manual.texi revisions to FULLTEXT section.

manual.texi other miscellaneous cleanups. manual.texi fix missing word Docs/manual.texi: revisions to FULLTEXT section. other miscellaneous cleanups.

manual.texi revisions to FULLTEXT section.
manual.texi other miscellaneous cleanups. manual.texi fix missing word Docs/manual.texi: revisions to FULLTEXT section. other miscellaneous cleanups.
67958804 · unknown · 85fd8dda · 67958804
Commit 67958804 authored Mar 20, 2002 by unknown
Hide whitespace changes
Inline Side-by-side

Showing with 112 additions and 96 deletions

Docs/manual.texi Docs/manual.texi +112 -96

No files found.
--- a/Docs/manual.texi
+++ b/Docs/manual.texi
@@ -33990,8 +33990,8 @@ DELETE FROM t1,t2 USING t1,t2,t3 WHERE t1.id=t2.id AND t2.id=t3.id
 In the above case we delete matching rows just from tables @code{t1} and
 @code{t2}.
-@code{ORDER BY} and using multiple tables in the @code{DELETE} is supported
+@code{ORDER BY} and using multiple tables in the @code{DELETE} statement
-in MySQL 4.0.
+is supported in MySQL 4.0.
 If an @code{ORDER BY} clause is used, the rows will be deleted in that order.
 This is really only useful in conjunction with @code{LIMIT}.  For example:
@@ -35947,16 +35947,17 @@ You can set the default isolation level for @code{mysqld} with
 @cindex full-text search
 @cindex FULLTEXT
-Since Version 3.23.23, MySQL has support for full-text indexing
+As of Version 3.23.23, MySQL has support for full-text indexing
 and searching.  Full-text indexes in MySQL are an index of type
 @code{FULLTEXT}.  @code{FULLTEXT} indexes can be created from @code{VARCHAR}
 and @code{TEXT} columns at @code{CREATE TABLE} time or added later with
-@code{ALTER TABLE} or @code{CREATE INDEX}.  For large datasets, adding
+@code{ALTER TABLE} or @code{CREATE INDEX}.  For large datasets, it will be
-@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX})
+much faster to load your data into a table that has no @code{FULLTEXT}
-would be much faster than inserting rows into the empty table that has
+index, then create the index with @code{ALTER TABLE} (or @code{CREATE
-a @code{FULLTEXT}  index.
+INDEX}).  Loading data into a table that already has a @code{FULLTEXT}
+index will be slower.
-Full-text search is performed with the @code{MATCH} function.
+Full-text searching is performed with the @code{MATCH()} function.
 @example
 mysql> CREATE TABLE articles (
@@ -35988,24 +35989,35 @@ mysql> SELECT * FROM articles
 2 rows in set (0.00 sec)
 @end example
-The function @code{MATCH} matches a natural language (or boolean,
+The @code{MATCH()} function performs a natural language search for a string
-see below) query in case-insensitive fashion @code{AGAINST}
+against a text collection (a set of of one or more columns included in
-a text collection (which is simply the set of columns covered by a
+a @code{FULLTEXT} index).  The search string is given as the argument to
-@code{FULLTEXT} index).  For every row in a table it returns relevance -
+@code{AGAINST()}.  The search is performed in case-insensitive fashion.
-a similarity measure between the text in that row (in the columns that are
+For every row in the table, @code{MATCH()} returns a relevance value,
-part of the collection) and the query.  When it is used in a @code{WHERE}
+that is, a similarity measure between the search string and the text in
-clause (see example above) the rows returned are automatically sorted with
+that row in the columns named in the @code{MATCH()} list.
-relevance decreasing.  Relevance is a non-negative floating-point number.
-Zero relevance means no similarity.  Relevance is computed based on the
-number of words in the row, the number of unique words in that row, the
-total number of words in the collection, and the number of documents (rows)
-that contain a particular word.
-The above is a basic example of using @code{MATCH} function. Rows are
+When @code{MATCH()} is used in a @code{WHERE} clause (see example above)
-returned with relevance decreasing.
+the rows returned are automatically sorted with highest relevance first.
+Relevance values are non-negative floating-point numbers.  Zero relevance
+means no similarity.  Relevance is computed based on the number of words
+in the row, the number of unique words in that row, the total number of
+words in the collection, and the number of documents (rows) that contain
+a particular word.
+It is also possible to perform a boolean mode search.  This is explained
+later in the section.
+The preceding example is a basic illustration showing how to use the
+@code{MATCH()} function. Rows are returned in order of decreasing
+relevance.
+The next example shows how to retrieve the relevance values explicitly.
+As neither @code{WHERE} nor @code{ORDER BY} clauses are present, returned
+rows are not ordered.
 @example
-mysql> SELECT id,MATCH title,body AGAINST ('Tutorial') FROM articles;
+mysql> SELECT id,MATCH (title,body) AGAINST ('Tutorial') FROM articles;
 +----+-----------------------------------------+
 | id | MATCH (title,body) AGAINST ('Tutorial') |
 +----+-----------------------------------------+
@@ -36019,12 +36031,16 @@ mysql> SELECT id,MATCH title,body AGAINST ('Tutorial') FROM articles;
 6 rows in set (0.00 sec)
 @end example
-This example shows how to retrieve the relevances. As neither @code{WHERE}
+The following example is more complex.  The query returns the relevance
-nor @code{ORDER BY} clauses are present, returned rows are not ordered.
+and still sorts the rows in order of decreasing relevance. To achieve
+this result, you should specify @code{MATCH()} twice. This will cause no
+additional overhead, because the MySQL optimiser will notice that the
+two @code{MATCH()} calls are identical and invoke the full-text search
+code only once.
 @example
-mysql> SELECT id, body, MATCH title,body AGAINST (
+mysql> SELECT id, body, MATCH (title,body) AGAINST
-    -> 'Security implications of running MySQL as root') AS score
+    -> ('Security implications of running MySQL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root');
 +----+-------------------------------------+-----------------+
@@ -36036,18 +36052,12 @@ mysql> SELECT id, body, MATCH title,body AGAINST (
 2 rows in set (0.00 sec)
 @end example
-This is more complex example - the query returns the relevance and still
+MySQL uses a very simple parser to split text into words.  A ``word''
-sorts the rows with relevance decreasing. To achieve it one should specify
+is any sequence of characters consisting of letters, numbers, @samp{'},
-@code{MATCH} twice. Note, that this will cause no additional overhead, as
+and @samp{_}.  Any ``word'' that is present in the stopword list or is just
-MySQL optimiser will notice that these two @code{MATCH} calls are
+too short (3 characters or less) is ignored.
-identical and will call full-text search code only once.
-MySQL uses a very simple parser to split text into words.  A
+Every correct word in the collection and in the query is weighted
-``word'' is any sequence of letters, numbers, @samp{'}, and @samp{_}.  Any
-``word'' that is present in the stopword list or just too short (3
-characters or less) is ignored.
-Every correct word in the collection and in the query is weighted,
 according to its significance in the query or collection.  This way, a
 word that is present in many documents will have lower weight (and may
 even have a zero weight), because it has lower semantic value in this
@@ -36057,28 +36067,28 @@ relevance of the row.
 Such a technique works best with large collections (in fact, it was
 carefully tuned this way).  For very small tables, word distribution
-does not reflect adequately their semantical value, and this model
+does not reflect adequately their semantic value, and this model
-may sometimes produce bisarre results.
+may sometimes produce bizarre results.
 @example
 mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL');
 Empty set (0.00 sec)
 @end example
-Search for the word @code{MySQL} produces no results in the above example.
+The search for the word @code{MySQL} produces no results in the above
-Word @code{MySQL} is present in more than half of rows, and as such, is
+example, because that word is present in more than half of rows.  As such,
-effectively treated as a stopword (that is, with semantical value zero).
+it is effectively treated as a stopword (that is, a word with zero semantic
-It is, really, the desired behavior - a natural language query should not
+value).  This is the most desirable behavior -- a natural language query
-return every second row in 1GB table.
+should not return every second row from a 1GB table.
 A word that matches half of rows in a table is less likely to locate relevant
 documents.  In fact, it will most likely find plenty of irrelevant documents.
 We all know this happens far too often when we are trying to find something on
 the Internet with a search engine.  It is with this reasoning that such rows
-have been assigned a low semantical value in @strong{this particular dataset}.
+have been assigned a low semantic value in @strong{this particular dataset}.
-Since version 4.0.1 MySQL can also perform boolean fulltext searches using
+As of Version 4.0.1, MySQL can also perform boolean full-text searches using
-@code{IN BOOLEAN MODE} modifier.
+the @code{IN BOOLEAN MODE} modifier.
 @example
 mysql> SELECT * FROM articles WHERE MATCH (title,body)
@@ -36095,38 +36105,44 @@ mysql> SELECT * FROM articles WHERE MATCH (title,body)
 @end example
 This query retrieved all the rows that contain the word @code{MySQL}
-(note: 50% threshold is gone), but does @strong{not} contain the word
+(note: the 50% threshold is not used), but that do @strong{not} contain
-@code{YourSQL}.  Note, that it does not auto-magically sort rows in
+the word @code{YourSQL}.  Note that a boolean mode search does not
-decreasing relevance order (the last row has the highest relevance,
+auto-magically sort rows in order of decreasing relevance.  You can
-as it contains @code{MySQL} twice). Boolean fulltext search can also
+see this from result of the preceding query, where the row with the
-work even without @code{FULLTEXT} index, but it would be @strong{slow}.
+highest relevance (the one that contains @code{MySQL} twice) is listed
+last, not first.  A boolean full-text search can also work even without
+a @code{FULLTEXT} index, although it would be @strong{slow}.
-Boolean fulltext search supports the following operators:
+The boolean full-text search capability supports the following operators:
 @table @code
 @item +
-A plus sign prepended to a word indicates that this word @strong{must be}
+A leading plus sign indicates that this word @strong{must be}
 present in every row returned.
 @item -
-A minus sign prepended to a word indicates that this word @strong{must not}
+A leading minus sign indicates that this word @strong{must not be}
-be present in the rows returned.
+present in any row returned.
 @item
-By default - without plus or minus - the word is optional, but the rows that
+By default (when neither plus nor minus is specified) the word is optional,
-contain it will be rated higher. This mimicks the behaviour of
+but the rows that contain it will be rated higher. This mimicks the
-@code{MATCH ... AGAINST()} without @code{IN BOOLEAN MODE} modifier.
+behaviour of @code{MATCH() ... AGAINST()} without the @code{IN BOOLEAN
+MODE} modifier.
 @item < >
-These two operators are used to increase and decrease word's contribution
+These two operators are used to change a word's contribution to the
-to the relevance value, assigned to a row. See an example below.
+relevance value that is assigned to a row.  The @code{<} operator
+decreases the contribution and the @code{>} operator increases it.
+See the example below.
 @item ( )
-Parentheses are used - as usual - to group words into subexpressions.
+Parentheses are used to group words into subexpressions.
 @item ~
-This is negation operator. It makes word's contribution to the row
+A leading tilde acts as a negation operator, causing the word's
-relevance negative. It's useful for marking noise words. A row that has
+contribution to the row relevance to be negative. It's useful for marking
-such a word will be rated lower than others, but will not be excluded
+noise words. A row that contains such a word will be rated lower than
-altogether, as with @code{-} operator.
+others, but will not be excluded altogether, as it would be with the
+@code{-} operator.
 @item *
-This is truncation operator. Unlike others it should be @strong{appended}
+An asterisk is the truncation operator. Unlike the other operators, it
-to the word, not prepended.
+should be @strong{appended} to the word, not prepended.
 @end table
 And here are some examples:
@@ -36148,25 +36164,25 @@ order), but rank ``apple pie'' higher than ``apple strudel''.
 @end table
 @menu
-* Fulltext Restrictions::       Fulltext Restrictions
+* Fulltext Restrictions::       Full-text Restrictions
 * Fulltext Fine-tuning::        Fine-tuning MySQL Full-text Search
 * Fulltext TODO::               Full-text Search TODO
 @end menu
 @node Fulltext Restrictions, Fulltext Fine-tuning, Fulltext Search, Fulltext Search
-@subsection Fulltext Restrictions
+@subsection Full-text Restrictions
 @itemize @bullet
 @item
-All parameters to the @code{MATCH} function must be columns from the
+All parameters to the @code{MATCH()} function must be columns from the
-same table that is part of the same fulltext index, unless this
+same table that is part of the same @code{FULLTEXT} index, unless the
-@code{MATCH} is @code{IN BOOLEAN MODE}.
+@code{MATCH()} is @code{IN BOOLEAN MODE}.
 @item
-Column list between @code{MATCH} and @code{AGAINST} must match exactly
+The @code{MATCH()} column list must exactly match the column list in some
-a column list in the @code{FULLTEXT} index definition, unless this
+@code{FULLTEXT} index definition for the table, unless this @code{MATCH()}
-@code{MATCH} is @code{IN BOOLEAN MODE}.
+is @code{IN BOOLEAN MODE}.
 @item
-The argument to @code{AGAINST} must be a constant string.
+The argument to @code{AGAINST()} must be a constant string.
 @end itemize
@@ -36176,7 +36192,7 @@ The argument to @code{AGAINST} must be a constant string.
 Unfortunately, full-text search has few user-tunable parameters yet,
 although adding some is very high on the TODO. If you have a
 MySQL source distribution (@pxref{Installing source}), you can
-more control on the full-text search behavior.
+exert more control over full-text searching behavior.
 Note that full-text search was carefully tuned for the best searching
 effectiveness.  Modifying the default behavior will, in most cases,
@@ -36186,37 +36202,37 @@ unless you know what you are doing!
 @itemize @bullet
 @item
-Minimal length of word to be indexed is defined by MySQL
+The minimum length of words to be indexed is defined by the MySQL
 variable @code{ft_min_word_length}. @xref{SHOW VARIABLES}.
 Change it to the value you prefer, and rebuild
 your @code{FULLTEXT} indexes.
 @item
 The stopword list is defined in @file{myisam/ft_static.c}
-Modify it to your taste, recompile MySQL and rebuild
+Modify it to your taste, recompile MySQL, and rebuild
 your @code{FULLTEXT} indexes.
 @item
-The 50% threshold is caused by the particular weighting scheme chosen. To
+The 50% threshold is determined by the particular weighting scheme chosen.
-disable it, change the following line in @file{myisam/ftdefs.h}:
+To disable it, change the following line in @file{myisam/ftdefs.h}:
 @example
 #define GWS_IN_USE GWS_PROB
 @end example
-to
+To:
 @example
 #define GWS_IN_USE GWS_FREQ
 @end example
-and recompile MySQL.
+Then recompile MySQL.
 There is no need to rebuild the indexes in this case.
-@strong{Note:} by doing this you @strong{severely} decrease MySQL ability
+@strong{Note:} by doing this you @strong{severely} decrease MySQL's ability
-to provide adequate relevance values by @code{MATCH} function.
+to provide adequate relevance values for the @code{MATCH()} function.
-It means, that if you really need to search for such a common words,
+If you really need to search for such common words, it would be better to
-then you should rather search @code{IN BOOLEAN MODE}, which does not
+search using @code{IN BOOLEAN MODE} instead, which does not observe the 50%
-has 50% threshold.
+threshold.
 @item
-Sometimes search engine maintaner would like to change operators used
+Sometimes the search engine maintainer would like to change the operators used
-for boolean fulltext search. They are defined by a
+for boolean fulltext searches. These are defined by the
 @code{ft_boolean_syntax} variable. @xref{SHOW VARIABLES}.
 Still, this variable is read-only, its value is set in
 @file{myisam/ft_static.c}.
@@ -36237,7 +36253,7 @@ the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
 @item Support for multi-byte charsets.
 @item Make stopword list to depend of the language of the data.
 @item Stemming (dependent of the language of the data, of course).
-@item Generic user-supplyable UDF (?) preparser.
+@item Generic user-suppliable UDF (?) preparser.
 @item Make the model more flexible (by adding some adjustable
 parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}).
 @end itemize
@@ -49697,7 +49713,7 @@ Fixed bug with @code{LOCK TABLE} and BDB tables.
 @itemize @bullet
 @item
-Fixed a bug when using @code{MATCH} in @code{HAVING} clause.
+Fixed a bug when using @code{MATCH()} in @code{HAVING} clause.
 @item
 Fixed a bug when using @code{HEAP} tables with @code{LIKE}.
 @item
@@ -50266,7 +50282,7 @@ that caused @code{mysql_install_db} to core dump on some Linux machines.
 @item
 Changed @code{mi_create()} to use less stack space.
 @item
-Fixed bug with optimiser trying to over-optimise @code{MATCH} when used
+Fixed bug with optimiser trying to over-optimise @code{MATCH()} when used
 with @code{UNIQUE} key.
 @item
 Changed @code{crash-me} and the MySQL benchmarks to also work
@@ -50722,7 +50738,7 @@ More variables in @code{SHOW SLAVE STATUS} and @code{SHOW MASTER STATUS}.
 @item
 @code{SLAVE STOP} now will not return until the slave thread actually exits.
 @item
-Full text search via the @code{MATCH} function and @code{FULLTEXT} index type
+Full text search via the @code{MATCH()} function and @code{FULLTEXT} index type
 (for MyISAM files).  This makes @code{FULLTEXT} a reserved word.
 @end itemize