ft_optimization: identical queries merging. collection -> fulltext. Bugs fixed.

**************** !!! NOTE EVERYBODY: SYNTAX CHANGED !!! ********************
There's no COLLECTIONs now, full-text indexes can be created via the word
FULLTEXT, which should be used like UNIQUE.
parent 6236dfc7
......@@ -11670,7 +11670,6 @@ to restart @code{mysqld} with @code{--skip-grant-tables} to be able to run
* GRANT:: @code{GRANT} and @code{REVOKE} syntax
* CREATE INDEX:: @code{CREATE INDEX} syntax
* DROP INDEX:: @code{DROP INDEX} syntax
* CREATE COLLECTION:: @code{CREATE COLLECTION} syntax
* Comments:: Comment syntax
* CREATE FUNCTION:: @code{CREATE FUNCTION} syntax
* Reserved words:: Is @strong{MySQL} picky about reserved words?
......@@ -13436,9 +13435,9 @@ mysql> CREATE TABLE test (
For @code{BLOB} and @code{TEXT} columns, you must index a prefix of the
column, you cannot index the entire thing.
In @strong{MySQL} 3.23.23 or later, you can also create special indexes
called @strong{collections}. They are used for full-text search. Only
@code{MyISAM} table type supports collections. Collection can be created
In @strong{MySQL} 3.23.23 or later, you can also create special
@strong{fulltext} indexes. They are used for full-text search. Only
@code{MyISAM} table type supports fulltext indexes. They can be created
only from @code{VARCHAR}, @code{BLOB}, and @code{TEXT} columns.
Indexing always happens over the entire column, partial indexing is not
supported. See @ref{MySQL full-text search} for details of operation.
......@@ -14165,7 +14164,7 @@ mysql> select STRCMP('text', 'text');
relevance - similarity measure between the text in columns
@code{(col1,col2,...)} and the query @code{expr}. Relevance is a
positive floating point number. Zero relevance means no similarity.
For @code{MATCH ... AGAINST()} to work, a @code{COLLECTION}
For @code{MATCH ... AGAINST()} to work, a @strong{fulltext index}
must be created first. @xref{CREATE TABLE, , @code{CREATE TABLE}}.
@code{MATCH ... AGAINST()} is available in @code{MySQL} 3.23.23 or later.
For details and usage examples see @xref{MySQL full-text search}.
......@@ -16178,7 +16177,7 @@ create_definition:
or KEY [index_name] (index_col_name,...)
or INDEX [index_name] (index_col_name,...)
or UNIQUE [INDEX] [index_name] (index_col_name,...)
or COLLECTION [collection_name] (collection_col_name,...)
or FULLTEXT [INDEX] [index_name] (index_col_name,...)
or [CONSTRAINT symbol] FOREIGN KEY index_name (index_col_name,...)
[reference_definition]
or CHECK (expr)
......@@ -16422,10 +16421,10 @@ When you use @code{ORDER BY} or @code{GROUP BY} with a @code{TEXT} or
@xref{BLOB, , @code{BLOB}}.
@item
In @strong{MySQL} 3.23.23 or later, you can also create special indexes
called @strong{collections}. They are used for full-text search. Only
@code{MyISAM} table type supports collections. Collection can be created
from any mix of @code{VARCHAR}, @code{BLOB}, and @code{TEXT} columns.
In @strong{MySQL} 3.23.23 or later, you can also create special
@strong{fulltext} indexes. They are used for full-text search. Only
@code{MyISAM} table type supports fulltext indexes. They can be created
only from @code{VARCHAR}, @code{BLOB}, and @code{TEXT} columns.
Indexing always happens over the entire column, partial indexing is not
supported. See @ref{MySQL full-text search} for details of operation.
......@@ -16598,7 +16597,7 @@ alter_specification:
or ADD INDEX [index_name] (index_col_name,...)
or ADD PRIMARY KEY (index_col_name,...)
or ADD UNIQUE [index_name] (index_col_name,...)
or ADD COLLECTION [collection_name] (collection_col_name,...)
or ADD FULLTEXT [index_name] (index_col_name,...)
or ALTER [COLUMN] col_name @{SET DEFAULT literal | DROP DEFAULT@}
or CHANGE [COLUMN] old_col_name create_definition
or MODIFY [COLUMN] create_definition
......@@ -19769,7 +19768,7 @@ dropped only with explicit @code{REVOKE} commands or by manipulating the
@section @code{CREATE INDEX} syntax
@example
CREATE [UNIQUE] INDEX index_name ON tbl_name (col_name[(length)],... )
CREATE [UNIQUE|FULLTEXT] INDEX index_name ON tbl_name (col_name[(length)],... )
@end example
The @code{CREATE INDEX} statement doesn't do anything in @strong{MySQL} prior
......@@ -19803,15 +19802,19 @@ which could save a lot of disk space and might also speed up @code{INSERT}
operations!
Note that you can only add an index on a column that can have @code{NULL}
values or on a @code{BLOB}/@code{TEXT} column if you are useing
values or on a @code{BLOB}/@code{TEXT} column if you are using
@strong{MySQL} version 3.23.2 or newer and are using the @code{MyISAM}
table type.
For more information about how @strong{MySQL} uses indexes, see
@ref{MySQL indexes, , @strong{MySQL} indexes}.
Fulltext indexes can index only @code{VARCHAR}, @code{BLOB}, and
@code{TEXT} columns, and only in @code{MyISAM} tables. Fulltext indexes
are available from @strong{MySQL} 3.23.23. @ref{MySQL full-text search}.
@findex DROP INDEX
@node DROP INDEX, CREATE COLLECTION, CREATE INDEX, Reference
@node DROP INDEX, Comments, CREATE INDEX, Reference
@section @code{DROP INDEX} syntax
@example
......@@ -19824,30 +19827,8 @@ prior to version 3.22. In 3.22 or later, @code{DROP INDEX} is mapped to an
@code{ALTER TABLE} statement to drop the index.
@xref{ALTER TABLE, , @code{ALTER TABLE}}.
@findex CREATE COLLECTION
@node CREATE COLLECTION, Comments, DROP INDEX, Reference
@section @code{CREATE COLLECTION} syntax
@example
CREATE COLLECTION collection_name ON tbl_name (col_name,... )
@end example
@code{CREATE COLLECTION} statement is mapped to an
@code{ALTER TABLE} statement to create collections.
@xref{ALTER TABLE, , @code{ALTER TABLE}}.
A column list of the form @code{(col1,col2,...)} creates a
multiple-column collection. Search in such a collection means a search
over the concatenated columns that comprise the collection.
There is no special @code{DROP COLLECTION} statement.
@code{DROP INDEX} should be used to drop collections instead.
Only @code{VARCHAR}, @code{BLOB}, and @code{TEXT} columns can be part
of the collection. See @ref{MySQL full-text search} for details of operation.
@findex Comment syntax
@node Comments, CREATE FUNCTION, CREATE COLLECTION, Reference
@node Comments, CREATE FUNCTION, DROP INDEX, Reference
@section Comment syntax
The @strong{MySQL} server supports the @code{# to end of line}, @code{--
......@@ -34106,14 +34087,14 @@ DELAYED} threads.
@section MySQL full-text search
Since version 3.23.23, @strong{MySQL} has support for full-text indexing
and searching. Full-text index in @strong{MySQL} is a special type of index
called @strong{collection}. Collections can be created from @code{VARCHAR},
@code{TEXT}, and @code{BLOB} columns at @code{CREATE TABLE}
time or added later with @code{ALTER TABLE} or @code{CREATE COLLECTION}.
Collection is queried with @code{MATCH} function.
and searching. Full-text index in @strong{MySQL} is an
index of type @code{FULLTEXT}. Fulltext indexes can be created from
@code{VARCHAR}, @code{TEXT}, and @code{BLOB} columns at
@code{CREATE TABLE} time or added later with @code{ALTER TABLE} or
@code{CREATE INDEX}. Full-text search is performed with @code{MATCH} function.
@example
mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, COLLECTION (a,b));
mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b));
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO t VALUES
-> ('MySQL has now support', 'for full-text search'),
......@@ -34145,15 +34126,16 @@ mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t;
@end example
Function @code{MATCH} matches a natural language query @code{AGAINST} a
text collection. For every row in a table it returns relevance -
text collection (which is simply the columns that are covered
by fulltext index). For every row in a table it returns relevance -
similarity measure between the text in that row (in the columns, that
are part of the collection) and the query. When it used in a @code{WHERE}
clause (see example above) the rows returned are automatically sorted
with relevance decreasing. Relevance is a non-negative floating point
number. Zero relevance means no similarity. Relevance is computed based
on number of words in the row and number of unique words in that row,
total number of words in the collection, number of documents (rows),
that contain a particular word, etc.
are part of the collection) and the query. When it used in a
@code{WHERE} clause (see example above) the rows returned are
automatically sorted with relevance decreasing. Relevance is a non-
negative floating point number. Zero relevance means no similarity.
Relevance is computed based on number of words in the row and number of
unique words in that row, total number of words in the collection,
number of documents (rows), that contain a particular word, etc.
MySQL uses very simple parser to split text into words. "Word" is
any sequence of letters, numbers, @code{'}, and @code{_}. Any "word"
......@@ -158,6 +158,7 @@ FT_DOCLIST * ft_init_search(void *info, uint keynr, byte *key,
ALL_IN_ONE aio;
FT_DOCLIST *dlist;
FT_DOC *dptr;
my_off_t saved_lastpos;
/* black magic ON */
if ((int) (keynr = _mi_check_index((MI_INFO *)info,keynr)) < 0)
......@@ -173,6 +174,8 @@ FT_DOCLIST * ft_init_search(void *info, uint keynr, byte *key,
aio.keyinfo=aio.info->s->keyinfo+keynr;
aio.key_root=aio.info->s->state.key_root[keynr];
saved_lastpos=aio.info->lastpos;
if(!(wtree=ft_parse(NULL,key,key_len))) return NULL;
init_tree(&aio.dtree,0,sizeof(FT_SUPERDOC),(qsort_cmp)&FT_SUPERDOC_cmp,0,
......@@ -199,6 +202,7 @@ FT_DOCLIST * ft_init_search(void *info, uint keynr, byte *key,
}
err:
aio.info->lastpos=saved_lastpos;
delete_tree(&aio.dtree);
delete_tree(wtree);
free(wtree);
......@@ -217,7 +221,8 @@ int ft_read_next(FT_DOCLIST *handler, char *record)
info->update&= (HA_STATE_CHANGED | HA_STATE_ROW_CHANGED);
if (!(*info->read_record)(info,handler->doc[handler->curdoc].dpos,record))
info->lastpos=handler->doc[handler->curdoc].dpos;
if (!(*info->read_record)(info,info->lastpos,record))
{
info->update|= HA_STATE_AKTIV; /* Record is read */
return 0;
......
......@@ -1141,7 +1141,13 @@ int mi_repair(MI_CHECK *param, register MI_INFO *info,
for (i=0 ; i < share->state.header.max_block_size ; i++)
share->state.key_del[i]= HA_OFFSET_ERROR;
share->state.key_map= ((ulonglong)1L << share->base.keys)-1; /* Should I ? */
/* I think mi_repair and mi_repair_by_sort should do the same
(according, e.g. to ha_myisam::repair), but as mi_repair doesn't
touch key_map it cannot be used to T_CREATE_MISSING_KEYS.
That is the next line for... (serg)
*/
share->state.key_map= ((ulonglong)1L << share->base.keys)-1;
info->state->key_file_length=share->base.keystart;
......@@ -1935,6 +1941,11 @@ static int sort_key_read(SORT_INFO *sort_info, void *key)
"Found too many records; Can`t continue");
DBUG_RETURN(1);
}
/* Hmm, repair_by_sort uses find_all_keys, and find_all_keys strictly
implies "one row - one key per keynr", while for ft_key one row/keynr
can produce as many keys as the number of unique words in the text
that's why I disabled repair_by_sort for ft-keys. (serg)
*/
if (sort_info->keyinfo->flag & HA_FULLTEXT )
{
mi_check_print_error(sort_info->param,
......@@ -3009,6 +3020,13 @@ my_bool mi_test_if_sort_rep(MI_INFO *info, ha_rows rows)
return FALSE; /* Can't use sort */
for (i=0 ; i < share->base.keys ; i++,key++)
{
/* It's to disable repair_by_sort for ft-keys.
Another solution would be to make ft-keys just too_big_key_for_sort,
but then they won't be disabled by dectivate_non_unique_index
and so they will be created at the first stage. As ft-key creation
is very time-consuming process, it's better to leave it to repair stage
but this repair shouldn't be repair_by_sort (serg)
*/
if (mi_too_big_key_for_sort(key,rows) || (key->flag & HA_FULLTEXT))
return FALSE;
}
......
......@@ -1837,26 +1837,8 @@ longlong Item_func_inet_aton::val_int()
double Item_func_match::val()
{
my_off_t docid=table->file->row_position(); // HAVE to do it here...
if (first_call)
{
if (join_key=(table->file->get_index() == key &&
(ft_handler=(FT_DOCLIST *)table->file->ft_handler)))
;
else
{
/* join won't use this ft-key, but we must to init it anyway */
String *ft_tmp=0;
char tmp1[FT_QUERY_MAXLEN];
String tmp2(tmp1,sizeof(tmp1));
ft_tmp=key_item()->val_str(&tmp2);
ft_handler=(FT_DOCLIST *)
table->file->ft_init_ext(key, (byte*) ft_tmp->ptr(), ft_tmp->length());
}
first_call=0;
}
init_search();
// Don't know how to return an error from val(), so NULL will be returned
if ((null_value=(ft_handler==NULL)))
......@@ -1873,6 +1855,7 @@ double Item_func_match::val()
int a,b,c;
FT_DOC *docs=ft_handler->doc;
my_off_t docid=table->file->row_position();
if ((null_value=(docid==HA_OFFSET_ERROR)))
return 0.0;
......@@ -1893,6 +1876,36 @@ double Item_func_match::val()
}
}
void Item_func_match::init_search()
{
if (!first_call)
return;
first_call=false;
if (master)
{
master->init_search();
ft_handler=master->ft_handler;
join_key=master->join_key;
return;
}
if (join_key)
{
ft_handler=((FT_DOCLIST *)table->file->ft_handler);
return;
}
/* join won't use this ft-key, but we must to init it anyway */
String *ft_tmp=0;
char tmp1[FT_QUERY_MAXLEN];
String tmp2(tmp1,sizeof(tmp1));
ft_tmp=key_item()->val_str(&tmp2);
ft_handler=(FT_DOCLIST *)
table->file->ft_init_ext(key, (byte*) ft_tmp->ptr(), ft_tmp->length());
}
bool Item_func_match::fix_fields(THD *thd,struct st_table_list *tlist)
{
List_iterator<Item> li(fields);
......@@ -1982,6 +1995,24 @@ bool Item_func_match::fix_index()
this->key=max_key;
first_call=1;
maybe_null=1;
join_key=0;
return 0;
}
bool Item_func_match::eq(const Item *item) const
{
if (item->type() != FUNC_ITEM)
return 0;
if (func_name() != ((Item_func*)item)->func_name())
return 0;
Item_func_match *ifm=(Item_func_match*) item;
if (key == ifm->key && table == ifm->table &&
key_item()->eq(ifm->key_item()))
return 1;
return 0;
}
......
......@@ -839,19 +839,28 @@ class Item_func_match :public Item_real_func
TABLE *table;
uint key;
bool first_call, join_key;
Item_func_match *master;
FT_DOCLIST *ft_handler;
Item_func_match(List<Item> &a, Item *b): Item_real_func(b),
fields(a), table(0), ft_handler(0)
{}
~Item_func_match() { ft_close_search(ft_handler);
if(join_key) table->file->ft_handler=0; }
fields(a), table(0), ft_handler(0), master(0) {}
~Item_func_match()
{
if (!master)
{
ft_close_search(ft_handler);
if(join_key)
table->file->ft_handler=0;
}
}
const char *func_name() const { return "match"; }
enum Functype functype() const { return FT_FUNC; }
void update_used_tables() {}
bool fix_fields(THD *thd,struct st_table_list *tlist);
bool fix_index();
bool eq(const Item *) const;
double val();
longlong val_int() { return val()!=0.0; }
bool fix_index();
void init_search();
};
......@@ -81,7 +81,6 @@ static SYMBOL symbols[] = {
{ "CHANGED", SYM(CHANGED),0,0},
{ "CHECK", SYM(CHECK_SYM),0,0},
{ "CHECKSUM", SYM(CHECKSUM_SYM),0,0},
{ "COLLECTION", SYM(COLLECTION),0,0},
{ "COLUMN", SYM(COLUMN_SYM),0,0},
{ "COLUMNS", SYM(COLUMNS),0,0},
{ "COMMENT", SYM(COMMENT_SYM),0,0},
......@@ -142,6 +141,7 @@ static SYMBOL symbols[] = {
{ "FROM", SYM(FROM),0,0},
{ "FOR", SYM(FOR_SYM),0,0},
{ "FULL", SYM(FULL),0,0},
{ "FULLTEXT", SYM(FULLTEXT_SYM),0,0},
{ "FUNCTION", SYM(UDF_SYM),0,0},
{ "GRANT", SYM(GRANT),0,0},
{ "GRANTS", SYM(GRANTS),0,0},
......
......@@ -2172,18 +2172,22 @@ bool remove_table_from_cache(THD *thd, const char *db,const char *table_name)
DBUG_RETURN(result);
}
/*
Will be used for ft-query optimization someday.
SerG.
*/
int setup_ftfuncs(THD *thd,TABLE_LIST *tables, List<Item_func_match> &ftfuncs)
{
List_iterator<Item_func_match> li(ftfuncs);
Item_func_match *ftf;
List_iterator<Item_func_match> li(ftfuncs), li2(ftfuncs);
Item_func_match *ftf, *ftf2;
while ((ftf=li++))
{
if (ftf->fix_index())
return 1;
li2.rewind();
while ((ftf2=li2++) != ftf)
{
if (ftf->eq(ftf2) && !ftf2->master)
ftf2->master=ftf;
}
}
return 0;
}
......@@ -1284,11 +1284,11 @@ add_ft_keys(DYNAMIC_ARRAY *keyuse_array,
KEYUSE keyuse;
keyuse.table= cond_func->table;
keyuse.val = cond_func->key_item();
keyuse.val = cond_func;
keyuse.key = cond_func->key;
#define FT_KEYPART (MAX_REF_PARTS+10)
keyuse.keypart=FT_KEYPART;
keyuse.used_tables=keyuse.val->used_tables();
keyuse.used_tables=cond_func->key_item()->used_tables();
VOID(insert_dynamic(keyuse_array,(gptr) &keyuse));
}
......@@ -1670,7 +1670,7 @@ find_best(JOIN *join,table_map rest_tables,uint idx,double record_count,
else
tmp=best_time; // Do nothing
}
} /* not ftkey */
} /* not ft_key */
if (tmp < best_time - records/(double) TIME_FOR_COMPARE)
{
best_time=tmp + records/(double) TIME_FOR_COMPARE;
......@@ -1882,9 +1882,12 @@ get_best_combination(JOIN *join)
keyinfo=table->key_info+key;
if (ftkey)
{
ft_tmp=keyuse->val->val_str(&tmp2);
Item_func_match *ifm=(Item_func_match *)keyuse->val;
ft_tmp=ifm->key_item()->val_str(&tmp2);
length=ft_tmp->length();
keyparts=1;
ifm->join_key=1;
}
else
{
......@@ -1924,7 +1927,7 @@ get_best_combination(JOIN *join)
byte *key_buff=j->ref.key_buff;
if (ftkey)
{
j->ref.items[0]=keyuse->val;
j->ref.items[0]=((Item_func*)(keyuse->val))->key_item();
if (!keyuse->used_tables &&
!(join->select_options & SELECT_DESCRIBE))
{
......
......@@ -137,7 +137,6 @@ bool my_yyoverflow(short **a, YYSTYPE **b,int *yystacksize);
%token CHANGED_FILES
%token CHECKSUM_SYM
%token CHECK_SYM
%token COLLECTION
%token COLUMNS
%token COLUMN_SYM
%token CONSTRAINT
......@@ -162,6 +161,7 @@ bool my_yyoverflow(short **a, YYSTYPE **b,int *yystacksize);
%token FOREIGN
%token FROM
%token FULL
%token FULLTEXT_SYM
%token GRANT
%token GRANTS
%token GREATEST_SYM
......@@ -457,7 +457,7 @@ bool my_yyoverflow(short **a, YYSTYPE **b,int *yystacksize);
expr_list udf_expr_list when_list ident_list
%type <key_type>
key_type opt_unique
key_type opt_unique_or_fulltext
%type <string_list>
key_usage_list
......@@ -628,7 +628,7 @@ create:
}
create2
| CREATE opt_unique INDEX ident ON table_ident
| CREATE opt_unique_or_fulltext INDEX ident ON table_ident
{
Lex->sql_command= SQLCOM_CREATE_INDEX;
if (!add_table_to_list($6,NULL))
......@@ -643,21 +643,6 @@ create:
Lex->key_list.push_back(new Key($2,$4.str,Lex->col_list));
Lex->col_list.empty();
}
| CREATE COLLECTION ident ON table_ident
{
Lex->sql_command= SQLCOM_CREATE_INDEX;
if (!add_table_to_list($5,NULL))
YYABORT;
Lex->create_list.empty();
Lex->key_list.empty();
Lex->col_list.empty();
Lex->change=NullS;
}
'(' key_list ')'
{
Lex->key_list.push_back(new Key(Key::FULLTEXT,$3.str,Lex->col_list));
Lex->col_list.empty();
}
| CREATE DATABASE opt_if_not_exists ident
{
Lex->sql_command=SQLCOM_CREATE_DB;
......@@ -964,7 +949,8 @@ delete_option:
key_type:
opt_constraint PRIMARY_SYM KEY_SYM { $$= Key::PRIMARY; }
| key_or_index { $$= Key::MULTIPLE; }
| COLLECTION { $$= Key::FULLTEXT; }
| FULLTEXT_SYM { $$= Key::FULLTEXT; }
| FULLTEXT_SYM key_or_index { $$= Key::FULLTEXT; }
| opt_constraint UNIQUE_SYM { $$= Key::UNIQUE; }
| opt_constraint UNIQUE_SYM key_or_index { $$= Key::UNIQUE; }
......@@ -976,9 +962,10 @@ keys_or_index:
KEYS {}
| INDEX {}
opt_unique:
opt_unique_or_fulltext:
/* empty */ { $$= Key::MULTIPLE; }
| UNIQUE_SYM { $$= Key::UNIQUE; }
| FULLTEXT_SYM { $$= Key::FULLTEXT; }
key_list:
key_list ',' key_part order_dir { Lex->col_list.push_back($3); }
......@@ -2443,7 +2430,6 @@ keyword:
| WORK_SYM {}
| YEAR_SYM {}
| SLAVE {}
| COLLECTION {}
/* Option functions */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment