Bug#17217128 : BAD INTERACTION BETWEEN MIN/MAX AND

"HAVING SUM(DISTINCT)": WRONG RESULTS. ISSUE: ------ If a query uses loose index scan and it has both AGG(DISTINCT) and MIN()/MAX()functions. Then, result values of MIN/MAX() is set improperly. When query has AGG(DISTINCT) then end_select is set to end_send_group. "end_send_group" keeps doing aggregation until it sees a record from next group. And, then it will send out the result row of that group. Since query also has MIN()/MAX() and loose index scan is used, values of MIN/MAX() are set as part of loose index scan itself. Setting MIN()/MAX() values as part of loose index scan overwrites values computed in end_send_group. This caused invalid result. For such queries to work loose index scan should stop performing MIN/MAX() aggregation. And, let end_send_group to do the same. But according to current design loose index scan can produce only one row per group key. If we have both MIN() and MAX() then it has to give two records out. This is not possible as interface has to use common buffer record[0]! for both records at a time. SOLUTIONS: ---------- For such queries to work we need a new interface for loose index scan. Hence, do not choose loose_index_scan for such cases. So a new rule SA7 is introduced to take care of the same. SA7: "If Q has both AGG_FUNC(DISTINCT ...) and MIN/MAX() functions then loose index scan access method is not used." mysql-test/r/group_min_max.result: Expected result. mysql-test/t/group_min_max.test: 1. Test with various combination of AGG(DISTINCT) and MIN(), MAX() functions. 2. Corrected the plan for old queries. sql/opt_range.cc: A new rule SA7 is introduced.

Bug#17217128 : BAD INTERACTION BETWEEN MIN/MAX AND
"HAVING SUM(DISTINCT)": WRONG RESULTS. ISSUE: ------ If a query uses loose index scan and it has both AGG(DISTINCT) and MIN()/MAX()functions. Then, result values of MIN/MAX() is set improperly. When query has AGG(DISTINCT) then end_select is set to end_send_group. "end_send_group" keeps doing aggregation until it sees a record from next group. And, then it will send out the result row of that group. Since query also has MIN()/MAX() and loose index scan is used, values of MIN/MAX() are set as part of loose index scan itself. Setting MIN()/MAX() values as part of loose index scan overwrites values computed in end_send_group. This caused invalid result. For such queries to work loose index scan should stop performing MIN/MAX() aggregation. And, let end_send_group to do the same. But according to current design loose index scan can produce only one row per group key. If we have both MIN() and MAX() then it has to give two records out. This is not possible as interface has to use common buffer record[0]! for both records at a time. SOLUTIONS: ---------- For such queries to work we need a new interface for loose index scan. Hence, do not choose loose_index_scan for such cases. So a new rule SA7 is introduced to take care of the same. SA7: "If Q has both AGG_FUNC(DISTINCT ...) and MIN/MAX() functions then loose index scan access method is not used." mysql-test/r/group_min_max.result: Expected result. mysql-test/t/group_min_max.test: 1. Test with various combination of AGG(DISTINCT) and MIN(), MAX() functions. 2. Corrected the plan for old queries. sql/opt_range.cc: A new rule SA7 is introduced.
f2202335 · mithun · e4931c92 · f2202335 · f2202335 · f2202335
Commit f2202335 authored May 15, 2014 by mithun
Showing with 95 additions and 2 deletions

mysql-test/r/group_min_max.result mysql-test/r/group_min_max.result +54 -2

mysql-test/t/group_min_max.test mysql-test/t/group_min_max.test +24 -0

sql/opt_range.cc sql/opt_range.cc +17 -0

No files found.
--- a/mysql-test/r/group_min_max.result
+++ b/mysql-test/r/group_min_max.result
@@ -3001,7 +3001,7 @@ COUNT(DISTINCT a, b)	SUM(DISTINCT a)
 0	NULL
 EXPLAIN SELECT SUM(DISTINCT a), MAX(b) FROM t2 GROUP BY a;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t2	range	NULL	a	5	NULL	9	Using index for group-by
+1	SIMPLE	t2	index	NULL	a	15	NULL	16	Using index
 SELECT SUM(DISTINCT a), MAX(b) FROM t2 GROUP BY a;
 SUM(DISTINCT a)	MAX(b)
 1	8
@@ -3029,7 +3029,7 @@ SELECT 42 * (a + c + COUNT(DISTINCT c, a, b)) FROM t2 GROUP BY a, b, c;
 168
 EXPLAIN SELECT (SUM(DISTINCT a) + MAX(b)) FROM t2 GROUP BY a;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t2	range	NULL	a	5	NULL	9	Using index for group-by
+1	SIMPLE	t2	index	NULL	a	15	NULL	16	Using index
 SELECT (SUM(DISTINCT a) + MAX(b)) FROM t2 GROUP BY a;
 (SUM(DISTINCT a) + MAX(b))
 9
@@ -3057,3 +3057,55 @@ id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	SIMPLE	t1	range	NULL	PRIMARY	5	NULL	9	Using index for group-by (scanning)
 drop table t1;
 # End of test#50539.
+#
+# Bug#17217128 -  BAD INTERACTION BETWEEN MIN/MAX AND
+#                 "HAVING SUM(DISTINCT)": WRONG RESULTS.
+#
+CREATE TABLE t (a INT, b INT, KEY(a,b));
+INSERT INTO t VALUES (1,1), (2,2), (3,3), (4,4), (1,0), (3,2), (4,5);
+ANALYZE TABLE t;
+Table	Op	Msg_type	Msg_text
+test.t	analyze	status	OK
+SELECT a, SUM(DISTINCT a), MIN(b) FROM t GROUP BY a;
+a	SUM(DISTINCT a)	MIN(b)
+1	1	0
+2	2	2
+3	3	2
+4	4	4
+EXPLAIN SELECT a, SUM(DISTINCT a), MIN(b) FROM t GROUP BY a;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t	index	NULL	a	10	NULL	7	Using index
+SELECT a, SUM(DISTINCT a), MAX(b) FROM t GROUP BY a;
+a	SUM(DISTINCT a)	MAX(b)
+1	1	1
+2	2	2
+3	3	3
+4	4	5
+EXPLAIN SELECT a, SUM(DISTINCT a), MAX(b) FROM t GROUP BY a;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t	index	NULL	a	10	NULL	7	Using index
+SELECT a, MAX(b) FROM t GROUP BY a HAVING SUM(DISTINCT a);
+a	MAX(b)
+1	1
+2	2
+3	3
+4	5
+EXPLAIN SELECT a, MAX(b) FROM t GROUP BY a HAVING SUM(DISTINCT a);
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t	index	NULL	a	10	NULL	7	Using index
+SELECT SUM(DISTINCT a), MIN(b), MAX(b) FROM t;
+SUM(DISTINCT a)	MIN(b)	MAX(b)
+10	0	5
+EXPLAIN SELECT SUM(DISTINCT a), MIN(b), MAX(b) FROM t;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t	index	NULL	a	10	NULL	7	Using index
+SELECT a, SUM(DISTINCT a), MIN(b), MAX(b) FROM t GROUP BY a;
+a	SUM(DISTINCT a)	MIN(b)	MAX(b)
+1	1	0	1
+2	2	2	2
+3	3	2	3
+4	4	4	5
+EXPLAIN SELECT a, SUM(DISTINCT a), MIN(b), MAX(b) FROM t GROUP BY a;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t	index	NULL	a	10	NULL	7	Using index
+DROP TABLE t;
--- a/mysql-test/t/group_min_max.test
+++ b/mysql-test/t/group_min_max.test
@@ -1243,3 +1243,27 @@ explain SELECT f1, COUNT(DISTINCT f2) FROM t1 GROUP BY f1;
 drop table t1;
 --echo # End of test#50539.

+--echo #
+--echo # Bug#17217128 -  BAD INTERACTION BETWEEN MIN/MAX AND
+--echo #                 "HAVING SUM(DISTINCT)": WRONG RESULTS.
+--echo #
+
+CREATE TABLE t (a INT, b INT, KEY(a,b));
+INSERT INTO t VALUES (1,1), (2,2), (3,3), (4,4), (1,0), (3,2), (4,5);
+ANALYZE TABLE t;
+
+SELECT a, SUM(DISTINCT a), MIN(b) FROM t GROUP BY a;
+EXPLAIN SELECT a, SUM(DISTINCT a), MIN(b) FROM t GROUP BY a;
+
+SELECT a, SUM(DISTINCT a), MAX(b) FROM t GROUP BY a;
+EXPLAIN SELECT a, SUM(DISTINCT a), MAX(b) FROM t GROUP BY a;
+
+SELECT a, MAX(b) FROM t GROUP BY a HAVING SUM(DISTINCT a);
+EXPLAIN SELECT a, MAX(b) FROM t GROUP BY a HAVING SUM(DISTINCT a);
+
+SELECT SUM(DISTINCT a), MIN(b), MAX(b) FROM t;
+EXPLAIN SELECT SUM(DISTINCT a), MIN(b), MAX(b) FROM t;
+
+SELECT a, SUM(DISTINCT a), MIN(b), MAX(b) FROM t GROUP BY a;
+EXPLAIN SELECT a, SUM(DISTINCT a), MIN(b), MAX(b) FROM t GROUP BY a;
+DROP TABLE t;
--- a/sql/opt_range.cc
+++ b/sql/opt_range.cc
@@ -9471,6 +9471,16 @@ cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
         never stored after a unique key lookup in the clustered index and
         furhter index_next/prev calls can not be used. So loose index scan
         optimization can not be used in this case.
+    SA7. If Q has both AGG_FUNC(DISTINCT ...) and MIN/MAX() functions then this
+         access method is not used.
+         For above queries MIN/MAX() aggregation has to be done at
+         nested_loops_join (end_send_group). But with current design MIN/MAX()
+         is always set as part of loose index scan. Because of this mismatch
+         MIN() and MAX() values will be set incorrectly. For such queries to
+         work we need a new interface for loose index scan. This new interface
+         should only fetch records with min and max values and let
+         end_send_group to do aggregation. Until then do not use
+         loose_index_scan.
    GA1. If Q has a GROUP BY clause, then GA is a prefix of I. That is, if
         G_i = A_j => i = j.
    GA2. If Q has a DISTINCT clause, then there is a permutation of SA that
@@ -9636,6 +9646,13 @@ get_best_group_min_max(PARAM *param, SEL_TREE *tree, double read_time)
        DBUG_RETURN(NULL);
    }
  }
+
+  /* Check (SA7). */
+  if (is_agg_distinct && (have_max || have_min))
+  {
+    DBUG_RETURN(NULL);
+  }
+
  /* Check (SA5). */
  if (join->select_distinct)
  {