Work around poor UPDATE use of index.

UPDATE query is exected to use the existing index on (processing_node, priority, date) both for WHERE and ORDER BY, as is expected from EXPLAIN-ing the equivalent SELECT: MariaDB [erp5]> explain select uid from message_queue WHERE processing_node=0 AND date <= '2013-06-06 22:22:49' ORDER BY priority, date LIMIT 1; +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ | 1 | SIMPLE | message_queue | ref | processing_node_processing,processing_node_priority_date | processing_node_priority_date | 2 | const | 26622 | Using where; Using index | +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ If it weren't using the index for ORDER BY, "Extra" would contain "Using filesort". Still, UPDATE behaves differently: # User@Host: user[user] @ [10.0.0.3] # Thread_id: 1635880 Schema: erp5 QC_hit: No # Query_time: 2.668405 Lock_time: 2.460698 Rows_sent: 0 Rows_examined: 49263 # Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: Yes Filesort_on_disk: No Merge_passes: 0 SET TIMESTAMP=1370557446; UPDATE message_queue SET processing_node=12 WHERE processing_node=0 AND DATE <= '2013-06-06 22:24:04' ORDER BY priority, DATE LIMIT 1; So change the UPDATE..SELECT pattern into a SELECT FOR UPDATE..UPDATE pattern, so SELECT's correct execution plan is used.

Work around poor UPDATE use of index.
UPDATE query is exected to use the existing index on (processing_node, priority, date) both for WHERE and ORDER BY, as is expected from EXPLAIN-ing the equivalent SELECT: MariaDB [erp5]> explain select uid from message_queue WHERE processing_node=0 AND date <= '2013-06-06 22:22:49' ORDER BY priority, date LIMIT 1; +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ | 1 | SIMPLE | message_queue | ref | processing_node_processing,processing_node_priority_date | processing_node_priority_date | 2 | const | 26622 | Using where; Using index | +------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+ If it weren't using the index for ORDER BY, "Extra" would contain "Using filesort". Still, UPDATE behaves differently: # User@Host: user[user] @ [10.0.0.3] # Thread_id: 1635880 Schema: erp5 QC_hit: No # Query_time: 2.668405 Lock_time: 2.460698 Rows_sent: 0 Rows_examined: 49263 # Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: Yes Filesort_on_disk: No Merge_passes: 0 SET TIMESTAMP=1370557446; UPDATE message_queue SET processing_node=12 WHERE processing_node=0 AND DATE <= '2013-06-06 22:24:04' ORDER BY priority, DATE LIMIT 1; So change the UPDATE..SELECT pattern into a SELECT FOR UPDATE..UPDATE pattern, so SELECT's correct execution plan is used.
7daaf0a5 · Vincent Pelletier · 3b74878d · 7daaf0a5 · 7daaf0a5 · 7daaf0a5
Commit 7daaf0a5 authored 11 years ago by Vincent Pelletier
4 changed files
--- a/product/CMFActivity/Activity/SQLBase.py
+++ b/product/CMFActivity/Activity/SQLBase.py
@@ -193,19 +193,36 @@ class SQLBase(Queue):
        This number is guaranted not to be exceeded.
        If None (or not given) no limit apply.
    """
-    select = activity_tool.SQLBase_selectReservedMessageList
-    if group_method_id:
-      reserve = limit - 1
-    else:
-      result = select(table=self.sql_table, count=limit,
-                      processing_node=processing_node)
-      reserve = limit - len(result)
-    if reserve:
-      activity_tool.SQLBase_reserveMessageList(table=self.sql_table,
-        count=reserve, processing_node=processing_node, to_date=date,
-        group_method_id=group_method_id)
-      result = select(table=self.sql_table,
-                      processing_node=processing_node, count=limit)
+    assert limit
+    # Do not check already-assigned messages when trying to reserve more
+    # activities, because in such case we will find one reserved activity.
+    result = activity_tool.SQLBase_selectReservedMessageList(
+      table=self.sql_table,
+      count=limit,
+      processing_node=processing_node,
+      group_method_id=group_method_id,
+    )
+    limit -= len(result)
+    if limit:
+      reservable = activity_tool.SQLBase_getReservableMessageList(
+        table=self.sql_table,
+        count=limit,
+        processing_node=processing_node,
+        to_date=date,
+        group_method_id=group_method_id,
+      )
+      if reservable:
+        activity_tool.SQLBase_reserveMessageList(
+          uid=[x.uid for x in reservable],
+          table=self.sql_table,
+          processing_node=processing_node,
+        )
+        # DC.ZRDB.Results.Results does not implement concatenation
+        # Implement an imperfect (but cheap) concatenation. Do not update
+        # __items__ nor _data_dictionary.
+        assert result._names == reservable._names, (result._names,
+          reservable._names)
+        result._data += reservable._data
    return result

  def makeMessageListAvailable(self, activity_tool, uid_list):

--- a/product/CMFActivity/skins/activity/SQLBase_getReservableMessageList.zsql
+++ b/product/CMFActivity/skins/activity/SQLBase_getReservableMessageList.zsql
+<dtml-comment>
+title:
+connection_id:cmf_activity_sql_connection
+max_rows:0
+max_cache:0
+cache_time:0
+class_name:
+class_file:
+</dtml-comment>
+<params>table
+processing_node
+to_date
+count
+group_method_id
+</params>
+SELECT
+  *
+FROM
+  <dtml-var table>
+WHERE
+  processing_node=0
+  AND date <= <dtml-sqlvar to_date type="datetime">
+  <dtml-if expr="group_method_id is not None">
+    AND group_method_id = <dtml-sqlvar group_method_id type="string">
+  </dtml-if>
+ORDER BY
+<dtml-comment>
+  During normal operation, sorting by date (as 2nd criteria) is fairer
+  for users and reduce the probability to do the same work several times
+  (think of an object that is modified several times in a short period of time).
+  However, current implementation is not optimal when reindexing a whole site
+  with several mount points (to different ZEO servers), because modules may not
+  be processed in parallel. If you want to speed up ERP5Site_reindexAll,
+  consider:
+  - ordering by 'priority, RAND()' temporarily;
+  - or better, hack ERP5Site_reindexAll so that all reindex messages have
+    identical/random dates (hint: add optional parameter to Folder_reindexAll
+    and Folder_reindexObjectList in order to forward a date from
+    ERP5Site_reindexAll, e.g. current date would work if MySQL
+    shuffles enough lines with same priority/date).
+  - or even better, use NEO <http://www.neoppod.org/>
+  For higher concurrency than 10 or 20 nodes of activity, it might be required
+  to add a random start point to reduce the risk of MySQL locks.
+</dtml-comment>
+  priority, date
+LIMIT <dtml-sqlvar count type="int">
+FOR UPDATE
--- a/product/CMFActivity/skins/activity/SQLBase_reserveMessageList.zsql
+++ b/product/CMFActivity/skins/activity/SQLBase_reserveMessageList.zsql
@@ -9,40 +9,13 @@ class_file:
 </dtml-comment>
 <params>table
 processing_node
-to_date
-count
-group_method_id
+uid
 </params>
 UPDATE
  <dtml-var table>
 SET
  processing_node=<dtml-sqlvar processing_node type="int">
 WHERE
-  processing_node=0
-  AND date <= <dtml-sqlvar to_date type="datetime">
-  <dtml-if expr="group_method_id is not None">
-    AND group_method_id = <dtml-sqlvar group_method_id type="string">
-  </dtml-if>
-ORDER BY
-<dtml-comment>
-  During normal operation, sorting by date (as 2nd criteria) is fairer
-  for users and reduce the probability to do the same work several times
-  (think of an object that is modified several times in a short period of time).
-  However, current implementation is not optimal when reindexing a whole site
-  with several mount points (to different ZEO servers), because modules may not
-  be processed in parallel. If you want to speed up ERP5Site_reindexAll,
-  consider:
-  - ordering by 'priority, RAND()' temporarily;
-  - or better, hack ERP5Site_reindexAll so that all reindex messages have
-    identical/random dates (hint: add optional parameter to Folder_reindexAll
-    and Folder_reindexObjectList in order to forward a date from
-    ERP5Site_reindexAll, e.g. current date would work if MySQL
-    shuffles enough lines with same priority/date).
-  - or even better, use NEO <http://www.neoppod.org/>
-  For higher concurrency than 10 or 20 nodes of activity, it might be required
-  to add a random start point to reduce the risk of MySQL locks.
-</dtml-comment>
-  priority, date
-LIMIT <dtml-sqlvar count type="int">
+  <dtml-sqltest uid type="int" multiple>
 <dtml-var sql_delimiter>
 COMMIT
--- a/product/CMFActivity/skins/activity/SQLBase_selectReservedMessageList.zsql
+++ b/product/CMFActivity/skins/activity/SQLBase_selectReservedMessageList.zsql
@@ -9,6 +9,7 @@ class_file:
 </dtml-comment>
 <params>table
 processing_node
+group_method_id
 count</params>
 SELECT
  *
@@ -16,6 +17,9 @@ FROM
  <dtml-var table>
 WHERE
  processing_node = <dtml-sqlvar processing_node type="int">
+<dtml-if expr="group_method_id is not None">
+  AND group_method_id = <dtml-sqlvar group_method_id type="string">
+</dtml-if>
 <dtml-if expr="count is not None">
  LIMIT <dtml-sqlvar count type="int">
 </dtml-if>