1. 28 Oct, 2016 2 commits
  2. 27 Oct, 2016 1 commit
    • Iliya Manolov's avatar
      neoctl: make 'print ids' command display time of TIDs · d9dd39f0
      Iliya Manolov authored
      Currently, the command "neoctl [arguments] print ids" has the following output:
      
          last_oid = 0x...
          last_tid = 0x...
          last_ptid = ...
      
      or
      
          backup_tid = 0x...
          last_tid = 0x...
          last_ptid = ...
      
      depending on whether the cluster is in normal or backup mode.
      
      This is extremely unreadable since the admin is often interested in the time that corresponds to each tid. Now the output is:
      
          last_oid = 0x...
          last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
          last_ptid = ...
      
      or
      
          backup_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
          last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
          last_ptid = ...
      
      /reviewed-on !2
      d9dd39f0
  3. 25 Oct, 2016 1 commit
  4. 24 Oct, 2016 3 commits
  5. 21 Oct, 2016 1 commit
  6. 19 Oct, 2016 2 commits
  7. 18 Oct, 2016 5 commits
  8. 17 Oct, 2016 1 commit
    • Kirill Smelkov's avatar
      mysql: force _getNextTID() to use appropriate/whole index · eaa00a88
      Kirill Smelkov authored
      Similarly to 13911ca3 on the same instance after MariaDB was upgraded to
      10.1.17 the following query, even after `OPTIMIZE TABLE obj`, started to execute
      very slowly:
      
          MariaDB [(none)]> SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +--------------------+
          | tid                |
          +--------------------+
          | 268707072758797063 |
          +--------------------+
          1 row in set (4.82 sec)
      
      Both explain and analyze says the query will/is using `partition` key but only partially (note key_len is only 10, not 18):
      
          MariaDB [(none)]> SHOW INDEX FROM neo1.obj;
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          | Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          | obj   |          0 | PRIMARY   |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | PRIMARY   |            2 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | PRIMARY   |            3 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            2 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            3 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          1 | data_id   |            1 | data_id     | A         |    28755928 |     NULL | NULL   | YES  | BTREE      |         |               |
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          7 rows in set (0.00 sec)
      
          MariaDB [(none)]> explain SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | Extra                    |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | Using where; Using index |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          1 row in set (0.00 sec)
      
          MariaDB [(none)]> analyze SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | r_rows     | filtered | r_filtered | Extra                    |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | 9741121.00 |   100.00 |       0.00 | Using where; Using index |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          1 row in set (4.93 sec)
      
      By explicitly forcing (partition, oid, tid) index usage which is precisely designed to serve this and similar queries can avoid the query from being slow:
      
          MariaDB [(none)]> analyze SELECT tid FROM neo1.obj FORCE INDEX(`partition`) WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          |    1 | SIMPLE      | obj   | range | partition     | partition | 18      | NULL |    2 |   1.00 |   100.00 |     100.00 | Using where; Using index |
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          1 row in set (0.00 sec)
      
      /cc @jm, @vpelltier, @Tyagov
      
      /reviewed-on nexedi/neoppod!1
      eaa00a88
  9. 12 Oct, 2016 1 commit
  10. 11 Oct, 2016 1 commit
  11. 10 Oct, 2016 1 commit
  12. 29 Sep, 2016 1 commit
  13. 27 Sep, 2016 1 commit
  14. 23 Sep, 2016 1 commit
  15. 22 Sep, 2016 2 commits
  16. 21 Sep, 2016 1 commit
  17. 20 Sep, 2016 1 commit
  18. 19 Sep, 2016 1 commit
  19. 12 Sep, 2016 1 commit
  20. 29 Aug, 2016 2 commits
    • Julien Muchembled's avatar
      mysql: fix use of wrong SQL index when checking for dropped partitions · 13911ca3
      Julien Muchembled authored
      After partitions were dropped with TokuDB, we had a case where MariaDB 10.1.14
      stopped using the most appropriate index.
      
      MariaDB [neo0]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=5;
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      | id   | select_type | table | type  | possible_keys     | key     | key_len | ref  | rows | Extra                                 |
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      |    1 | SIMPLE      | obj   | range | PRIMARY,partition | data_id | 11      | NULL |   10 | Using where; Using index for group-by |
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      MariaDB [neo0]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=5;
      Empty set (1 min 51.47 sec)
      
      Expected:
      
      MariaDB [neo1]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=4;
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      | id   | select_type | table | type | possible_keys     | key     | key_len | ref   | rows | Extra                        |
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | PRIMARY | 2       | const |    1 | Using where; Using temporary |
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      1 row in set (0.00 sec)
      MariaDB [neo1]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=4;
      Empty set (0.00 sec)
      
      Restarting the server or 'OPTIMIZE TABLE obj; ' does not help.
      
      Such issue could prevent the cluster to start due to timeouts, by always going
      back to RECOVERING state.
      13911ca3
    • Julien Muchembled's avatar
      Update TODO · 00ffb1ef
      Julien Muchembled authored
      00ffb1ef
  21. 11 Aug, 2016 2 commits
    • Julien Muchembled's avatar
      Add test to check that a moved cell doesn't cause POSKeyError · df990a05
      Julien Muchembled authored
      Freeing disk space when a cell is dropped will have to be implemented with care,
      not only for performance reasons.
      df990a05
    • Julien Muchembled's avatar
      mysql: do not use unsafe TRUNCATE statement · c3c2ffe2
      Julien Muchembled authored
      TRUNCATE was chosen for performance reasons, but it's usually done on small
      tables, and not for performance-critical operations. TRUNCATE commits
      implicitely, so for pt/ttrans in particular, it's certainly slower due to extra
      fsyncs to disk.
      
      On the other side, committing too early can corrupt the database if the storage
      node is stopped just after. For example, a failure in changePartitionTable()
      can cause 'pt' to remain empty.
      c3c2ffe2
  22. 01 Aug, 2016 2 commits
  23. 31 Jul, 2016 1 commit
    • Julien Muchembled's avatar
      storage: review TransactionManager.abortFor · 2d388048
      Julien Muchembled authored
      This reverts commit 7aecdada partially.
      There seems to be no bug here, because:
      - abortFor() is only called upon a notification from the master that a client
        is disconnected,
      - and from the same TCP connection, we only receive a LockInformation packet
        if there's still such a transaction on the master side.
      
      The code removed in abortFor() was redundant with abort().
      2d388048
  24. 27 Jul, 2016 5 commits
    • Julien Muchembled's avatar
      cb144fdb
    • Julien Muchembled's avatar
      38583af9
    • Julien Muchembled's avatar
      client: do not limit the number of open connections to storage nodes · 77132157
      Julien Muchembled authored
      There was a bug that connections were not maintained during a TPC,
      which caused transactions to be aborted when the limit was reached.
      
      Given that oids are spreaded evenly over all partitions, and that clients always
      write to all cells of each involved partitions, clients would spend their time
      reconnecting to storage nodes as soon as the limit is reached. So such feature
      really looks counter-productive.
      77132157
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      client: fix conflict of node id by never reading from storage without being connected to the master · 11d83ad9
      Julien Muchembled authored
      Client nodes ignored the state of the connection to the master node when reading
      data from storage, as long as their partition tables were recent enough. This
      way, they were able to finish read-only transactions even if they could't reach
      the master, which could be useful for high availability. The downside is that
      the master node ignored that their node ids were still used, which causes "uuid"
      conflicts when reallocating them.
      
      Rejected solutions:
      - An unused NEO Storage should not insist in staying connected to master node.
      - Reverting to big random node identifiers is a lot of work and it would make
        debugging annoying (see commit 23fad3af).
      - Always increasing node ids could have been a simple solution if we accepted
        that the cluster dies after that all 2^24 possible ids were allocated.
      
      Given that reading from storage without being connected to the master can only
      be useful to finish the current transaction (because we always ping the master
      at the beginning of every transaction), keeping such feature is not worth the
      effort.
      
      This commit fixes id conflicts in a very simple way, by clearing the partition
      table upon primary node failure, which forces reconnection to the master before
      querying any storage node. In such case, we raise a special exception that will
      cause the transaction to be restarted, so that the user does not get errors for
      temporary connection failures.
      11d83ad9