• V Narayanan's avatar
    Bug#45803 Inaccurate estimates for partial key values with IBMDB2I · f2251c77
    V Narayanan authored
    Some collations were causing IBMDB2I to report
    inaccurate key range estimations to the optimizer
    for LIKE clauses that select substrings. This can
    be seen by running EXPLAIN. This problem primarily
    affects multi-byte and unicode character sets.
    
    This patch involves substantial changes to several
    modules. There are a number of problems with the
    character set and collation handling. These problems
    have been or are being fixed,  and a comprehensive
    test has been included which should provide much
    better coverage than there was before. This test
    is enabled only for IBM i 6.1, because that version
    has support for the greatest number of collations.
    
    mysql-test/suite/ibmdb2i/r/ibmdb2i_collations.result:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      result file for test case.
    mysql-test/suite/ibmdb2i/t/ibmdb2i_collations.test:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      Tests for character sets and collations. This test
      is enabled only for IBM i 6.1, because that version
      has support for the greatest number of collations.
    storage/ibmdb2i/db2i_conversion.cc:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      - Added support in convertFieldChars to enable records_in_range
        to determine how many substitute characters were inserted and
        to suppress conversion warnings.
      
      - Fixed bug which was causing all multi-byte and Unicode fields
        to be created as UTF16 (CCSID 1200) fields in DB2. The corrected
        code will now create UCS2 fields as UCS2 (CCSID 13488), UTF8
        fields (except for utf8_general_ci) as UTF8 (CCSID 1208), and
        all other multi-byte or Unicode fields as UTF16.  This will only
        affect tables that are newly created through the IBMDB2I storage
        engine. Existing IBMDB2I tables will retain the original CCSID
        until recreated. The existing behavior is believed to be
        functionally correct, but it may negatively impact performance
        by causing unnecessary character conversion. Additionally, users
        accessing IBMDB2I tables through DB2 should be aware that mixing 
        tables created before and after this change may require extra type
        casts or other workarounds.  For this reason, users who have
        existing IBMDB2I tables using a Unicode collation other than
        utf8_general_ci are encouraged to recreate their tables (e.g.
        ALTER TABLE t1 ENGINE=IBMDB2I) in order to get the updated CCSIDs
        associated with their DB2 tables.
      
      - Improved error reporting for unsupported character sets by forcing
        a check for the iconv conversion table at table creation time,
        rather than at data access time.
    storage/ibmdb2i/db2i_myconv.h:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      Fix to set errno when iconv fails.
    storage/ibmdb2i/db2i_rir.cc:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      Significant improvements were made to the records_in_range code
      that handles partial length string data in keys for optimizer plan
      estimation. Previously, to obtain an estimate for a partial key
      value, the implementation would perform any necessary character
      conversion and then attempt to determine the unpadded length of
      the partial key by searching for the minimum or maximum sort
      character. While this algorithm was sufficient for most single-byte
      character sets, it did not treat Unicode and multi-byte strings
      correctly. Furthermore, due to an operating system limitation,
      partial keys having UTF8 collations (ICU sort sequences in DB2)
      could not be estimated with this method.
      
      With this patch, the code no longer attempts to explicitly determine
      the unpadded length of the key. Instead, the entire key is converted
      (if necessary), including padding, and then passed to the operating
      system for estimation. Depending on the source and target character
      sets and collations, additional logic is required to correctly
      handle cases in which MySQL uses unconvertible or differently
      -weighted values to pad the key. The bulk of the patch exists
      to implement this additional logic.
    storage/ibmdb2i/ha_ibmdb2i.h:
      Bug#45803 Inaccurate estimates for partial key values with IBMDB2I
      
      The convertFieldChars declaration was updated to support additional 
      optional behaviors.
    f2251c77
db2i_rir.cc 31.9 KB