• Alexander Barkov's avatar
    MDEV-27009 Add UCA-14.0.0 collations - dump logical positions and contractions · d7ffb7c3
    Alexander Barkov authored
    - uca-dump can now dump logical positions as a set of "#define" directives.
      Logical positions for 4.0.0 and for 5.2.0 were calculated and put into
      ctype-uca.c manually. That required some efforts by analyzing allkeys.txt
      with help of grep and sort.
      Now when defining a new MY_UCA_INFO it's possible to use the new #define's
      instead of calculating logical positions manually.
      Logical positions also print their weights in DUCET format as a comment
      before the define:
    
    /*
    [.0000.0021.0002]
    [.0000.0117.0002]
    */
    
      The comment helps to know weight ranges on various levels,
      which makes it easier to debug the code.
    
    - uca-dump can now dump built-in DUCET contractions
    
    - Adding a new uca-dump command line option --no-contractions, this is useful
      if one needs to re-dump 4.0.0 and 5.2.0 data in ctype-uca.c compatible way.
    
    - Adding a new uca-dump command line options --case-first=upper|level.
      This can be useful if one need to dump with UPPER case first by default.
      It's not yet decided if we'll use --case-first=upper during the dump though.
    
    - Moving parts of the code from the main loop into separate functions
      parse_chars() and parse_weights(). This allows to reuse the code between
      single characters and contractions.
    
    - Adding a new function my_ducet_weight_normalize(), to cut zero weights
      from a weight string, e.g. [AAAA][0000][BBBB] -> [AAAA][BBBB].
      This helps to reuse the code between single characters and contractions.
    
    - Weight normalization is now done before printing, in separate loops inside
      my_ducet_normalize(). Before this change, normalization was done during
      priting, inside the printing loop. This helps to separate steps:
      loading -> normalizing -> printing.
      This makes it easier to follow what's going on, e.g. while debugging.
    
    - Fixing ctype-uca.c to handle built-in contractions of any length.
      Previously we had only built-in contractions in utf8mb4_thai_520_w2,
      which contains only 2-character contractions.
    d7ffb7c3
ctype-uca.c 2.12 MB
The source could not be displayed because it is larger than 1 MB. You can load it anyway or download it instead.