• Josef Bacik's avatar
    btrfs: use percpu_read_positive instead of sum_positive for need_preempt · 2cdb3909
    Josef Bacik authored
    Looking at perf data for a fio workload I noticed that we were spending
    a pretty large chunk of time (around 5%) doing percpu_counter_sum() in
    need_preemptive_reclaim.  This is silly, as we only want to know if we
    have more ordered than delalloc to see if we should be counting the
    delayed items in our threshold calculation.  Change this to
    percpu_read_positive() to avoid the overhead.
    
    I ran this through fsperf to validate the changes, obviously the latency
    numbers in dbench and fio are quite jittery, so take them as you wish,
    but overall the improvements on throughput, iops, and bw are all
    positive.  Each test was run two times, the given value is the average
    of both runs for their respective column.
    
      btrfs ssd normal test results
    
      bufferedrandwrite16g results
           metric         baseline   current          diff
      ==========================================================
      write_io_kbytes     16777216   16777216     0.00%
      read_clat_ns_p99           0          0     0.00%
      write_bw_bytes      1.04e+08   1.05e+08     1.12%
      read_iops                  0          0     0.00%
      write_clat_ns_p50      13888      11840   -14.75%
      read_io_kbytes             0          0     0.00%
      read_io_bytes              0          0     0.00%
      write_clat_ns_p99      35008      29312   -16.27%
      read_bw_bytes              0          0     0.00%
      elapsed                  170        167    -1.76%
      write_lat_ns_min     4221.50    3762.50   -10.87%
      sys_cpu                39.65      35.37   -10.79%
      write_lat_ns_max    2.67e+10   2.50e+10    -6.63%
      read_lat_ns_min            0          0     0.00%
      write_iops          25270.10   25553.43     1.12%
      read_lat_ns_max            0          0     0.00%
      read_clat_ns_p50           0          0     0.00%
    
      dbench60 results
        metric     baseline   current         diff
      ==================================================
      qpathinfo       11.12     12.73    14.52%
      throughput     416.09    445.66     7.11%
      flush         3485.63   1887.55   -45.85%
      qfileinfo        0.70      1.92   173.86%
      ntcreatex      992.60    695.76   -29.91%
      qfsinfo          2.43      3.71    52.48%
      close            1.67      3.14    88.09%
      sfileinfo       66.54    105.20    58.10%
      rename         809.23    619.59   -23.43%
      find            16.88     15.46    -8.41%
      unlink         820.54    670.86   -18.24%
      writex        3375.20   2637.91   -21.84%
      deltree        386.33    449.98    16.48%
      readx            3.43      3.41    -0.60%
      mkdir            0.05      0.03   -38.46%
      lockx            0.26      0.26    -0.76%
      unlockx          0.81      0.32   -60.33%
    
      dio4kbs16threads results
           metric          baseline       current           diff
      ================================================================
      write_io_kbytes         5249676       3357150   -36.05%
      read_clat_ns_p99              0             0     0.00%
      write_bw_bytes      89583501.50   57291192.50   -36.05%
      read_iops                     0             0     0.00%
      write_clat_ns_p50        242688        263680     8.65%
      read_io_kbytes                0             0     0.00%
      read_io_bytes                 0             0     0.00%
      write_clat_ns_p99      15826944      36732928   132.09%
      read_bw_bytes                 0             0     0.00%
      elapsed                      61            61     0.00%
      write_lat_ns_min          42704         42095    -1.43%
      sys_cpu                    5.27          3.45   -34.52%
      write_lat_ns_max       7.43e+08      9.27e+08    24.71%
      read_lat_ns_min               0             0     0.00%
      write_iops             21870.97      13987.11   -36.05%
      read_lat_ns_max               0             0     0.00%
      read_clat_ns_p50              0             0     0.00%
    
      randwrite2xram results
           metric          baseline       current           diff
      ================================================================
      write_io_kbytes        24831972      28876262    16.29%
      read_clat_ns_p99              0             0     0.00%
      write_bw_bytes      83745273.50   92182192.50    10.07%
      read_iops                     0             0     0.00%
      write_clat_ns_p50         13952         11648   -16.51%
      read_io_kbytes                0             0     0.00%
      read_io_bytes                 0             0     0.00%
      write_clat_ns_p99         50176         52992     5.61%
      read_bw_bytes                 0             0     0.00%
      elapsed                     314           332     5.73%
      write_lat_ns_min        5920.50          5127   -13.40%
      sys_cpu                    7.82          7.35    -6.07%
      write_lat_ns_max       5.27e+10      3.88e+10   -26.44%
      read_lat_ns_min               0             0     0.00%
      write_iops             20445.62      22505.42    10.07%
      read_lat_ns_max               0             0     0.00%
      read_clat_ns_p50              0             0     0.00%
    
      untarfirefox results
      metric    baseline   current        diff
      ==============================================
      elapsed      47.41     47.40   -0.03%
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2cdb3909
space-info.c 53.4 KB