bigfile/tests/bench_zblkfmt · 0c6f08509e319770f7527c06b07816f9d96db52c · Kirill Smelkov / wendelin.core

fixup! ZBigFile: Add ZBlk format option 'h' (heuristic) · 0c6f0850

Kirill Smelkov authored Mar 29, 2024

Rework the benchmark:

- cleanup benchmark directory after benchmark run.
  I'm running out of memory after several benchmark runs because my /tmp is on
  tmpfs and it is generally a leak not to clean after test run.

- Use only [size]int instead of [size][2]int as test array.
  Besides major dimension array shape is orthogonal to testing how storage behaves.

- Benchmark both append and random write workloads, not only append.
  It is generally good to run the benchmark and have full set of numbers.

- Fix off-by-one error in accessrand: random.randint(a,b) returns [a,b], not
  [a,b) and so using A[randint(0, arraysize)] can result in A[arraysize] which
  will go beyound last array element.
  Also replace arraysize with len(A) for better clarity.

- Rework read access benchmark to robustly never access the same block twice.
  Previously the code was setting just niter=10 and hoping that a block would
  never be hit for the same time, but in the benchmarks we have not so
  many blocks and blindly selecting 10 random of them starts to overloap.
  Updated code makes sure to load any block only up to one time.

- Do not manually set sys.path when running the benchmark:
  When tests are run wendelin.core is expectied to be installed in development
  mode via e.g. `pip install -e`, or, under buildout, via using custom python
  interpreter that has wendelin.core egg on path. This way path setup is
  already such that import wendelin.core should work. And if that would not be
  the case, we would have to adjust sys.path in every test or demo program.

- Use unified benchmarking format for the output, so that tools like benchstat
  could be used to aggregate and compare results.

- Remove code to raise RLIMIT_NOFILE.
  In the benchmark we use only one array and the amount of needed file
  descriptors is proportional to the number of used arrays. In other words he
  benchmark should not be a heavy user of the file descriptors.

  With `ulimit -n 20` the benchmarks run just ok, while the system
  default is usually 1024 or similar.

- Remove usage of bash - the benchmark spawns processes from itself via python code.

- Restructure the code for clarity.

- Rename the benchmark to start with bench_ similarly to other existing
  benchmarks.

0c6f0850

bench_zblkfmt 8.11 KB

Replace bench_zblkfmt