X Notes on I/O

19476823 · Kirill Smelkov · 98780ea4 · 19476823
Commit 19476823 authored Mar 16, 2017 by Kirill Smelkov
Hide whitespace changes
Inline Side-by-side

Showing with 92 additions and 0 deletions

t/neo/storage/fs1/notes_io.txt t/neo/storage/fs1/notes_io.txt +92 -0

No files found.
--- a/t/neo/storage/fs1/notes_io.txt
+++ b/t/neo/storage/fs1/notes_io.txt
+Notes on Input/Output
+---------------------
+
+Several options available here:
+
+pread
+~~~~~
+
+The kernel handles both disk I/O and caching (in pagecache).
+For hot cache case:
+
+Cost = C(pread(n)) = α + β⋅n
+
+α - syscall cost
+β - cost to copy 1 byte         (both src and dst is in cache)
+
+α is quite big ≈ (200 - 300) ns
+α/β ≈ 2-3.5 · 10^4
+
+see details here: https://github.com/golang/go/issues/19563
+
+thus:
+
+the cost to pread 1 page (n ≈ 4·10^3) is ~ (1.1 - 1.2) · α
+the cost to copy  1 page              is ~ (0.1 - 0.2) · α
+
+if there are many small reads and for each read syscall is made it works slow
+becaus α is big.
+
+
+pread + user-buffer
+~~~~~~~~~~~~~~~~~~~
+
+It is possible to mitigate high α and buffer data from bigger reads in
+user-space and for smaller client reads copy data from that buffer.
+
+Performance depends on buffer hit/miss ration and will be evaluated for simple
+1-page buffer.
+
+
+mmap
+~~~~
+
+The kernel handles both disk I/O and caching (in pagecache).
+
+Cost ~ α (XXX recheck) is spent on first-time access.
+Future accesses to page, given it is still in page-cache, does not incur α cost.
+
+However I/O errors are reported as SIGBUS on memory access. Thus if for read
+requst pointer to mmaped-memory is returned, clients could get I/O errors as
+exceptions potentially everywhere.
+
+To get & check I/O errors on actual read request the read service will thus
+need to access and copy data from mmapped-memory to other buffer incurring β⋅n
+cost in hot-cache case.
+
+Not doing the copy can lead to situation where data was first read/checked by
+read service ok, then evicted from page-cache by kernel, then accessed by
+client which cause real disk I/O, and if this I/O fails -> client get SIGBUS.
+
+Another potential disadvantage: if memory access causes disk I/O whole thread
+is blocked, not only goroutine which issued the access.
+
+Note: madvice should be used to guide kernel cache read-ahead/backwards or
+where we are planning to access data next. madvice is syscall so this can add α
+back.
+
+...
+
+Direct I/O
+~~~~~~~~~~
+
+Kernel handles disk I/O directly to user-space memory.
+The kernel does not handle caching.
+
+Cache must be implemented in user-space.
+
+pros:
+
+  - kernel is accessed only when there is real need for disk IO.
+  - memory can be managed completely by "us" in userspace.
+  - what to cache and preload can be more integrated with client workload.
+  - various copy discipline for reads are possible,
+    including providing pointer to in-cache data to clients (though this
+    requires implementing ref-count and such)
+
+
+cons:
+
+  - harder to implement
+  - Linus dislikes Direct I/O very much
+  - probably more kernel bugs as this is kind of more exotic area