Commit ee1407f7 authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-32268: GNU libc posix_fallocate() may be extremely slow

os_file_set_size(): Let us invoke the Linux system call fallocate(2)
directly, because the GNU libc posix_fallocate() implements a fallback
that writes to the file 1 byte every 4096 or fewer bytes. In one
environment, invoking fallocate() directly would lead to 4 times the
file growth rate during ALTER TABLE. Presumably, what happened was
that the NFS server used a smaller allocation block size than 4096 bytes
and therefore created a heavily fragmented sparse file when
posix_fallocate() was used. For example, extending a file by 4 MiB
would create 1,024 file fragments. When the file is actually being
written to with data, it would be "unsparsed".

The built-in EOPNOTSUPP fallback in os_file_set_size() writes a buffer
of 1 MiB of NUL bytes. This was always used on musl libc and other
Linux implementations of posix_fallocate().
parent 615f4a8c
...@@ -4934,8 +4934,18 @@ os_file_set_size( ...@@ -4934,8 +4934,18 @@ os_file_set_size(
return true; return true;
} }
current_size &= ~4095ULL; current_size &= ~4095ULL;
# ifdef __linux__
if (!fallocate(file, 0, current_size,
size - current_size)) {
err = 0;
break;
}
err = errno;
# else
err = posix_fallocate(file, current_size, err = posix_fallocate(file, current_size,
size - current_size); size - current_size);
# endif
} }
} while (err == EINTR } while (err == EINTR
&& srv_shutdown_state <= SRV_SHUTDOWN_INITIATED); && srv_shutdown_state <= SRV_SHUTDOWN_INITIATED);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment