Commit 681ce862 authored by Yafang Shao's avatar Yafang Shao Committed by Linus Torvalds

vfs: Delete the associated dentry when deleting a file

Our applications, built on Elasticsearch[0], frequently create and
delete files.  These applications operate within containers, some with a
memory limit exceeding 100GB.  Over prolonged periods, the accumulation
of negative dentries within these containers can amount to tens of
gigabytes.

Upon container exit, directories are deleted.  However, due to the
numerous associated dentries, this process can be time-consuming.  Our
users have expressed frustration with this prolonged exit duration,
which constitutes our first issue.

Simultaneously, other processes may attempt to access the parent
directory of the Elasticsearch directories.  Since the task responsible
for deleting the dentries holds the inode lock, processes attempting
directory lookup experience significant delays.  This issue, our second
problem, is easily demonstrated:

  - Task 1 generates negative dentries:
  $ pwd
  ~/test
  $ mkdir es && cd es/ && ./create_and_delete_files.sh

  [ After generating tens of GB dentries ]

  $ cd ~/test && rm -rf es

  [ It will take a long duration to finish ]

  - Task 2 attempts to lookup the 'test/' directory
  $ pwd
  ~/test
  $ ls

  The 'ls' command in Task 2 experiences prolonged execution as Task 1
  is deleting the dentries.

We've devised a solution to address both issues by deleting associated
dentry when removing a file.  Interestingly, we've noted that a similar
patch was proposed years ago[1], although it was rejected citing the
absence of tangible issues caused by negative dentries.  Given our
current challenges, we're resubmitting the proposal.  All relevant
stakeholders from previous discussions have been included for reference.

Some alternative solutions are also under discussion[2][3], such as
shrinking child dentries outside of the parent inode lock or even
asynchronously shrinking child dentries.  However, given the
straightforward nature of the current solution, I believe this approach
is still necessary.

[ NOTE! This is a pretty fundamental change in how we deal with
  unlinking dentries, and it doesn't change the fact that you can have
  lots of negative dentries from just doing negative lookups.

  But the kernel test robot is at least initially happy with this from a
  performance angle, so I'm applying this ASAP just to get more testing
  and as a "known fix for an issue people hit in real life".

  Put another way: we should still look at the alternatives, and this
  patch may get reverted if somebody finds a performance regression on
  some other load.       - Linus ]
Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
Link: https://github.com/elastic/elasticsearch [0]
Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20240511200240.6354-2-torvalds@linux-foundation.org/ [2]
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjEMf8Du4UFzxuToGDnF3yLaMcrYeyNAaH1NJWa6fwcNQ@mail.gmail.com/ [3]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Wangkai <wangkai86@huawei.com>
Cc: Colin Walters <walters@verbum.org>
Tested-by: default avatarkernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/202405221518.ecea2810-oliver.sang@intel.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 29c73fc7
...@@ -2360,19 +2360,17 @@ EXPORT_SYMBOL(d_hash_and_lookup); ...@@ -2360,19 +2360,17 @@ EXPORT_SYMBOL(d_hash_and_lookup);
* - unhash this dentry and free it. * - unhash this dentry and free it.
* *
* Usually, we want to just turn this into * Usually, we want to just turn this into
* a negative dentry, but if anybody else is * a negative dentry, but certain workloads can
* currently using the dentry or the inode * generate a large number of negative dentries.
* we can't do that and we fall back on removing * Therefore, it would be better to simply
* it from the hash queues and waiting for * unhash it.
* it to be deleted later when it has no users
*/ */
/** /**
* d_delete - delete a dentry * d_delete - delete a dentry
* @dentry: The dentry to delete * @dentry: The dentry to delete
* *
* Turn the dentry into a negative dentry if possible, otherwise * Remove the dentry from the hash queues so it can be deleted later.
* remove it from the hash queues so it can be deleted later
*/ */
void d_delete(struct dentry * dentry) void d_delete(struct dentry * dentry)
...@@ -2381,6 +2379,8 @@ void d_delete(struct dentry * dentry) ...@@ -2381,6 +2379,8 @@ void d_delete(struct dentry * dentry)
spin_lock(&inode->i_lock); spin_lock(&inode->i_lock);
spin_lock(&dentry->d_lock); spin_lock(&dentry->d_lock);
__d_drop(dentry);
/* /*
* Are we the only user? * Are we the only user?
*/ */
...@@ -2388,7 +2388,6 @@ void d_delete(struct dentry * dentry) ...@@ -2388,7 +2388,6 @@ void d_delete(struct dentry * dentry)
dentry->d_flags &= ~DCACHE_CANT_MOUNT; dentry->d_flags &= ~DCACHE_CANT_MOUNT;
dentry_unlink_inode(dentry); dentry_unlink_inode(dentry);
} else { } else {
__d_drop(dentry);
spin_unlock(&dentry->d_lock); spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock); spin_unlock(&inode->i_lock);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment