- 09 Jul, 2024 1 commit
-
-
Andreas Gruenbacher authored
The logic for determining when to demote a glock in glock_work_func(), introduced in commit 7cf8dcd3 ("GFS2: Automatically adjust glock min hold time"), doesn't make sense: inode glocks have a minimum hold time that delays demotion, while all other glocks are expected to be demoted immediately. Instead of demoting non-inode glocks immediately, glock_work_func() schedules glock work for them to be demoted, however. Get rid of that unnecessary indirection. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 20 Jun, 2024 3 commits
-
-
Andreas Gruenbacher authored
Since the previous commit, function gfs2_quota_sync() will not cause the sync generation to creep forward by one every time the function is called; this helps keep things a but more tidy. We also don't care that this function allocates a page of memory every time it is called, so no good reason for keeping qd_changed() anymore, which just duplicates qd_grab_sync(). This reverts commit 06aa6fd3. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The quota sync generation is only ever updated under sd_quota_sync_mutex by gfs2_quota_sync(), but its current value is fetched ouside of that mutex, so use WRITE_ONCE() and READ_ONCE() when accessing it without holding that mutex. Pass the current sync generation to do_sync() from its callers to ensure that we're not recording the wrong generation when the syncing is done. Also, make sure that qd->qd_sync_gen only ever moves forward. In gfs2_quota_sync(), only write the new sync generation when we know that there are changes. This eliminates the need for function sd_changed(), which we will remove in the next commit. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
With the locking the previous patch has introduced for each struct gfs2_quota_data object, sd_quota_mutex has become largely irrelevant. By waiting on the buffer head instead of waiting on the mutex in get_bh(), it becomes completely irrelevant and can be removed. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 12 Jun, 2024 1 commit
-
-
Andreas Gruenbacher authored
The quota code is missing some locking between local quota changes and syncing those quota changes to the global quota file (gfs2_quota_sync); in particular, qd->qd_change needs to be kept in sync with the QDF_CHANGE change flag and the number of references held. Use the qd->qd_lockref.lock spinlock for that. With the qd->qd_lockref.lock spinlock held, we can no longer call lockref_get(), so turn qd_hold() into a variant that assumes that the lock is held. This function is really supposed to take an additional reference when one or more references are already held, so check for that instead of checking if the lockref is dead. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 08 Jun, 2024 7 commits
-
-
Andreas Gruenbacher authored
The split between qd_fish() and gfs2_quota_sync() is rather unfortunate as qd_fish() is repeatedly called to scan sdp->sd_quota_list only to find the next object to that needs syncing; if there are multiple objects on the list that need syncing, it makes more sense to grab them all in one go. This is relatively easy to do when qd_fish() is folded into gfs2_quota_sync(). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Rename variable 'value' to 'change' as it stores a change in value. Add new 'value' and 'limit' variables for the current value and limit. Only fetch the tuning parameters when we need them. Get rid of unnecessary nesting. No change in functionality. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Function do_qc() is supposed to be conceptually simple: it alters the current in-memory and on-disk quota change values for a given uid/gid by a given delta. If the on-disk record isn't defined yet, a new record is created. If the on-disk record exists and the resulting change value is zero, there no longer is a need for that record and so the record is deleted. On top of that, some reference counting is involved when creating and deleting records. Currently, instead of doing the above, do_qc() alters the on-disk value and then it sets the in-memory value to the on-disk value. This is incorrect when the on-disk value differs from the in-memory value. The two values are allowed to differ when quota changes are synced to the global quota file. Fix by changing both values by the same amount. In addition, do_qc() currently gets confused when the delta value is 0. It isn't supposed to be called that way, but that assumption isn't mentioned and it makes the code harder to read. Make the code more explicit. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Commit 432928c9 ("gfs2: Add quota_change type") makes the incorrect assertion that function do_qc() should behave differently in the two contexts it is used in, but that isn't actually true. In all cases, do_qc() grabs a "reference" when it starts using a slot in the per-node quota changes file, and it releases that "reference" when no more residual changes remain. Revert that broken commit. There are some remaining issues with function do_qc() which are addressed in the next commit. This reverts commit 432928c9. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Commit 4c6a0812 ("gfs2: ignore negated quota changes") skips quota changes with qd_change == 0 instead of writing them back, which leaves behind non-zero qd_change values in the affected slots. The kernel then assumes that those slots are unused, while the qd_change values on disk indicate that they are indeed still in use. The next time the filesystem is mounted, those invalid slots are read in from disk, which will cause inconsistencies. Revert that commit to avoid filesystem corruption. This reverts commit 4c6a0812. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Rename qd_check_sync() to qd_grab_sync() and make it return a bool. Turn the sync_gen pointer into a regular u64 and pass in U64_MAX instead of a NULL pointer when sync generation checking isn't needed. Introduce a new qd_ungrab_sync() helper for undoing the effects of qd_grab_sync() if the subsequent bh_get() on the qd object fails. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The qd_bh_get_or_undo() helper introduced by that commit doesn't improve the code much, so revert it and clean things up in a more useful way in the next commit. This reverts commit 7dbc6ae6. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 07 Jun, 2024 1 commit
-
-
Andreas Gruenbacher authored
In gfs2_quota_init(), make sure that the per-node "quota_change%u" file doesn't contain duplicate uids/gids. Those duplicates would cause us to acquire the glock corresponding to those ids repeatedly, which the glock code doesn't allow. When finding inconsistencies, we wipe them out and ignore them. The resulting quotas will likely be inconsistent, and running quotacheck(1) is advised. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 04 Jun, 2024 1 commit
-
-
Andreas Gruenbacher authored
Add a fail_brelse label and use it where useful. Move variable bh out of the loop to extend its visibility to the new label. No functional change. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 29 May, 2024 9 commits
-
-
Andreas Gruenbacher authored
The demote_ok glock operation is only still used to prevent the inode glocks of the "jindex" and "rindex" directories from getting recycled while they are still referenced by sdp->sd_jindex and sdp->sd_rindex. However, the LRU walking code will no longer recycle glocks which are referenced, so the demote_ok glock operation is obsolete and can be removed. Each of a glock's holders in the gl_holders list is holding a reference on the glock, so when the list of holders isn't empty in demote_ok(), the existing reference count check will already prevent the glock from getting released. This means that demote_ok() is obsolete as well. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
This reverts commit e7ccaf5f. Before commit e7ccaf5f, every time a resource group glock was dequeued by gfs2_glock_dq(), it was added to the glock LRU list even though the glock was still referenced by the resource group and could never be evicted, anyway. Commit e7ccaf5f added a GLOF_LRU hack to avoid that overhead for resource group glocks, and that hack was since adopted for some other types of glocks as well. We now no longer add glocks to the glock LRU list while they are still referenced. This solves the underlying problem, and obsoletes the GLOF_LRU hack. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
-
Andreas Gruenbacher authored
In the current glock reference counting model, a bias of one is added to a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED). A glock is removed from the lru_list when it is enqueued, and added back when it is dequeued. This isn't a very appropriate model because most glocks are held for long periods of time (for example, the inode "owns" references to its inode and iopen glocks as long as the inode is cached even when the glock state changes to LM_ST_UNLOCKED), and they can only be freed when they are no longer referenced, anyway. Fix this by getting rid of the refcount bias for locked glocks. That way, we can use lockref_put_or_lock() to efficiently drop all but the last glock reference, and put the glock onto the lru_list when the last reference is dropped. When find_insert_glock() returns a reference to a cached glock, it removes the glock from the lru_list. Dumping the "glocks" and "glstats" debugfs files also takes glock references, but instead of removing the glocks from the lru_list in that case as well, we leave them on the list. This ensures that dumping those files won't perturb the order of the glocks on the lru_list. In addition, when the last reference to an *unlocked* glock is dropped, we immediately free it; this preserves the preexisting behavior. If it later turns out that caching unlocked glocks is useful in some situations, we can change the caching strategy. It is currently unclear if a glock that has no active references can have the GLF_LFLUSH flag set. To make sure that such a glock won't accidentally be evicted due to memory pressure, we add a GLF_LFLUSH check to gfs2_dispose_glock_lru(). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Switch to a per-filesystem glock workqueue. Additional workqueues are cheap nowadays, and keeping separate workqueues allows to flush the work of each filesystem without affecting the others. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
When glocks cannot be freed for a long time, avoid the "task blocked for more than N seconds" messages and report how many glocks are still outstanding, instead. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Clean up the messy code in gfs2_glock_get(). No change in functionality. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Invert the meaning of the GLF_INITIAL flag: right now, when GLF_INITIAL is set, a DLM lock exists and we have a valid identifier for it; when GLF_INITIAL is cleared, no DLM lock exists (yet). This is confusing. In addition, it makes more sense to highlight the exceptional case (i.e., no DLM lock exists yet) in glock dumps and trace points than to highlight the common case. To avoid confusion between the "old" and the "new" meaning of the flag, use 'a' instead of 'I' to represent the flag. For improved code consistency, check if the GLF_INITIAL flag is cleared to determine whether a DLM lock exists instead of checking if the lock identifier is non-zero. Document what the flag is used for. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Rearrange the table of locking modes and associated caching capability to be in order of increasing caching capability. Update the description of the glock operations. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 28 May, 2024 6 commits
-
-
Andreas Gruenbacher authored
Function handle_callback() is used to request a glock demote. This often happens in response to a conflicting remote locking request and subsequent bast callback from DLM, but there are other reasons for triggering a demote request as well, such as when trying to release a glock in response to memory pressure. To clarify that, rename the function to request_demote(). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The GLF_FROZEN flag indicates that a reply to a DLM locking request has been received, but should not be processed at this time. To clarify that meaning, rename the flag to GLF_HAVE_FROZEN_REPLY. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The GLF_REPLY_PENDING flag indicates to glock_work_func() that in response to a locking request, DLM has sent a reply that needs to be processed. A flag with that name could as well indicate that we are waiting on a reply from DLM, however. To disambiguate these two cases, rename the flag to GLF_HAVE_REPLY. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Rename the GLF_FREEING flag to GLF_UNLOCKED, and the ->go_free glock operation to ->go_unlocked. This mechanism is used to wait for the underlying DLM lock to be unlocked; being able to free the glock is a consequence of the DLM lock being unlocked. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The return statement at the end of run_queue() is useless. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Function __gfs2_glock_dq() gets defined before it is used, so there is no need for a separate function declaration. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- 26 May, 2024 5 commits
-
-
Linus Torvalds authored
-
Kent Overstreet authored
percpu.h depends on smp.h, but doesn't include it directly because of circular header dependency issues; percpu.h is needed in a bunch of low level headers. This fixes a randconfig build error on mips: include/linux/alloc_tag.h: In function '__alloc_tag_ref_set': include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration] Reported-by:
kernel test robot <lkp@intel.com> Fixes: 24e44cc2 ("mm: percpu: enable per-cpu allocation tagging") Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tool fix from Arnaldo Carvalho de Melo: "Revert a patch causing a regression. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels". * tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
-
Arnaldo Carvalho de Melo authored
This reverts commit 617824a7. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed at length in the threads in the Link tags below. The fix provided by Ian wasn't acceptable and work to fix this will take time we don't have at this point, so lets revert this and work on it on the next devel cycle. Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.comSigned-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: - two important netfs integration fixes - including for a data corruption and also fixes for multiple xfstests - reenable swap support over SMB3 * tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix missing set of remote_i_size cifs: Fix smb3_insert_range() to move the zero_point cifs: update internal version number smb3: reenable swapfiles over SMB3 mounts
-
- 25 May, 2024 6 commits
-
-
Linus Torvalds authored
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes, 11 of which are cc:stable. A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts" * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/ksm: fix possible UAF of stable_node mm/memory-failure: fix handling of dissolved but not taken off from buddy pages mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again nilfs2: fix potential hang in nilfs_detach_log_writer() nilfs2: fix unexpected freezing of nilfs_segctor_sync() nilfs2: fix use-after-free of timer for log writer thread selftests/mm: fix build warnings on ppc64 arm64: patching: fix handling of execmem addresses selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages selftests/mm: compaction_test: fix bogus test success on Aarch64 mailmap: update email address for Satya Priya mm/huge_memory: don't unpoison huge_zero_folio kasan, fortify: properly rename memintrinsics lib: add version into /proc/allocinfo output mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Ingo Molnar: - Fix x86 IRQ vector leak caused by a CPU offlining race - Fix build failure in the riscv-imsic irqchip driver caused by an API-change semantic conflict - Fix use-after-free in irq_find_at_or_after() * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after() genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: - Fix regressions of the new x86 CPU VFM (vendor/family/model) enumeration/matching code - Fix crash kernel detection on buggy firmware with non-compliant ACPI MADT tables - Address Kconfig warning * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL crypto: x86/aes-xts - switch to new Intel CPU model defines x86/topology: Handle bogus ACPI tables correctly x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
-
https://github.com/cminyard/linux-ipmiLinus Torvalds authored
Pull ipmi updates from Corey Minyard: "Mostly updates for deprecated interfaces, platform.remove and converting from a tasklet to a BH workqueue. Also use HAS_IOPORT for disabling inb()/outb()" * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi: ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void ipmi: ipmi_ssif: Convert to platform remove callback returning void ipmi: ipmi_si_platform: Convert to platform remove callback returning void ipmi: ipmi_powernv: Convert to platform remove callback returning void ipmi: bt-bmc: Convert to platform remove callback returning void char: ipmi: handle HAS_IOPORT dependencies ipmi: Convert from tasklet to BH workqueue
-
https://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph updates from Ilya Dryomov: "A series from Xiubo that adds support for additional access checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup" * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client: doc: ceph: update userspace command to get CephFS metadata ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit ceph: check the cephx mds auth access for async dirop ceph: check the cephx mds auth access for open ceph: check the cephx mds auth access for setattr ceph: add ceph_mds_check_access() helper ceph: save cap_auths in MDS client when session is opened
-
https://github.com/Paragon-Software-Group/linux-ntfs3Linus Torvalds authored
Pull ntfs3 updates from Konstantin Komarov: "Fixes: - reusing of the file index (could cause the file to be trimmed) - infinite dir enumeration - taking DOS names into account during link counting - le32_to_cpu conversion, 32 bit overflow, NULL check - some code was refactored Changes: - removed max link count info display during driver init Remove: - atomic_open has been removed for lack of use" * tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Break dir enumeration if directory contents error fs/ntfs3: Fix case when index is reused during tree transformation fs/ntfs3: Mark volume as dirty if xattr is broken fs/ntfs3: Always make file nonresident on fallocate call fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode fs/ntfs3: Use variable length array instead of fixed size fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow fs/ntfs3: Check 'folio' pointer for NULL fs/ntfs3: Missed le32_to_cpu conversion fs/ntfs3: Remove max link count info display during driver init fs/ntfs3: Taking DOS names into account during link counting fs/ntfs3: remove atomic_open fs/ntfs3: use kcalloc() instead of kzalloc()
-