- 25 Jul, 2008 40 commits
-
-
Manfred Spraul authored
The attached patch: - reverses the locking order of ulp->lock and sem_lock: Previously, it was first ulp->lock, then inside sem_lock. Now it's the other way around. - converts the undo structure to rcu. Benefits: - With the old locking order, IPC_RMID could not kfree the undo structures. The stale entries remained in the linked lists and were released later. - The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that recreates exactly the same id happen between find_alloc_undo() and sem_lock, then semtimedop() would access already kfree'd memory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
sem_array.sem_pending is a double linked list, the attached patch converts it to struct list_head. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
sem_queue.sma and sem_queue.id were never used, the attached patch removes them. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
The undo structures contain two linked lists, the attached patch replaces them with generic struct list_head lists. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Remove the ipc_lock_down() routines: they used to call idr_find() locklessly (given that the ipc ids lock was already held), so they are not needed anymore. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU protected. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Introduce the free_layer() routine: it is the one that actually frees memory after a grace period has elapsed. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Make idr_find rcu-safe: it can now be called inside an rcu_read critical section. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Make the idr_get_new* routines rcu-safe. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Do some code factorization in the return code analysis. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
Fix the incomplete printk call. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
This is a trivial patch that renames: . alloc_layer to get_from_free_list since it idr_pre_get that actually allocates memory. . free_layer to move_to_free_list since memory is not actually freed there. This makes things more clear for the next patches. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nadia Derbey authored
After scalability problems have been detected when using the sysV ipcs, I have proposed to use an RCU based implementation of the IDR api instead (see threads http://lkml.org/lkml/2008/4/11/212 and http://lkml.org/lkml/2008/4/29/295). This resulted in many people asking to convert the idr API and make it rcu safe (because most of the code was duplicated and thus unmaintanable and unreviewable). So here is a first attempt. The important change wrt to the idr API itself is during idr removes: idr layers are freed after a grace period, instead of being moved to the free list. The important change wrt to ipcs, is that idr_find() can now be called locklessly inside a rcu read critical section. Here are the results I've got for the pmsg test sent by Manfred: 2.6.25-rc3-mm1 2.6.25-rc3-mm1+ 2.6.25-mm1 Patched 2.6.25-mm1 1 1168441 1064021 876000 947488 2 1094264 921059 1549592 1730685 3 2082520 1738165 1694370 2324880 4 2079929 1695521 404553 2400408 5 2898758 406566 391283 3246580 6 2921417 261275 263249 3752148 7 3308761 126056 191742 4243142 8 3329456 100129 141722 4275780 1st column: stock 2.6.25-rc3-mm1 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs) 3nd column: stock 2.6.25-mm1 4th column: 2.6.25-mm1 + this pacth series. This patch: Add an rcu_head to the idr_layer structure in order to free it after a grace period. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Chandru authored
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366 box. The ongoing dma's of aacraid from the first kernel continue to exist until the driver is loaded in the kdump kernel. Calgary is initialized prior to aacraid and creation of new tce tables causes wrong dma's to occur. Here we try to get the tce tables of the first kernel in kdump kernel and use them. While in the kdump kernel we do not allocate new tce tables but instead read the base address register contents of calgary iommu and use the tables that the registers point to. With these changes the kdump kernel and hence aacraid now boots normally. Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The bug was pointed out by Akinobu Mita <akinobu.mita@gmail.com>, and this patch is based on his original patch. workqueue_cpu_callback(CPU_UP_PREPARE) expects that if it returns NOTIFY_BAD, _cpu_up() will send CPU_UP_CANCELED then. However, this is not true since "cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE" commit: a0d8cdb6 The callback which has returned NOTIFY_BAD will not receive CPU_UP_CANCELED. Change the code to fulfil the CPU_UP_CANCELED logic if CPU_UP_PREPARE fails. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
schedule_on_each_cpu() can use schedule_work_on() to avoid the code duplication. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
queue_work() can use queue_work_on() to avoid the code duplication. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Add lockdep annotations to flush_work() and update the comment. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Now that it is safe to use get_online_cpus() we can revert [S390] cpu topology: Fix possible deadlock. commit: fd781fa2 and call arch_reinit_sched_domains() directly from topology_work_fn(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under cpu_maps_update_begin(). This means that the multithreaded workqueues can't use get_online_cpus() due to the possible deadlock, very bad and very old problem. Introduce the new state, CPU_POST_DEAD, which is called after cpu_hotplug_done() but before cpu_maps_update_done(). Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD. This means that create/destroy functions can't rely on get_online_cpus() any longer and should take cpu_add_remove_lock instead. [akpm@linux-foundation.org: fix CONFIG_SMP=n] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Change schedule_on_each_cpu() to use flush_work() instead of flush_workqueue(), this way we don't wait for other work_struct's which can be queued meanwhile. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Most of users of flush_workqueue() can be changed to use cancel_work_sync(), but sometimes we really need to wait for the completion and cancelling is not an option. schedule_on_each_cpu() is good example. Add the new helper, flush_work(work), which waits for the completion of the specific work_struct. More precisely, it "flushes" the result of of the last queue_work() which is visible to the caller. For example, this code queue_work(wq, work); /* WINDOW */ queue_work(wq, work); flush_work(work); doesn't necessary work "as expected". What can happen in the WINDOW above is - wq starts the execution of work->func() - the caller migrates to another CPU now, after the 2nd queue_work() this work is active on the previous CPU, and at the same time it is queued on another. In this case flush_work(work) may return before the first work->func() completes. It is trivial to add another helper int flush_work_sync(struct work_struct *work) { return flush_work(work) || wait_on_work(work); } which works "more correctly", but it has to iterate over all CPUs and thus it much slower than flush_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
insert_work() inserts the new work_struct before or after cwq->worklist, depending on the "int tail" parameter. Change it to accept "list_head *" instead, this shrinks .text a bit and allows us to insert the barrier after specific work_struct. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
I don't understand why the multi-thread coredump implies the core_uses_pid behaviour, but we shouldn't use mm->mm_users for that. This counter can be incremented by get_task_mm(). Use the valued returned by coredump_wait() instead. Also, remove the "const char *pattern" argument, format_corename() can use core_pattern directly. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Now that we have core_state->dumper list we can use it to wake up the sub-threads waiting for the coredump completion. This uglifies the code and .text grows by 47 bytes, but otoh mm_struct lessens by sizeof(struct completion). Also, with this change we can decouple exit_mm() from the coredumping code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/. This patch allows futher cleanups in binfmt_elf_fdpic.c. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/. This patch allows futher cleanups in binfmt_elf.c, in particular we can kill the parallel info->threads list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
binfmt->core_dump() has to iterate over the all threads in system in order to find the coredumping threads and construct the list using the GFP_ATOMIC allocations. With this patch each thread allocates the list node on exit_mm()'s stack and adds itself to the list. This allows us to do further changes: - simplify ->core_dump() - change exit_mm() to clear ->mm first, then wait for ->core_done. this makes the coredumping process visible to oom_kill - kill mm->core_done Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Move the "struct core_state core_state" from coredump_wait() to do_coredump(), this makes mm->core_state visible to binfmt->core_dump(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Turn core_state->nr_threads into atomic_t and kill now unneeded down_write(&mm->mmap_sem) in exit_mm(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Change zap_process() to return int instead of incrementing mm->core_state->nr_threads directly. Change zap_threads() to set mm->core_state only on success. This patch restores the original size of .text, and more importantly now ->nr_threads is used in two places only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Move mm->core_waiters into "struct core_state" allocated on stack. This shrinks mm_struct a little bit and allows further changes. This patch mostly does s/core_waiters/core_state. The only essential change is that coredump_wait() must clear mm->core_state before return. The coredump_wait()'s path is uglified and .text grows by 30 bytes, this is fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
mm->core_startup_done points to "struct completion startup_done" allocated on the coredump_wait()'s stack. Introduce the new structure, core_state, which holds this "struct completion". This way we can add more info visible to the threads participating in coredump without enlarging mm_struct. No changes in affected .o files. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
linux_binfmt->core_dump() runs before the process does exit_aio(), this means that we can hit the kernel thread which shares the same ->mm. Afaics, nothing really bad can happen, but perhaps it makes sense to fix this minor bug. It is sad we have to iterate over all threads in system and use GFP_ATOMIC. Hopefully we can kill theses ugly do_each_thread()s, but this needs some nontrivial changes in mm_struct and do_coredump. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The main loop in zap_threads() must skip kthreads which may use the same mm. Otherwise we "kill" this thread erroneously (for example, it can not fork or exec after that), and the coredumping task stucks in the TASK_UNINTERRUPTIBLE state forever because of the wrong ->core_waiters count. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users. No functional changes yet. But this allows us to do further fixes/cleanups. oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the kthreads, this is wrong because of use_mm(). The problem with PF_BORROWED_MM is that we need task_lock() to avoid races. With this patch we can check PF_KTHREAD directly, or use a simple lockless helper: /* The result must not be dereferenced !!! */ struct mm_struct *__get_task_mm(struct task_struct *tsk) { if (tsk->flags & PF_KTHREAD) return NULL; return tsk->mm; } Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel thread without PF_BORROWED_MM. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Introduce the new PF_KTHREAD flag to mark the kernel threads. It is set by INIT_TASK() and copied to the forked childs (we could set it in kthreadd() along with PF_NOFREEZE instead). daemonize() was changed as well. In that case testing of PF_KTHREAD is racy, but daemonize() is hopeless anyway. This flag is cleared in do_execve(), before search_binary_handler(). Probably not the best place, we can do this in exec_mmap() or in start_thread(), or clear it along with PF_FORKNOEXEC. But I think this doesn't matter in practice, and if do_execve() fails kthread should die soon. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
1. SIGKILL can't be blocked, remove this check from sigkill_pending(). 2. When ptrace_stop() sees sigkill_pending() == T, it can just return. Kill "int killed" and simplify the code. This also is more correct, the tracer shouldn't see us in TASK_TRACED if we are not going to stop. I strongly believe this code needs further changes. We should do the "was this task killed" check unconditionally, currently it depends on arch_ptrace_stop_needed(). On the other hand, sigkill_pending() isn't very clever. If the task was killed tkill(SIGKILL), the signal can be already dequeued if the caller is do_exit(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
ptrace_stop() has some complicated checks to prevent the scheduling in the TASK_TRACED state with the pending SIGKILL, but these checks are racy, and they depend on arch_ptrace_stop_needed(). This patch assumes that the traced task should die asap if it was killed by SIGKILL, in that case schedule()->signal_pending_state() has no reason to ignore the TASK_WAKEKILL part of TASK_TRACED, and we can kill this nasty special case. Note: do_exit()->ptrace_notify() is special, the killed task can already dequeue SIGKILL at this point. Another indication that fatal_signal_pending() is not exactly right. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Adrian Bunk authored
This patch contains the following cleanups for the asm/ptrace.h userspace headers: - include/asm-generic/Kbuild.asm already lists ptrace.h, remove the superfluous listings in the Kbuild files of the following architectures: - cris - frv - powerpc - x86 - don't expose function prototypes and macros to userspace: - arm - blackfin - cris - mn10300 - parisc - remove #ifdef CONFIG_'s around #define's: - blackfin - m68knommu - sh: AFAIK __SH5__ should work in both kernel and userspace, no need to leak CONFIG_SUPERH64 to userspace - xtensa: cosmetical change to remove empty #ifndef __ASSEMBLY__ #else #endif from the userspace headers Not changed by this patch is the fact that the following architectures have a different struct pt_regs depending on CONFIG_ variables: - h8300 - m68knommu - mips This does not work in userspace. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Greg Ungerer <gerg@uclinux.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Chris Zankel <chris@zankel.net> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-