- 16 Dec, 2013 7 commits
-
-
Paul E. McKenney authored
RCU must ensure that there is the equivalent of a full memory barrier between any memory access preceding grace period and any memory access following that same grace period, regardless of which CPU(s) happen to execute the two memory accesses. Therefore, downgrading UNLOCK+LOCK to no longer imply a full memory barrier requires some adjustments to RCU. This commit therefore adds smp_mb__after_unlock_lock() invocations as needed after the RCU lock acquisitions that need to be part of a full-memory-barrier UNLOCK+LOCK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-7-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Paul E. McKenney authored
Historically, an UNLOCK+LOCK pair executed by one CPU, by one task, or on a given lock variable has implied a full memory barrier. In a recent LKML thread, the wisdom of this historical approach was called into question: http://www.spinics.net/lists/linux-mm/msg65653.html, in part due to the memory-order complexities of low-handoff-overhead queued locks on x86 systems. This patch therefore removes this guarantee from the documentation, and further documents how to restore it via a new smp_mb__after_unlock_lock() primitive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-6-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Paul E. McKenney authored
The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the same CPU or task, or (2) the same lock variable was used for the UNLOCK and LOCK. It now seems likely that very few places in the kernel rely on this full-memory-barrier semantic, and with the advent of queued locks, providing this semantic either requires complex reasoning, or for some architectures, added overhead. This commit therefore adds a smp_mb__after_unlock_lock(), which may be placed after a LOCK primitive to restore the full-memory-barrier semantic. All definitions are currently no-ops, but will be upgraded for some architectures when queued locks arrive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-5-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Paul E. McKenney authored
The situations in which ACCESS_ONCE() is required are not well documented, so this commit adds some verbiage to memory-barriers.txt. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-4-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
No SMP architecture currently supporting Linux allows speculative writes, so this commit updates Documentation/memory-barriers.txt to prohibit them in Linux core code. It also records restrictions on their use. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-3-git-send-email-paulmck@linux.vnet.ibm.com [ Paul modified the original patch from Peter. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Paul E. McKenney authored
Although the atomic_long_t functions are quite useful, they are a bit obscure. This commit therefore adds the common ones alongside their atomic_t counterparts in Documentation/memory-barriers.txt. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-2-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Paul E. McKenney authored
The Documentation/memory-barriers.txt file was written before the need for ACCESS_ONCE() was fully appreciated. It therefore contains no ACCESS_ONCE() calls, which can be a problem when people lift examples from it. This commit therefore adds ACCESS_ONCE() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 30 Nov, 2013 2 commits
-
-
Ingo Molnar authored
This reverts commit 9dd12201. Revert it until Linus's concerns are addressed: this option should not allow nonsensical CONFIG_CPUMASK_OFFSTACK and CONFIG_NR_CPUS values, and it should probably select sane defaults as well. Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-etcruvuw9neycYf0Rripxrjv@git.kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Henrik Austad authored
A strictly monotonically increasing sequence is normally used in numbered lists as they provide an intuitive ordering of the elements. However, in situations where race conditions can appear, this assumption breaks down and you can end up with unpredictable results, leading to a rather confusing list :-) This changes the numbered list 1,2,2,2 to the more intuitive 1,2,3,4. Introduced in: 2eec9ad9 [PATCH] lightweight robust futexes: docs Signed-off-by: Henrik Austad <henrik@austad.us> Link: http://lkml.kernel.org/r/1385590680-8110-1-git-send-email-henrik@austad.usSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 27 Nov, 2013 11 commits
-
-
Sasha Levin authored
Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-10-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
This is a simple wrapper to make using liblockdep on existing applications much easier. After running 'make && make install', it becomes quite simple to test things with liblockdep. For example, to try it on perf: lockdep perf No other integration required. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-9-git-send-email-sasha.levin@oracle.com [ Changed it to load ./liblockdep.so, so it can be tested in situ. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
This allows lockdep to be used without being compiled in the original program. Usage is quite simple: LD_PRELOAD=/path/to/liblockdep.so /path/to/my/program And magically, you'll have lockdep checking in your program! Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-8-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
A simple test to make sure we handle rwlocks correctly. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-7-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
Both pthreads and lockdep support dealing with rwlocks, so here's the liblockdep implementation for those. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-6-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
This is a rather simple and basic test suite to test common locking issues. Beyond tests, it also shows how to use the library. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-5-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
These headers provide the same API as their pthread mutex counterparts. The design here is to allow to easily switch to liblockdep lock validation just by adding a "liblockdep_" to pthread_mutex_*() calls, which means that it's easy to integrate liblockdep into existing codebases. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-4-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
kernel/locking/lockdep.c deals with validating locking scenarios for various architectures supported by the kernel. There isn't anything kernel specific going on in lockdep, and when we compare userspace to other architectures that don't have to deal with irqs such as s390, they become all too similar. We wrap kernel/locking/lockdep.c and include/linux/lockdep.h with several headers which allow us to build and use lockdep from userspace. We don't touch the kernel code itself which means that any work done on lockdep in the kernel will automatically benefit userspace lockdep as well! Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1371163284-6346-3-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Sasha Levin authored
Lockdep is an awesome piece of code which detects locking issues which are relevant both to userspace and kernelspace. We can easily make lockdep work in userspace since there is really no kernel spacific magic going on in the code. All we need is to wrap two functions which are used by lockdep and are very kernel specific. Doing that will allow tools located in tools/ to easily utilize lockdep's code for their own use. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: penberg@kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1352753446-24109-1-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Instead of saving the hardirq state on a per CPU variable, which require an explicit call before the softirq handling and some complication, just save and restore the hardirq tracing state through functions return values and parameters. It simplifies a bit the black magic that works around the fact that softirqs can be called from hardirqs while hardirqs can nest on softirqs but those two cases have very different semantics and only the latter case assume both states. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1384906054-30676-1-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Prepare for dependent patch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 22 Nov, 2013 1 commit
-
-
Ingo Molnar authored
This typo has been there forever, it is 7.5 years old, looks like this section of our memory ordering documentation is a place where most eyes are glazed over already ;-) [ Also fix some stray spaces and stray tabs while at it, shrinking the file by 49 bytes. Visual output unchanged. ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-gncea9cb8igosblizfqMXrie@git.kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 20 Nov, 2013 1 commit
-
-
Mahesh Rajashekhara authored
It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Nov, 2013 18 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc fixes from David Miller: "Two merge window fallout build fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: merge fix sparc64: fix build regession
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linuxLinus Torvalds authored
Pull ia64 fix from Tony Luck: "Unbreak ia64 build by avoiding circular dependency" * tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: kernel/bounds: avoid circular dependencies in generated headers
-
Kirill A. Shutemov authored
<linux/spinlock.h> has heavy dependencies on other header files. It triggers circular dependencies in generated headers on IA64, at least: CC kernel/bounds.s In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from arch/ia64/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from kernel/bounds.c:14: /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory compilation terminated. Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's enough to find out size of spinlock_t. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
David S. Miller authored
Johannes Berg says: ==================== genetlink: clean up multicast group APIs The generic netlink multicast group registration doesn't have to be dynamic, and can thus be simplified just like I did with the ops. This removes some complexity in registration code. Additionally, two users of generic netlink already use multicast groups in a wrong way, add workarounds for those two to keep the userspace API working, but at the same time make them not clash with other users of multicast groups as might happen now. While making it all a bit easier, also prevent such abuse by adding checks to the APIs so each family can only use the groups it owns. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Add a static inline to generic netlink to wrap netlink_set_err() to make it easier to use here - use it in openvswitch (the only generic netlink user of netlink_set_err()). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There's no reason to have the family pointer there since it can just be passed internally where needed, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There are no users of this API remaining, and we'll soon change group registration to be static (like ops are now) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There's no need to unregister the multicast group if the generic netlink family is registered immediately after. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
The drop monitor code is abusing the genetlink API and is statically using the generic netlink multicast group 1, even if that group belongs to somebody else (which it invariably will, since it's not reserved.) Make the drop monitor code use the proper APIs to reserve a group ID, but also reserve the group id 1 in generic netlink code to preserve the userspace API. Since drop monitor can be a module, don't clear the bit for it on unregistration. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrey Vagin authored
snd_nxt must be updated synchronously with sk_send_head. Otherwise tp->packets_out may be updated incorrectly, what may bring a kernel panic. Here is a kernel panic from my host. [ 103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150 ... [ 146.301158] Call Trace: [ 146.301158] [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0 Before this panic a tcp socket was restored. This socket had sent and unsent data in the write queue. Sent data was restored in repair mode, then the socket was switched from reapair mode and unsent data was restored. After that the socket was switched back into repair mode. In that moment we had a socket where write queue looks like this: snd_una snd_nxt write_seq |_________|________| | sk_send_head After a second switching from repair mode the state of socket was changed: snd_una snd_nxt, write_seq |_________ ________| | sk_send_head This state is inconsistent, because snd_nxt and sk_send_head are not synchronized. Bellow you can find a call trace, how packets_out can be incremented twice for one skb, if snd_nxt and sk_send_head are not synchronized. In this case packets_out will be always positive, even when sk_write_queue is empty. tcp_write_wakeup skb = tcp_send_head(sk); tcp_fragment if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq)) tcp_adjust_pcount(sk, skb, diff); tcp_event_new_data_sent tp->packets_out += tcp_skb_pcount(skb); I think update of snd_nxt isn't required, when a socket is switched from repair mode. Because it's initialized in tcp_connect_init. Then when a write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent, so it's always is in consistent state. I have checked, that the bug is not reproduced with this patch and all tests about restoring tcp connections work fine. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ying Xue authored
init_card() calls dev_get_by_name() to get a network deceive. But it doesn't decrease network device reference count after the device is used. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
fan.du authored
After searching rt by the vti tunnel dst/src parameter, if this rt has neither attached to any transformation nor the transformation is not tunnel oriented, this rt should be released back to ip layer. otherwise causing dst memory leakage. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
The parameter is just 'group', not 'groups', fix the documentation typo. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-