Commits · 01352fb81658cbf78c55844de8e3d1d606bbf3f8 · Kirill Smelkov / linux

16 Dec, 2013 5 commits

locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier · 01352fb8

Paul E. McKenney authored Dec 11, 2013

The Linux kernel has traditionally required that an UNLOCK+LOCK
pair act as a full memory barrier when either (1) that
UNLOCK+LOCK pair was executed by the same CPU or task, or (2)
the same lock variable was used for the UNLOCK and LOCK.  It now
seems likely that very few places in the kernel rely on this
full-memory-barrier semantic, and with the advent of queued
locks, providing this semantic either requires complex
reasoning, or for some architectures, added overhead.

This commit therefore adds a smp_mb__after_unlock_lock(), which
may be placed after a LOCK primitive to restore the
full-memory-barrier semantic. All definitions are currently
no-ops, but will be upgraded for some architectures when queued
locks arrive.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-5-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

01352fb8

Documentation/memory-barriers.txt: Document ACCESS_ONCE() · 692118da

Paul E. McKenney authored Dec 11, 2013

The situations in which ACCESS_ONCE() is required are not well
documented, so this commit adds some verbiage to
memory-barriers.txt.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-4-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

692118da

Documentation/memory-barriers.txt: Prohibit speculative writes · 18c03c61

Peter Zijlstra authored Dec 11, 2013

No SMP architecture currently supporting Linux allows
speculative writes, so this commit updates
Documentation/memory-barriers.txt to prohibit them in Linux core
code.  It also records restrictions on their use.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-3-git-send-email-paulmck@linux.vnet.ibm.com
[ Paul modified the original patch from Peter. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

18c03c61

Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt · fb2b5819

Paul E. McKenney authored Dec 11, 2013

Although the atomic_long_t functions are quite useful, they are
a bit obscure.  This commit therefore adds the common ones
alongside their atomic_t counterparts in
Documentation/memory-barriers.txt.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-2-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

fb2b5819

Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt · 2ecf8101

Paul E. McKenney authored Dec 11, 2013

The Documentation/memory-barriers.txt file was written before
the need for ACCESS_ONCE() was fully appreciated.  It therefore
contains no ACCESS_ONCE() calls, which can be a problem when
people lift examples from it.  This commit therefore adds
ACCESS_ONCE() calls.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

2ecf8101

30 Nov, 2013 2 commits

Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" · 962d9c57

Ingo Molnar authored Nov 30, 2013

This reverts commit 9dd12201.

Revert it until Linus's concerns are addressed: this option should not
allow nonsensical CONFIG_CPUMASK_OFFSTACK and CONFIG_NR_CPUS values, and
it should probably select sane defaults as well.

Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-etcruvuw9neycYf0Rripxrjv@git.kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

962d9c57

Documentation/robust-futex-API: Count properly to 4 · 854ff82a

Henrik Austad authored Nov 27, 2013

A strictly monotonically increasing sequence is normally used in
numbered lists as they provide an intuitive ordering of the
elements.

However, in situations where race conditions can appear, this
assumption breaks down and you can end up with unpredictable
results, leading to a rather confusing list :-)

This changes the numbered list 1,2,2,2 to the more intuitive
1,2,3,4.

Introduced in:

  2eec9ad9 [PATCH] lightweight robust futexes: docs
Signed-off-by: Henrik Austad <henrik@austad.us>
Link: http://lkml.kernel.org/r/1385590680-8110-1-git-send-email-henrik@austad.usSigned-off-by: Ingo Molnar <mingo@kernel.org>

854ff82a

27 Nov, 2013 11 commits

liblockdep: Add a MAINTAINERS entry · 1acd437c

Sasha Levin authored Jun 13, 2013

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-10-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

1acd437c

liblockdep: Add the 'lockdep' user-space utility · f612ac05

Sasha Levin authored Jun 13, 2013

This is a simple wrapper to make using liblockdep on existing
applications much easier.

After running 'make && make install', it becomes quite simple to
test things with liblockdep. For example, to try it on perf:

	lockdep perf

No other integration required.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-9-git-send-email-sasha.levin@oracle.com
[ Changed it to load ./liblockdep.so, so it can be tested in situ. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

f612ac05

liblockdep: Support using LD_PRELOAD · 231941ee

Sasha Levin authored Jun 13, 2013

This allows lockdep to be used without being compiled in the
original program.

Usage is quite simple:

	LD_PRELOAD=/path/to/liblockdep.so /path/to/my/program

And magically, you'll have lockdep checking in your program!
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-8-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

231941ee

liblockdep: Add pthread_rwlock_t test suite · dbe94182

Sasha Levin authored Jun 13, 2013

A simple test to make sure we handle rwlocks correctly.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-7-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

dbe94182

liblockdep: Add public headers for pthread_rwlock_t implementation · 5a52c9b4

Sasha Levin authored Jun 13, 2013

Both pthreads and lockdep support dealing with rwlocks, so
here's the liblockdep implementation for those.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-6-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

5a52c9b4

liblockdep: Add pthread_mutex_t test suite · 878f968e

Sasha Levin authored Jun 13, 2013

This is a rather simple and basic test suite to test common
locking issues.

Beyond tests, it also shows how to use the library.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-5-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

878f968e

liblockdep: Add public headers for pthread_mutex_t implementation · 45e62074

Sasha Levin authored Jun 13, 2013

These headers provide the same API as their pthread mutex
counterparts.

The design here is to allow to easily switch to liblockdep lock
validation just by adding a "liblockdep_" to pthread_mutex_*()
calls, which means that it's easy to integrate liblockdep into
existing codebases.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-4-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

45e62074

liblockdep: Wrap kernel/locking/lockdep.c to allow usage from userspace · 5634bd7d

Sasha Levin authored Jun 13, 2013

kernel/locking/lockdep.c deals with validating locking scenarios for
various architectures supported by the kernel. There isn't
anything kernel specific going on in lockdep, and when we
compare userspace to other architectures that don't have to deal
with irqs such as s390, they become all too similar.

We wrap kernel/locking/lockdep.c and include/linux/lockdep.h with
several headers which allow us to build and use lockdep from
userspace. We don't touch the kernel code itself which means
that any work done on lockdep in the kernel will automatically
benefit userspace lockdep as well!
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1371163284-6346-3-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

5634bd7d

lockdep: Be nice about building from userspace · 8dce7a9a

Sasha Levin authored Jun 13, 2013

Lockdep is an awesome piece of code which detects locking issues
which are relevant both to userspace and kernelspace. We can
easily make lockdep work in userspace since there is really no
kernel spacific magic going on in the code.

All we need is to wrap two functions which are used by lockdep
and are very kernel specific.

Doing that will allow tools located in tools/ to easily utilize
lockdep's code for their own use.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: penberg@kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/1352753446-24109-1-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

8dce7a9a

lockdep: Simplify a bit hardirq <-> softirq transitions · 5c4853b6

Frederic Weisbecker authored Nov 20, 2013

Instead of saving the hardirq state on a per CPU variable, which require
an explicit call before the softirq handling and some complication,
just save and restore the hardirq tracing state through functions
return values and parameters.

It simplifies a bit the black magic that works around the fact that
softirqs can be called from hardirqs while hardirqs can nest on softirqs
but those two cases have very different semantics and only the latter
case assume both states.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1384906054-30676-1-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

5c4853b6

Merge branch 'core/urgent' into core/locking · 7d5b1583
Ingo Molnar authored Nov 27, 2013
```
Prepare for dependent patch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
7d5b1583

22 Nov, 2013 1 commit

Documentation/memory-barriers.txt: Fix a typo in the data dependency description · e0edc78f

Ingo Molnar authored Nov 22, 2013

This typo has been there forever, it is 7.5 years old, looks like this
section of our memory ordering documentation is a place where most eyes
are glazed over already ;-)

[ Also fix some stray spaces and stray tabs while at it, shrinking the
  file by 49 bytes. Visual output unchanged. ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-gncea9cb8igosblizfqMXrie@git.kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

e0edc78f

20 Nov, 2013 1 commit

aacraid: prevent invalid pointer dereference · b4789b8e

Mahesh Rajashekhara authored Oct 31, 2013

It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.

It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data.  So, we would recommend to add the following
(instead of test for fibsize == 0).
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b4789b8e

19 Nov, 2013 20 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1ee2dcc2

Linus Torvalds authored Nov 19, 2013

Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...

1ee2dcc2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 4457e6f6

Linus Torvalds authored Nov 19, 2013

Pull sparc fixes from David Miller:
 "Two merge window fallout build fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: merge fix
  sparc64: fix build regession

4457e6f6

Merge tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · e87e7be9

Linus Torvalds authored Nov 19, 2013

Pull ia64 fix from Tony Luck:
 "Unbreak ia64 build by avoiding circular dependency"

* tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  kernel/bounds: avoid circular dependencies in generated headers

e87e7be9

kernel/bounds: avoid circular dependencies in generated headers · 24b9fdc5

Kirill A. Shutemov authored Nov 18, 2013

<linux/spinlock.h> has heavy dependencies on other header files.
It triggers circular dependencies in generated headers on IA64, at
least:

  CC      kernel/bounds.s
In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0,
                 from include/linux/thread_info.h:54,
                 from include/asm-generic/preempt.h:4,
                 from arch/ia64/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:18,
                 from include/linux/spinlock.h:50,
                 from kernel/bounds.c:14:
/home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory
compilation terminated.

Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's
enough to find out size of spinlock_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

24b9fdc5

Merge branch 'genetlink_mcast' · 091e0662

David S. Miller authored Nov 19, 2013

Johannes Berg says:

====================
genetlink: clean up multicast group APIs

The generic netlink multicast group registration doesn't have to
be dynamic, and can thus be simplified just like I did with the
ops. This removes some complexity in registration code.

Additionally, two users of generic netlink already use multicast
groups in a wrong way, add workarounds for those two to keep the
userspace API working, but at the same time make them not clash
with other users of multicast groups as might happen now.

While making it all a bit easier, also prevent such abuse by adding
checks to the APIs so each family can only use the groups it owns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

091e0662

genetlink: make multicast groups const, prevent abuse · 2a94fe48

Johannes Berg authored Nov 19, 2013

Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a94fe48

genetlink: pass family to functions using groups · 68eb5503

Johannes Berg authored Nov 19, 2013

This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68eb5503

genetlink: add and use genl_set_err() · 62b68e99

Johannes Berg authored Nov 19, 2013

Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62b68e99

genetlink: remove family pointer from genl_multicast_group · c2ebb908

Johannes Berg authored Nov 19, 2013

There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2ebb908

genetlink: remove genl_unregister_mc_group() · 06fb555a

Johannes Berg authored Nov 19, 2013

There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06fb555a

hsr: don't call genl_unregister_mc_group() · 03ed3827

Johannes Berg authored Nov 19, 2013

There's no need to unregister the multicast group if the
generic netlink family is registered immediately after.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03ed3827

quota/genetlink: use proper genetlink multicast APIs · 2ecf7536

Johannes Berg authored Nov 19, 2013

The quota code is abusing the genetlink API and is using
its family ID as the multicast group ID, which is invalid
and may belong to somebody else (and likely will.)

Make the quota code use the correct API, but since this
is already used as-is by userspace, reserve a family ID
for this code and also reserve that group ID to not break
userspace assumptions.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ecf7536

drop_monitor/genetlink: use proper genetlink multicast APIs · e5dcecba

Johannes Berg authored Nov 19, 2013

The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)

Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5dcecba

genetlink: only pass array to genl_register_family_with_ops() · c53ed742

Johannes Berg authored Nov 19, 2013

As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c53ed742

tcp: don't update snd_nxt, when a socket is switched from repair mode · dbde4979

Andrey Vagin authored Nov 19, 2013

snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbde4979

atm: idt77252: fix dev refcnt leak · b5de4a22

Ying Xue authored Nov 19, 2013

init_card() calls dev_get_by_name() to get a network deceive. But it
doesn't decrease network device reference count after the device is
used.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5de4a22

xfrm: Release dst if this dst is improper for vti tunnel · 236c9f84

fan.du authored Nov 19, 2013

After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

236c9f84

netlink: fix documentation typo in netlink_set_err() · 840e93f2

Johannes Berg authored Nov 19, 2013

The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

840e93f2

Merge tag 'arc-v3.13-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · dec8e461

Linus Torvalds authored Nov 19, 2013

Pull second set of ARC changes from Vineet Gupta:
 - Support for Perf from Mischa
 - Enabling GPIO/Pinctrl drivers for Abilis TB10x platform
 - New defconfig for buildroot

* tag 'arc-v3.13-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-arcfpga] Add defconfig without initramfs location
  ARC: perf: ARC 700 PMU doesn't support sampling events
  ARC: Add documentation on DT binding for ARC700 PMU
  ARC: Add perf support for ARC700 cores
  ARC: [TB10x] Updates for GPIO and pinctrl

dec8e461

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 806dace6

Linus Torvalds authored Nov 19, 2013

Pull second set of s390 patches from Martin Schwidefsky:
 "The handling of the PCI hotplug notifications has been improved, the
  zfcp dumper can now detect the HSA size dynamically and the default
  install kernel has been changed to the compressed bzImage.  And two
  bug-fixes for scm and 3720"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: implement hotplug notifications
  s390/scm_block: do not hide eadm subchannel dependency
  s390/sclp: Consolidate early sclp init calls to sclp_early_detect()
  s390/sclp: Move early code from sclp_cmd.c to sclp_early.c
  s390/sclp: Determine HSA size dynamically for zfcpdump
  s390/sclp: Move declarations for sclp_sdias into separate header file
  s390/pci: implement pcibios_remove_bus
  s390/pci: improve handling of bus resources
  s390/3270: fix missing device_destroy() call
  s390/boot: Install bzImage as default kernel image

806dace6