1. 22 May, 2015 9 commits
    • Thadeu Lima de Souza Cascardo's avatar
      bridge: fix parsing of MLDv2 reports · 47cc84ce
      Thadeu Lima de Souza Cascardo authored
      When more than a multicast address is present in a MLDv2 report, all but
      the first address is ignored, because the code breaks out of the loop if
      there has not been an error adding that address.
      
      This has caused failures when two guests connected through the bridge
      tried to communicate using IPv6. Neighbor discoveries would not be
      transmitted to the other guest when both used a link-local address and a
      static address.
      
      This only happens when there is a MLDv2 querier in the network.
      
      The fix will only break out of the loop when there is a failure adding a
      multicast address.
      
      The mdb before the patch:
      
      dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
      dev ovirtmgmt port bond0.86 grp ff02::2 temp
      
      After the patch:
      
      dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
      dev ovirtmgmt port bond0.86 grp ff02::fb temp
      dev ovirtmgmt port bond0.86 grp ff02::2 temp
      dev ovirtmgmt port bond0.86 grp ff02::d temp
      dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
      dev ovirtmgmt port bond0.86 grp ff02::16 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
      dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
      dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
      
      Fixes: 08b202b6 ("bridge br_multicast: IPv6 MLD support.")
      Reported-by: default avatarRik Theys <Rik.Theys@esat.kuleuven.be>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Tested-by: default avatarRik Theys <Rik.Theys@esat.kuleuven.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47cc84ce
    • Nathan Sullivan's avatar
      ARM: zynq: DT: Use the zynq binding with macb · 9eeb5161
      Nathan Sullivan authored
      Use the new zynq binding for macb ethernet, since it will disable half
      duplex gigabit like the Zynq TRM says to do.
      Signed-off-by: default avatarNathan Sullivan <nathan.sullivan@ni.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eeb5161
    • Nathan Sullivan's avatar
      net: macb: Disable half duplex gigabit on Zynq · 222ca8e0
      Nathan Sullivan authored
      According to the Zynq TRM, gigabit half duplex is not supported.  Add a
      new cap and compatible string so Zynq can avoid advertising that mode.
      Signed-off-by: default avatarNathan Sullivan <nathan.sullivan@ni.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      222ca8e0
    • Nathan Sullivan's avatar
    • Michal Kubeček's avatar
      ipv4: fill in table id when replacing a route · d4e64c29
      Michal Kubeček authored
      When replacing an IPv4 route, tb_id member of the new fib_alias
      structure is not set in the replace code path so that the new route is
      ignored.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4e64c29
    • Bjørn Mork's avatar
      cdc_ncm: Fix tx_bytes statistics · 44f6731d
      Bjørn Mork authored
      The tx_curr_frame_payload field is u32. When we try to calculate a
      small negative delta based on it, we end up with a positive integer
      close to 2^32 instead.  So the tx_bytes pointer increases by about
      2^32 for every transmitted frame.
      
      Fix by calculating the delta as a signed long.
      
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Reported-by: default avatarFlorian Bruhin <me@the-compiler.org>
      Fixes: 7a1e890e ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44f6731d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 572152ad
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contain Netfilter fixes for your net tree, they are:
      
      1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
         This problem is due to wrong order in the per-net registration and netlink
         socket events. Patch from Francesco Ruggeri.
      
      2) Make sure that counters that userspace pass us are higher than 0 in all the
         x_tables frontends. Discovered via Trinity, patch from Dave Jones.
      
      3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
         breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      572152ad
    • Eric W. Biederman's avatar
      ipv4: Avoid crashing in ip_error · 381c759d
      Eric W. Biederman authored
      ip_error does not check if in_dev is NULL before dereferencing it.
      
      IThe following sequence of calls is possible:
      CPU A                          CPU B
      ip_rcv_finish
          ip_route_input_noref()
              ip_route_input_slow()
                                     inetdev_destroy()
          dst_input()
      
      With the result that a network device can be destroyed while processing
      an input packet.
      
      A crash was triggered with only unicast packets in flight, and
      forwarding enabled on the only network device.   The error condition
      was created by the removal of the network device.
      
      As such it is likely the that error code was -EHOSTUNREACH, and the
      action taken by ip_error (if in_dev had been accessible) would have
      been to not increment any counters and to have tried and likely failed
      to send an icmp error as the network device is going away.
      
      Therefore handle this weird case by just dropping the packet if
      !in_dev.  It will result in dropping the packet sooner, and will not
      result in an actual change of behavior.
      
      Fixes: 251da413 ("ipv4: Cache ip_error() routes even when not forwarding.")
      Reported-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Tested-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      381c759d
    • Eric Dumazet's avatar
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet authored
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d654976c
  2. 21 May, 2015 1 commit
    • Daniel Borkmann's avatar
      net: sched: fix call_rcu() race on classifier module unloads · c78e1746
      Daniel Borkmann authored
      Vijay reported that a loop as simple as ...
      
        while true; do
          tc qdisc add dev foo root handle 1: prio
          tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
          tc qdisc del dev foo root
          rmmod cls_u32
        done
      
      ... will panic the kernel. Moreover, he bisected the change
      apparently introducing it to 78fd1d0a ("netlink: Re-add
      locking to netlink_lookup() and seq walker").
      
      The removal of synchronize_net() from the netlink socket
      triggering the qdisc to be removed, seems to have uncovered
      an RCU resp. module reference count race from the tc API.
      Given that RCU conversion was done after e341694e ("netlink:
      Convert netlink_lookup() to use RCU protected hash table")
      which added the synchronize_net() originally, occasion of
      hitting the bug was less likely (not impossible though):
      
      When qdiscs that i) support attaching classifiers and,
      ii) have at least one of them attached, get deleted, they
      invoke tcf_destroy_chain(), and thus call into ->destroy()
      handler from a classifier module.
      
      After RCU conversion, all classifier that have an internal
      prio list, unlink them and initiate freeing via call_rcu()
      deferral.
      
      Meanhile, tcf_destroy() releases already reference to the
      tp->ops->owner module before the queued RCU callback handler
      has been invoked.
      
      Subsequent rmmod on the classifier module is then not prevented
      since all module references are already dropped.
      
      By the time, the kernel invokes the RCU callback handler from
      the module, that function address is then invalid.
      
      One way to fix it would be to add an rcu_barrier() to
      unregister_tcf_proto_ops() to wait for all pending call_rcu()s
      to complete.
      
      synchronize_rcu() is not appropriate as under heavy RCU
      callback load, registered call_rcu()s could be deferred
      longer than a grace period. In case we don't have any pending
      call_rcu()s, the barrier is allowed to return immediately.
      
      Since we came here via unregister_tcf_proto_ops(), there
      are no users of a given classifier anymore. Further nested
      call_rcu()s pointing into the module space are not being
      done anywhere.
      
      Only cls_bpf_delete_prog() may schedule a work item, to
      unlock pages eventually, but that is not in the range/context
      of cls_bpf anymore.
      
      Fixes: 25d8c0d5 ("net: rcu-ify tcf_proto")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Reported-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c78e1746
  3. 20 May, 2015 7 commits
    • Tim Beale's avatar
      net: phy: Make sure phy_start() always re-enables the phy interrupts · c15e10e7
      Tim Beale authored
      This is an alternative way of fixing:
       commit db9683fb ("net: phy: Make sure PHY_RESUMING state change
                            is always processed")
      
      When the PHY state transitions from PHY_HALTED to PHY_RESUMING, there are
      two things we need to do:
      1). Re-enable interrupts (and power up the physical link, if powered down)
      2). Update the PHY state and net-device based on the link status.
      
      There's no strict reason why #1 has to be done from within the main
      phy_state_machine() function. There is a risk that other changes to the
      PHY (e.g. setting speed/duplex, which calls phy_start_aneg()) could cause
      a subsequent state transition before phy_state_machine() has processed
      the PHY_RESUMING state change. This would leave the PHY with interrupts
      disabled and/or still in the BMCR_PDOWN/low-power mode.
      
      Moving enabling the interrupts and phy_resume() into phy_start() will
      guarantee this work always gets done. As the PHY is already in the HALTED
      state and interrupts are disabled, it shouldn't conflict with any work
      being done in phy_state_machine(). The downside of this change is that if
      the PHY_RESUMING state is ever entered from anywhere else, it'll also have
      to repeat this work.
      Signed-off-by: default avatarTim Beale <tim.beale@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c15e10e7
    • David S. Miller's avatar
      Merge branch 'ipv6_ecmp_fixes' · 7764b9dd
      David S. Miller authored
      Michal Kubecek says:
      
      ====================
      IPv6 ECMP route add/replace fixes
      
      (1) When adding a nexthop of a multipath route fails (e.g. because of a
      conflict with an existing route), we are supposed to delete nexthops
      already added. However, currently we try to also delete all nexthops we
      haven't even tried to add yet so that a "ip route add" command can
      actually remove pre-existing routes if it fails.
      
      (2) Attempt to replace a multipath route results in a broken siblings
      linked list. Following commands (like "ip route del") can then either
      follow a link into freed memory or end in an infinite loop (if the slab
      object has been reused).
      
      v2: fix an omission in first patch
      
      v3: change the semantics of replace operation to better match IPv4
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7764b9dd
    • Michal Kubeček's avatar
      ipv6: fix ECMP route replacement · 27596472
      Michal Kubeček authored
      When replacing an IPv6 multipath route with "ip route replace", i.e.
      NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
      matching route without fixing its siblings, resulting in corrupted
      siblings linked list; removing one of the siblings can then end in an
      infinite loop.
      
      IPv6 ECMP implementation is a bit different from IPv4 so that route
      replacement cannot work in exactly the same way. This should be a
      reasonable approximation:
      
      1. If the new route is ECMP-able and there is a matching ECMP-able one
      already, replace it and all its siblings (if any).
      
      2. If the new route is ECMP-able and no matching ECMP-able route exists,
      replace first matching non-ECMP-able (if any) or just add the new one.
      
      3. If the new route is not ECMP-able, replace first matching
      non-ECMP-able route (if any) or add the new route.
      
      We also need to remove the NLM_F_REPLACE flag after replacing old
      route(s) by first nexthop of an ECMP route so that each subsequent
      nexthop does not replace previous one.
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27596472
    • Michal Kubeček's avatar
      ipv6: do not delete previously existing ECMP routes if add fails · 35f1b4e9
      Michal Kubeček authored
      If adding a nexthop of an IPv6 multipath route fails, comment in
      ip6_route_multipath() says we are going to delete all nexthops already
      added. However, current implementation deletes even the routes it
      hasn't even tried to add yet. For example, running
      
        ip route add 1234:5678::/64 \
            nexthop via fe80::aa dev dummy1 \
            nexthop via fe80::bb dev dummy1 \
            nexthop via fe80::cc dev dummy1
      
      twice results in removing all routes first command added.
      
      Limit the second (delete) run to nexthops that succeeded in the first
      (add) run.
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35f1b4e9
    • Florian Westphal's avatar
      Revert "netfilter: bridge: query conntrack about skb dnat" · faecbb45
      Florian Westphal authored
      This reverts commit c055d5b0.
      
      There are two issues:
      'dnat_took_place' made me think that this is related to
      -j DNAT/MASQUERADE.
      
      But thats only one part of the story.  This is also relevant for SNAT
      when we undo snat translation in reverse/reply direction.
      
      Furthermore, I originally wanted to do this mainly to avoid
      storing ipv6 addresses once we make DNAT/REDIRECT work
      for ipv6 on bridges.
      
      However, I forgot about SNPT/DNPT which is stateless.
      
      So we can't escape storing address for ipv6 anyway. Might as
      well do it for ipv4 too.
      Reported-and-tested-by: default avatarBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      faecbb45
    • Dave Jones's avatar
      netfilter: ensure number of counters is >0 in do_replace() · 1086bbe9
      Dave Jones authored
      After improving setsockopt() coverage in trinity, I started triggering
      vmalloc failures pretty reliably from this code path:
      
      warn_alloc_failed+0xe9/0x140
      __vmalloc_node_range+0x1be/0x270
      vzalloc+0x4b/0x50
      __do_replace+0x52/0x260 [ip_tables]
      do_ipt_set_ctl+0x15d/0x1d0 [ip_tables]
      nf_setsockopt+0x65/0x90
      ip_setsockopt+0x61/0xa0
      raw_setsockopt+0x16/0x60
      sock_common_setsockopt+0x14/0x20
      SyS_setsockopt+0x71/0xd0
      
      It turns out we don't validate that the num_counters field in the
      struct we pass in from userspace is initialized.
      
      The same problem also exists in ebtables, arptables, ipv6, and the
      compat variants.
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1086bbe9
    • Francesco Ruggeri's avatar
      netfilter: nfnetlink_{log,queue}: Register pernet in first place · 3bfe0498
      Francesco Ruggeri authored
      nfnetlink_{log,queue}_init() register the netlink callback nf*_rcv_nl_event
      before registering the pernet_subsys, but the callback relies on data
      structures allocated by pernet init functions.
      
      When nfnetlink_{log,queue} is loaded, if a netlink message is received after
      the netlink callback is registered but before the pernet_subsys is registered,
      the kernel will panic in the sequence
      
      nfulnl_rcv_nl_event
        nfnl_log_pernet
          net_generic
            BUG_ON(id == 0)  where id is nfnl_log_net_id.
      
      The panic can be easily reproduced in 4.0.3 by:
      
      while true ;do modprobe nfnetlink_log ; rmmod nfnetlink_log ; done &
      while true ;do ip netns add dummy ; ip netns del dummy ; done &
      
      This patch moves register_pernet_subsys to earlier in nfnetlink_log_init.
      
      Notice that the BUG_ON hit in 4.0.3 was recently removed in 2591ffd3
      ["netns: remove BUG_ONs from net_generic()"].
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3bfe0498
  4. 19 May, 2015 5 commits
  5. 18 May, 2015 4 commits
  6. 16 May, 2015 6 commits
    • Herbert Xu's avatar
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu authored
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07ee0722
    • Tim Beale's avatar
      net: phy: Make sure PHY_RESUMING state change is always processed · db9683fb
      Tim Beale authored
      If phy_start_aneg() was called while the phydev is in the PHY_RESUMING
      state, then its state would immediately transition to PHY_AN (or
      PHY_FORCING). This meant the phy_state_machine() never processed the
      PHY_RESUMING state change, which meant interrupts weren't enabled for the
      PHY. If the PHY used low-power mode (i.e. using BMCR_PDOWN), then the
      physical link wouldn't get powered up again.
      
      There seems no point for phy_start_aneg() to make the PHY_RESUMING -->
      PHY_AN transition, as the state machine will do this anyway. I'm not sure
      about the case where autoneg is disabled, as my patch will change
      behaviour so that the PHY goes to PHY_NOLINK instead of PHY_FORCING. An
      alternative solution would be to move the phy_config_interrupt() and
      phy_resume() work out of the state machine and into phy_start().
      
      The background behind this: we're running linux v3.16.7 and from user-space
      we want to enable the eth port (i.e. do a SIOCSIFFLAGS ioctl with the
      IFF_UP flag) and immediately afterward set the interface's speed/duplex.
      Enabling the interface calls .ndo_open() then phy_start() and the PHY
      transitions PHY_HALTED --> PHY_RESUMING. Setting the speed/duplex ends up
      calling phy_ethtool_sset(), which calls phy_start_aneg() (meanwhile the
      phy_state_machine() hasn't processed the PHY_RESUMING state change yet).
      Signed-off-by: default avatarTim Beale <tim.beale@alliedtelesis.co.nz>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db9683fb
    • Herbert Xu's avatar
      netlink: Reset portid after netlink_insert failure · c0bb07df
      Herbert Xu authored
      The commit c5adde94 ("netlink:
      eliminate nl_sk_hash_lock") breaks the autobind retry mechanism
      because it doesn't reset portid after a failed netlink_insert.
      
      This means that should autobind fail the first time around, then
      the socket will be stuck in limbo as it can never be bound again
      since it already has a non-zero portid.
      
      Fixes: c5adde94 ("netlink: eliminate nl_sk_hash_lock")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0bb07df
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 1d605701
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for your net tree, they are:
      
      1) Fix a leak in IPVS, the sysctl table is not released accordingly when
         destroying a netns, patch from Tommi Rantala.
      
      2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is
         compiled as module, from Florian Westphal.
      
      3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch
         from Jesper Dangaard Brouer.
      
      4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores
         a map, from Mirek Kratochvil.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d605701
    • Florian Fainelli's avatar
      net: phy: Allow EEE for all RGMII variants · 7e140696
      Florian Fainelli authored
      RGMII interfaces come in multiple flavors: RGMII with transmit or
      receive internal delay, no delays at all, or delays in both direction.
      
      This change extends the initial check for PHY_INTERFACE_MODE_RGMII to
      cover all of these variants since EEE should be allowed for any of these
      modes, since it is a property of the RGMII, hence Gigabit PHY capability
      more than the RGMII electrical interface and its delays.
      
      Fixes: a59a4d19 ("phy: add the EEE support and the way to access to the MMD registers")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e140696
    • Ying Xue's avatar
      rocker: fix a neigh entry leak issue · 1f9993f6
      Ying Xue authored
      Once we get a neighbour through looking up arp cache or creating a
      new one in rocker_port_ipv4_resolve(), the neighbour's refcount is
      already taken. But as we don't put the refcount again after it's
      used, this makes the neighbour entry leaked.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f9993f6
  7. 15 May, 2015 7 commits
  8. 14 May, 2015 1 commit
    • Eric Dumazet's avatar
      netlink: move nl_table in read_mostly section · 91dd93f9
      Eric Dumazet authored
      netlink sockets creation and deletion heavily modify nl_table_users
      and nl_table_lock.
      
      If nl_table is sharing one cache line with one of them, netlink
      performance is really bad on SMP.
      
      ffffffff81ff5f00 B nl_table
      ffffffff81ff5f0c b nl_table_users
      
      Putting nl_table in read_mostly section increased performance
      of my open/delete netlink sockets test by about 80 %
      
      This came up while diagnosing a getaddrinfo() problem.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91dd93f9