1. 06 Feb, 2016 30 commits
    • Parthasarathy Bhuvaragan's avatar
      tipc: donot create timers if subscription timeout = TIPC_WAIT_FOREVER · ae245557
      Parthasarathy Bhuvaragan authored
      Until now, we create timers even for the subscription requests
      with timeout = TIPC_WAIT_FOREVER.
      This can be improved by avoiding timer creation when the timeout
      is set to TIPC_WAIT_FOREVER.
      
      In this commit, we introduce a check to creates timers only
      when timeout != TIPC_WAIT_FOREVER.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae245557
    • Parthasarathy Bhuvaragan's avatar
      tipc: protect tipc_subscrb_get() with subscriber spin lock · f3ad288c
      Parthasarathy Bhuvaragan authored
      Until now, during subscription creation the mod_time() &
      tipc_subscrb_get() are called after releasing the subscriber
      spin lock.
      
      In a SMP system when performing a subscription creation, if the
      subscription timeout occurs simultaneously (the timer is
      scheduled to run on another CPU) then the timer thread
      might decrement the subscribers refcount before the create
      thread increments the refcount.
      
      This can be simulated by creating subscription with timeout=0 and
      sometimes the timeout occurs before the create request is complete.
      This leads to the following message:
      [30.702949] BUG: spinlock bad magic on CPU#1, kworker/u8:3/87
      [30.703834] general protection fault: 0000 [#1] SMP
      [30.704826] CPU: 1 PID: 87 Comm: kworker/u8:3 Not tainted 4.4.0-rc8+ #18
      [30.704826] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [30.704826] task: ffff88003f878600 ti: ffff88003fae0000 task.ti: ffff88003fae0000
      [30.704826] RIP: 0010:[<ffffffff8109196c>]  [<ffffffff8109196c>] spin_dump+0x5c/0xe0
      [...]
      [30.704826] Call Trace:
      [30.704826]  [<ffffffff81091a16>] spin_bug+0x26/0x30
      [30.704826]  [<ffffffff81091b75>] do_raw_spin_lock+0xe5/0x120
      [30.704826]  [<ffffffff81684439>] _raw_spin_lock_bh+0x19/0x20
      [30.704826]  [<ffffffffa0096f10>] tipc_subscrb_rcv_cb+0x1d0/0x330 [tipc]
      [30.704826]  [<ffffffffa00a37b1>] tipc_receive_from_sock+0xc1/0x150 [tipc]
      [30.704826]  [<ffffffffa00a31df>] tipc_recv_work+0x3f/0x80 [tipc]
      [30.704826]  [<ffffffff8106a739>] process_one_work+0x149/0x3c0
      [30.704826]  [<ffffffff8106aa16>] worker_thread+0x66/0x460
      [30.704826]  [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0
      [30.704826]  [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0
      [30.704826]  [<ffffffff8107029d>] kthread+0xed/0x110
      [30.704826]  [<ffffffff810701b0>] ? kthread_create_on_node+0x190/0x190
      [30.704826]  [<ffffffff81684bdf>] ret_from_fork+0x3f/0x70
      
      In this commit,
      1. we remove the check for the return code for mod_timer()
      2. we protect tipc_subscrb_get() using the subscriber spin lock.
         We increment the subscriber's refcount as soon as we add the
         subscription to subscriber's subscription list.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3ad288c
    • Parthasarathy Bhuvaragan's avatar
      tipc: hold subscriber->lock for tipc_nametbl_subscribe() · d4091899
      Parthasarathy Bhuvaragan authored
      Until now, while creating a subscription the subscriber lock
      protects only the subscribers subscription list and not the
      nametable. The call to tipc_nametbl_subscribe() is outside
      the lock. However, at subscription timeout and cancel both
      the subscribers subscription list and the nametable are
      protected by the subscriber lock.
      
      This asymmetric locking mechanism leads to the following problem:
      In a SMP system, the timer can be fire on another core before
      the create request is complete.
      When the timer thread calls tipc_nametbl_unsubscribe() before create
      thread calls tipc_nametbl_subscribe(), we get a nullptr exception.
      
      This can be simulated by creating subscription with timeout=0 and
      sometimes the timeout occurs before the create request is complete.
      
      The following is the oops:
      [57.569661] BUG: unable to handle kernel NULL pointer dereference at (null)
      [57.577498] IP: [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/0x120 [tipc]
      [57.584820] PGD 0
      [57.586834] Oops: 0002 [#1] SMP
      [57.685506] CPU: 14 PID: 10077 Comm: kworker/u40:1 Tainted: P OENX 3.12.48-52.27.1.     9688.1.PTF-default #1
      [57.703637] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [57.708697] task: ffff88064c7f00c0 ti: ffff880629ef4000 task.ti: ffff880629ef4000
      [57.716181] RIP: 0010:[<ffffffffa02135aa>]  [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/   0x120 [tipc]
      [...]
      [57.812327] Call Trace:
      [57.814806]  [<ffffffffa0211c77>] tipc_subscrp_delete+0x37/0x90 [tipc]
      [57.821357]  [<ffffffffa0211e2f>] tipc_subscrp_timeout+0x3f/0x70 [tipc]
      [57.827982]  [<ffffffff810618c1>] call_timer_fn+0x31/0x100
      [57.833490]  [<ffffffff81062709>] run_timer_softirq+0x1f9/0x2b0
      [57.839414]  [<ffffffff8105a795>] __do_softirq+0xe5/0x230
      [57.844827]  [<ffffffff81520d1c>] call_softirq+0x1c/0x30
      [57.850150]  [<ffffffff81004665>] do_softirq+0x55/0x90
      [57.855285]  [<ffffffff8105aa35>] irq_exit+0x95/0xa0
      [57.860290]  [<ffffffff815215b5>] smp_apic_timer_interrupt+0x45/0x60
      [57.866644]  [<ffffffff8152005d>] apic_timer_interrupt+0x6d/0x80
      [57.872686]  [<ffffffffa02121c5>] tipc_subscrb_rcv_cb+0x2a5/0x3f0 [tipc]
      [57.879425]  [<ffffffffa021c65f>] tipc_receive_from_sock+0x9f/0x100 [tipc]
      [57.886324]  [<ffffffffa021c826>] tipc_recv_work+0x26/0x60 [tipc]
      [57.892463]  [<ffffffff8106fb22>] process_one_work+0x172/0x420
      [57.898309]  [<ffffffff8107079a>] worker_thread+0x11a/0x3c0
      [57.903871]  [<ffffffff81077114>] kthread+0xb4/0xc0
      [57.908751]  [<ffffffff8151f318>] ret_from_fork+0x58/0x90
      
      In this commit, we do the following at subscription creation:
      1. set the subscription's subscriber pointer before performing
         tipc_nametbl_subscribe(), as this value is required further in
         the call chain ex: by tipc_subscrp_send_event().
      2. move tipc_nametbl_subscribe() under the scope of subscriber lock
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4091899
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix connection abort when receiving invalid cancel request · cb01c7c8
      Parthasarathy Bhuvaragan authored
      Until now, the subscribers endianness for a subscription
      create/cancel request is determined as:
          swap = !(s->filter & (TIPC_SUB_PORTS | TIPC_SUB_SERVICE))
      The checks are performed only for port/service subscriptions.
      
      The swap calculation is incorrect if the filter in the subscription
      cancellation request is set to TIPC_SUB_CANCEL (it's a malformed
      cancel request, as the corresponding subscription create filter
      is missing).
      Thus, the check if the request is for cancellation fails and the
      request is treated as a subscription create request. The
      subscription creation fails as the request is illegal, which
      terminates this connection.
      
      In this commit we determine the endianness by including
      TIPC_SUB_CANCEL, which will set swap correctly and the
      request is processed as a cancellation request.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb01c7c8
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix connection abort during subscription cancellation · c8beccc6
      Parthasarathy Bhuvaragan authored
      In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing
      to events")', we terminate the connection if the subscription
      creation fails.
      In the same commit, the subscription creation result was based on
      the value of subscription pointer (set in the function) instead of
      the return code.
      
      Unfortunately, the same function also handles subscription
      cancellation request. For a subscription cancellation request,
      the subscription pointer cannot be set. Thus the connection is
      terminated during cancellation request.
      
      In this commit, we move the subcription cancel check outside
      of tipc_subscrp_create(). Hence,
      - tipc_subscrp_create() will create a subscripton
      - tipc_subscrb_rcv_cb() will subscribe or cancel a subscription.
      
      Fixes: 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")'
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8beccc6
    • Parthasarathy Bhuvaragan's avatar
      tipc: introduce tipc_subscrb_subscribe() routine · 7c13c622
      Parthasarathy Bhuvaragan authored
      In this commit, we split tipc_subscrp_create() into two:
      1. tipc_subscrp_create() creates a subscription
      2. A new function tipc_subscrp_subscribe() adds the
         subscription to the subscriber subscription list,
         activates the subscription timer and subscribes to
         the nametable updates.
      
      In future commits, the purpose of tipc_subscrb_rcv_cb() will
      be to either subscribe or cancel a subscription.
      
      There is no functional change in this commit.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c13c622
    • Parthasarathy Bhuvaragan's avatar
      tipc: remove struct tipc_name_seq from struct tipc_subscription · a4273c73
      Parthasarathy Bhuvaragan authored
      Until now, struct tipc_subscriber has duplicate fields for
      type, upper and lower (as member of struct tipc_name_seq) at:
      1. as member seq in struct tipc_subscription
      2. as member seq in struct tipc_subscr, which is contained
         in struct tipc_event
      The former structure contains the type, upper and lower
      values in network byte order and the later contains the
      intact copy of the request.
      The struct tipc_subscription contains a field swap to
      determine if request needs network byte order conversion.
      Thus by using swap, we can convert the request when
      required instead of duplicating it.
      
      In this commit,
      1. we remove the references to these elements as members of
         struct tipc_subscription and replace them with elements
         from struct tipc_subscr.
      2. provide new functions to convert the user request into
         network byte order.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4273c73
    • Parthasarathy Bhuvaragan's avatar
      tipc: remove filter and timeout elements from struct tipc_subscription · 30865231
      Parthasarathy Bhuvaragan authored
      Until now, struct tipc_subscription has duplicate timeout and filter
      attributes present:
      1. directly as members of struct tipc_subscription
      2. in struct tipc_subscr, which is contained in struct tipc_event
      
      In this commit, we remove the references to these elements as
      members of struct tipc_subscription and replace them with elements
      from struct tipc_subscr.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30865231
    • Parthasarathy Bhuvaragan's avatar
      tipc: remove incorrect check for subscription timeout value · 4f61d4ef
      Parthasarathy Bhuvaragan authored
      Until now, during subscription creation we set sub->timeout by
      converting the timeout request value in milliseconds to jiffies.
      This is followed by setting the timeout value in the timer if
      sub->timeout != TIPC_WAIT_FOREVER.
      
      For a subscription create request with a timeout value of
      TIPC_WAIT_FOREVER, msecs_to_jiffies(TIPC_WAIT_FOREVER)
      returns MAX_JIFFY_OFFSET (0xfffffffe). This is not equal to
      TIPC_WAIT_FOREVER (0xffffffff).
      
      In this commit, we remove this check.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f61d4ef
    • Zhang Shengju's avatar
      bonding: add slave device name for debug · c6140a29
      Zhang Shengju authored
      netdev_dbg() will add bond device name, it will be helpful if we print
      slave device name.
      Signed-off-by: default avatarZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6140a29
    • Rafał Miłecki's avatar
      bgmac: add helper checking for BCM4707 / BCM53018 chip id · 387b75f8
      Rafał Miłecki authored
      Chipsets with BCM4707 / BCM53018 ID require special handling at a few
      places in the code. It's likely there will be more IDs to check in the
      future. To simplify it add this trivial helper.
      Signed-off-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      387b75f8
    • David S. Miller's avatar
      Merge branch 'bpf-per-cpu-maps' · 8ac2c867
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: introduce per-cpu maps
      
      We've started to use bpf to trace every packet and atomic add
      instruction (event JITed) started to show up in perf profile.
      The solution is to do per-cpu counters.
      For PERCPU_(HASH|ARRAY) map the existing bpf_map_lookup() helper
      returns per-cpu area which bpf programs can use to store and
      increment the counters. The BPF_MAP_LOOKUP_ELEM syscall command
      returns areas from all cpus and user process aggregates the counters.
      The usage example is in patch 6. The api turned out to be very
      easy to use from bpf program and from user space.
      Long term we were discussing to add 'bounded loop' instruction,
      so bpf programs can do aggregation within the program which may
      help some use cases. Right now user space aggregation of
      per-cpu counters fits the best.
      
      This patch set is new approach for per-cpu hash and array maps.
      I've reused the map tests written by Martin and Ming, but
      implementation and api is new. Old discussion here:
      http://thread.gmane.org/gmane.linux.kernel/2123800/focus=2126435
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ac2c867
    • Alexei Starovoitov's avatar
    • tom.leiming@gmail.com's avatar
      samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY · df570f57
      tom.leiming@gmail.com authored
      A sanity test for BPF_MAP_TYPE_PERCPU_ARRAY
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df570f57
    • Martin KaFai Lau's avatar
      samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH · e1559671
      Martin KaFai Lau authored
      A sanity test for BPF_MAP_TYPE_PERCPU_HASH.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1559671
    • Alexei Starovoitov's avatar
      bpf: add lookup/update support for per-cpu hash and array maps · 15a07b33
      Alexei Starovoitov authored
      The functions bpf_map_lookup_elem(map, key, value) and
      bpf_map_update_elem(map, key, value, flags) need to get/set
      values from all-cpus for per-cpu hash and array maps,
      so that user space can aggregate/update them as necessary.
      
      Example of single counter aggregation in user space:
        unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
        long values[nr_cpus];
        long value = 0;
      
        bpf_lookup_elem(fd, key, values);
        for (i = 0; i < nr_cpus; i++)
          value += values[i];
      
      The user space must provide round_up(value_size, 8) * nr_cpus
      array to get/set values, since kernel will use 'long' copy
      of per-cpu values to try to copy good counters atomically.
      It's a best-effort, since bpf programs and user space are racing
      to access the same memory.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15a07b33
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map · a10423b8
      Alexei Starovoitov authored
      Primary use case is a histogram array of latency
      where bpf program computes the latency of block requests or other
      events and stores histogram of latency into array of 64 elements.
      All cpus are constantly running, so normal increment is not accurate,
      bpf_xadd causes cache ping-pong and this per-cpu approach allows
      fastest collision-free counters.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a10423b8
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_MAP_TYPE_PERCPU_HASH map · 824bd0ce
      Alexei Starovoitov authored
      Introduce BPF_MAP_TYPE_PERCPU_HASH map type which is used to do
      accurate counters without need to use BPF_XADD instruction which turned
      out to be too costly for high-performance network monitoring.
      In the typical use case the 'key' is the flow tuple or other long
      living object that sees a lot of events per second.
      
      bpf_map_lookup_elem() returns per-cpu area.
      Example:
      struct {
        u32 packets;
        u32 bytes;
      } * ptr = bpf_map_lookup_elem(&map, &key);
      /* ptr points to this_cpu area of the value, so the following
       * increments will not collide with other cpus
       */
      ptr->packets ++;
      ptr->bytes += skb->len;
      
      bpf_update_elem() atomically creates a new element where all per-cpu
      values are zero initialized and this_cpu value is populated with
      given 'value'.
      Note that non-per-cpu hash map always allocates new element
      and then deletes old after rcu grace period to maintain atomicity
      of update. Per-cpu hash map updates element values in-place.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      824bd0ce
    • Kim Jones's avatar
      ethtool: Declare netdev_rss_key as __read_mostly. · ba905f5e
      Kim Jones authored
      netdev_rss_key is written to once and thereafter is read by
      drivers when they are initialising. The fact that it is mostly
      read and not written to makes it a candidate for a __read_mostly
      declaration.
      Signed-off-by: default avatarKim Jones <kim-marie.jones@intel.com>
      Signed-off-by: default avatarAlan Carey <alan.carey@intel.com>
      Acked-by: default avatarRami Rosen <rami.rosen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba905f5e
    • David S. Miller's avatar
      Merge branch 'tcp_fast_open_synack_fin' · ef449678
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: fastopen: accept data/FIN present in SYNACK
      
      Implements RFC 7413 (TCP Fast Open) 4.2.2, accepting payload and/or FIN
      in SYNACK messages, and prepare removal of SYN flag in tcp_recvmsg()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef449678
    • Eric Dumazet's avatar
      tcp: do not enqueue skb with SYN flag · 9d691539
      Eric Dumazet authored
      If we remove the SYN flag from the skbs that tcp_fastopen_add_skb()
      places in socket receive queue, then we can remove the test that
      tcp_recvmsg() has to perform in fast path.
      
      All we have to do is to adjust SEQ in the slow path.
      
      For the moment, we place an unlikely() and output a message
      if we find an skb having SYN flag set.
      Goal would be to get rid of the test completely.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d691539
    • Eric Dumazet's avatar
      tcp: fastopen: accept data/FIN present in SYNACK message · 61d2bcae
      Eric Dumazet authored
      RFC 7413 (TCP Fast Open) 4.2.2 states that the SYNACK message
      MAY include data and/or FIN
      
      This patch adds support for the client side :
      
      If we receive a SYNACK with payload or FIN, queue the skb instead
      of ignoring it.
      
      Since we already support the same for SYN, we refactor the existing
      code and reuse it. Note we need to clone the skb, so this operation
      might fail under memory pressure.
      
      Sara Dickinson pointed out FreeBSD server Fast Open implementation
      was planned to generate such SYNACK in the future.
      
      The server side might be implemented on linux later.
      Reported-by: default avatarSara Dickinson <sara@sinodun.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61d2bcae
    • David S. Miller's avatar
      Merge branch 'rx_nohandler' · df03288b
      David S. Miller authored
      Jarod Wilson says:
      
      ====================
      net: add and use rx_nohandler stat counter
      
      The network core tries to keep track of dropped packets, but some packets
      you wouldn't really call dropped, so much as intentionally ignored, under
      certain circumstances. One such case is that of bonding and team device
      slaves that are currently inactive. Their respective rx_handler functions
      return RX_HANDLER_EXACT (the only places in the kernel that return that),
      which ends up tracking into the network core's __netif_receive_skb_core()
      function's drop path, with no pt_prev set. On a noisy network, this can
      result in a very rapidly incrementing rx_dropped counter, not only on the
      inactive slave(s), but also on the master device, such as the following:
      
      $ cat /proc/net/dev
      Inter-|   Receive                                                |  Transmit
       face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
        p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
        p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
       bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
          lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
        p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
        p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
         em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
         em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
         em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
         em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
         ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
      
      In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
      active-backup bond0, and you can see that all three have high drop counts,
      with the master bond0 showing a tally of all three.
      
      I know that this was previously discussed some here:
      
          http://www.spinics.net/lists/netdev/msg226341.html
      
      It seems additional counters never came to fruition, so this is a first
      attempt at creating one of them, so that we stop calling these drops,
      which for users monitoring rx_dropped, causes great alarm, and renders the
      counter much less useful for them.
      
      This adds a sysfs statistics node and makes the counter available via
      netlink.
      
      Additionally, I'm not certain if this set qualifies for net, or if it
      should be put aside and resubmitted for net-next after 4.5 is put to
      bed, but I do have users who consider this an important bugfix.
      
      This has been tested quite a bit on x86_64, and now lightly on i686 as
      well, to verify functionality of updates to netdev_stats_to_stats64()
      on 32-bit arches.
      ====================
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df03288b
    • Jarod Wilson's avatar
      bond: track sum of rx_nohandler for all slaves · f344b0d9
      Jarod Wilson authored
      Sample output with this set applied for an active-backup bond:
      
      $ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler
      16568
      $ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler
      16583
      $ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler
      33151
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f344b0d9
    • Jarod Wilson's avatar
      team: track sum of rx_nohandler for all slaves · bb63daf9
      Jarod Wilson authored
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb63daf9
    • Jarod Wilson's avatar
      net: add rx_nohandler stat counter · 6e7333d3
      Jarod Wilson authored
      This adds an rx_nohandler stat counter, along with a sysfs statistics
      node, and copies the counter out via netlink as well.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@mellanox.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e7333d3
    • Jarod Wilson's avatar
      net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 · 9256645a
      Jarod Wilson authored
      The netdev_stats_to_stats64 function copies the deprecated
      net_device_stats format stats into rtnl_link_stats64 for legacy support
      purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
      extend rtnl_link_stats64 without also extending net_device_stats. Relax
      the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
      zero out all the stat counters that aren't present in net_device_stats.
      
      CC: Eric Dumazet <edumazet@google.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9256645a
    • Richard Alpe's avatar
      tipc: fix link priority propagation · 81729810
      Richard Alpe authored
      Currently link priority changes isn't handled for active links. In
      this patch we resolve this by changing our priority if the peer passes
      a valid priority in a state message.
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81729810
    • Richard Alpe's avatar
      tipc: fix link attribute propagation bug · d01332f1
      Richard Alpe authored
      Changing certain link attributes (link tolerance and link priority)
      from the TIPC management tool is supposed to automatically take
      effect at both endpoints of the affected link.
      
      Currently the media address is not instantiated for the link and is
      used uninstantiated when crafting protocol messages designated for the
      peer endpoint. This means that changing a link property currently
      results in the property being changed on the local machine but the
      protocol message designated for the peer gets lost. Resulting in
      property discrepancy between the endpoints.
      
      In this patch we resolve this by using the media address from the
      link entry and using the bearer transmit function to send it. Hence,
      we can now eliminate the redundant function tipc_link_prot_xmit() and
      the redundant field tipc_link::media_addr.
      
      Fixes: 2af5ae37 (tipc: clean up unused code and structures)
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reported-by: default avatarJason Hu <huzhijiang@gmail.com>
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d01332f1
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 6247fd9f
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-02-03
      
      This series contains updates to i40e and i40evf only.
      
      Kiran adds the MAC filter element to the end of the list instead of HEAD
      just in case there are ever any ordering issues in the future.
      
      Anjali fixes several RSS issues, first fixes the hash PCTYPE enable for
      X722 since it supports a broader selection of PCTYPES for TCP and UDP.
      Then fixes a bug in XL710, X710, and X722 support for RSS since we cannot
      reduce the 4-tuple for RSS for TCP/IPv4/IPv6 or UDP/IPv4/IPv6 packets
      since this requires a product feature change coming in a later release.
      Cleans up the reset code where the restart-autoneg workaround is
      applied, since X722 does not need the workaround, add a flag to indicate
      which MAC and firmware version require the workaround to be applied.
      Adds new device id's for X722 and code to add their support.  Also
      adds another way to access the RSS keys and lookup table using the admin
      queue for X722 devices.
      
      Catherine updates the driver to replace the MAC check with a feature
      flag check for 100M SGMII, since it is only support on X722 devices
      currently.
      
      Mitch reworks the VF driver to allow channel bonding, which was not
      possible before this patch due to the asynchronous nature of the admin
      queue mechanism.  Also fixes a rare case which causes a panic if the
      VF driver is removed during reset recovery, resolve this by setting the
      ring pointers to NULL after freeing them.
      
      Shannon cleans up the driver where device capabilities were defined in
      two different places, and neither had all the definitions, so he
      consolidates the definitions in the admin queue API.  Also adds the new
      proxy-wake-on-lan capability bit available with the new X722 device.
      Lastly, added the new External Device Power Ability field to the
      get_link_status data structure by using a reserved field at the end
      of the structure.
      
      Jesse mimics the ixgbe driver's use of a private work queue in the i40e
      and i40evf drivers to avoid blocking the system work queue.
      
      Greg cleans up the driver to limit the firmware revision checks to
      properly handle DCB configurations from the firmware to the older
      devices which need these checks (specifically X710 and XL710 devices
      only).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6247fd9f
  2. 05 Feb, 2016 1 commit
    • Mahesh Bandewar's avatar
      ipvlan: inherit MTU from master device · 296d4856
      Mahesh Bandewar authored
      When we create IPvlan slave; we use ether_setup() and that
      sets up default MTU to 1500 while the master device may have
      lower / different MTU. Any subsequent changes to the masters'
      MTU are reflected into the slaves' MTU setting. However if those
      don't happen (most likely scenario), the slaves' MTU stays at
      1500 which could be bad.
      
      This change adds code to inherit MTU from the master device
      instead of using the default value during the link initialization
      phase.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Tim Hockins <thockins@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      296d4856
  3. 04 Feb, 2016 9 commits