1. 18 Apr, 2017 21 commits
  2. 17 Apr, 2017 19 commits
    • Chonggang Li's avatar
      bonding: deliver link-local packets with skb->dev set to link that packets arrived on · b89f04c6
      Chonggang Li authored
      Bonding driver changes the skb->dev to the bonding-master before
      passing the packet to stack for further processing. This, however
      does not make sense for the link-local packets and it loses "the
      link info" once its skb->dev is changed to bonding-master.  This
      patch changes this behavior for link-local packets by not changing
      the skb->dev to the bonding-master and maintaining it as it is,
      i.e. the link on which the packet arrived.
      Signed-off-by: default avatarChonggang Li <chonggangli@google.com>
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b89f04c6
    • David Ahern's avatar
      net: rtnetlink: plumb extended ack to doit function · c21ef3e3
      David Ahern authored
      Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
      for doit functions that call it directly.
      
      This is the first step to using extended error reporting in rtnetlink.
      >From here individual subsystems can be updated to set netlink_ext_ack as
      needed.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21ef3e3
    • David Lebrun's avatar
      ipv6: sr: fix BUG due to headroom too small after SRH push · af3b5158
      David Lebrun authored
      When a locally generated packet receives an SRH with two or more segments,
      the remaining headroom is too small to push an ethernet header. This patch
      ensures that the headroom is large enough after SRH push.
      
      The BUG generated the following trace.
      
      [  192.950285] skbuff: skb_under_panic: text:ffffffff81809675 len:198 put:14 head:ffff88006f306400 data:ffff88006f3063fa tail:0xc0 end:0x2c0 dev:A-1
      [  192.952456] ------------[ cut here ]------------
      [  192.953218] kernel BUG at net/core/skbuff.c:105!
      [  192.953411] invalid opcode: 0000 [#1] PREEMPT SMP
      [  192.953411] Modules linked in:
      [  192.953411] CPU: 5 PID: 3433 Comm: ping6 Not tainted 4.11.0-rc3+ #237
      [  192.953411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
      [  192.953411] task: ffff88007c2d42c0 task.stack: ffffc90000ef4000
      [  192.953411] RIP: 0010:skb_panic+0x61/0x70
      [  192.953411] RSP: 0018:ffffc90000ef7900 EFLAGS: 00010286
      [  192.953411] RAX: 0000000000000085 RBX: 00000000000086dd RCX: 0000000000000201
      [  192.953411] RDX: 0000000080000201 RSI: ffffffff81d104c5 RDI: 00000000ffffffff
      [  192.953411] RBP: ffffc90000ef7920 R08: 0000000000000001 R09: 0000000000000000
      [  192.953411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [  192.953411] R13: ffff88007c5a4000 R14: ffff88007b363d80 R15: 00000000000000b8
      [  192.953411] FS:  00007f94b558b700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000
      [  192.953411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  192.953411] CR2: 00007fff5ecd5080 CR3: 0000000074141000 CR4: 00000000001406e0
      [  192.953411] Call Trace:
      [  192.953411]  skb_push+0x3b/0x40
      [  192.953411]  eth_header+0x25/0xc0
      [  192.953411]  neigh_resolve_output+0x168/0x230
      [  192.953411]  ? ip6_finish_output2+0x242/0x8f0
      [  192.953411]  ip6_finish_output2+0x242/0x8f0
      [  192.953411]  ? ip6_finish_output2+0x76/0x8f0
      [  192.953411]  ip6_finish_output+0xa8/0x1d0
      [  192.953411]  ip6_output+0x64/0x2d0
      [  192.953411]  ? ip6_output+0x73/0x2d0
      [  192.953411]  ? ip6_dst_check+0xb5/0xc0
      [  192.953411]  ? dst_cache_per_cpu_get.isra.2+0x40/0x80
      [  192.953411]  seg6_output+0xb0/0x220
      [  192.953411]  lwtunnel_output+0xcf/0x210
      [  192.953411]  ? lwtunnel_output+0x59/0x210
      [  192.953411]  ip6_local_out+0x38/0x70
      [  192.953411]  ip6_send_skb+0x2a/0xb0
      [  192.953411]  ip6_push_pending_frames+0x48/0x50
      [  192.953411]  rawv6_sendmsg+0xa39/0xf10
      [  192.953411]  ? __lock_acquire+0x489/0x890
      [  192.953411]  ? __mutex_lock+0x1fc/0x970
      [  192.953411]  ? __lock_acquire+0x489/0x890
      [  192.953411]  ? __mutex_lock+0x1fc/0x970
      [  192.953411]  ? tty_ioctl+0x283/0xec0
      [  192.953411]  inet_sendmsg+0x45/0x1d0
      [  192.953411]  ? _copy_from_user+0x54/0x80
      [  192.953411]  sock_sendmsg+0x33/0x40
      [  192.953411]  SYSC_sendto+0xef/0x170
      [  192.953411]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
      [  192.953411]  ? trace_hardirqs_on_caller+0x12b/0x1b0
      [  192.953411]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  192.953411]  SyS_sendto+0x9/0x10
      [  192.953411]  entry_SYSCALL_64_fastpath+0x1f/0xc2
      [  192.953411] RIP: 0033:0x7f94b453db33
      [  192.953411] RSP: 002b:00007fff5ecd0578 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [  192.953411] RAX: ffffffffffffffda RBX: 00007fff5ecd16e0 RCX: 00007f94b453db33
      [  192.953411] RDX: 0000000000000040 RSI: 000055a78352e9c0 RDI: 0000000000000003
      [  192.953411] RBP: 00007fff5ecd1690 R08: 000055a78352c940 R09: 000000000000001c
      [  192.953411] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a783321e10
      [  192.953411] R13: 000055a7839890c0 R14: 0000000000000004 R15: 0000000000000000
      [  192.953411] Code: 00 00 48 89 44 24 10 8b 87 c4 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 90 58 d2 81 48 89 04 24 31 c0 e8 4f 70 9a ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 97 d8 00 00
      [  192.953411] RIP: skb_panic+0x61/0x70 RSP: ffffc90000ef7900
      [  193.000186] ---[ end trace bd0b89fabdf2f92c ]---
      [  193.000951] Kernel panic - not syncing: Fatal exception in interrupt
      [  193.001137] Kernel Offset: disabled
      [  193.001169] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      Fixes: 19d5a26f ("ipv6: sr: expand skb head only if necessary")
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af3b5158
    • Ilan Tayari's avatar
      gso: Validate assumption of frag_list segementation · 7a7a9bd7
      Ilan Tayari authored
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a7a9bd7
    • Xin Long's avatar
      sctp: get list_of_streams of strreset outreq earlier · edb12f2d
      Xin Long authored
      Now when processing strreset out responses, it gets outreq->list_of_streams
      only when result is performed. But if result is not performed, str_p will
      be NULL. It will cause panic in sctp_ulpevent_make_stream_reset_event if
      nums is not 0.
      
      This patch is to fix it by getting outreq->list_of_streams earlier, and
      also to improve some codes for the strreset inreq process.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edb12f2d
    • Chenbo Feng's avatar
      Add uid and cookie bpf helper to cg_skb_func_proto · 9fd0f315
      Chenbo Feng authored
      BPF helper functions get_socket_cookie and get_socket_uid can be
      used for network traffic classifications, among others. Expose
      them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
      commit 8f917bba ("bpf: pass sk to helper functions") the
      required skb->sk function is available at both cgroup bpf ingress
      and egress hooks. With these two new helper, cg_skb_func_proto is
      effectively the same as sk_filter_func_proto.
      
      Change since V1:
      Instead of add the helper to cg_skb_func_proto, redirect the
      cg_skb_func_proto to sk_filter_func_proto since all helper function
      in sk_filter_func_proto are applicable to cg_skb_func_proto now.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fd0f315
    • Simon Xiao's avatar
      hv_netvsc: change netvsc device default duplex to FULL · f3c9d40e
      Simon Xiao authored
      The netvsc device supports full duplex by default.
      This warnings in log from bonding device which did not like
      seeing UNKNOWN duplex.
      Signed-off-by: default avatarSimon Xiao <sixiao@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3c9d40e
    • stephen hemminger's avatar
      netvsc: fix RCU warning in get_stats · 776e726b
      stephen hemminger authored
      The statistics functionis called with RTNL held during probe
      but with RCU held during access from /proc and elsewhere.
      This is safe so update the lockdep annotation.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      776e726b
    • Dan Carpenter's avatar
      net: phy: test the right variable in phy_write_mmd() · 1dbba4cb
      Dan Carpenter authored
      This is a copy and paste buglet.  We meant to test for ->write_mmd but
      we test for ->read_mmd.
      
      Fixes: 1ee6b9bc ("net: phy: make phy_(read|write)_mmd() generic MMD accessors")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dbba4cb
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 450cc8cc
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2017-04-14
      
      Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
      kernel.
      
       - Many fixes to 6LoWPAN, in particular for BLE
       - New CA8210 IEEE 802.15.4 device driver (accounting for most of the
         lines of code added in this pull request)
       - Added Nokia Bluetooth (UART) HCI driver
       - Some serdev & TTY changes that are dependencies for the Nokia
         driver (with acks from relevant maintainers and an agreement that
         these come through the bluetooth tree)
       - Support for new Intel Bluetooth device
       - Various other minor cleanups/fixes here and there
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      450cc8cc
    • David S. Miller's avatar
      Merge branch 'bpf-lru-perf' · d584fec6
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      bpf: LRU performance and test-program improvements
      
      The first 4 patches make a few improvements to the LRU tests.
      
      Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map.
      
      Patch 6/6 adds an example in using LRU map with map-in-map.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d584fec6
    • Martin KaFai Lau's avatar
      bpf: lru: Add map-in-map LRU example · 3a5795b8
      Martin KaFai Lau authored
      This patch adds a map-in-map LRU example.
      If we know only a subset of cores will use the
      LRU, we can allocate a common LRU list per targeting core
      and store it into an array-of-hashs.
      
      It allows using the common LRU map with map-update performance
      comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
      on the unused cores that we know they will never access the LRU map.
      
      BPF_F_NO_COMMON_LRU:
      > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
      9234314 (9.23M/s)
      
      map-in-map LRU:
      > map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
      9962743 (9.96M/s)
      
      Notes that the max_entries for the map-in-map LRU test is 1260000 which
      is the max_entries for each inner LRU map.  8 processes have been
      started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
      used in the BPF_F_NO_COMMON_LRU test.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a5795b8
    • Martin KaFai Lau's avatar
      bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 · 695ba265
      Martin KaFai Lau authored
      After doing map_perf_test with a much bigger
      BPF_F_NO_COMMON_LRU map, the perf report shows a
      lot of time spent in rotating the inactive list (i.e.
      __bpf_lru_list_rotate_inactive):
      > map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
      19644783 (19M/s)
      > map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
      6283930 (6.28M/s)
      
      By inactive, it usually means the element is not in cache.  Hence,
      there is a need to tune the PERCPU_NR_SCANS value.
      
      This patch finds a better number of elements to
      scan during each list rotation.  The PERCPU_NR_SCANS (which
      is defined the same as PERCPU_FREE_TARGET) decreases
      from 16 elements to 4 elements.  This change only
      affects the BPF_F_NO_COMMON_LRU map.
      
      The test_lru_dist does not show meaningful difference
      between 16 and 4.  Our production L4 load balancer which uses
      the LRU map for conntrack-ing also shows little change in cache
      hit rate.  Since both benchmark and production data show no
      cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
      We can consider making it configurable if we find a usecase
      later that shows another value works better and/or use
      a different rotation strategy.
      
      After this change:
      > map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
      9240324 (9.2M/s)
      
      i.e. 6.28M/s -> 9.2M/s
      
      The test_lru_dist has not shown meaningful difference:
      > test_lru_dist zipf.100k.a1_01.out 4000 1:
      nr_misses: 31575 (Before) vs 31566 (After)
      
      > test_lru_dist zipf.100k.a0_01.out 40000 1
      nr_misses: 67036 (Before) vs 67031 (After)
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695ba265
    • Martin KaFai Lau's avatar
      bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def · 9fd63d05
      Martin KaFai Lau authored
      The current bpf_map_def is statically defined during compile
      time.  This patch allows the *_user.c program to change it during
      runtime.  It is done by adding load_bpf_file_fixup_map() which
      takes a callback.  The callback will be called before creating
      each map so that it has a chance to modify the bpf_map_def.
      
      The current usecase is to change max_entries in map_perf_test.
      It is interesting to test with a much bigger map size in
      some cases (e.g. the following patch on bpf_lru_map.c).
      However,  it is hard to find one size to fit all testing
      environment.  Hence, it is handy to take the max_entries
      as a cmdline arg and then configure the bpf_map_def during
      runtime.
      
      This patch adds two cmdline args.  One is to configure
      the map's max_entries.  Another is to configure the max_cnt
      which controls how many times a syscall is called.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fd63d05
    • Martin KaFai Lau's avatar
      bpf: lru: Refactor LRU map tests in map_perf_test · bf8db5d2
      Martin KaFai Lau authored
      One more LRU test will be added later in this patch series.
      In this patch, we first move all existing LRU map tests into
      a single syscall (connect) first so that the future new
      LRU test can be added without hunting another syscall.
      
      One of the map name is also changed from percpu_lru_hash_map
      to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf8db5d2
    • Martin KaFai Lau's avatar
      bpf: lru: Cleanup test_lru_map.c · 6467acbc
      Martin KaFai Lau authored
      This patch does the following cleanup on test_lru_map.c
      1) Fix indentation (Replace spaces by tabs)
      2) Remove redundant BPF_F_NO_COMMON_LRU test
      3) Simplify some comments
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6467acbc
    • Martin KaFai Lau's avatar
      bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU · 9746f856
      Martin KaFai Lau authored
      test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
      It just happens to work when PERCPU_FREE_TARGET == 16.
      
      This patch:
      1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
      2) Add test_lru_sanity6 to test list rotation for
         the BPF_F_NO_COMMON_LRU map.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9746f856
    • Jisheng Zhang's avatar
      net: mvneta: fix failed to suspend if WOL is enabled · 82960fff
      Jisheng Zhang authored
      Recently, suspend/resume and WOL support are added into mvneta driver.
      If we enable WOL, then we get some error as below on Marvell BG4CT
      platforms during suspend:
      
      [  184.149723] dpm_run_callback(): mdio_bus_suspend+0x0/0x50 returns -16
      [  184.149727] PM: Device f7b62004.mdio-mi:00 failed to suspend: error -16
      
      -16 means -EBUSY, phy_suspend() will return -EBUSY if it finds the
      device has WOL enabled.
      
      We fix this issue by properly setting the netdev's power.can_wakeup
      and power.wakeup, i.e
      
      1. in mvneta_mdio_probe(), call device_set_wakeup_capable() to set
      power.can_wakeup if the phy support WOL.
      
      2. in mvneta_ethtool_set_wol(), call device_set_wakeup_enable() to
      set power.wakeup if WOL has been successfully enabled in phy.
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82960fff
    • Nikolay Aleksandrov's avatar
      net: bridge: notify on hw fdb takeover · cab93af0
      Nikolay Aleksandrov authored
      Recently we added support for SW fdbs to take over HW ones, but that
      results in changing a user-visible fdb flag thus we need to send a
      notification, also it's consistent with how HW takes over SW entries.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab93af0