1. 23 Feb, 2016 5 commits
  2. 22 Feb, 2016 35 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · dea08e60
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Looks like a lot, but mostly driver fixes scattered all over as usual.
      
        Of note:
      
         1) Add conditional sched in nf conntrack in cleanup to avoid NMI
            watchdogs.  From Florian Westphal.
      
         2) Fix deadlock in nfnetlink cttimeout, also from Floarian.
      
         3) Fix handling of slaves in bonding ARP monitor validation, from Jay
            Vosburgh.
      
         4) Callers of ip_cmsg_send() are responsible for freeing IP options,
            some were not doing so.  Fix from Eric Dumazet.
      
         5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.
      
         6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.
      
         7) bcm7xxx PHY driver bug fixes from Florian Fainelli.
      
         8) Avoid unaligned accesses to protocol headers wrt.  GRE, from
            Alexander Duyck.
      
         9) SKB leaks and other problems in arc_emac driver, from Alexander
            Kochetkov.
      
        10) tcp_v4_inbound_md5_hash() releases listener socket instead of
            request socket on error path, oops.  Fix from Eric Dumazet.
      
        11) Missing socket release in pppoe_rcv_core() that seems to have
            existed basically forever.  From Guillaume Nault.
      
        12) Missing slave_dev unregister in dsa_slave_create() error path,
            from Florian Fainelli.
      
        13) crypto_alloc_hash() never returns NULL, fix return value check in
            __tcp_alloc_md5sig_pool.  From Insu Yun.
      
        14) Properly expire exception route entries in ipv4, from Xin Long.
      
        15) Fix races in tcp/dccp listener socket dismantle, from Eric
            Dumazet.
      
        16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
            legal.  These drivers modify the SKB on transmit.  From Jiri Benc.
      
        17) Fix regression in the initialziation of netdev->tx_queue_len.
            From Phil Sutter.
      
        18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.
      
        19) SCTP port hash sizing does not properly ensure that table is a
            power of two in size.  From Neil Horman.
      
        20) Fix initializing of software copy of MAC address in fmvj18x_cs
            driver, from Ken Kawasaki"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
        bnx2x: Fix 84833 phy command handler
        bnx2x: Fix led setting for 84858 phy.
        bnx2x: Correct 84858 PHY fw version
        bnx2x: Fix 84833 RX CRC
        bnx2x: Fix link-forcing for KR2
        net: ethernet: davicom: fix devicetree irq resource
        fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
        Driver: Vmxnet3: Update Rx ring 2 max size
        net: netcp: rework the code for get/set sw_data in dma desc
        soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
        net: ti: netcp: restore get/set_pad_info() functionality
        MAINTAINERS: Drop myself as xen netback maintainer
        sctp: Fix port hash table size computation
        can: ems_usb: Fix possible tx overflow
        Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
        net: bcmgenet: Fix internal PHY link state
        af_unix: Don't use continue to re-execute unix_stream_read_generic loop
        unix_diag: fix incorrect sign extension in unix_lookup_by_ino
        bnxt_en: Failure to update PHY is not fatal condition.
        bnxt_en: Remove unnecessary call to update PHY settings.
        ...
      dea08e60
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.5-rc6' of... · 5c102d0e
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Two fixes headed for stable:
      
         - Remove an unnecessary speed_index lookup for thermal hook in the
           gpio-fan driver.  The unnecessary speed lookup can hog the system.
      
         - Handle negative conversion values correctly in the ads1015 driver"
      
      * tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
        hwmon: (ads1015) Handle negative conversion values correctly
      5c102d0e
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · a16152c8
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "One ocrdma fix:
      
         - The new CQ API support was added to ocrdma, but they got the arming
           logic wrong, so without this, transfers eventually fail when they
           fail to arm the interrupt properly under load
      
        Two related fixes for mlx4:
      
         - When we added the 64bit extended counters support to the core IB
           code, they forgot to update the RoCE side of the mlx4 driver (the
           IB side they properly updated).
      
           I debated whether or not to include these patches as they could be
           considered feature enablement patches, but the existing code will
           blindy copy the 32bit counters, whether any counters were requested
           at all (a bug).
      
           These two patches make it (a) check to see that counters were
           requested and (b) copy the right counters (the 64bit support is
           new, the 32bit is not).  For that reason I went ahead and took
           them"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/mlx4: Add support for the port info class for RoCE ports
        IB/mlx4: Add support for extended counters over RoCE ports
        RDMA/ocrdma: Fix arm logic to align with new cq API
      a16152c8
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 7ee302f6
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some bugfixes from I2C for you:
      
        A fix for a RuntimePM regression with OMAP, a fix to enable TCO for
        Lewisburg platforms, and a typo fix while we are here"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: i801: Adding Intel Lewisburg support for iTCO
        i2c: uniphier: fix typos in error messages
        i2c: omap: Fix PM regression with deferred probe for pm_runtime_reinit
      7ee302f6
    • Ray Bellis's avatar
      tools, bpf_asm: simplify parser rule for BPF extensions · b1d95ae5
      Ray Bellis authored
      We can already use yylval in the lexer for encoding the BPF extension
      number, so that the parser rules can be further reduced to a single one
      for each B/H/W case.
      Signed-off-by: default avatarRay Bellis <ray@isc.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1d95ae5
    • Sudip Mukherjee's avatar
      netcp: use pointer to fix build fail · 0c71de66
      Sudip Mukherjee authored
      While building keystone_defconfig of arm we are getting build failure
      with the error:
      
      drivers/net/ethernet/ti/netcp_core.c:1846:31: error: invalid type argument of '->' (have 'struct tc_to_netdev')
        if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
                                     ^
      drivers/net/ethernet/ti/netcp_core.c:1851:35: error: invalid type argument of '->' (have 'struct tc_to_netdev')
            (dev->real_num_tx_queues < tc->tc))
                                         ^
      drivers/net/ethernet/ti/netcp_core.c:1855:8: error: invalid type argument of '->' (have 'struct tc_to_netdev')
        if (tc->tc) {
              ^
      drivers/net/ethernet/ti/netcp_core.c:1856:28: error: invalid type argument of '->' (have 'struct tc_to_netdev')
         netdev_set_num_tc(dev, tc->tc);
                                  ^
      drivers/net/ethernet/ti/netcp_core.c:1857:21: error: invalid type argument of '->' (have 'struct tc_to_netdev')
         for (i = 0; i < tc->tc; i++)
                           ^
      drivers/net/ethernet/ti/netcp_core.c: At top level:
      drivers/net/ethernet/ti/netcp_core.c:1879:2: warning: initialization from incompatible pointer type
        .ndo_setup_tc  = netcp_setup_tc,
        ^
      
      The callback of ndo_setup_tc should be:
      int (*ndo_setup_tc)(struct net_device *dev, u32 handle, __be16 protocol,
                          struct tc_to_netdev *tc);
      
      But we missed marking the last argument as a pointer.
      
      Fixes: 16e5cc64 ("net: rework setup_tc ndo op to consume general tc operand")
      CC: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c71de66
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.5-20160221' of... · d856626d
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.5-20160221' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2016-02-21
      
      this is a pull reqeust of one patch for net/master.
      
      The patch is by Gerhard Uttenthaler and fixes a potential tx overflow in the
      ems_usb driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d856626d
    • David S. Miller's avatar
      Merge branch 'bnx2x-848xx-phy-fixes' · dd78dac8
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      bnx2x: Fix 848xx phys
      
      This series contains link-related fixes, mostly for the 848xx phys
      [2 patches are for 84833, and 2 patches are for 84858].
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd78dac8
    • Yuval Mintz's avatar
      bnx2x: Fix 84833 phy command handler · 4ec0b6d5
      Yuval Mintz authored
      Current initialization sequence is lacking, causing some configurations
      to fail.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ec0b6d5
    • Yuval Mintz's avatar
      bb1187af
    • Yuval Mintz's avatar
      bnx2x: Correct 84858 PHY fw version · 27ba2d2d
      Yuval Mintz authored
      The phy's firmware version isn't being parsed properly as it's
      currently parsed like the rest of the 848xx phys.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27ba2d2d
    • Yuval Mintz's avatar
      bnx2x: Fix 84833 RX CRC · 512ab9a0
      Yuval Mintz authored
      There's a problem in current 84833 phy configuration -
      in case 1Gb link is configured and jumbo-sized packets are being
      used, device will experience RX crc errors.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      512ab9a0
    • Yuval Mintz's avatar
      bnx2x: Fix link-forcing for KR2 · 1e411f01
      Yuval Mintz authored
      Currently, when link is using KR2 it cannot be forced to any speed other
      than 20g.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.om>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e411f01
    • David S. Miller's avatar
      Merge branch 'qed-next' · 0162a583
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      qed*: Driver updates
      
      This contains various minor changes to driver - changing memory allocation,
      fixing a small theoretical bug, as well as some mostly-semantic changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0162a583
    • Yuval Mintz's avatar
      d4ee5289
    • Yuval Mintz's avatar
      qed: Introduce DMA_REGPAIR_LE · 94494598
      Yuval Mintz authored
      FW hsi contains regpairs, mostly for 64-bit address representations.
      Since same paradigm is applied each time a regpair is filled, this
      introduces a new utility macro for setting such regpairs.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94494598
    • Yuval Mintz's avatar
      qed: Change metadata needed for SPQ entries · 06f56b81
      Yuval Mintz authored
      Each configuration element send via ramrod requires a Slow Path Queue
      entry. This slightly changes the way such an entry is configured, but
      contains mostly semantic changes [where more parameters are gathered
      in a sub-struct instead of being directly passed].
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06f56b81
    • Yuval Mintz's avatar
      qed: Handle possible race in SB config · 0a0c5d3b
      Yuval Mintz authored
      Due to HW design, some of the memories are wide-bus and access to those
      needs to be sequentialized on a per-HW-block level; Read/write to a
      given HW-block might break other read/write to wide-bus memory done at
      ~same time.
      
      Status blocks initialization in CAU is done into such a wide-bus memory.
      This moves the initialization into using DMAE which is guaranteed to be
      safe to use on such memories.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a0c5d3b
    • Yuval Mintz's avatar
      qed: Turn most GFP_ATOMIC into GFP_KERNEL · 60fffb3b
      Yuval Mintz authored
      Initial driver submission used GFP_ATOMIC almost inclusively when
      allocating memory. We now remedy this point, using GFP_KERNEL where
      it's possible.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60fffb3b
    • David S. Miller's avatar
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 9ca69b70
      David S. Miller authored
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth 2016-02-20
      
      Here's an important patch for 4.5 which fixes potential invalid pointer
      access when processing completed Bluetooth HCI commands.
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ca69b70
    • David S. Miller's avatar
      Merge branch 'ipvlan-misc' · ea5b2f44
      David S. Miller authored
      Mahesh Bandewar says:
      
      ====================
      IPvlan misc patches
      
      This is a collection of unrelated patches for IPvlan driver.
      a. crub_skb() changes are added to ensure that the packets hit the
      NF_HOOKS in masters' ns in L3 mode.
      b. u16 change is bug fix while
      c. the third patch is to group tx/rx variables in single cacheline
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea5b2f44
    • Mahesh Bandewar's avatar
      ipvlan: misc changes · ab5b7013
      Mahesh Bandewar authored
      1. scope correction for few functions that are used in single file.
      2. Adjust variables that are used in fast-path to fit into single cacheline
      3. Update rcv_frame() to skip shared check for frames coming over wire
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab5b7013
    • Mahesh Bandewar's avatar
      ipvlan: mode is u16 · e93fbc5a
      Mahesh Bandewar authored
      The mode argument was erronusly defined as u32 but it has always
      been u16. Also use ipvlan_set_mode() helper to set the mode instead
      of assigning directly. This should avoid future erronus assignments /
      updates.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e93fbc5a
    • Mahesh Bandewar's avatar
      ipvlan: scrub skb before routing in L3 mode. · c3aaa06d
      Mahesh Bandewar authored
      Scrub skb before hitting the iptable hooks to ensure packets hit
      these hooks. Set the xnet param only when the packet is crossing the
      ns boundry so if the IPvlan slave and master belong to the same ns,
      the param will be set to false.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3aaa06d
    • Robert Jarzmik's avatar
      net: ethernet: davicom: fix devicetree irq resource · b5a099c6
      Robert Jarzmik authored
      The dm9000 driver doesn't work in at least one device-tree
      configuration, spitting an error message on irq resource :
      [    1.062495] dm9000 8000000.ethernet: insufficient resources
      [    1.068439] dm9000 8000000.ethernet: not found (-2).
      [    1.073451] dm9000: probe of 8000000.ethernet failed with error -2
      
      The reason behind is that the interrupt might be provided by a gpio
      controller, not probed when dm9000 is probed, and needing the probe
      deferral mechanism to apply.
      
      Currently, the interrupt is directly taken from resources. This patch
      changes this to use the more generic platform_get_irq(), which handles
      the deferral.
      
      Moreover, since commit Fixes: 7085a740 ("drivers: platform: parse
      IRQ flags from resources"), the interrupt trigger flags are honored in
      platform_get_irq(), so remove the needless code in dm9000.
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Acked-by: default avatarMarcel Ziswiler <marcel@ziswiler.com>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Tested-by: default avatarSergei Ianovich <ynvich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5a099c6
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-4.6-20160220' of... · 86310cc4
      David S. Miller authored
      Merge tag 'linux-can-next-for-4.6-20160220' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2016-02-20
      
      this is a pull request of 9 patch for net-next/master.
      
      The first 3 patches are from Damien Riegel, they add support for
      Technologic Systems IP core to tje sja100 driver. The next patches 6 by
      Marek Vasut (including one my me) first clean sort the CAN driver's
      Kconfig and Makefiles and then add support for the IFI CANFD IP core.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86310cc4
    • Ken Kawasaki's avatar
      fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address · 1ad54668
      Ken Kawasaki authored
      fix incorrect indexing of dev->dev_addr[] when copying the MAC address
      of FMV-J182 at buf[5].
      Signed-off-by: default avatarKen Kawasaki <ken_kawasaki@nifty.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad54668
    • David S. Miller's avatar
      Merge branch 'bpf-helper-improvements' · 9c572dc4
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF updates
      
      This set contains various updates for eBPF, i.e. the addition of a
      generic csum helper function and other misc bits that mostly improve
      existing helpers and ease programming with eBPF on cls_bpf. For more
      details, please see individual patches.
      
      Set is rebased on top of http://patchwork.ozlabs.org/patch/584465/.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c572dc4
    • Daniel Borkmann's avatar
      bpf: don't emit mov A,A on return · 6205b9cf
      Daniel Borkmann authored
      While debugging with bpf_jit_disasm I noticed emissions of 'mov %eax,%eax',
      and found that this comes from BPF_RET | BPF_A translations from classic
      BPF. Emitting this is unnecessary as BPF_REG_A is mapped into BPF_REG_0
      already, therefore only emit a mov when immediates are used as return value.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6205b9cf
    • Daniel Borkmann's avatar
      bpf: fix csum update in bpf_l4_csum_replace helper for udp · 2f72959a
      Daniel Borkmann authored
      When using this helper for updating UDP checksums, we need to extend
      this in order to write CSUM_MANGLED_0 for csum computations that result
      into 0 as sum. Reason we need this is because packets with a checksum
      could otherwise become incorrectly marked as a packet without a checksum.
      Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
      not turn packets without a checksum into ones with a checksum.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f72959a
    • Daniel Borkmann's avatar
      bpf: try harder on clones when writing into skb · 3697649f
      Daniel Borkmann authored
      When we're dealing with clones and the area is not writeable, try
      harder and get a copy via pskb_expand_head(). Replace also other
      occurences in tc actions with the new skb_try_make_writable().
      Reported-by: default avatarAshhad Sheikh <ashhadsheikh394@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3697649f
    • Daniel Borkmann's avatar
      bpf: remove artificial bpf_skb_{load, store}_bytes buffer limitation · 21cafc1d
      Daniel Borkmann authored
      We currently limit bpf_skb_store_bytes() and bpf_skb_load_bytes()
      helpers to only store or load a maximum buffer of 16 bytes. Thus,
      loading, rewriting and storing headers require several bpf_skb_load_bytes()
      and bpf_skb_store_bytes() calls.
      
      Also here we can use a per-cpu scratch buffer instead in order to not
      pressure stack space any further. I do suspect that this limit was mainly
      set in place for this particular reason. So, ease program development
      by removing this limitation and make the scratchpad generic, so it can
      be reused.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21cafc1d
    • Daniel Borkmann's avatar
      bpf: add generic bpf_csum_diff helper · 7d672345
      Daniel Borkmann authored
      For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
      currently limited to handle 2 and 4 byte changes in a header and feeds the
      from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
      working with IPv6, for example, this makes it rather cumbersome to deal
      with, similarly when editing larger parts of a header.
      
      Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
      add a case for header field mask of 0 to change the checksum at a given
      offset through inet_proto_csum_replace_by_diff(), and provide a helper
      bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
      amounts of data.
      
      This can be used in multiple ways: for the bpf_l4_csum_replace() only
      part, this even provides us with the option to insert precalculated diffs
      from user space f.e. from a map, or from bpf_csum_diff() during runtime.
      
      bpf_csum_diff() has a optional from/to stack buffer input, so we can
      calculate a diff by using a scratchbuffer for scenarios where we're
      inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
      don't need to be of equal size) data. Also, bpf_csum_diff() allows to
      feed a previous csum into csum_partial(), so the function can also be
      cascaded.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d672345
    • Daniel Borkmann's avatar
      bpf: add new arg_type that allows for 0 sized stack buffer · 8e2fe1d9
      Daniel Borkmann authored
      Currently, when we pass a buffer from the eBPF stack into a helper
      function, the function proto indicates argument types as ARG_PTR_TO_STACK
      and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1>
      must be of the latter type. Then, verifier checks whether the buffer
      points into eBPF stack, is initialized, etc. The verifier also guarantees
      that the constant value passed in R<X+1> is greater than 0, so helper
      functions don't need to test for it and can always assume a non-NULL
      initialized buffer as well as non-0 buffer size.
      
      This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that
      allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function.
      Such helper functions, of course, need to be able to handle these cases
      internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0
      or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any
      other combinations are not possible to load.
      
      I went through various options of extending the verifier, and introducing
      the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes
      needed to the verifier.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2fe1d9
    • David S. Miller's avatar
      Merge branch 'geneve-vxlan-outer-checksum' · 8b393f83
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      GENEVE/VXLAN: Enable outer Tx checksum by default
      
      This patch series makes it so that we enable the outer Tx checksum for IPv4
      tunnels by default.  This makes the behavior consistent with how we were
      handling this for IPv6.  In addition I have updated the internal flags for
      these tunnels so that we use a ZERO_CSUM_TX flag for IPv4 which should
      match up will with the ZERO_CSUM6_TX flag which was already in use for
      IPv6.
      
      For most network devices this should be a net gain in terms of performance
      as having the outer header checksum present allows for devices to report
      CHECKSUM_UNNECESSARY which we can then convert to CHECKSUM_COMPLETE in order
      to determine if the inner header checksum is valid.
      
      Below is some data I collected with ixgbe with an X540 that demonstrates
      this.  I located two PFs connected back to back in two different name
      spaces and then setup a pair of tunnels on each, one with checksum enabled
      and one without.
      
      Recv   Send    Send                          Utilization
      Socket Socket  Message  Elapsed              Send
      Size   Size    Size     Time     Throughput  local
      bytes  bytes   bytes    secs.    10^6bits/s  % S
      
      noudpcsum:
       87380  16384  16384    30.00      8898.67   12.80
      udpcsum:
       87380  16384  16384    30.00      9088.47   5.69
      
      The one spot where this may cause a performance regression is if the
      environment contains devices that can parse the inner headers and a device
      supports NETIF_F_GSO_UDP_TUNNEL but not NETIF_F_GSO_UDP_TUNNEL_CSUM.  In
      the case of such a device we have to fall back to using GSO to segment the
      tunnel instead of TSO and as a result we may take a performance hit as seen
      below with i40e.
      
      Recv   Send    Send                          Utilization
      Socket Socket  Message  Elapsed              Send
      Size   Size    Size     Time     Throughput  local
      bytes  bytes   bytes    secs.    10^6bits/s  % S
      
      noudpcsum:
       87380  16384  16384    30.00      9085.21   3.32
      udpcsum:
       87380  16384  16384    30.00      9089.23   5.54
      
      In addition it will be necessary to update iproute2 so that we don't
      provide the checksum attribute unless specified.  This way on older kernels
      which don't have local checksum offload we will default to disabling the
      outer checksum, and on newer kernels that have LCO we can default to
      enabling it.
      
      I also haven't investigated the effect this will have on OVS.  However I
      suspect the impact should be minimal as the worst case scenario should be
      that Tx checksumming will become enabled by default which should be
      consistent with the existing behavior for IPv6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b393f83