1. 04 Nov, 2016 2 commits
  2. 03 Nov, 2016 12 commits
  3. 02 Nov, 2016 18 commits
    • Govindarajulu Varadarajan's avatar
      enic: set skb->hash type properly · 17197236
      Govindarajulu Varadarajan authored
      Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
      which is bit mask. This is wrong. Hw actually provides us enum.
      Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
      
      Fixes: bf751ba8 ("driver/net: enic: record q_number and rss_hash for skb")
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17197236
    • Philippe Reynes's avatar
      net: 3com: typhoon: use new api ethtool_{get|set}_link_ksettings · f7a5537c
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Reviewed-by: default avatarDavid Dillow <dave@thedillows.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a5537c
    • Tom Herbert's avatar
      ila: Fix crash caused by rhashtable changes · 1913540a
      Tom Herbert authored
      commit ca26893f ("rhashtable: Add rhlist interface")
      added a field to rhashtable_iter so that length became 56 bytes
      and would exceed the size of args in netlink_callback (which is
      48 bytes). The netlink diag dump function already has been
      allocating a iter structure and storing the pointed to that
      in the args of netlink_callback. ila_xlat also uses
      rhahstable_iter but is still putting that directly in
      the arg block. Now since rhashtable_iter size is increased
      we are overwriting beyond the structure. The next field
      happens to be cb_mutex pointer in netlink_sock and hence the crash.
      
      Fix is to alloc the rhashtable_iter and save it as pointer
      in arg.
      
      Tested:
      
        modprobe ila
        ./ip ila add loc 3333:0:0:0 loc_match 2222:0:0:1,
        ./ip ila list  # NO crash now
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1913540a
    • Cyrill Gorcunov's avatar
      net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnect · 3de864f8
      Cyrill Gorcunov authored
      While being preparing patches for killing raw sockets via
      diag netlink interface I noticed that my runs are stuck:
      
       | [root@pcs7 ~]# cat /proc/`pidof ss`/stack
       | [<ffffffff816d1a76>] __lock_sock+0x80/0xc4
       | [<ffffffff816d206a>] lock_sock_nested+0x47/0x95
       | [<ffffffff8179ded6>] udp_disconnect+0x19/0x33
       | [<ffffffff8179b517>] raw_abort+0x33/0x42
       | [<ffffffff81702322>] sock_diag_destroy+0x4d/0x52
      
      which has not been the case before. I narrowed it down to the commit
      
       | commit 286c72de
       | Author: Eric Dumazet <edumazet@google.com>
       | Date:   Thu Oct 20 09:39:40 2016 -0700
       |
       |     udp: must lock the socket in udp_disconnect()
      
      where we start locking the socket for different reason.
      
      So the raw_abort escaped the renaming and we have to
      fix this typo using __udp_disconnect instead.
      
      Fixes: 286c72de ("udp: must lock the socket in udp_disconnect()")
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrey Vagin <avagin@openvz.org>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3de864f8
    • Woojung Huh's avatar
      lan78xx: Use irq_domain for phy interrupt from USB Int. EP · cc89c323
      Woojung Huh authored
      To utilize phylib with interrupt fully than handling some of phy stuff in the MAC driver,
      create irq_domain for USB interrupt EP of phy interrupt and
      pass the irq number to phy_connect_direct() instead of PHY_IGNORE_INTERRUPT.
      
      Idea comes from drivers/gpio/gpio-dl2.c
      Signed-off-by: default avatarWoojung Huh <woojung.huh@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc89c323
    • Eric Dumazet's avatar
      tcp: enhance tcp collapsing · 2331ccc5
      Eric Dumazet authored
      As Ilya Lesokhin suggested, we can collapse two skbs at retransmit
      time even if the skb at the right has fragments.
      
      We simply have to use more generic skb_copy_bits() instead of
      skb_copy_from_linear_data() in tcp_collapse_retrans()
      
      Also need to guard this skb_copy_bits() in case there is nothing to
      copy, otherwise skb_put() could panic if left skb has frags.
      
      Tested:
      
      Used following packetdrill test
      
      // Establish a connection.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      +.100 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
         +0 write(4, ..., 200) = 200
         +0 > P. 1:201(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 201:401(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 401:601(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 601:801(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 801:1001(200) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1001:1101(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1101:1201(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1201:1301(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1301:1401(100) ack 1
      
      +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401>
      // Check that TCP collapse works :
         +0 > P. 1:1001(1000) ack 1
      Reported-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2331ccc5
    • Philippe Reynes's avatar
      net: 3c509: use new api ethtool_{get|set}_link_ksettings · b646cf29
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b646cf29
    • Philippe Reynes's avatar
      net: 3c59x: use new api ethtool_{get|set}_link_ksettings · e19b7883
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e19b7883
    • Philippe Reynes's avatar
      net: mii: add generic function to support ksetting support · bc8ee596
      Philippe Reynes authored
      The old ethtool api (get_setting and set_setting) has generic mii
      functions mii_ethtool_sset and mii_ethtool_gset.
      
      To support the new ethtool api ({get|set}_link_ksettings), we add
      two generics mii function mii_ethtool_{get|set}_link_ksettings_get.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc8ee596
    • David S. Miller's avatar
      Merge branch 'mlx4-XDP-tx-refactor' · 55454e9e
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx4 XDP TX refactor
      
      This patchset refactors the XDP forwarding case, so that
      its dedicated transmit queues are managed in a complete
      separation from the other regular ones.
      
      It also adds ethtool counters for XDP cases.
      
      Series generated against net-next commit:
      22ca904a genetlink: fix error return code in genl_register_family()
      
      Thanks,
      Tariq.
      
      v3:
      * Exposed per ring counters.
      
      v2:
      * Added ethtool counters.
      * Rebased, now patch 2 reverts Brenden's fix, as the bug no longer exists:
        958b3d39 ("net/mlx4_en: fixup xdp tx irq to match rx")
      * Updated commit message of patch 2.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55454e9e
    • Tariq Toukan's avatar
      net/mlx4_en: Add ethtool statistics for XDP cases · 15fca2c8
      Tariq Toukan authored
      XDP statistics are reported in ethtool, in total and per ring,
      as follows:
      - xdp_drop: the number of packets dropped by xdp.
      - xdp_tx: the number of packets forwarded by xdp.
      - xdp_tx_full: the number of times an xdp forward failed
      	due to a full tx xdp ring.
      
      In addition, all packets that are dropped/forwarded by XDP
      are no longer accounted in rx_packets/rx_bytes of the ring,
      so that they count traffic that is passed to the stack.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15fca2c8
    • Tariq Toukan's avatar
      net/mlx4_en: Refactor the XDP forwarding rings scheme · 67f8b1dc
      Tariq Toukan authored
      Separately manage the two types of TX rings: regular ones, and XDP.
      Upon an XDP set, do not borrow regular TX rings and convert them
      into XDP ones, but allocate new ones, unless we hit the max number
      of rings.
      Which means that in systems with smaller #cores we will not consume
      the current TX rings for XDP, while we are still in the num TX limit.
      
      XDP TX rings counters are not shown in ethtool statistics.
      Instead, XDP counters will be added to the respective RX rings
      in a downstream patch.
      
      This has no performance implications.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67f8b1dc
    • Tariq Toukan's avatar
      net/mlx4_en: Add TX_XDP for CQ types · ccc109b8
      Tariq Toukan authored
      Support XDP CQ type, and refactor the CQ type enum.
      Rename the is_tx field to match the change.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccc109b8
    • Xin Long's avatar
      sctp: clean up sctp_packet_transmit · e4ff952a
      Xin Long authored
      After adding sctp gso, sctp_packet_transmit is a quite big function now.
      
      This patch is to extract the codes for packing packet to sctp_packet_pack
      from sctp_packet_transmit, and add some comments, simplify the err path by
      freeing auth chunk when freeing packet chunk_list in out path and freeing
      head skb early if it fails to pack packet.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4ff952a
    • David S. Miller's avatar
      Merge branch 'cls_flower-misc' · 92901827
      David S. Miller authored
      Roi Dayan says:
      
      ====================
      misc TC/flower changes
      
      This series includes two small changes to the TC flower classifier.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92901827
    • Roi Dayan's avatar
      net/sched: cls_flower: merge filter delete/destroy common code · 13fa876e
      Roi Dayan authored
      Move common code from fl_delete and fl_detroy to __fl_delete.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13fa876e
    • Roi Dayan's avatar
      net/sched: cls_flower: add missing unbind call when destroying flows · a1a8f7fe
      Roi Dayan authored
      tcf_unbind was called in fl_delete but was missing in fl_destroy when
      force deleting flows.
      
      Fixes: 77b9900e ('tc: introduce Flower classifier')
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a8f7fe
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4cb551a1
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. This includes better integration with the routing subsystem for
      nf_tables, explicit notrack support and smaller updates. More
      specifically, they are:
      
      1) Add fib lookup expression for nf_tables, from Florian Westphal. This
         new expression provides a native replacement for iptables addrtype
         and rp_filter matches. This is more flexible though, since we can
         populate the kernel flowi representation to inquire fib to
         accomodate new usecases, such as RTBH through skb mark.
      
      2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This
         new expression allow you to access skbuff route metadata, more
         specifically nexthop and classid fields.
      
      3) Add notrack support for nf_tables, to skip conntracking, requested by
         many users already.
      
      4) Add boilerplate code to allow to use nf_log infrastructure from
         nf_tables ingress.
      
      5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate
         the xtables cluster match, from Liping Zhang.
      
      6) Move socket lookup code into generic nf_socket_* infrastructure so
         we can provide a native replacement for the xtables socket match.
      
      7) Make sure nfnetlink_queue data that is updated on every packets is
         placed in a different cache from read-only data, from Florian Westphal.
      
      8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal.
      
      9) Start round robin number generation in nft_numgen from zero,
         instead of n-1, for consistency with xtables statistics match,
         patch from Liping Zhang.
      
      10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log,
          given we retry with a smaller allocation on failure, from Calvin Owens.
      
      11) Cleanup xt_multiport to use switch(), from Gao feng.
      
      12) Remove superfluous check in nft_immediate and nft_cmp, from
          Liping Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cb551a1
  4. 01 Nov, 2016 8 commits
    • Florian Westphal's avatar
      netfilter: nf_queue: place volatile data in own cacheline · 886bc503
      Florian Westphal authored
      As the comment indicates, the data at the end of nfqnl_instance struct is
      written on every queue/dequeue, so it should reside in its own cacheline.
      
      Before this change, 'lock' was in first cacheline so we dirtied both.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      886bc503
    • Liping Zhang's avatar
      netfilter: nf_tables: remove useless U8_MAX validation · e41e9d62
      Liping Zhang authored
      After call nft_data_init, size is already validated and desc.len will
      not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never
      exceed U8_MAX.
      
      Furthermore, in nft_immediate_init, we forget to call nft_data_uninit
      when desc.len exceeds U8_MAX, although this will not happen, but it's
      a logical mistake.
      
      Now remove these redundant validation introduced by commit 36b701fa
      ("netfilter: nf_tables: validate maximum value of u32 netlink attributes")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e41e9d62
    • Anders K. Pedersen's avatar
      netfilter: nf_tables: introduce routing expression · 2fa84193
      Anders K. Pedersen authored
      Introduces an nftables rt expression for routing related data with support
      for nexthop (i.e. the directly connected IP address that an outgoing packet
      is sent to), which can be used either for matching or accounting, eg.
      
       # nft add rule filter postrouting \
      	ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
      
      This will drop any traffic to 192.168.1.0/24 that is not routed via
      192.168.0.1.
      
       # nft add rule filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule ip6 filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
      
      These rules count outgoing traffic per nexthop. Note that the timeout
      releases an entry if no traffic is seen for this nexthop within 10 minutes.
      
       # nft add rule inet filter postrouting \
      	ether type ip \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule inet filter postrouting \
      	ether type ip6 \
      	flow table acct { rt nexthop timeout 600s counter }
      
      Same as above, but via the inet family, where the ether type must be
      specified explicitly.
      
      "rt classid" is also implemented identical to "meta rtclassid", since it
      is more logical to have this match in the routing expression going forward.
      Signed-off-by: default avatarAnders K. Pedersen <akp@cohaesio.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2fa84193
    • Pablo Neira Ayuso's avatar
      netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c · 8db4c5be
      Pablo Neira Ayuso authored
      We need this split to reuse existing codebase for the upcoming nf_tables
      socket expression.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8db4c5be
    • Pablo Neira Ayuso's avatar
      netfilter: nf_log: add packet logging for netdev family · 1fddf4ba
      Pablo Neira Ayuso authored
      Move layer 2 packet logging into nf_log_l2packet() that resides in
      nf_log_common.c, so this can be shared by both bridge and netdev
      families.
      
      This patch adds the boiler plate code to register the netdev logging
      family.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1fddf4ba
    • Florian Westphal's avatar
      netfilter: nf_tables: add fib expression · f6d0cbcf
      Florian Westphal authored
      Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
      just dispatches to ipv4 or ipv6 one based on nfproto).
      
      Currently supports fetching output interface index/name and the
      rtm_type associated with an address.
      
      This can be used for adding path filtering. rtm_type is useful
      to e.g. enforce a strong-end host model where packets
      are only accepted if daddr is configured on the interface the
      packet arrived on.
      
      The fib expression is a native nftables alternative to the
      xtables addrtype and rp_filter matches.
      
      FIB result order for oif/oifname retrieval is as follows:
       - if packet is local (skb has rtable, RTF_LOCAL set, this
         will also catch looped-back multicast packets), set oif to
         the loopback interface.
       - if fib lookup returns an error, or result points to local,
         store zero result.  This means '--local' option of -m rpfilter
         is not supported. It is possible to use 'fib type local' or add
         explicit saddr/daddr matching rules to create exceptions if this
         is really needed.
       - store result in the destination register.
         In case of multiple routes, search set for desired oif in case
         strict matching is requested.
      
      ipv4 and ipv6 behave fib expressions are supposed to behave the same.
      
      [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
      
      	http://patchwork.ozlabs.org/patch/688615/
      
        to address fallout from this patch after rebasing nf-next, that was
        posted to address compilation warnings. --pablo ]
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f6d0cbcf
    • Wei Yongjun's avatar
      genetlink: fix error return code in genl_register_family() · 22ca904a
      Wei Yongjun authored
      Fix to return a negative error code from the idr_alloc() error handling
      case instead of 0, as done elsewhere in this function.
      
      Also fix the return value check of idr_alloc() since idr_alloc return
      negative errors on failure, not zero.
      
      Fixes: 2ae0f17d ("genetlink: use idr to track families")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22ca904a
    • David Ahern's avatar
      net: Enable support for VRF with ipv4 multicast · e58e4159
      David Ahern authored
      Enable support for IPv4 multicast:
      - similar to unicast the flow struct is updated to L3 master device
        if relevant prior to calling fib_rules_lookup. The table id is saved
        to the lookup arg so the rule action for ipmr can return the table
        associated with the device.
      
      - ip_mr_forward needs to check for master device mismatch as well
        since the skb->dev is set to it
      
      - allow multicast address on VRF device for Rx by checking for the
        daddr in the VRF device as well as the original ingress device
      
      - on Tx need to drop to __mkroute_output when FIB lookup fails for
        multicast destination address.
      
      - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates
        IPMR FIB rules on first device create similar to FIB rules. In
        addition the VRF driver does not divert IPv4 multicast packets:
        it breaks on Tx since the fib lookup fails on the mcast address.
      
      With this patch, ipmr forwarding and local rx/tx work.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e58e4159