1. 13 Feb, 2020 12 commits
    • David S. Miller's avatar
      Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer' · 803381f9
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      icmp: account for NAT when sending icmps from ndo layer
      
      The ICMP routines use the source address for two reasons:
      
      1. Rate-limiting ICMP transmissions based on source address, so
         that one source address cannot provoke a flood of replies. If
         the source address is wrong, the rate limiting will be
         incorrectly applied.
      
      2. Choosing the interface and hence new source address of the
         generated ICMP packet. If the original packet source address
         is wrong, ICMP replies will be sent from the wrong source
         address, resulting in either a misdelivery, infoleak, or just
         general network admin confusion.
      
      Most of the time, the icmp_send and icmpv6_send routines can just reach
      down into the skb's IP header to determine the saddr. However, if
      icmp_send or icmpv6_send is being called from a network device driver --
      there are a few in the tree -- then it's possible that by the time
      icmp_send or icmpv6_send looks at the packet, the packet's source
      address has already been transformed by SNAT or MASQUERADE or some other
      transformation that CONNTRACK knows about. In this case, the packet's
      source address is most certainly the *wrong* source address to be used
      for the purpose of ICMP replies.
      
      Rather, the source address we want to use for ICMP replies is the
      original one, from before the transformation occurred.
      
      Fortunately, it's very easy to just ask CONNTRACK if it knows about this
      packet, and if so, how to fix it up. The saddr is the only field in the
      header we need to fix up, for the purposes of the subsequent processing
      in the icmp_send and icmpv6_send functions, so we do the lookup very
      early on, so that the rest of the ICMP machinery can progress as usual.
      
      Changes v3->v4:
      - Add back the skb_shared checking, since the previous assumption isn't
        actually true [Eric]. This implies dropping the additional patches v3 had
        for removing skb_share_check from various drivers. We can revisit that
        general set of ideas later, but that's probably better suited as a net-next
        patchset rather than this stable one which is geared at fixing bugs. So,
        this implements things in the safe conservative way.
      
      Changes v2->v3:
      - Add selftest to ensure this actually does what we want and never regresses.
      - Check the size of the skb header before operating on it.
      - Use skb_ensure_writable to ensure we can modify the cloned skb [Florian].
      - Conditionalize this on IPS_SRC_NAT so we don't do anything unnecessarily
        [Florian].
      - It turns out that since we're calling these from the xmit path,
        skb_share_check isn't required, so remove that [Florian]. This simplifes the
        code a bit too. **The supposition here is that skbs passed to ndo_start_xmit
        are _never_ shared. If this is not correct NOW IS THE TIME TO PIPE UP, for
        doom awaits us later.**
      - While investigating the shared skb business, several drivers appeared to be
        calling it incorrectly in the xmit path, so this series also removes those
        unnecessary calls, based on the supposition mentioned in the previous point.
      
      Changes v1->v2:
      - icmpv6 takes subtly different types than icmpv4, like u32 instead of be32,
        u8 instead of int.
      - Since we're technically writing to the skb, we need to make sure it's not
        a shared one [Dave, 2017].
      - Restore the original skb data after icmp_send returns. All current users
        are freeing the packet right after, so it doesn't matter, but future users
        might not.
      - Remove superfluous route lookup in sunvnet [Dave].
      - Use NF_NAT instead of NF_CONNTRACK for condition [Florian].
      - Include this cover letter [Dave].
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      803381f9
    • Jason A. Donenfeld's avatar
      xfrm: interface: use icmp_ndo_send helper · 45942ba8
      Jason A. Donenfeld authored
      Because xfrmi is calling icmp from network device context, it should use
      the ndo helper so that the rate limiting applies correctly.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45942ba8
    • Jason A. Donenfeld's avatar
      wireguard: device: use icmp_ndo_send helper · a12d7f3c
      Jason A. Donenfeld authored
      Because wireguard is calling icmp from network device context, it should
      use the ndo helper so that the rate limiting applies correctly.  This
      commit adds a small test to the wireguard test suite to ensure that the
      new functions continue doing the right thing in the context of
      wireguard. It does this by setting up a condition that will definately
      evoke an icmp error message from the driver, but along a nat'd path.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a12d7f3c
    • Jason A. Donenfeld's avatar
      sunvnet: use icmp_ndo_send helper · 67c9a7e1
      Jason A. Donenfeld authored
      Because sunvnet is calling icmp from network device context, it should use
      the ndo helper so that the rate limiting applies correctly. While we're
      at it, doing the additional route lookup before calling icmp_ndo_send is
      superfluous, since this is the job of the icmp code in the first place.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Shannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67c9a7e1
    • Jason A. Donenfeld's avatar
      gtp: use icmp_ndo_send helper · e0fce6f9
      Jason A. Donenfeld authored
      Because gtp is calling icmp from network device context, it should use
      the ndo helper so that the rate limiting applies correctly.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Harald Welte <laforge@gnumonks.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0fce6f9
    • Jason A. Donenfeld's avatar
      icmp: introduce helper for nat'd source address in network device context · 0b41713b
      Jason A. Donenfeld authored
      This introduces a helper function to be called only by network drivers
      that wraps calls to icmp[v6]_send in a conntrack transformation, in case
      NAT has been used. We don't want to pollute the non-driver path, though,
      so we introduce this as a helper to be called by places that actually
      make use of this, as suggested by Florian.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b41713b
    • David S. Miller's avatar
      Merge branch 'skip_sw-skip_hw-validation' · 07134cf6
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      add missing validation of 'skip_hw/skip_sw'
      
      ensure that all classifiers currently supporting HW offload
      validate the 'flags' parameter provided by user:
      
      - patch 1/2 fixes cls_matchall
      - patch 2/2 fixes cls_flower
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07134cf6
    • Davide Caratti's avatar
      net/sched: flower: add missing validation of TCA_FLOWER_FLAGS · e2debf08
      Davide Caratti authored
      unlike other classifiers that can be offloaded (i.e. users can set flags
      like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
      netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
      to fl_policy.
      
      Fixes: 5b33f488 ("net/flower: Introduce hardware offload support")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2debf08
    • Davide Caratti's avatar
      net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS · 1afa3cc9
      Davide Caratti authored
      unlike other classifiers that can be offloaded (i.e. users can set flags
      like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
      of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
      entry to mall_policy.
      
      Fixes: b87f7936 ("net/sched: Add match-all classifier hw offloading.")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1afa3cc9
    • Hangbin Liu's avatar
      net/flow_dissector: remove unexist field description · 6ee2deb6
      Hangbin Liu authored
      @thoff has moved to struct flow_dissector_key_control.
      
      Fixes: 42aecaa9 ("net: Get skb hash over flow_keys structure")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ee2deb6
    • Li RongQing's avatar
      page_pool: refill page when alloc.count of pool is zero · 304db6cb
      Li RongQing authored
      "do {} while" in page_pool_refill_alloc_cache will always
      refill page once whether refill is true or false, and whether
      alloc.count of pool is less than PP_ALLOC_CACHE_REFILL or not
      this is wrong, and will cause overflow of pool->alloc.cache
      
      the caller of __page_pool_get_cached should provide guarantee
      that pool->alloc.cache is safe to access, so in_serving_softirq
      should be removed as suggested by Jesper Dangaard Brouer in
      https://patchwork.ozlabs.org/patch/1233713/
      
      so fix this issue by calling page_pool_refill_alloc_cache()
      only when pool->alloc.count is zero
      
      Fixes: 44768dec ("page_pool: handle page recycle for NUMA_NO_NODE condition")
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Suggested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      304db6cb
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 89e960b5
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-02-12
      
      This series contains fixes to only the ice driver.
      
      Dave fixes logic flaws in the DCB rebuild function which is used after a
      reset.  Also fixed a configuration issue when switching between firmware
      and software LLDP mode where the number of TLV's configured was getting
      out of sync with what lldpad thinks is configured.
      
      Paul fixes how the driver displayed all the supported and advertised
      link modes by basing it on the PHY capabilities, and in the process
      cleaned up a lot of code.
      
      Brett fixes duplicate receive tail bumps by comparing the value we are
      writing to tail with the previously written tail value.  Also cleaned up
      workarounds that are no longer needed with the latest NVM images.
      
      Anirudh cleaned up unnecessary CONFIG_PCI_IOV wrappers.  Updated the
      driver to use ice_pf_to_dev() instead of &pf->pdev->dev or
      &vsi->back->pdev->dev.  Cleaned up the string format in print function
      calls to remove newlines where applicable.
      
      Akeem updates the link message logging to include "Full Duplex" and
      "Negotiated", to help distinguish from "Requested" for FEC.
      
      Bruce fixes and consolidates the logging of firmware/NVM information
      during driver load, since the information is duplicate of what is
      available via ethtool.  Fixed the checking of the Unit Load Status bits
      after reset to ensure they are 0x7FF before continuing, by updating the
      mask.  Cleanup up possible NULL dereferences that were created by a
      previous commit.
      
      Ben fixes the driver to use the correct netif_msg_tx/rx_error() to
      determine whether to print the MDD event type.
      
      Tony provides several trivial fixes, which include whitespace, typos,
      function header comments, reverse Christmas tree issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89e960b5
  2. 12 Feb, 2020 28 commits