1. 28 Jun, 2021 33 commits
    • David S. Miller's avatar
      Merge branch 'bridge-replay-helpers' · 3095f512
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Cleanup for the bridge replay helpers
      
      This patch series brings some improvements to the logic added to the
      bridge and DSA to handle LAG interfaces sandwiched between a bridge and
      a DSA switch port.
      
              br0
              /  \
             /    \
           bond0  swp2
           /  \
          /    \
        swp0  swp1
      
      In particular, it ensures that the switchdev object additions and
      deletions are well balanced per physical port. This is important for
      future work in the area of offloading local bridge FDB entries to
      hardware in the context of DSA requesting a replay of those entries at
      bridge join time (this will be submitted in a future patch series).
      Due to some difficulty ensuring that the deletion of local FDB entries
      pointing towards the bridge device itself is notified to switchdev in
      time (before the switchdev port disconnects from the bridge), this is
      potentially still not the final form in which the replay helpers will
      exist. I'm thinking about moving from the pull mode (in which DSA
      requests the replay) to a push mode (in which the bridge initiates the
      replay). Nonetheless, these preliminary changes are needed either way.
      
      The patch series also addresses some feedback from Nikolai which is long
      overdue by now (sorry).
      
      Switchdev driver maintainers were deliberately omitted due to the
      trivial nature of the driver changes (just a function prototype).
      
      Changes in v2:
      - fix build issue in patch 4 (function prototype mismatch)
      - move switchdev object unsync to the NETDEV_PRECHANGEUPPER code path
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3095f512
    • Vladimir Oltean's avatar
      net: dsa: replay a deletion of switchdev objects for ports leaving a bridged LAG · 74918945
      Vladimir Oltean authored
      When a DSA switch port leaves a bonding interface that is under a
      bridge, there might be dangling switchdev objects on that port left
      behind, because the bridge is not aware that its lower interface (the
      bond) changed state in any way.
      
      Call the bridge replay helpers with adding=false before changing
      dp->bridge_dev to NULL, because we need to simulate to
      dsa_slave_port_obj_del() that these notifications were emitted by the
      bridge.
      
      We add this hook to the NETDEV_PRECHANGEUPPER event handler, because
      we are calling into switchdev (and the __switchdev_handle_port_obj_del
      fanout helpers expect the upper/lower adjacency lists to still be valid)
      and PRECHANGEUPPER is the last moment in time when they still are.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74918945
    • Vladimir Oltean's avatar
      net: dsa: refactor the prechangeupper sanity checks into a dedicated function · 4ede74e7
      Vladimir Oltean authored
      We need to add more logic to the DSA NETDEV_PRECHANGEUPPER event
      handler, more exactly we need to request an unsync of switchdev objects.
      In order to fit more code, refactor the existing logic into a helper.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ede74e7
    • Vladimir Oltean's avatar
      net: bridge: allow the switchdev replay functions to be called for deletion · 7e8c1858
      Vladimir Oltean authored
      When a switchdev port leaves a LAG that is a bridge port, the switchdev
      objects and port attributes offloaded to that port are not removed:
      
      ip link add br0 type bridge
      ip link add bond0 type bond mode 802.3ad
      ip link set swp0 master bond0
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 nomaster
      
      VLAN 100 will remain installed on swp0 despite it going into standalone
      mode, because as far as the bridge is concerned, nothing ever happened
      to its bridge port.
      
      Let's extend the bridge vlan, fdb and mdb replay functions to take a
      'bool adding' argument, and make DSA and ocelot call the replay
      functions with 'adding' as false from the switchdev unsync path, for the
      switch port that leaves the bridge.
      
      Note that this patch in itself does not salvage anything, because in the
      current pull mode of operation, DSA still needs to call the replay
      helpers with adding=false. This will be done in another patch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e8c1858
    • Vladimir Oltean's avatar
      net: bridge: constify variables in the replay helpers · bdf123b4
      Vladimir Oltean authored
      Some of the arguments and local variables for the newly added switchdev
      replay helpers can be const, so let's make them so.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdf123b4
    • Vladimir Oltean's avatar
      net: bridge: ignore switchdev events for LAG ports which didn't request replay · 0d2cfbd4
      Vladimir Oltean authored
      There is a slight inconvenience in the switchdev replay helpers added
      recently, and this is when:
      
      ip link add br0 type bridge
      ip link add bond0 type bond
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 master bond0
      ip link set swp1 master bond0
      
      Since the underlying driver (currently only DSA) asks for a replay of
      VLANs when swp0 and swp1 join the LAG because it is bridged, what will
      happen is that DSA will try to react twice on the VLAN event for swp0.
      This is not really a huge problem right now, because most drivers accept
      duplicates since the bridge itself does, but it will become a problem
      when we add support for replaying switchdev object deletions.
      
      Let's fix this by adding a blank void *ctx in the replay helpers, which
      will be passed on by the bridge in the switchdev notifications. If the
      context is NULL, everything is the same as before. But if the context is
      populated with a valid pointer, the underlying switchdev driver
      (currently DSA) can use the pointer to 'see through' the bridge port
      (which in the example above is bond0) and 'know' that the event is only
      for a particular physical port offloading that bridge port, and not for
      all of them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d2cfbd4
    • Vladimir Oltean's avatar
      net: switchdev: add a context void pointer to struct switchdev_notifier_info · 69bfac96
      Vladimir Oltean authored
      In the case where the driver asks for a replay of a certain type of
      event (port object or attribute) for a bridge port that is a LAG, it may
      do so because this port has just joined the LAG.
      
      But there might already be other switchdev ports in that LAG, and it is
      preferable that those preexisting switchdev ports do not act upon the
      replayed event.
      
      The solution is to add a context to switchdev events, which is NULL most
      of the time (when the bridge layer initiates the call) but which can be
      set to a value controlled by the switchdev driver when a replay is
      requested. The driver can then check the context to figure out if all
      ports within the LAG should act upon the switchdev event, or just the
      ones that match the context.
      
      We have to modify all switchdev_handle_* helper functions as well as the
      prototypes in the drivers that use these helpers too, because these
      helpers hide the underlying struct switchdev_notifier_info from us and
      there is no way to retrieve the context otherwise.
      
      The context structure will be populated and used in later patches.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69bfac96
    • Vladimir Oltean's avatar
      net: ocelot: delete call to br_fdb_replay · 97558e88
      Vladimir Oltean authored
      Not using this driver, I did not realize it doesn't react to
      SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE notifications, but it implements just
      the bridge bypass operations (.ndo_fdb_{add,del}). So the call to
      br_fdb_replay just produces notifications that are ignored, delete it
      for now.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97558e88
    • Vladimir Oltean's avatar
      net: bridge: include the is_local bit in br_fdb_replay · e887b2df
      Vladimir Oltean authored
      Since commit 2c4eca3e ("net: bridge: switchdev: include local flag
      in FDB notifications"), the bridge emits SWITCHDEV_FDB_ADD_TO_DEVICE
      events with the is_local flag populated (but we ignore it nonetheless).
      
      We would like DSA to start treating this bit, but it is still not
      populated by the replay helper, so add it there too.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e887b2df
    • David S. Miller's avatar
      Merge branch 'bnxt_en-ptp' · a1b05634
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Add hardware PTP timestamping support on 575XX devices
      
      Add PTP RX and TX hardware timestamp support on 575XX devices.  These
      devices use the two-step method to implement the IEEE-1588 timestamping
      support.
      
      v2: Add spinlock to serialize access to the timecounter.
          Use .do_aux_work() for the periodic timer reading and to get the TX
          timestamp from the firmware.
          Propagate error code from ptp_clock_register().
          Make the 64-bit timer access safe on 32-bit CPUs.
          Read PHC using direct register access.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1b05634
    • Michael Chan's avatar
      bnxt_en: Enable hardware PTP support · 93cb62d9
      Michael Chan authored
      Call bnxt_ptp_init() to initialize and register with the clock driver
      to enable PTP support.  Call bnxt_ptp_free() to unregister and clean
      up during shutdown.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93cb62d9
    • Pavan Chebbi's avatar
      bnxt_en: Transmit and retrieve packet timestamps · 83bb623c
      Pavan Chebbi authored
      Setup the TXBD to enable TX timestamp if requested.  At TX packet DMA
      completion, if we requested TX timestamp on that packet, we defer to
      .do_aux_work() to obtain the TX timestamp from the firmware before we
      free the TX SKB.
      
      v2: Use .do_aux_work() to get the TX timestamp from firmware.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83bb623c
    • Pavan Chebbi's avatar
      bnxt_en: Get the RX packet timestamp · 7f5515d1
      Pavan Chebbi authored
      If the RX packet is timestamped by the hardware, the RX completion
      record will contain the lower 32-bit of the timestamp.  This needs
      to be combined with the upper 16-bit of the periodic timestamp that
      we get from the timer.  The previous snapshot in ptp->old_timer is
      used to make sure that the snapshot is not ahead of the RX timestamp
      and we adjust for wrap-around if needed.
      
      v2: Make ptp->old_time read access safe on 32-bit CPUs.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f5515d1
    • Pavan Chebbi's avatar
      bnxt_en: Get the full 48-bit hardware timestamp periodically · 390862f4
      Pavan Chebbi authored
      From the bnxt_timer(), read the 48-bit hardware running clock
      periodically and store it in ptp->current_time.  The previous snapshot
      of the clock will be stored in ptp->old_time.  The old_time snapshot
      will be used in the next patches to compute the RX packet timestamps.
      
      v2: Use .do_aux_work() to read the timer periodically.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      390862f4
    • Michael Chan's avatar
      bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods · 118612d5
      Michael Chan authored
      Add the clock APIs to set/get/adjust the hw clock, and the related
      ioctls and ethtool methods.
      
      v2: Propagate error code from ptp_clock_register().
          Add spinlock to serialize access to the timecounter.  The
          timecounter is accessed in process context and the RX datapath.
          Read the PHC using direct registers.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      118612d5
    • Michael Chan's avatar
      bnxt_en: Get PTP hardware capability from firmware · ae5c42f0
      Michael Chan authored
      Store PTP hardware info in a structure if hardware and firmware support PTP.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae5c42f0
    • Michael Chan's avatar
      bnxt_en: Update firmware interface to 1.10.2.47 · 78eeadb8
      Michael Chan authored
      Adding the PTP related firmware interface is the main change.
      
      There is also a name change for admin_mtu, requiring code fixup.
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78eeadb8
    • David S. Miller's avatar
      Merge branch 'hns3-next' · 2eeae3a5
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add new debugfs commands
      
      This series adds three new debugfs commands for the HNS3 ethernet driver.
      
      change log:
      V1 -> V2:
      1. remove patch "net: hns3: add support for link diagnosis info in debugfs"
         and use ethtool extended link state to implement similar function
         according to Jakub Kicinski's opinion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eeae3a5
    • Jian Shen's avatar
      net: hns3: add support for dumping MAC umv counter in debugfs · d59daf6a
      Jian Shen authored
      This patch adds support of dumping MAC umv counter in debugfs,
      which will be helpful for debugging.
      
      The display style is below:
      $ cat umv_info
      num_alloc_vport  : 2
      max_umv_size     : 256
      wanted_umv_size  : 256
      priv_umv_size    : 85
      share_umv_size   : 86
      vport(0) used_umv_num : 1
      vport(1) used_umv_num : 1
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d59daf6a
    • Jian Shen's avatar
      net: hns3: add support for FD counter in debugfs · 03a92fe8
      Jian Shen authored
      Previously, the flow director counter is not enabled. To improve the
      maintainability for chechking whether flow director hit or not, enable
      flow director counter for each function, and add debugfs query inerface
      to query the counters for each function.
      
      The debugfs command is below:
      cat fd_counter
      func_id    hit_times
      pf         0
      vf0        0
      vf1        0
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03a92fe8
    • David S. Miller's avatar
      Merge branch 'tipc-next' · c948b46a
      David S. Miller authored
      Menglong Dong says:
      
      ====================
      net: tipc: fix FB_MTU eat two pages and do some code cleanup
      
      In the first patch, FB_MTU is redefined to make sure data size will not
      exceed PAGE_SIZE. Besides, I removed the alignment for buf_size in
      tipc_buf_acquire, because skb_alloc_fclone will do the alignment job.
      
      In the second patch, I removed align() in msg.c and replace it with
      ALIGN().
      
      Changes since V5:
      - remove blank line after Fixes in commit log in the first patch
      
      Changes since V4:
      - remove ONE_PAGE_SKB_SZ and replace it with one_page_mtu in the first
        patch.
      - fix some code style problems for the second patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c948b46a
    • Menglong Dong's avatar
      net: tipc: replace align() with ALIGN in msg.c · d4cfb7fe
      Menglong Dong authored
      The function align() which is defined in msg.c is redundant, replace it
      with ALIGN() and introduce a BUF_ALIGN().
      Signed-off-by: default avatarMenglong Dong <dong.menglong@zte.com.cn>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4cfb7fe
    • Menglong Dong's avatar
      net: tipc: fix FB_MTU eat two pages · 0c6de0c9
      Menglong Dong authored
      FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory
      allocation fails, which can avoid unnecessary sending failures.
      
      The value of FB_MTU now is 3744, and the data size will be:
      
        (3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \
          SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))
      
      which is larger than one page(4096), and two pages will be allocated.
      
      To avoid it, replace '3744' with a calculation:
      
        (PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \
          SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
      
      What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size,
      and it's not necessary to make alignment for buf_size in
      tipc_buf_acquire(). So, just remove it.
      
      Fixes: 4c94cc2d ("tipc: fall back to smaller MTU if allocation of local send skb fails")
      Signed-off-by: default avatarMenglong Dong <dong.menglong@zte.com.cn>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c6de0c9
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git · 1b077ce1
      David S. Miller authored
      /klassert/ipsec-next
      
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2021-06-28
      
      1) Remove an unneeded error assignment in esp4_gro_receive().
         From Yang Li.
      
      2) Add a new byseq state hashtable to find acquire states faster.
         From Sabrina Dubroca.
      
      3) Remove some unnecessary variables in pfkey_create().
         From zuoqilin.
      
      4) Remove the unused description from xfrm_type struct.
         From Florian Westphal.
      
      5) Fix a spelling mistake in the comment of xfrm_state_ok().
         From gushengxian.
      
      6) Replace hdr_off indirections by a small helper function.
         From Florian Westphal.
      
      7) Remove xfrm4_output_finish and xfrm6_output_finish declarations,
         they are not used anymore.From Antony Antony.
      
      8) Remove xfrm replay indirections.
         From Florian Westphal.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b077ce1
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-net-next-2021-06-25' of... · 007b312c
      David S. Miller authored
      Merge tag 'mac80211-next-for-net-next-2021-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes berg says:
      
      ====================
      Lots of changes:
       * aggregation handling improvements for some drivers
       * hidden AP discovery on 6 GHz and other HE 6 GHz
         improvements
       * minstrel improvements for no-ack frames
       * deferred rate control for TXQs to improve reaction
         times
       * virtual time-based airtime scheduler
       * along with various little cleanups/fixups
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      007b312c
    • Matthieu Baerts's avatar
      mptcp: fix 'masking a bool' warning · c4512c63
      Matthieu Baerts authored
      Dan Carpenter reported an issue introduced in
      commit fde56eea ("mptcp: refine mptcp_cleanup_rbuf") where a new
      boolean (ack_pending) is masked with 0x9.
      
      This is not the intention to ignore values by using a boolean. This
      variable should not have a 'bool' type: we should keep the 'u8' to allow
      this comparison.
      
      Fixes: fde56eea ("mptcp: refine mptcp_cleanup_rbuf")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4512c63
    • David S. Miller's avatar
      Merge branch 'reset-mac' · 8eb517a2
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      net: reset MAC header consistently across L3 virtual devices
      
      Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode),
      reset the MAC header pointer after they parsed the outer headers. This
      accurately reflects the fact that the decapsulated packet is pure L3
      packet, as that makes the MAC header 0 bytes long (the MAC and network
      header pointers are equal).
      
      However, many L3 devices only adjust the network header after
      decapsulation and leave the MAC header pointer to its original value.
      This can confuse other parts of the networking stack, like TC, which
      then considers the outer headers as one big MAC header.
      
      This patch series makes the following L3 tunnels behave like VXLAN-GPE:
      bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp.
      
      The case of gre is a bit special. It already resets the MAC header
      pointer in collect_md mode, so only the classical mode needs to be
      adjusted. However, gre also has a special case that expects the MAC
      header pointer to keep pointing to the outer header even after
      decapsulation. Therefore, patch 4 keeps an exception for this case.
      
      Ideally, we'd centralise the call to skb_reset_mac_header() in
      ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2),
      sit (patch 3) and gre (patch 4). That's unfortunately not feasible
      currently, because of the gre special case discussed above that
      precludes us from resetting the MAC header unconditionally.
      
      The original motivation is to redirect bareudp packets to Ethernet
      devices (as described in patch 1). The rest of this series aims at
      bringing consistency across all L3 devices (apart from gre's special
      case unfortunately).
      
      Note: the gtp patch results from pure code inspection and has been
      compiled tested only.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8eb517a2
    • Guillaume Nault's avatar
      gtp: reset mac_header after decap · b2d898c8
      Guillaume Nault authored
      For consistency with other L3 tunnel devices, reset the mac_header
      pointer after decapsulation. This makes the mac_header 0 bytes long,
      thus making it clear that this skb has no mac_header.
      
      Compile tested only.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2d898c8
    • Guillaume Nault's avatar
      ip6_tunnel: allow redirecting ip6gre and ipxip6 packets to eth devices · da5a2e49
      Guillaume Nault authored
      Reset the mac_header pointer even when the tunnel transports only L3
      data (in the ARPHRD_ETHER case, this is already done by eth_type_trans).
      This prevents other parts of the stack from mistakenly accessing the
      outer header after the packet has been decapsulated.
      
      In practice, this allows to push an Ethernet header to ipip6, ip6ip6,
      mplsip6 or ip6gre packets and redirect them to an Ethernet device:
      
        $ tc filter add dev ip6tnl0 ingress matchall       \
            action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                                 src_mac 00:00:5e:00:53:00 \
            action mirred egress redirect dev eth0
      
      Without this patch, push_eth refuses to add an ethernet header because
      the skb appears to already have a MAC header.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da5a2e49
    • Guillaume Nault's avatar
      gre: let mac_header point to outer header only when necessary · aab1e898
      Guillaume Nault authored
      Commit e271c7b4 ("gre: do not keep the GRE header around in collect
      medata mode") did reset the mac_header for the collect_md case. Let's
      extend this behaviour to classical gre devices as well.
      
      ipgre_header_parse() seems to be the only case that requires mac_header
      to point to the outer header. We can detect this case accurately by
      checking ->header_ops. For all other cases, we can reset mac_header.
      
      This allows to push an Ethernet header to ipgre packets and redirect
      them to an Ethernet device:
      
        $ tc filter add dev gre0 ingress matchall          \
            action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                                 src_mac 00:00:5e:00:53:00 \
            action mirred egress redirect dev eth0
      
      Before this patch, this worked only for collect_md gre devices.
      Now this works for regular gre devices as well. Only the special case
      of gre devices that use ipgre_header_ops isn't supported.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aab1e898
    • Guillaume Nault's avatar
      sit: allow redirecting ip6ip, ipip and mplsip packets to eth devices · 730eed27
      Guillaume Nault authored
      Even though sit transports L3 data (IPv6, IPv4 or MPLS) packets, it
      needs to reset the mac_header pointer, so that other parts of the stack
      don't mistakenly access the outer header after the packet has been
      decapsulated. There are two rx handlers to modify: ipip6_rcv() for the
      ip6ip mode and sit_tunnel_rcv() which is used to re-implement the ipip
      and mplsip modes of ipip.ko.
      
      This allows to push an Ethernet header to sit packets and redirect
      them to an Ethernet device:
      
        $ tc filter add dev sit0 ingress matchall          \
            action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                                 src_mac 00:00:5e:00:53:00 \
            action mirred egress redirect dev eth0
      
      Without this patch, push_eth refuses to add an ethernet header because
      the skb appears to already have a MAC header.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      730eed27
    • Guillaume Nault's avatar
      ipip: allow redirecting ipip and mplsip packets to eth devices · 7ad136fd
      Guillaume Nault authored
      Even though ipip transports IPv4 or MPLS packets, it needs to reset the
      mac_header pointer, so that other parts of the stack don't mistakenly
      access the outer header after the packet has been decapsulated.
      
      This allows to push an Ethernet header to ipip or mplsip packets and
      redirect them to an Ethernet device:
      
        $ tc filter add dev ipip0 ingress matchall         \
            action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                                 src_mac 00:00:5e:00:53:00 \
            action mirred egress redirect dev eth0
      
      Without this patch, push_eth refuses to add an ethernet header because
      the skb appears to already have a MAC header.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ad136fd
    • Guillaume Nault's avatar
      bareudp: allow redirecting bareudp packets to eth devices · 99c8719b
      Guillaume Nault authored
      Even though bareudp transports L3 data (typically IP or MPLS), it needs
      to reset the mac_header pointer, so that other parts of the stack don't
      mistakenly access the outer header after the packet has been
      decapsulated.
      
      This allows to push an Ethernet header to bareudp packets and redirect
      them to an Ethernet device:
      
        $ tc filter add dev bareudp0 ingress matchall      \
            action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                                 src_mac 00:00:5e:00:53:00 \
            action mirred egress redirect dev eth0
      
      Without this patch, push_eth refuses to add an ethernet header because
      the skb appears to already have a MAC header.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99c8719b
  2. 25 Jun, 2021 7 commits