1. 16 May, 2022 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 1a01a075
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      This is v2 including deadlock fix in conntrack ecache rework
      reported by Jakub Kicinski.
      
      The following patchset contains Netfilter updates for net-next,
      mostly updates to conntrack from Florian Westphal.
      
      1) Add a dedicated list for conntrack event redelivery.
      
      2) Include event redelivery list in conntrack dumps of dying type.
      
      3) Remove per-cpu dying list for event redelivery, not used anymore.
      
      4) Add netns .pre_exit to cttimeout to zap timeout objects before
         synchronize_rcu() call.
      
      5) Remove nf_ct_unconfirmed_destroy.
      
      6) Add generation id for conntrack extensions for conntrack
         timeout and helpers.
      
      7) Detach timeout policy from conntrack on cttimeout module removal.
      
      8) Remove __nf_ct_unconfirmed_destroy.
      
      9) Remove unconfirmed list.
      
      10) Remove unconditional local_bh_disable in init_conntrack().
      
      11) Consolidate conntrack iterator nf_ct_iterate_cleanup().
      
      12) Detect if ctnetlink listeners exist to short-circuit event
          path early.
      
      13) Un-inline nf_ct_ecache_ext_add().
      
      14) Add nf_conntrack_events autodetect ctnetlink listener mode
          and make it default.
      
      15) Add nf_ct_ecache_exist() to check for event cache extension.
      
      16) Extend flowtable reverse route lookup to include source, iif,
          tos and mark, from Sven Auhagen.
      
      17) Do not verify zero checksum UDP packets in nf_reject,
          from Kevin Mitchell.
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a01a075
  2. 14 May, 2022 3 commits
  3. 13 May, 2022 36 commits
    • Jakub Kicinski's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 2c5f1536
      Jakub Kicinski authored
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2022-05-13
      
      1) Cleanups for the code behind the XFRM offload API. This is a
         preparation for the extension of the API for policy offload.
         From Leon Romanovsky.
      
      * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
        xfrm: drop not needed flags variable in XFRM offload struct
        net/mlx5e: Use XFRM state direction instead of flags
        netdevsim: rely on XFRM state direction instead of flags
        ixgbe: propagate XFRM offload state direction instead of flags
        xfrm: store and rely on direction to construct offload flags
        xfrm: rename xfrm_state_offload struct to allow reuse
        xfrm: delete not used number of external headers
        xfrm: free not used XFRM_ESP_NO_TRAILER flag
      ====================
      
      Link: https://lore.kernel.org/r/20220513151218.4010119-1-steffen.klassert@secunet.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2c5f1536
    • Ren Zhijie's avatar
      sfc: siena: Fix Kconfig dependencies · f9a210c7
      Ren Zhijie authored
      If CONFIG_PTP_1588_CLOCK=m and CONFIG_SFC_SIENA=y, the siena driver will fail to link:
      
      drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_remove_channel':
      ptp.c:(.text+0xa28): undefined reference to `ptp_clock_unregister'
      drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_probe_channel':
      ptp.c:(.text+0x13a0): undefined reference to `ptp_clock_register'
      ptp.c:(.text+0x1470): undefined reference to `ptp_clock_unregister'
      drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_ptp_pps_worker':
      ptp.c:(.text+0x1d29): undefined reference to `ptp_clock_event'
      drivers/net/ethernet/sfc/siena/ptp.o: In function `efx_siena_ptp_get_ts_info':
      ptp.c:(.text+0x301b): undefined reference to `ptp_clock_index'
      
      To fix this build error, make SFC_SIENA depends on PTP_1588_CLOCK.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: d48523cb ("sfc: Copy shared files needed for Siena (part 2)")
      Signed-off-by: default avatarRen Zhijie <renzhijie2@huawei.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20220513012721.140871-1-renzhijie2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9a210c7
    • Kevin Mitchell's avatar
      netfilter: conntrack: skip verification of zero UDP checksum · 4f9bd530
      Kevin Mitchell authored
      The checksum is optional for UDP packets. However nf_reject would
      previously require a valid checksum to elicit a response such as
      ICMP_DEST_UNREACH.
      
      Add some logic to nf_reject_verify_csum to determine if a UDP packet has
      a zero checksum and should therefore not be verified.
      Signed-off-by: default avatarKevin Mitchell <kevmitch@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4f9bd530
    • Sven Auhagen's avatar
      netfilter: flowtable: nft_flow_route use more data for reverse route · 3412e164
      Sven Auhagen authored
      When creating a flow table entry, the reverse route is looked
      up based on the current packet.
      There can be scenarios where the user creates a custom ip rule
      to route the traffic differently.
      In order to support those scenarios, the lookup needs to add
      more information based on the current packet.
      The patch adds multiple new information to the route lookup.
      Signed-off-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3412e164
    • Florian Westphal's avatar
      netfilter: prefer extension check to pointer check · 8edc8131
      Florian Westphal authored
      The pointer check usually results in a 'false positive': its likely
      that the ctnetlink module is loaded but no event monitoring is enabled.
      
      After recent change to autodetect ctnetlink usage and only allocate
      the ecache extension if a listener is active, check if the extension
      is present on a given conntrack.
      
      If its not there, there is nothing to report and calls to the
      notification framework can be elided.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8edc8131
    • Florian Westphal's avatar
      netfilter: conntrack: add nf_conntrack_events autodetect mode · 90d1daa4
      Florian Westphal authored
      This adds the new nf_conntrack_events=2 mode and makes it the
      default.
      
      This leverages the earlier flag in struct net to allow to avoid
      the event extension as long as no event listener is active in
      the namespace.
      
      This avoids, for most cases, allocation of ct->ext area.
      A followup patch will take further advantage of this by avoiding
      calls down into the event framework if the extension isn't present.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      90d1daa4
    • Florian Westphal's avatar
      netfilter: conntrack: un-inline nf_ct_ecache_ext_add · b0a7ab4a
      Florian Westphal authored
      Only called when new ct is allocated or the extension isn't present.
      This function will be extended, place this in the conntrack module
      instead of inlining.
      
      The callers already depend on nf_conntrack module.
      Return value is changed to bool, noone used the returned pointer.
      
      Make sure that the core drops the newly allocated conntrack
      if the extension is requested but can't be added.
      This makes it necessary to ifdef the section, as the stub
      always returns false we'd drop every new conntrack if the
      the ecache extension is disabled in kconfig.
      
      Add from data path (xt_CT, nft_ct) is unchanged.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b0a7ab4a
    • Florian Westphal's avatar
      netfilter: nfnetlink: allow to detect if ctnetlink listeners exist · 2794cdb0
      Florian Westphal authored
      At this time, every new conntrack gets the 'event cache extension'
      enabled for it.
      
      This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults
      to 1.
      
      Changing the default to 0 means that commands that rely on the event
      notification extension, e.g. 'conntrack -E' or conntrackd, stop working.
      
      We COULD detect if there is a listener by means of
      'nfnetlink_has_listeners()' and only add the extension if this is true.
      
      The downside is a dependency from conntrack module to nfnetlink module.
      
      This adds a different way: inc/dec a counter whenever a ctnetlink group
      is being (un)subscribed and toggle a flag in struct net.
      
      Next patches will take advantage of this and will only add the event
      extension if the flag is set.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2794cdb0
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*() · 8169ff58
      Pablo Neira Ayuso authored
      This patch adds a structure to collect all the context data that is
      passed to the cleanup iterator.
      
       struct nf_ct_iter_data {
             struct net *net;
             void *data;
             u32 portid;
             int report;
       };
      
      There is a netns field that allows to clean up conntrack entries
      specifically owned by the specified netns.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8169ff58
    • Florian Westphal's avatar
      netfilter: conntrack: avoid unconditional local_bh_disable · 0bcfbafb
      Florian Westphal authored
      Now that the conntrack entry isn't placed on the pcpu list anymore the
      bh only needs to be disabled in the 'expectation present' case.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0bcfbafb
    • Florian Westphal's avatar
      netfilter: conntrack: remove unconfirmed list · 8a75a2c1
      Florian Westphal authored
      It has no function anymore and can be removed.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a75a2c1
    • Florian Westphal's avatar
      netfilter: conntrack: remove __nf_ct_unconfirmed_destroy · ace53fdc
      Florian Westphal authored
      Its not needed anymore:
      
      A. If entry is totally new, then the rcu-protected resource
      must already have been removed from global visibility before call
      to nf_ct_iterate_destroy.
      
      B. If entry was allocated before, but is not yet in the hash table
         (uncofirmed case), genid gets incremented and synchronize_rcu() call
         makes sure access has completed.
      
      C. Next attempt to peek at extension area will fail for unconfirmed
        conntracks, because ext->genid != genid.
      
      D. Conntracks in the hash are iterated as before.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ace53fdc
    • Florian Westphal's avatar
      netfilter: cttimeout: decouple unlink and free on netns destruction · 42df4fb9
      Florian Westphal authored
      Increment the extid on module removal; this makes sure that even
      in extreme cases any old uncofirmed entry that happened to be kept
      e.g. on nfnetlink_queue list will not trip over a stale timeout
      reference.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      42df4fb9
    • Florian Westphal's avatar
      netfilter: extensions: introduce extension genid count · c56716c6
      Florian Westphal authored
      Multiple netfilter extensions store pointers to external data
      in their extension area struct.
      
      Examples:
      1. Timeout policies
      2. Connection tracking helpers.
      
      No references are taken for these.
      
      When a helper or timeout policy is removed, the conntrack table gets
      traversed and affected extensions are cleared.
      
      Conntrack entries not yet in the hashtable are referenced via a special
      list, the unconfirmed list.
      
      On removal of a policy or connection tracking helper, the unconfirmed
      list gets traversed an all entries are marked as dying, this prevents
      them from getting committed to the table at insertion time: core checks
      for dying bit, if set, the conntrack entry gets destroyed at confirm
      time.
      
      The disadvantage is that each new conntrack has to be added to the percpu
      unconfirmed list, and each insertion needs to remove it from this list.
      The list is only ever needed when a policy or helper is removed -- a rare
      occurrence.
      
      Add a generation ID count: Instead of adding to the list and then
      traversing that list on policy/helper removal, increment a counter
      that is stored in the extension area.
      
      For unconfirmed conntracks, the extension has the genid valid at ct
      allocation time.
      
      Removal of a helper/policy etc. increments the counter.
      At confirmation time, validate that ext->genid == global_id.
      
      If the stored number is not the same, do not allow the conntrack
      insertion, just like as if a confirmed-list traversal would have flagged
      the entry as dying.
      
      After insertion, the genid is no longer relevant (conntrack entries
      are now reachable via the conntrack table iterators and is set to 0.
      
      This allows removal of the percpu unconfirmed list.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c56716c6
    • Florian Westphal's avatar
      netfilter: remove nf_ct_unconfirmed_destroy helper · 17438b42
      Florian Westphal authored
      This helper tags connections not yet in the conntrack table as
      dying.  These nf_conn entries will be dropped instead when the
      core attempts to insert them from the input or postrouting
      'confirm' hook.
      
      After the previous change, the entries get unlinked from the
      list earlier, so that by the time the actual exit hook runs,
      new connections no longer have a timeout policy assigned.
      
      Its enough to walk the hashtable instead.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      17438b42
    • Florian Westphal's avatar
      netfilter: cttimeout: decouple unlink and free on netns destruction · 78222bac
      Florian Westphal authored
      Make it so netns pre_exit unlinks the objects from the pernet list, so
      they cannot be found anymore.
      
      netns core issues a synchronize_rcu() before calling the exit hooks so
      any the time the exit hooks run unconfirmed nf_conn entries have been
      free'd or they have been committed to the hashtable.
      
      The exit hook still tags unconfirmed entries as dying, this can
      now be removed in a followup change.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      78222bac
    • Florian Westphal's avatar
      netfilter: conntrack: remove the percpu dying list · 1397af5b
      Florian Westphal authored
      Its no longer needed. Entries that need event redelivery are placed
      on the new pernet dying list.
      
      The advantage is that there is no need to take additional spinlock on
      conntrack removal unless event redelivery failed or the conntrack entry
      was never added to the table in the first place (confirmed bit not set).
      
      The IPS_CONFIRMED bit now needs to be set as soon as the entry has been
      unlinked from the unconfirmed list, else the destroy function may
      attempt to unlink it a second time.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1397af5b
    • Florian Westphal's avatar
      netfilter: conntrack: include ecache dying list in dumps · 0d3cc504
      Florian Westphal authored
      The new pernet dying list includes conntrack entries that await
      delivery of the 'destroy' event via ctnetlink.
      
      The old percpu dying list will be removed soon.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d3cc504
    • Florian Westphal's avatar
      netfilter: ecache: use dedicated list for event redelivery · 2ed3bf18
      Florian Westphal authored
      This disentangles event redelivery and the percpu dying list.
      
      Because entries are now stored on a dedicated list, all
      entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries
      still have confirmed bit set -- the reference count is at least 1.
      
      The 'struct net' back-pointer can be removed as well.
      
      The pcpu dying list will be removed eventually, it has no functionality.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2ed3bf18
    • David S. Miller's avatar
      Merge branch 'bnxt_en-next' · a65cc843
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Updates for net-next
      
      This small patchset updates the firmware interface, adds timestamping
      support for all receive packets, and adds revised NVRAM package error
      messages for ethtool and devlink.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a65cc843
    • Kalesh AP's avatar
      bnxt_en: parse and report result field when NVRAM package install fails · ab0bed4b
      Kalesh AP authored
      Instead of always returning -ENOPKG, decode the firmware error
      code further when the HWRM_NVM_INSTALL_UPDATE firmware call fails.
      Return a more suitable error code to userspace and log an error
      in dmesg.
      
      This is version 2 of the earlier patch that was reverted:
      
      02acd399 ("bnxt_en: parse result field when NVRAM package install fails")
      
      In this new version, if the call is made through devlink instead of
      ethtool, we'll also set the error message in extack.
      
      Link: https://lore.kernel.org/netdev/20220307141358.4d52462e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab0bed4b
    • Pavan Chebbi's avatar
      bnxt_en: Enable packet timestamping for all RX packets · 66ed81dc
      Pavan Chebbi authored
      Add driver support to enable timestamping on all RX packets
      that are received by the NIC. This capability can be requested
      by the applications using SIOCSHWTSTAMP ioctl with filter type
      HWTSTAMP_FILTER_ALL.
      
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66ed81dc
    • Pavan Chebbi's avatar
      bnxt_en: Configure ptp filters during bnxt open · 11862689
      Pavan Chebbi authored
      For correctness, we need to configure the packet filters for timestamping
      during bnxt_open.  This way they are always configured after firmware
      reset or chip reset.  We should not assume that the filters will always
      be retained across resets.
      
      This patch modifies the ioctl handler and always configures the PTP
      filters in the bnxt_open() path.
      
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11862689
    • Michael Chan's avatar
      bnxt_en: Update firmware interface to 1.10.2.95 · ad04cc05
      Michael Chan authored
      The main changes are timestamp support for all RX packets and new PCIe
      statistics.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad04cc05
    • Robert Hancock's avatar
      net: axienet: Use NAPI for TX completion path · 9e2bc267
      Robert Hancock authored
      This driver was using the TX IRQ handler to perform all TX completion
      tasks. Under heavy TX network load, this can cause significant irqs-off
      latencies (found to be in the hundreds of microseconds using ftrace).
      This can cause other issues, such as overrunning serial UART FIFOs when
      using high baud rates with limited UART FIFO sizes.
      
      Switch to using a NAPI poll handler to perform the TX completion work
      to get this out of hard IRQ context and avoid the IRQ latency impact.
      A separate poll handler is used for TX and RX since they have separate
      IRQs on this controller, so that the completion work for each of them
      stays on the same CPU as the interrupt.
      
      Testing on a Xilinx MPSoC ZU9EG platform using iperf3 from a Linux PC
      through a switch at 1G link speed showed no significant change in TX or
      RX throughput, with approximately 941 Mbps before and after. Hard IRQ
      time in the TX throughput test was significantly reduced from 12% to
      below 1% on the CPU handling TX interrupts, with total hard+soft IRQ CPU
      usage dropping from about 56% down to 48%.
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e2bc267
    • Robert Hancock's avatar
      net: axienet: Be more careful about updating tx_bd_tail · f0cf4000
      Robert Hancock authored
      The axienet_start_xmit function was updating the tx_bd_tail variable
      multiple times, with potential rollbacks on error or invalid
      intermediate positions, even though this variable is also used in the
      TX completion path. Use READ_ONCE where this variable is read and
      WRITE_ONCE where it is written to make this update more atomic, and
      move the write before the MMIO write to start the transfer, so it is
      protected by that implicit write barrier.
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0cf4000
    • Eric Dumazet's avatar
      inet: add READ_ONCE(sk->sk_bound_dev_if) in INET_MATCH() · 4915d50e
      Eric Dumazet authored
      INET_MATCH() runs without holding a lock on the socket.
      
      We probably need to annotate most reads.
      
      This patch makes INET_MATCH() an inline function
      to ease our changes.
      
      v2:
      
      We remove the 32bit version of it, as modern compilers
      should generate the same code really, no need to
      try to be smarter.
      
      Also make 'struct net *net' the first argument.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4915d50e
    • Amit Cohen's avatar
      selftests: fib_nexthops: Make the test more robust · 49bb39bd
      Amit Cohen authored
      Rarely some of the test cases fail. Make the test more robust by increasing
      the timeout of ping commands to 5 seconds.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49bb39bd
    • David S. Miller's avatar
      Merge branch 'lan95xx-no-polling' · b7da9c6b
      David S. Miller authored
      Lukas Wunner says:
      
      ====================
      Polling be gone on LAN95xx
      
      Do away with link status polling on LAN95xx USB Ethernet
      and rely on interrupts instead, thereby reducing bus traffic,
      CPU overhead and improving interface bringup latency.
      
      Link to v2:
      https://lore.kernel.org/netdev/cover.1651574194.git.lukas@wunner.de/
      
      Only change since v2:
      
      * Patch [5/7]:
        * Drop call to __irq_enter_raw() which worked around a warning in
          generic_handle_domain_irq().  That warning is gone since
          792ea6a0 (queued on tip.git/irq/urgent).
          (Marc Zyngier, Thomas Gleixner)
      ====================
      b7da9c6b
    • Lukas Wunner's avatar
      net: phy: smsc: Cope with hot-removal in interrupt handler · 1e7b81ed
      Lukas Wunner authored
      If reading the Interrupt Source Flag register fails with -ENODEV, then
      the PHY has been hot-removed and the correct response is to bail out
      instead of throwing a WARN splat and attempting to suspend the PHY.
      The PHY should be stopped in due course anyway as the kernel
      asynchronously tears down the device.
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e7b81ed
    • Lukas Wunner's avatar
      net: phy: smsc: Cache interrupt mask · 7e8b617e
      Lukas Wunner authored
      Cache the interrupt mask to avoid re-reading it from the PHY upon every
      interrupt.
      
      This will simplify a subsequent commit which detects hot-removal in the
      interrupt handler and bails out.
      
      Analyzing and debugging PHY transactions also becomes simpler if such
      redundant reads are avoided.
      
      Last not least, interrupt overhead and latency is slightly improved.
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e8b617e
    • Lukas Wunner's avatar
      usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling · 1ce8b372
      Lukas Wunner authored
      Link status of SMSC LAN95xx chips is polled once per second, even though
      they're capable of signaling PHY interrupts through the MAC layer.
      
      Forward those interrupts to the PHY driver to avoid polling.  Benefits
      are reduced bus traffic, reduced CPU overhead and quicker interface
      bringup.
      
      Polling was introduced in 2016 by commit d69d1694 ("usbnet:
      smsc95xx: fix link detection for disabled autonegotiation").
      Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
      hence couldn't detect link-up events when auto-negotiation was disabled.
      The proper solution would have been to enable the ENERGYON interrupt
      instead of polling.
      
      Since then, PHY handling was moved from the LAN95xx driver to the SMSC
      PHY driver with commit 05b35e7e ("smsc95xx: add phylib support").
      That PHY driver is capable of link detection with auto-negotiation
      disabled because it enables the ENERGYON interrupt.
      
      Note that signaling interrupts through the MAC layer not only works with
      the integrated PHY, but also with an external PHY, provided its
      interrupt pin is attached to LAN95xx's nPHY_INT pin.
      
      In the unlikely event that the interrupt pin of an external PHY is
      attached to a GPIO of the SoC (or not connected at all), the driver can
      be amended to retrieve the irq from the PHY's of_node.
      
      To forward PHY interrupts to phylib, it is not sufficient to call
      phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
      so that PHY interrupts are cleared.  That's because according to page
      119 of the LAN950x datasheet, "The source of this interrupt is a level.
      The interrupt persists until it is cleared in the PHY."
      
      https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
      
      Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
      future, the IRQ domain may be extended to support the 11 GPIOs on the
      LAN95xx.
      
      Normally the PHY interrupt should be masked until the PHY driver has
      cleared it.  However masking requires a (sleeping) USB transaction and
      interrupts are received in (non-sleepable) softirq context.  I decided
      not to mask the interrupt at all (by using the dummy_irq_chip's noop
      ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
      intervals and normally that's sufficient to wake the PHY driver's IRQ
      thread and have it clear the interrupt.  If it does take longer, worst
      thing that can happen is the IRQ thread is woken again.  No big deal.
      
      Because PHY interrupts are now perpetually enabled, there's no need to
      selectively enable them on suspend.  So remove all invocations of
      smsc95xx_enable_phy_wakeup_interrupts().
      
      In smsc95xx_resume(), move the call of phy_init_hw() before
      usbnet_resume() (which restarts the status URB) to ensure that the PHY
      is fully initialized when an interrupt is handled.
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
      Cc: Andre Edich <andre.edich@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ce8b372
    • Lukas Wunner's avatar
      usbnet: smsc95xx: Avoid link settings race on interrupt reception · 8960f878
      Lukas Wunner authored
      When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the
      MAC full duplex mode and PHY flow control registers based on cached data
      in struct phy_device:
      
        smsc95xx_status()                 # raises EVENT_LINK_RESET
          usbnet_deferred_kevent()
            smsc95xx_link_reset()         # uses cached data in phydev
      
      Simultaneously, phylib polls link status once per second and updates
      that cached data:
      
        phy_state_machine()
          phy_check_link_status()
            phy_read_status()
              lan87xx_read_status()
                genphy_read_status()      # updates cached data in phydev
      
      If smsc95xx_link_reset() wins the race against genphy_read_status(),
      the registers may be updated based on stale data.
      
      E.g. if the link was previously down, phydev->duplex is set to
      DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even
      though genphy_read_status() may update it to DUPLEX_FULL afterwards.
      
      PHY interrupts are currently only enabled on suspend to trigger wakeup,
      so the impact of the race is limited, but we're about to enable them
      perpetually.
      
      Avoid the race by delaying execution of smsc95xx_link_reset() until
      phy_state_machine() has done its job and calls back via
      smsc95xx_handle_link_change().
      
      Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib
      picks up link status changes through polling.  So drop the declaration
      of a ->link_reset() callback.
      
      Note that the semicolon on a line by itself added in smsc95xx_status()
      is a placeholder for a function call which will be added in a subsequent
      commit.  That function call will actually handle the INT_ENP_PHY_INT_
      interrupt.
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8960f878
    • Lukas Wunner's avatar
      usbnet: smsc95xx: Don't reset PHY behind PHY driver's back · 14021da6
      Lukas Wunner authored
      smsc95xx_reset() resets the PHY behind the PHY driver's back, which
      seems like a bad idea generally.  Remove that portion of the function.
      
      We're about to use PHY interrupts instead of polling to detect link
      changes on SMSC LAN95xx chips.  Because smsc95xx_reset() is called from
      usbnet_open(), PHY interrupt settings are lost whenever the net_device
      is brought up.
      
      There are two other callers of smsc95xx_reset(), namely smsc95xx_bind()
      and smsc95xx_reset_resume(), and both may indeed benefit from a PHY
      reset.  However they already perform one through their calls to
      phy_connect_direct() and phy_init_hw().
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Martyn Welch <martyn.welch@collabora.com>
      Cc: Gabriel Hojda <ghojda@yo2urs.ro>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14021da6
    • Lukas Wunner's avatar
      usbnet: smsc95xx: Don't clear read-only PHY interrupt · 3108871f
      Lukas Wunner authored
      Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver
      attempts to clear the signaled interrupts by writing "all ones" to the
      Interrupt Status Register.
      
      However the driver only ever enables a single type of interrupt, namely
      the PHY Interrupt.  And according to page 119 of the LAN950x datasheet,
      its bit in the Interrupt Status Register is read-only.  There's no other
      way to clear it than in a separate PHY register:
      
      https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
      
      Consequently, writing "all ones" to the Interrupt Status Register is
      pointless and can be dropped.
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3108871f
    • Lukas Wunner's avatar
      usbnet: Run unregister_netdev() before unbind() again · d1408f6b
      Lukas Wunner authored
      Commit 2c9d6c2b ("usbnet: run unbind() before unregister_netdev()")
      sought to fix a use-after-free on disconnect of USB Ethernet adapters.
      
      It turns out that a different fix is necessary to address the issue:
      https://lore.kernel.org/netdev/18b3541e5372bc9b9fc733d422f4e698c089077c.1650177997.git.lukas@wunner.de/
      
      So the commit was not necessary.
      
      The commit made binding and unbinding of USB Ethernet asymmetrical:
      Before, usbnet_probe() first invoked the ->bind() callback and then
      register_netdev().  usbnet_disconnect() mirrored that by first invoking
      unregister_netdev() and then ->unbind().
      
      Since the commit, the order in usbnet_disconnect() is reversed and no
      longer mirrors usbnet_probe().
      
      One consequence is that a PHY disconnected (and stopped) in ->unbind()
      is afterwards stopped once more by unregister_netdev() as it closes the
      netdev before unregistering.  That necessitates a contortion in ->stop()
      because the PHY may only be stopped if it hasn't already been
      disconnected.
      
      Reverting the commit allows making the call to phy_stop() unconditional
      in ->stop().
      
      Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
      Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Cc: Martyn Welch <martyn.welch@collabora.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1408f6b