1. 11 Jun, 2021 40 commits
    • David S. Miller's avatar
      Merge branch 'ipa-sysfs' · 1f1aa3fe
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: introduce ipa_syfs.c
      
      This series (its last patch, actually) creates a new source file,
      "ipa_syfs.c", to contain functions and data that expose to user
      space information known by the IPA driver via device attributes.
      
      The directory containing these files on supported systems is:
          /sys/devices/platform/soc@0/1e40000.ipa
      
      And within that direcftory, the following files and directories
      are added:
          .
          |-- feature
          |   |-- rx_offload          Type of checksum offload supported
          |   `-- tx_offload
          |   . . .
          |-- modem
          |   |-- rx_endpoint_id      IPA endpoint IDs for the embedded modem
          |   `-- tx_endpoint_id
          |   . . .
          |-- version                 IPA hardware version (informational)
              . . .
      
      The first patch just makes endpoint validation unconditional, as
      suggested by Leon Romanovsky.  The second just ensures the version
      defined in configuration data is valid, so the version attribute
      doesn't have to handle unrecognized version numbers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f1aa3fe
    • Alex Elder's avatar
      net: ipa: introduce sysfs code · 2e3cf97f
      Alex Elder authored
      Add IPA device attributes to expose information known by the IPA
      driver about the hardware and its configuration.
      
      All pointers used to display these attribute values (i.e., IPA
      pointer and endpoint pointers) will have been initialized by the
      time IPA probe has completed, so they may be safely dereferenced.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e3cf97f
    • Alex Elder's avatar
      net: ipa: introduce ipa_version_valid() · e22e8e2f
      Alex Elder authored
      Define and use a new function that just validates the version
      defined in configuration data.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22e8e2f
    • Alex Elder's avatar
      net: ipa: make endpoint data validation unconditional · 9e8fb7bf
      Alex Elder authored
      The cost of validating the endpoint configuration data is not all
      that high, so just do it unconditionally, rather than doing so only
      when IPA_VALIDATAION is defined.
      Suggested-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e8fb7bf
    • Lijun Pan's avatar
      ibmvnic: fix kernel build warning in strncpy · 0b217d3d
      Lijun Pan authored
      drivers/net/ethernet/ibm/ibmvnic.c: In function ‘handle_vpd_rsp’:
      drivers/net/ethernet/ibm/ibmvnic.c:4393:3: warning: ‘strncpy’ output truncated before terminating nul copying 3 bytes from a string of the same length [-Wstringop-truncation]
       4393 |   strncpy((char *)adapter->fw_version, "N/A", 3 * sizeof(char));
            |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarLijun Pan <lijunp213@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b217d3d
    • David S. Miller's avatar
      Merge branch 'sja1105-xpcs' · 2227ec7b
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Port the SJA1105 DSA driver to XPCS
      
      As requested when adding support for the NXP SJA1110, the SJA1105 driver
      could make use of the common XPCS driver, to eliminate some hardware
      specific code duplication.
      
      This series modifies the XPCS driver so that it can accommodate the XPCS
      instantiation from NXP switches, and the SJA1105 driver so it can expose
      what the XPCS driver expects.
      
      Tested on NXP SJA1105S and SJA1110A.
      
      Changes in v3:
      None. This is a resend of v2 which had "changes requested" even though
      there was no direct feedback.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2227ec7b
    • Vladimir Oltean's avatar
      net: dsa: sja1105: plug in support for 2500base-x · 56b63466
      Vladimir Oltean authored
      The MAC treats 2500base-x same as SGMII (yay for that) except that it
      must be set to a different speed.
      
      Extend all places that check for SGMII to also check for 2500base-x.
      
      Also add the missing 2500base-x compatibility matrix entry for SJA1110D.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56b63466
    • Vladimir Oltean's avatar
      net: dsa: sja1105: SGMII and 2500base-x on the SJA1110 are 'special' · ece578bc
      Vladimir Oltean authored
      For the xMII Mode Parameters Table to be properly configured for SGMII
      mode on SJA1110, we need to set the "special" bit, since SGMII is
      officially bitwise coded as 0b0011 in SJA1105 (decimal 3, equal to
      XMII_MODE_SGMII), and as 0b1011 in SJA1110 (decimal 11).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ece578bc
    • Vladimir Oltean's avatar
      net: dsa: sja1105: register the PCS MDIO bus for SJA1110 · 27871359
      Vladimir Oltean authored
      On the SJA1110, the PCS of each SERDES-capable port is accessed through
      a different memory window which is 0x100 bytes in size, denoted by
      "pcs_base".
      
      In each PCS register access window, the XPCS MMDs are accessed in an
      indirect way: in pages/banks of up to 0x100 addresses each. Changing the
      page/bank is done by writing to a special register at the end of the
      access window.
      
      The MDIO register map accessed indirectly through the indirect banked
      method described above is similar to what SJA1105 has: upper 5 bits are
      the MMD, lower 16 bits are the MDIO address within that MMD.
      
      Since the PHY ID reported by the XPCS inside SJA1110 is also all zeroes
      (like SJA1105), we need to trap those reads and return a fake PHY ID so
      that the xpcs driver can apply some specific fixups for our integration.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27871359
    • Vladimir Oltean's avatar
      net: dsa: sja1105: migrate to xpcs for SGMII · 3ad1d171
      Vladimir Oltean authored
      There is a desire to use the generic driver for the Synopsys XPCS
      located in drivers/net/pcs, and to achieve that, the sja1105 driver must
      expose an MDIO bus for the SGMII PCS, because the XPCS probes as an
      mdio_device.
      
      In preparation of the SJA1110 which in fact has a different access
      procedure for the SJA1105, we register this PCS MDIO bus once in the
      common code, but we implement function pointers for the read and write
      methods. In this patch there is a single implementation for them.
      
      There is exactly one MDIO bus for the PCS, this will contain all PCSes
      at MDIO addresses equal to the port number.
      
      We delete a bunch of hardware support code because the xpcs driver
      already does what we need.
      
      We need to hack up the MDIO reads for the PHY ID, since our XPCS
      instantiation returns zeroes and there are some specific fixups which
      need to be applied by the xpcs driver.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ad1d171
    • Vladimir Oltean's avatar
      net: pcs: xpcs: export xpcs_do_config and xpcs_link_up · a853c68e
      Vladimir Oltean authored
      The sja1105 hardware has a quirk in that some changes require a switch
      reset, which loses all configuration. When the reset is initiated,
      everything needs to be reprogrammed, including the MACs and the PCS.
      This is currently done in sja1105_static_config_reload() - we manually
      call sja1105_adjust_port_config(), sja1105_sgmii_pcs_config() and
      sja1105_sgmii_pcs_force_speed() which are all internal functions.
      
      There is a desire for sja1105 to use the common xpcs driver, and that
      means that the equivalents of those functions, xpcs_do_config() and
      xpcs_link_up() respectively, will no longer be local functions.
      
      Forcing phylink to retrigger a resolve somehow, say by doing dev_close()
      followed by dev_open() is not really an option, because the CPU port
      might have a PCS as well, and there is no net device which we can close
      and reopen for that. Additionally, the dev_close/dev_open sequence might
      force a renegotiation of the copper-side link for SGMII ports connected
      to a PHY, and this is undesirable as well, because the switch reset is
      much quicker than a PHY autoneg, so we would have a lot more downtime.
      
      The only solution I see is for the sja1105 driver to keep doing what
      it's doing, and that means we need to export the equivalents from xpcs
      for sja1105_sgmii_pcs_config and sja1105_sgmii_pcs_force_speed, and call
      them directly in sja1105_static_config_reload(). This will be done
      during the conversion patch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a853c68e
    • Vladimir Oltean's avatar
      net: pcs: xpcs: add support for NXP SJA1110 · f7380bba
      Vladimir Oltean authored
      The NXP SJA1110 switch integrates its own, non-Synopsys PMA, but it
      manages it through the register space of the XPCS itself, in a small
      register window inside MDIO_MMD_VEND2 from address 0x8030 to 0x806e.
      
      This coincides with where the registers for the default Synopsys PMA
      are, but the register definitions are of course not the same.
      
      This situation is an odd hardware quirk, but the simplest way to manage
      it is to drive the SJA1110's PMA from within the XPCS driver.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7380bba
    • Vladimir Oltean's avatar
      net: pcs: xpcs: add support for NXP SJA1105 · dd0721ea
      Vladimir Oltean authored
      The NXP SJA1105 DSA switch integrates a Synopsys SGMII XPCS on port 4.
      The generic code works fine, except there is an integration issue which
      needs to be dealt with: in this switch, the XPCS is integrated with a
      PMA that has the TX lane polarity inverted by default (PLUS is MINUS,
      MINUS is PLUS).
      
      To obtain normal non-inverted behavior, the TX lane polarity must be
      inverted in the PCS, via the DIGITAL_CONTROL_2 register.
      
      We introduce a pma_config() method in xpcs_compat which is called by the
      phylink_pcs_config() implementation.
      
      Also, the NXP SJA1105 returns all zeroes in the PHY ID registers 2 and 3.
      We need to hack up an ad-hoc PHY ID (OUI is zero, device ID is 1) in
      order for the XPCS driver to recognize it. This PHY ID is added to the
      public include/linux/pcs/pcs-xpcs.h for that reason (for the sja1105
      driver to be able to use it in a later patch).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd0721ea
    • Vladimir Oltean's avatar
      net: pcs: xpcs: also ignore phy id if it's all ones · 36641b04
      Vladimir Oltean authored
      xpcs_get_id() searches multiple MMDs for a known PHY ID, starting with
      MDIO_MMD_PCS (3). However not all integrators might have implemented
      that MMD on their MDIO bus. For example, the NXP SJA1105 and SJA1110
      switches only implement vendor-specific MMD 1 and 2.
      
      When there is nothing on an MDIO bus at a certain address, traditionally
      the bus returns 0xffff, which means that the bus remained in its default
      pull-up state for the duration of the MDIO transaction. The 0xffff value
      is widely used in drivers/net/phy/phy_device.c (see get_phy_c22_id for
      example) to denote a missing device.
      
      So it makes sense for the xpcs to ignore this value as well, and
      continue its search, eventually finding the proper PHY ID in the
      vendor-specific MMDs.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36641b04
    • Vladimir Oltean's avatar
      net: pcs: xpcs: add support for sgmii with no inband AN · 2031c09e
      Vladimir Oltean authored
      In fixed-link use cases, the XPCS can disable the clause 37 in-band
      autoneg process, disable the "Automatic Speed Mode Change after CL37 AN"
      setting, and force operation in a speed dictated by management.
      
      Add support for this operating mode.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2031c09e
    • Vladimir Oltean's avatar
      net: pcs: xpcs: move register bit descriptions to a header file · d4433d5b
      Vladimir Oltean authored
      Vendors which integrate the Designware XPCS might modify a few things
      here and there, and to support those, it's best to create separate C
      files in order to not clutter up the main pcs-xpcs.c.
      
      Because the vendor files might want to access the common xpcs registers
      too, let's move them in a header file which is local to this driver and
      can be included by vendor files as appropriate.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4433d5b
    • Vladimir Oltean's avatar
      net: stmmac: reduce indentation when calling stmmac_xpcs_setup · 7413f9a6
      Vladimir Oltean authored
      There is no reason to embed an if within an if, we can just logically
      AND the two conditions.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7413f9a6
    • Vladimir Oltean's avatar
      net: stmmac: reverse Christmas tree notation in stmmac_xpcs_setup · 47538dbe
      Vladimir Oltean authored
      Reorder the variable declarations in descending line length order,
      according to the networking coding style.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47538dbe
    • Vladimir Oltean's avatar
      net: pcs: xpcs: rename mdio_xpcs_args to dw_xpcs · 5673ef86
      Vladimir Oltean authored
      The struct mdio_xpcs_args is reminiscent of when a similarly named
      struct mdio_xpcs_ops existed. Now that that is removed, we can shorten
      the name to dw_xpcs (dw for DesignWare).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5673ef86
    • David S. Miller's avatar
      Merge branch 'rmnet-checksums-part-1' · a6e49699
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: qualcomm: rmnet: MAPv4 download checksum cleanup, part 1
      
      I'm posting a large series an two smaller parts; this is part 1.
      
      The RMNet driver handles MAP (or QMAP) protocol traffic.  There are
      several versions of this protocol.  Version 1 supports multiplexing,
      as well as aggregation of packets in a single buffer.  Version 4
      adds the ability to perform checksum offload.  And version 5
      implements checksum offload in a different way from version 4.
      
      This series involves only MAPv4 protocol checksum offload, and only
      in the download (RX) direction.  It affects handling of checksums
      computed by hardware for UDP datagrams and TCP segments, carried
      over both IPv4 and IPv6.
      
      MAP packets arriving on an RMNet port implementing MAPv4 checksum
      offload are passed to rmnet_map_checksum_downlink_packet() for
      handling.
      
      The packet is then passed to rmnet_map_ipv4_dl_csum_trailer() or
      rmnet_map_ipv6_dl_csum_trailer(), depending contents of the MAP
      payload.  These two functions interpret checksum metadata to
      determine whether the checksum in the received packet matches that
      calculated by the hardware.
      
      It is these two functions that are the subject of this series (parts
      1 and 2).  The bulk of these functions are transformed--in a lot of
      small steps--from an extremely difficult-to-follow block of checksum
      processing code into a fairly simple, heavily commented equivalent.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e49699
    • Alex Elder's avatar
      net: qualcomm: rmnet: avoid unnecessary IPv6 byte-swapping · 23a5708d
      Alex Elder authored
      In the previous patch IPv4 download checksum offload code was
      updated to avoid unnecessary byte swapping, based on properties of
      the Internet checksum algorithm.  This patch makes comparable
      changes to the IPv6 download checksum offload handling.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23a5708d
    • Alex Elder's avatar
      net: qualcomm: rmnet: avoid unnecessary byte-swapping · a2918a16
      Alex Elder authored
      Internet checksums are used for IPv4 header checksum, as well as TCP
      segment and UDP datagram checksums.  Such a checksum represents the
      negated sum of adjacent pairs of bytes, using ones' complement
      arithmetic.
      
      One property of the Internet checkum is byte order independence [1].
      Specifically, the sum of byte-swapped pairs is equal to the result
      of byte swapping the sum of those same pairs when not byte-swapped.
      
      So for example if a, b, c, d, y, and z are hexadecimal digits, and
      PLUS represents ones' complement addition:
          If:		ab PLUS cd = yz
          Then:	ba PLUS dc = zy
      
      For this reason, there is no need to swap the order of bytes in the
      checksum value held in a message header, nor the one in the QMAPv4
      trailer, in order to operate on them.
      
      In other words, we can determine whether the hardware-computed
      checksum matches the one in the message header without any byte
      swaps.
      
      (This patch leaves in place all existing type casts.)
      
      [1] https://tools.ietf.org/html/rfc1071Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2918a16
    • Alex Elder's avatar
      net: qualcomm: rmnet: clarify a bit of code · e5adbbdf
      Alex Elder authored
      In rmnet_map_ipv6_dl_csum_trailer() there is an especially involved
      line of code that determines the ones' complement sum of the IPv6
      packet header (in host byte order).  Simplify that by storing the
      result of computing just the header checksum in a local variable,
      then using that in the original assignment.
      
      Use the size of the IPv6 header structure as the number of bytes to
      checksum, rather than computing the offset to the transport header.
      And use ip_fast_csum() rather than ipa_compute_csum(), knowing that
      the size of an IPv6 header (40 bytes) is a multiple of 4 bytes
      greater than 16.
      
      Add some comments to match rmnet_map_ipv4_dl_csum_trailer().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5adbbdf
    • Alex Elder's avatar
      net: qualcomm: rmnet: IPv4 header has zero checksum · 16bf3d33
      Alex Elder authored
      In rmnet_map_ipv4_dl_csum_trailer(), an illegal checksum subtraction
      is done, subtracting hdr_csum (in host byte order) from csum_value (in
      network byte order).  Despite being illegal, it generally works,
      because it turns out the value subtracted is (or should be) always 0,
      which has the same representation in either byte order.
      
      Doing illegal operations is not good form though, so fix this by
      verifying the IP header checksum early in that function.  If its
      checksum is non-zero, the packet will be bad, so just return an
      error.  This will cause the packet to passed to the IP layer where
      it can be dropped.
      
      Thereafter, there is no need subtract the IP header checksum from
      the checksum value in the trailer because we know it is zero.
      Add a comment explaining this.
      
      This type of packet error is different from other types, so add a
      new statistics counter to track this condition.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16bf3d33
    • Alex Elder's avatar
      net: qualcomm: rmnet: simplify rmnet_map_get_csum_field() · 874a333f
      Alex Elder authored
      The checksum fields of the TCP and UDP header structures already
      have type __sum16.  We don't support any other protocol headers, so
      we can simplify rmnet_map_get_csum_field(), getting rid of the local
      variable entirely and just returning the appropriate address.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      874a333f
    • Alex Elder's avatar
      net: qualcomm: rmnet: get rid of some local variables · 1d257f45
      Alex Elder authored
      The value passed as an argument to rmnet_map_ipv4_ul_csum_header()
      is always an IPv4 header.  Rather than using a local variable, just
      have the type of the argument reflect the proper type.
      
      In rmnet_map_ipv6_ul_csum_header() things are defined a little
      differently, but make the same basic change there.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d257f45
    • Alex Elder's avatar
      net: qualcomm: rmnet: eliminate some ifdefs · 75db5b07
      Alex Elder authored
      If IPV6 is not enabled in the kernel configuration, the RMNet
      checksum code indicates a buffer containing an IPv6 packet is not
      supported.  The same thing happens if a buffer contains something
      other than an IPv4 or IPv6 packet.
      
      We can rearrange things a bit in two functions so that some #ifdef
      calls can simply be eliminated.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75db5b07
    • Alex Elder's avatar
      net: qualcomm: rmnet: use ip_is_fragment() · e4517d8a
      Alex Elder authored
      In rmnet_map_ipv4_dl_csum_trailer() use ip_is_fragment() to
      determine whether a socket buffer contains a packet fragment.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4517d8a
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 9e4e1dd4
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Jake Keller says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2021-06-11
      
      Extend the ice driver to support basic PTP clock functionality for E810
      devices.
      
      This includes some tangential work required to setup the sideband queue and
      driver shared parameters as well.
      
      This series only supports E810-based devices. This is because other devices
      based on the E822 MAC use a different and more complex PHY.
      
      The low level device functionality is kept within ice_ptp_hw.c and is
      designed to be extensible for supporting E822 devices in a future series.
      
      This series also only supports very basic functionality including the
      ptp_clock device and timestamping. Support for configuring periodic outputs
      and external input timestamps will be implemented in a future series.
      
      There are a couple of potential "what? why?" bits in this series I want to
      point out:
      
      1) the PTP hardware functionality is shared between multiple functions. This
      means that the same clock registers are shared across multiple PFs. In order
      to avoid contention or clashing between PFs, firmware assigns "ownership" to
      one PF, while other PFs are merely "associated" with the timer. Because we
      share the hardware resource, only the clock owner will allocate and register
      a PTP clock device. Other PFs determine the appropriate PTP clock index to
      report by using a firmware interface to read a shared parameter that is set
      by the owning PF.
      
      2) the ice driver uses its own kthread instead of using do_aux_work. This is
      because the periodic and asynchronous tasks are necessary for all PFs, but
      only one PF will allocate the clock.
      
      The series is broken up into functional pieces to allow easy review.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e4e1dd4
    • David S. Miller's avatar
      Merge branch 'virtio-vsock-seqpacket' · 5aa3bd9b
      David S. Miller authored
      Arseny Krasnov says:
      
      ====================
      virtio/vsock: introduce SOCK_SEQPACKET support
      
      This patchset implements support of SOCK_SEQPACKET for virtio
      transport.
      	As SOCK_SEQPACKET guarantees to save record boundaries, so to
      do it, new bit for field 'flags' was added: SEQ_EOR. This bit is
      set to 1 in last RW packet of message.
      	Now as  packets of one socket are not reordered neither on vsock
      nor on vhost transport layers, such bit allows to restore original
      message on receiver's side. If user's buffer is smaller than message
      length, when all out of size data is dropped.
      	Maximum length of datagram is limited by 'peer_buf_alloc' value.
      	Implementation also supports 'MSG_TRUNC' flags.
      	Tests also implemented.
      
      	Thanks to stsp2@yandex.ru for encouragements and initial design
      recommendations.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aa3bd9b
    • Arseny Krasnov's avatar
      virtio/vsock: update trace event for SEQPACKET · 184039ee
      Arseny Krasnov authored
      Add SEQPACKET socket type to vsock trace event.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      184039ee
    • Arseny Krasnov's avatar
      vsock_test: add SOCK_SEQPACKET tests · 41b792d7
      Arseny Krasnov authored
      Implement two tests of SOCK_SEQPACKET socket: first sends data by
      several 'write()'s and checks that number of 'read()' were same.
      Second test checks MSG_TRUNC flag. Cases for connect(), bind(),
      etc. are not tested, because it is same as for stream socket.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41b792d7
    • Arseny Krasnov's avatar
      vsock/loopback: enable SEQPACKET for transport · 6e90a577
      Arseny Krasnov authored
      Add SEQPACKET ops for loopback transport and 'seqpacket_allow()'
      callback.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e90a577
    • Arseny Krasnov's avatar
      vhost/vsock: support SEQPACKET for transport · ced7b713
      Arseny Krasnov authored
      When received packet is copied to guests's rx queue, data buffers
      of rx queue could be smaller that data buffer of input packet, so
      data of input packet is copied to each rx buffer, thus each rx
      buffer will be a packet with dynamically created header. Fields
      of such header are initialized from header of input packet(except
      length field which value is depends on number of bytes copied to
      rx buffer). But in SEQPACKET case, we also need to take care of
      record delimeter bit: if input packet has this bit set, we don't
      copy it to header of packet in rx buffer, except case when such
      rx buffer is last part of input packet. Otherwise, we will get
      sequence of packets with delimeter bit set, thus braking record
      bounds.
      Also remove ignore of non-stream type of packets, handle SEQPACKET
      feature bit.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ced7b713
    • Arseny Krasnov's avatar
      virtio/vsock: enable SEQPACKET for transport · 53efbba1
      Arseny Krasnov authored
      To make transport work with SOCK_SEQPACKET add two things:
      1) SOCK_SEQPACKET ops for virtio transport and 'seqpacket_allow()'
         callback.
      2) Handling of SEQPACKET bit: guest tries to negotiate it with vhost,
         so feature will be enabled only if bit is negotiated with device.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53efbba1
    • Arseny Krasnov's avatar
      virtio/vsock: rest of SOCK_SEQPACKET support · 9ac841f5
      Arseny Krasnov authored
      Small updates to make SOCK_SEQPACKET work:
      1) Send SHUTDOWN on socket close for SEQPACKET type.
      2) Set SEQPACKET packet type during send.
      3) Set 'VIRTIO_VSOCK_SEQ_EOR' bit in flags for last
         packet of message.
      4) Implement data check function for SEQPACKET.
      5) Check for max datagram size.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ac841f5
    • Arseny Krasnov's avatar
      virtio/vsock: add SEQPACKET receive logic · e4b1ef15
      Arseny Krasnov authored
      Update current receive logic for SEQPACKET support: performs
      check for packet and socket types on receive(if mismatch, then
      reset connection). Increment EOR counter on receive. Also if
      buffer of new packet was appended to buffer of last packet in
      rx queue, update flags of last packet with flags of new packet.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4b1ef15
    • Arseny Krasnov's avatar
      virtio/vsock: dequeue callback for SOCK_SEQPACKET · 44931195
      Arseny Krasnov authored
      Callback fetches RW packets from rx queue of socket until whole record
      is copied(if user's buffer is full, user is not woken up). This is done
      to not stall sender, because if we wake up user and it leaves syscall,
      nobody will send credit update for rest of record, and sender will wait
      for next enter of read syscall at receiver's side. So if user buffer is
      full, we just send credit update and drop data.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44931195
    • Arseny Krasnov's avatar
      virtio/vsock: defines and constants for SEQPACKET · f07b2a5b
      Arseny Krasnov authored
      Add set of defines and constants for SOCK_SEQPACKET support
      in vsock.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f07b2a5b
    • Arseny Krasnov's avatar
      virtio/vsock: simplify credit update function API · c10844c5
      Arseny Krasnov authored
      This function is static and 'hdr' arg was always NULL.
      Signed-off-by: default avatarArseny Krasnov <arseny.krasnov@kaspersky.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c10844c5