- 14 Feb, 2022 8 commits
-
-
Colin Foster authored
The ocelot_update_stats function only needs to read from one port, yet it was updating the stats for all ports. Update to only read the stats that are necessary. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add reasons to __udp6_lib_rcv for skb drops. The only twist is that the NO_SOCKET takes precedence over the CSUM or other counters for that path (motivation behind this patch - csum counter was misleading). Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Joseph CHAMG says: ==================== ADD DM9051 ETHERNET DRIVER DM9051 is a spi interface chip, need cs/mosi/miso/clock with an interrupt gpio pin ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joseph CHAMG authored
Add davicom dm9051 spi ethernet driver, The driver work for the device platform which has the spi master Signed-off-by: Joseph CHAMG <josright123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joseph CHAMG authored
This is a new yaml base data file for configure davicom dm9051 with device tree Signed-off-by: Joseph CHAMG <josright123@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tony Lu authored
The previous patch introduces a lock-free version of smc_tx_work() to solve unnecessary lock contention, which is expected to be held lock. So this adds comment to remind people to keep an eye out for locks. Suggested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kalash Nainwal authored
Generate RTM_NEWROUTE netlink notification when the route preference changes on an existing kernel generated default route in response to RA messages. Currently netlink notifications are generated only when this route is added or deleted but not when the route preference changes, which can cause userspace routing application state to go out of sync with kernel. Signed-off-by: Kalash Nainwal <kalash@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
in current Linux, MTU policing does not take into account that packets at the TC ingress have the L2 header pulled. Thus, the same TC police action (with the same value of tcfp_mtu) behaves differently for ingress/egress. In addition, the full GSO size is compared to tcfp_mtu: as a consequence, the policer drops GSO packets even when individual segments have the L2 + L3 + L4 + payload length below the configured valued of tcfp_mtu. Improve the accuracy of MTU policing as follows: - account for mac_len for non-GSO packets at TC ingress. - compare MTU threshold with the segmented size for GSO packets. Also, add a kselftest that verifies the correct behavior. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 13 Feb, 2022 10 commits
-
-
Kees Cook authored
With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], | ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When CONFIG_IPV6 is not set, then the linking of the lan966x driver fails with the following error: drivers/net/ethernet/microchip/lan966x/lan966x_main.c:444: undefined reference to `ipv6_mc_check_mld' The fix consists in adding a check also for IS_ENABLED(CONFIG_IPV6) Fixes: 47aeea0d ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When CONFIG_PTP_1588_CLOCK is compiled as a module, then the linking of the lan966x fails because it can't find references to the following functions 'ptp_clock_index', 'ptp_clock_register' and 'ptp_clock_unregister' The fix consists in adding CONFIG_PTP_1588_CLOCK_OPTIONAL as a dependency for the driver. Fixes: d0964594 ("net: lan966x: Add support for ptp clocks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series adds support of the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. The PCI1xxxx family of devices consists of a PCIe switch with a variety of embedded PCI endpoints on its downstream ports. The PCI11010 / PCI11414 devices include an Ethernet 10/100/1000/2500 function as one of those embedded endpoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Add support for Clause-45 MDIO PHY management Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
This change facilitates the selection between SGMII and (R)GIII interfaces Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Increase MSI / MSI-X vectors supported from 8 to 16 and Interrupt De-assertion timers from 8 to 10 Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Add support for 4 Tx queues Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
PCI11010/PCI11414 devices are enhancement of Ethernet LAN743x chip family. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
M Chetan Kumar authored
This patch enables Intel M.2 7360 WWAN card support on IOSM Driver. Control path implementation is a reuse whereas data path implementation it uses a different protocol called as MUX Aggregation. The major portion of this patch covers the MUX Aggregation protocol implementation used for IP traffic communication. For M.2 7360 WWAN card, driver exposes 2 wwan AT ports for control communication. The user space application or the modem manager to use wwan AT port for data path establishment. During probe, driver reads the mux protocol device capability register to know the mux protocol version supported by device. Base on which the right mux protocol is initialized for data path communication. An overview of an Aggregation Protocol 1> An IP packet is encapsulated with 16 octet padding header to form a Datagram & the start offset of the Datagram is indexed into Datagram Header (DH). 2> Multiple such Datagrams are composed & the start offset of each DH is indexed into Datagram Table Header (DTH). 3> The Datagram Table (DT) is IP session specific & table_length item in DTH holds the number of composed datagram pertaining to that particular IP session. 4> And finally the offset of first DTH is indexed into DBH (Datagram Block Header). So in TX/RX flow Datagram Block (Datagram Block Header + Payload)is exchanged between driver & device. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Feb, 2022 1 commit
-
-
Jakub Kicinski authored
This reverts commit 038fcdaf. Christophe points out div64_u64() and do_div() have different calling conventions. One updates the param, the other returns the result. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/all/056a7276-c6f0-cd7e-9e46-1d8507a0b6b1@wanadoo.fr/ Fixes: 038fcdaf ("net: ethernet: cavium: use div64_u64() instead of do_div()") Link: https://lore.kernel.org/r/20220211020544.3262694-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 11 Feb, 2022 21 commits
-
-
Julia Lawall authored
Platform_driver probe functions aren't called with locks held and thus don't need GFP_ATOMIC. Use GFP_KERNEL instead. Problem found with Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220210204223.104181-1-Julia.Lawall@inria.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Hariprasad Kelam authored
This patch fixes below error by using proper data type. drivers/net/ethernet/marvell/octeontx2/af/rpm.c: In function 'rpm_cfg_pfc_quanta_thresh': include/linux/find.h:40:23: error: array subscript 'long unsigned int[0]' is partly outside array bounds of 'u16[1]' {aka 'short unsigned int[1]'} [-Werror=array-bounds] 40 | val = *addr & GENMASK(size - 1, offset); Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20220211155539.13931-1-hkelam@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Merge tag 'wireless-next-2022-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next wireless-next patches for v5.18 First set of patches for v5.18, with both wireless and stack patches. rtw89 now has AP mode support and wcn36xx has survey support. But otherwise pretty normal. Major changes: ath11k * add LDPC FEC type in 802.11 radiotap header * enable RX PPDU stats in monitor co-exist mode wcn36xx * implement survey reporting brcmfmac * add CYW43570 PCIE device rtw88 * rtw8821c: enable RFE 6 devices rtw89 * AP mode support mt76 * mt7916 support * background radar detection support
-
David S. Miller authored
Eric Dumazet says: ==================== ipv6: remove addrconf reliance on loopback Second patch in this series removes IPv6 requirement about the netns loopback device being the last device being dismantled. This was needed because rt6_uncached_list_flush_dev() and ip6_dst_ifdown() had to switch dst dev to a known device (loopback). Instead of loopback, we can use the (hidden) blackhole_netdev which is also always there. This will allow future simplfications of netdev_run_to() and other parts of the stack like default_device_exit_batch(). Last two patches are optimizations for both IP families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is an optimization to keep the per-cpu lists as short as possible: Whenever rt_flush_dev() changes one rtable dst.dev matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt_del_uncached_list(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is an optimization to keep the per-cpu lists as short as possible: Whenever rt6_uncached_list_flush_dev() changes one rt6_info matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt6_uncached_list_del(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion. This caused many limitations and work arounds. Back in linux-5.3, Mahesh added a per host blackhole_netdev that can be used whenever we need to make sure objects no longer refer to a disappearing device. If we attach to blackhole_netdev an ip6_ptr (allocate an idev), then we can use this special device (which is never freed) in place of the loopback_dev (which can be freed). This will permit improvements in netdev_run_todo() and other parts of the stack where had steps to make sure loopback_dev was the last device to disappear. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This counter has never been visible, there is little point trying to maintain it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Holger Brunck authored
The mv88e6352, mv88e6240 and mv88e6176 have a serdes interface. This patch allows to configure the output swing to a desired value in the phy-handle of the port. The value which is peak to peak has to be specified in microvolts. As the chips only supports eight dedicated values we return EINVAL if the value in the DTS does not match one of these values. Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Common PHYs and network PCSes often have the possibility to specify peak-to-peak voltage on the differential pair - the default voltage sometimes needs to be changed for a particular board. Add properties `tx-p2p-microvolt` and `tx-p2p-microvolt-names` for this purpose. The second property is needed to specify the mode for the corresponding voltage in the `tx-p2p-microvolt` property, if the voltage is to be used only for speficic mode. More voltage-mode pairs can be specified. Example usage with only one voltage (it will be used for all supported PHY modes, the `tx-p2p-microvolt-names` property is not needed in this case): tx-p2p-microvolt = <915000>; Example usage with voltages for multiple modes: tx-p2p-microvolt = <915000>, <1100000>, <1200000>; tx-p2p-microvolt-names = "2500base-x", "usb", "pcie"; Add these properties into a separate file phy/transmit-amplitude.yaml, which should be referenced by any binding that uses it. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
The ->rtm_tos option is normally used to route packets based on both the destination address and the DS field. However it's ignored for IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route is going to work only on the destination address anyway, so it won't behave as specified. Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== More aggressive DSA cleanup This series deletes some code which is apparently not needed. I've had these patches in my tree for a while, and testing on my boards didn't reveal any issues. Compared to the RFC v1 series, the only change is the addition of patch 3. https://patchwork.kernel.org/project/netdevbpf/cover/20220107184842.550334-1-vladimir.oltean@nxp.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
D. Wythe says: ==================== net/smc: Optimizing performance in short-lived scenarios This patch set aims to optimizing performance of SMC in short-lived links scenarios, which is quite unsatisfactory right now. In our benchmark, we test it with follow scripts: ./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server Current performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 4956 requests in 20.06s, 3.24MB read Socket errors: connect 0, read 0, write 672, timeout 0 Requests/sec: 247.07 Transfer/sec: 165.28KB There are many reasons for this phenomenon, this patch set doesn't solve it all though, but it can be well alleviated with it in. Patch 1/5 (Make smc_tcp_listen_work() independent) : Separate smc_tcp_listen_work() from smc_listen_work(), make them independent of each other, the busy SMC handshake can not affect new TCP connections visit any more. Avoid discarding a large number of TCP connections after being overstock, which is undoubtedly raise the connection establishment time. Patch 2/5 (Limit SMC backlog connections): Since patch 1 has separated smc_tcp_listen_work() from smc_listen_work(), an unrestricted TCP accept have come into being. This patch try to put a limit on SMC backlog connections refers to implementation of TCP. Patch 3/5 (Limit SMC visits when handshake workqueue congested): Considering the complexity of SMC handshake right now, in short-lived links scenarios, this may not be the main scenario of SMC though, it's performance is still quite poor. This patch try to provide constraint on SMC handshake when handshake workqueue congested, which is the sign of SMC handshake stacking in our opinion. Patch 4/5 (Dynamic control handshake limitation by socket options) This patch allow applications dynamically control the ability of SMC handshake limitation. Since SMC don't support set SMC socket option before, this patch also have to support SMC's owns socket options. Patch 5/5 (Add global configure for handshake limitation by netlink) This patch provides a way to get benefit of handshake limitation without modifying any code for applications, which is quite useful for most existing applications. After this patch set, performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 693253 requests in 20.10s, 452.88MB read Requests/sec: 34488.13 Transfer/sec: 22.53MB That's a quite well performance improvement, about to 6 to 7 times in my environment. --- changelog: v1 -> v2: - fix compile warning - fix invalid dependencies in kconfig v2 -> v3: - correct spelling mistakes - fix useless variable declare v3 -> v4 - make smc_tcp_ls_wq be static v4 -> v5 - add dynamic control for SMC auto fallback by socket options - add global configure for SMC auto fallback through netlink v5 -> v6 - move auto fallback to net namespace scope - remove auto fallback attribute in SMC_GEN_SYS_INFO - add independent attributes for auto fallback v6 -> v7 - fix wording and the naming issues, rename 'auto fallback' to handshake limitation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
Although we can control SMC handshake limitation through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of SMC handshake limitation through netlink, providing a way to put constraint on handshake without modifies any code for applications. Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
This patch aims to add dynamic control for SMC handshake limitation for every smc sockets, in production environment, it is possible for the same applications to handle different service types, and may have different opinion on SMC handshake limitation. This patch try socket options to complete it, since we don't have socket option level for SMC yet, which requires us to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_LIMIT_HS. - provide getter/setter for SMC socket options. Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
This patch intends to provide a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Put limitation after TCP established. 2. Put limitation before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the constraint. This patch use the second way. After this patch, connections who beyond constraint will not informed any SMC indication, and SMC will not be involved in any of its subsequent processes. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
Current implementation does not handling backlog semantics, one potential risk is that server will be flooded by infinite amount connections, even if client was SMC-incapable. This patch works to put a limit on backlog connections, referring to the TCP implementation, we divides SMC connections into two categories: 1. Half SMC connection, which includes all TCP established while SMC not connections. 2. Full SMC connection, which includes all SMC established connections. For half SMC connection, since all half SMC connections starts with TCP established, we can achieve our goal by put a limit before TCP established. Refer to the implementation of TCP, this limits will based on not only the half SMC connections but also the full connections, which is also a constraint on full SMC connections. For full SMC connections, although we know exactly where it starts, it's quite hard to put a limit before it. The easiest way is to block wait before receive SMC confirm CLC message, while it's under protection by smc_server_lgr_pending, a global lock, which leads this limit to the entire host instead of a single listen socket. Another way is to drop the full connections, but considering the cast of SMC connections, we prefer to keep full SMC connections. Even so, the limits of full SMC connections still exists, see commits about half SMC connection below. After this patch, the limits of backend connection shows like: For SMC: 1. Client with SMC-capability can makes 2 * backlog full SMC connections or 1 * backlog half SMC connections and 1 * backlog full SMC connections at most. 2. Client without SMC-capability can only makes 1 * backlog half TCP connections and 1 * backlog full TCP connections. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
In multithread and 10K connections benchmark, the backend TCP connection established very slowly, and lots of TCP connections stay in SYN_SENT state. Client: smc_run wrk -c 10000 -t 4 http://server the netstate of server host shows like: 145042 times the listen queue of a socket overflowed 145042 SYNs to LISTEN sockets dropped One reason of this issue is that, since the smc_tcp_listen_work() shared the same workqueue (smc_hs_wq) with smc_listen_work(), while the smc_listen_work() do blocking wait for smc connection established. Once the workqueue became congested, it's will block the accept() from TCP listen. This patch creates a independent workqueue(smc_tcp_ls_wq) for smc_tcp_listen_work(), separate it from smc_listen_work(), which is quite acceptable considering that smc_tcp_listen_work() runs very fast. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-