- 24 Aug, 2018 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix BPF sockmap and tls where we get a hang in do_tcp_sendpages() when sndbuf is full due to missing calls into underlying socket's sk_write_space(), from John. 2) Two BPF sockmap fixes to reject invalid parameters on map creation and to fix a map element miscount on allocation failure. Another fix for BPF hash tables to use per hash table salt for jhash(), from Daniel. 3) Fix for bpftool's command line parsing in order to terminate on bad arguments instead of keeping looping in some border cases, from Quentin. 4) Fix error value of xdp_umem_assign_dev() in order to comply with expected bind ops error codes, from Prashant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Aug, 2018 36 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queueDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2018-08-23 This series contains bug fixes to the ice driver. Anirudh provides several fixes, starting with static analysis fixes by replacing a bitwise-and with a constant value and replace "magic" numbers with defines. Fixed the control queue processing by removing unnecessary read/writes to registers, as well as getting a accurate value for "pending". Added additional checks to avoid NULL pointer dereferences. Fixed up code formatting issues, by cleaning up code comments and coding style. Bruce cleans up a duplicate check for owner, within the same function. Also cleans up interrupt causes that are not handled or applicable. Fix checkpatch warning about the use of bool in structures due to the wasted space and size of bool, so convert struct members to u8 instead. Jake fixes a number of potential bugs in the reporting of stats via ethtool, by simply reporting all the queue statistics, even for the queues that are not activated. Fixed a compiler warning, as well as make the code a bit cleaner but just using order_base_2() for calculating the power-of-2. Preethi adds a check to avoid a NULL pointer dereference crash during initialization. Brett clarifies the code when it comes to port VLANs and regular VLANs, by renaming defines and field values to match their intended use and purpose. Jesse initializes a variable to avoid garbage values being returned to the caller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anirudh Venkataramanan authored
1) Add missing "\n" when printing link event error message. 2) Update dev_err statement in probe. 3) Add function description for ice_clear_pf_cfg. 4) Fix coding style for ice_acquire_nvm. 5) netdev->mtu is unsigned so use %u. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Bruce Allan authored
Recent versions of checkpatch have a new warning based on a documented preference of Linus to not use bool in structures due to wasted space and the size of bool is implementation dependent. For more information, see the email thread at https://lkml.org/lkml/2017/11/21/384. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
In ice_vsi_setup_[tx|rx]_rings, err is uninitialized which can result in a garbage value return to the caller. Fix that. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
1) When ice_ena_msix_range() fails to reserve vectors, a devm_kfree() warning was seen in the error flow path. So check pf->irq_tracker before use in ice_clear_interrupt_scheme(). 2) In ice_vsi_cfg(), check vsi->netdev before use. 3) In ice_get_link_status, check link_up before use. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Bruce Allan authored
Remove the following interrupt causes that are not applicable or not handled: - PFINT_OICR_HLP_RDY_M - PFINT_OICR_CPM_RDY_M - PFINT_OICR_GPIO_M - PFINT_OICR_STORM_DETECT_M Add the following interrupt cause that's actually handled in ice_misc_intr: - PFINT_OICR_PE_CRITERR_M Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Quentin Monnet authored
When command line parsing fails in the while loop in do_event_pipe() because the number of arguments is incorrect or because the keyword is unknown, an error message is displayed, but bpftool remains stuck in the loop. Make sure we exit the loop upon failure. Fixes: f412eed9 ("tools: bpftool: add simple perf event output reader") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Brett Creeley authored
In the struct ice_aqc_vsi_props the field port_vlan_flags is an overloaded term because it is used for both port VLANs (PVLANs) and regular VLANs. This is an issue and is very confusing especially when dealing with VFs because normal VLANs and port VLANs are not the same. To fix this the field was renamed to vlan_flags and all of the #define's labeled *_PVLAN_* were renamed to *_VLAN_* if they are not specific to port VLANs. Also in ice_vsi_manage_vlan_stripping, set the ICE_AQ_VSI_VLAN_MODE_ALL bit to allow the driver to add a VLAN tag to all packets it sends. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Currently, we use a combination of ilog2 and is_power_of_2() to calculate the next power of 2 for the qcount. This appears to be causing a warning on some combinations of GCC and the Linux kernel: MODPOST 1 modules WARNING: "____ilog2_NaN" [ice.ko] undefined! This appears to because because GCC realizes that qcount could be zero in some circumstances and thus attempts to link against the intentionally undefined ___ilog2_NaN function. The order_base_2 function is intentionally defined to return 0 when passed 0 as an argument, and thus will be safe to use here. This not only fixes the warning but makes the resulting code slightly cleaner, and is really what we should have used originally. Also update the comment to make it more clear that we are rounding up, not just incrementing the ilog2 of qcount unconditionally. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
This patch is a consolidation of multiple bug fixes for control queue processing. 1) In ice_clean_adminq_subtask() remove unnecessary reads/writes to registers. The bits PFINT_FW_CTL, PFINT_MBX_CTL and PFINT_SB_CTL are not set when an interrupt arrives, which means that clearing them again can be omitted. 2) Get an accurate value in "pending" by re-reading the control queue head register from the hardware. 3) Fix a corner case involving lost control queue messages by checking for new control messages (using ice_ctrlq_pending) before exiting the cleanup routine. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Preethi Banala authored
Clean control queues only when they are initialized. One of the ways to validate if the basic initialization is done is by checking value of cq->sq.head and cq->rq.head variables that specify the register address. This patch adds a check to avoid NULL pointer dereference crash when tried to shutdown uninitialized control queue. Signed-off-by: Preethi Banala <preethi.banala@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
It is not safe to have the string table for statistics change order or size over the lifetime of a given netdevice. This is because of the nature of the 3-step process for obtaining stats. First, user space performs a request for the size of the strings table. Second it performs a separate request for the strings themselves, after allocating space for the table. Third, it requests the stats themselves, also allocating space for the table. If the size decreased, there is potential to see garbage data or stats values. In the worst case, we could potentially see stats values become mis-aligned with their strings, so that it looks like a statistic is being reported differently than it actually is. Even worse, if the size increased, there is potential that the strings table or stats table was not allocated large enough and the stats code could access and write to memory it should not, potentially resulting in undefined behavior and system crashes. It isn't even safe if the size always changes under the RTNL lock. This is because the calls take place over multiple user space commands, so it is not possible to hold the RTNL lock for the entire duration of obtaining strings and stats. Further, not all consumers of the ethtool API are the user space ethtool program, and it is possible that one assumes the strings will not change (valid under the current contract), and thus only requests the stats values when requesting stats in a loop. Finally, it's not possible in the general case to detect when the size changes, because it is quite possible that one value which could impact the stat size increased, while another decreased. This would result in the same total number of stats, but reordering them so that stats no longer line up with the strings they belong to. Since only size changes aren't enough, we would need some sort of hash or token to determine when the strings no longer match. This would require extending the ethtool stats commands, but there is no more space in the relevant structures. The real solution to resolve this would be to add a completely new API for stats, probably over netlink. In the ice driver, the only thing impacting the stats that is not constant is the number of queues. Instead of reporting stats for each used queue, report stats for each allocated queue. We do not change the number of queues allocated for a given netdevice, as we pass this into the alloc_etherdev_mq() function to set the num_tx_queues and num_rx_queues. This resolves the potential bugs at the slight cost of displaying many queue statistics which will not be activated. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
Use define for the unit size shift of the Rx LAN context descriptor base address instead of the magic number 7. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Daniel Borkmann authored
All BPF hash and LRU maps currently have a known and global seed we feed into jhash() which is 0. This is suboptimal, thus fix it by generating a random seed upon hashtab setup time which we can later on feed into jhash() on lookup, update and deletions. Fixes: 0f8e4bd8 ("bpf: add hashtable type of eBPF maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com>
-
Bruce Allan authored
There is already a check for owner == ICE_SCHED_NODE_OWNER_LAN at the beginning of ice_sched_update_vsi_child_nodes. Remove the additional check to address the static analysis tool smatch issue "warn: we tested 'owner' before and it was 'false'". Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
This patch fixes the following smatch errors: 1) Fix "odd binop '0x0 & 0xc'" when performing the bitwise-and with a constant value of zero (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG). Remove a similar bitwise-and with 0 in ice_add_marker_act() and use the right mask ICE_LG_ACT_GENERIC_OFFSET_M in the expression. 2) Fix a similar issue "odd binop '0x0 & 0x1800' in ice_req_irq_msix_misc. 3) Fix "odd binop '0x380000 & 0x7fff8'" in ice_add_marker_act(). Also, use a new define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX instead of magic number '7'. 4) Fix warn: odd binop '0x0 & 0x18' in ice_set_dflt_vsi_ctx() by removing unnecessary logic to explicitly unset bits 3 and 4 in port_vlan_bits. These bits are unset already by the memset on ctxt->info. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothDavid S. Miller authored
Johan Hedberg says: ==================== pull request: bluetooth 2018-08-23 Here are two important Bluetooth fixes for the MediaTek and RealTek HCI drivers. Please let me know if there are any issues pulling, thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: bug fix & optimization for HNS3 driver This patchset presents a bug fix found out when CONFIG_ARM64_64K_PAGES enable and an optimization for HNS3 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
'truesize' is supposed to be u32, not int, so fix it. Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE is 65536(64K). But the type of page_offset is u16, it will overflow. So change it to u32, when "CONFIG_ARM64_64K_PAGES" enabled. Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hangbin Liu authored
Commit 6edb3c96 ("net/ipv6: Defer initialization of dst to data path") forgot to handle anycast route and init anycast rt->dst.input to ip6_forward. Fix it by setting anycast rt->dst.input back to ip6_input. Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns: bug fixes & optimization for HNS driver This patchset presents some bug fixes found out when CONFIG_ARM64_64K_PAGES enable and an optimization for HNS driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
Update hns to drop the hns_nic_get_headlen function in favour of eth_get_headlen, and hence also removes now redundant hns_nic_get_headlen. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
skb->truesize is not meant to be tracking amount of used bytes in a skb, but amount of reserved/consumed bytes in memory. For instance, if we use a single byte in last page fragment, we have to account the full size of the fragment. So skb_add_rx_frag needs to calculate the length of the entire buffer into turesize. Fixes: 9cbe9fd5 ("net: hns: optimize XGE capability by reducing cpu usage") Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
'truesize' is supposed to be u32, not int, so fix it. Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE is 65536(64K). But the type of length and page_offset are u16, they will overflow. So change them to u32. Fixes: 6fe6611f ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Kevin Yang says: ==================== tcp_bbr: PROBE_RTT minor bug fixes This series includes two minor bug fixes for the TCP BBR PROBE_RTT mechanism, and one preparatory patch: (1) A preparatory patch to reorganize the PROBE_RTT logic by refactoring (into its own function) the code to exit PROBE_RTT, since the next patch will be using that code in a new context. (2) Fix: When BBR restarts from idle and if BBR is in PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its full value. (3) Fix: Apply the PROBE_RTT cwnd cap even if the count of fully-ACKed packets is 0. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kevin Yang authored
This commit fixes a corner case where TCP BBR would enter PROBE_RTT mode but not reduce its cwnd. If a TCP receiver ACKed less than one full segment, the number of delivered/acked packets was 0, so that bbr_set_cwnd() would short-circuit and exit early, without cutting cwnd to the value we want for PROBE_RTT. The fix is to instead make sure that even when 0 full packets are ACKed, we do apply all the appropriate caps, including the cap that applies in PROBE_RTT mode. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kevin Yang authored
This patch fix the case where BBR does not exit PROBE_RTT mode when it restarts from idle. When BBR restarts from idle and if BBR is in PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its full value. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kevin Yang authored
This patch add a helper function bbr_check_probe_rtt_done() to 1. check the condition to see if bbr should exit probe_rtt mode; 2. process the logic of exiting probe_rtt mode. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally to send some control packets. 1) RST packets, through tcp_v4_send_reset() 2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack() These packets assert IP_DF, and also use the hashed IP ident generator to provide an IPv4 ID number. Geoff Alexander reported this could be used to build off-path attacks. These packets should not be fragmented, since their size is smaller than IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment, regardless of inner IPID. We really can use zero IPID, to address the flaw, and as a bonus, avoid a couple of atomic operations in ip_idents_reserve() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Geoff Alexander <alexandg@cs.unm.edu> Tested-by: Geoff Alexander <alexandg@cs.unm.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
All the 3 callers of addrconf_add_mroute() assert RTNL lock, they don't take any additional lock either, so it is safe to convert it to GFP_KERNEL. Same for sit_add_v4_addrs(). Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The new tcf_exts_for_each_action() macro doesn't reference its arguments when CONFIG_NET_CLS_ACT is disabled, which leads to a harmless warning in at least one driver: drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions': drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable] Adding a cast to void lets us avoid this kind of warning. To be on the safe side, do it for all three arguments, not just the one that caused the warning. Fixes: 244cd96a ("net_sched: remove list_head from tc_action") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Toke Høiland-Jørgensen authored
The TC filter flow mapping override completely skipped the call to cake_hash(); however that meant that the internal state was not being updated, which ultimately leads to deadlocks in some configurations. Fix that by passing the overridden flow ID into cake_hash() instead so it can react appropriately. In addition, the major number of the class ID can now be set to override the host mapping in host isolation mode. If both host and flow are overridden (or if the respective modes are disabled), flow dissection and hashing will be skipped entirely; otherwise, the hashing will be kept for the portions that are not set by the filter. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Samuel Mendoza-Jonas authored
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI flag, causing additional package information after the first to be lost. Also fixup a sanity check in ncsi_write_package_info() to reject out of range package IDs. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wolfram Sang authored
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 22 Aug, 2018 3 commits
-
-
John Fastabend authored
When sockmap code is using the stream parser it also handles the write space events in order to handle the case where (a) verdict redirects skb to another socket and (b) the sockmap then sends the skb but due to memory constraints (or other EAGAIN errors) needs to do a retry. But the initial code missed a third case where the skb_send_sock_locked() triggers an sk_wait_event(). A typically case would be when sndbuf size is exceeded. If this happens because we do not pass the write_space event to the lower layers we never wake up the event and it will wait for sndtimeo. Which as noted in ktls fix may be rather large and look like a hang to the user. To reproduce the best test is to reduce the sndbuf size and send 1B data chunks to stress the memory handling. To fix this pass the event from the upper layer to the lower layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
John Fastabend authored
Currently, the lower protocols sk_write_space handler is not called if TLS is sending a scatterlist via tls_push_sg. However, normally tls_push_sg calls do_tcp_sendpage, which may be under memory pressure, that in turn may trigger a wait via sk_wait_event. Typically, this happens when the in-flight bytes exceed the sdnbuf size. In the normal case when enough ACKs are received sk_write_space() will be called and the sk_wait_event will be woken up allowing it to send more data and/or return to the user. But, in the TLS case because the sk_write_space() handler does not wake up the events the above send will wait until the sndtimeo is exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a hang to the user (especially this impatient user). To fix this pass the sk_write_space event to the lower layers sk_write_space event which in the TCP case will wake any pending events. I observed the above while integrating sockmap and ktls. It initially appeared as test_sockmap (modified to use ktls) occasionally hanging. To reliably reproduce this reduce the sndbuf size and stress the tls layer by sending many 1B sends. This results in every byte needing a header and each byte individually being sent to the crypto layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Daniel Borkmann authored
When we try to allocate a new sock hash entry and the allocation fails, then sock hash map fails to reduce the map element counter, meaning we keep accounting this element although it was never used. Fix it by dropping the element counter on error. Fixes: 81110384 ("bpf: sockmap, add hash map support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com>
-