- 17 Jul, 2021 11 commits
-
-
Liu Jian authored
I got below panic when doing fuzz test: Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x7a/0x9b panic+0x2cd/0x5af end_report.cold+0x5a/0x5a kasan_report+0xec/0x110 ip_check_mc_rcu+0x556/0x5d0 __mkroute_output+0x895/0x1740 ip_route_output_key_hash_rcu+0x2d0/0x1050 ip_route_output_key_hash+0x182/0x2e0 ip_route_output_flow+0x28/0x130 udp_sendmsg+0x165d/0x2280 udpv6_sendmsg+0x121e/0x24f0 inet6_sendmsg+0xf7/0x140 sock_sendmsg+0xe9/0x180 ____sys_sendmsg+0x2b8/0x7a0 ___sys_sendmsg+0xf0/0x160 __sys_sendmmsg+0x17e/0x3c0 __x64_sys_sendmmsg+0x9e/0x100 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x462eb9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9 RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff It is one use-after-free in ip_check_mc_rcu. In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection. But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ronak Doshi says: ==================== vmxnet3: upgrade to version 6 vmxnet3 emulation has recently added several new features which includes increase in queues supported, remove power of 2 limitation on queues, add RSS for ESP IPv6, etc. This patch series extends the vmxnet3 driver to leverage these new features. Compatibility is maintained using existing vmxnet3 versioning mechanism as follows: - new features added to vmxnet3 emulation are associated with new vmxnet3 version viz. vmxnet3 version 6. - emulation advertises all the versions it supports to the driver. - during initialization, vmxnet3 driver picks the highest version number supported by both the emulation and the driver and configures emulation to run at that version. In particular, following changes are introduced: Patch 1: This patch introduces utility macros for vmxnet3 version 6 comparison and updates Copyright information. Patch 2: This patch adds support to increase maximum Tx/Rx queues from 8 to 32. Patch 3: This patch removes the limitation of power of 2 on the queues. Patch 4: Uses existing get_rss_hash_opts and set_rss_hash_opts methods to add support for ESP IPv6 RSS. Patch 5: This patch reports correct RSS hash type based on the type of RSS performed. Patch 6: This patch updates maximum configurable mtu to 9190. Patch 7: With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver, with this patch, the driver can configure emulation to run at vmxnet3 version 6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver, the driver can configure emulation to run at vmxnet3 version 6, provided the emulation advertises support for version 6. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
This patch increases the maximum configurable mtu to 9190 to accommodate jumbo packets of overlay traffic. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
As vmxnet3 supports IP/TCP/UDP RSS, this patch sets appropriate hash type based on the type of RSS performed. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
Vmxnet3 version 4 added support for ESP RSS. However, only IPv4 was supported. With vmxnet3 version 6, this patch enables RSS for ESP IPv6 packets as well. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
With version 6, vmxnet3 relaxes the restriction on queues to be power of two. This is helpful in cases (Edge VM) where vcpus are less than 8 and device requires more than 4 queues. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
Currently, vmxnet3 supports maximum of 8 Tx/Rx queues. With increase in number of vcpus on a VM, to achieve better performance and utilize idle vcpus, we need to increase the max number of queues supported. This patch enhances vmxnet3 to support maximum of 32 Tx/Rx queues. Increasing the Rx queues also increases the probability of distrubuting the traffic from different flows to different queues with RSS. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
vmxnet3 is currently at version 4 and this patch initiates the preparation to accommodate changes for upto version 6. Introduced utility macros for vmxnet3 version 6 comparison and update Copyright information. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Note that this will work as above only when MSG_EOR is set in the flags parameter of recvmsg(), so that it won't break any old user applications. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tDavid S. Miller authored
nguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-07-16 Vinicius Costa Gomes says: Add support for steering traffic to specific RX queues using Flex Filters. As the name implies, Flex Filters are more flexible than using Layer-2, VLAN or MAC address filters, one of the reasons is that they allow "AND" operations more easily, e.g. when the user wants to steer some traffic based on the source MAC address and the packet ethertype. Future work include adding support for offloading tc-u32 filters to the hardware. The series is divided as follows: Patch 1/5, add the low level primitives for configuring Flex filters. Patch 2/5 and 3/5, allow ethtool to manage Flex filters. Patch 4/5, when specifying filters that have multiple predicates, use Flex filters. Patch 5/5, Adds support for exposing the i225 LEDs using the LED subsystem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Jul, 2021 29 commits
-
-
Kurt Kanzenbach authored
Each i225 has three LEDs. Export them via the LED class framework. Each LED is controllable via sysfs. Example: $ cd /sys/class/leds/igc_led0 $ cat brightness # Current Mode $ cat max_brightness # 15 $ echo 0 > brightness # Mode 0 $ echo 1 > brightness # Mode 1 The brightness field here reflects the different LED modes ranging from 0 to 15. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Kurt Kanzenbach authored
Currently flex filters are only used for filters containing user data. However, it makes sense to utilize them also for filters having multiple conditions, because that's not supported by the driver at the moment. Add it. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Vinicius Costa Gomes authored
Allows Flex Filters to be installed. The previous restriction to the types of filters that can be installed can now be lifted. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Kurt Kanzenbach authored
Use the flex filter mechanism to extend the current ethtool filter operations by intercoperating the user data. This allows to match eight more bytes within a Ethernet frame in addition to macs, ether types and vlan. The matching pattern looks like this: * dest_mac [6] * src_mac [6] * tpid [2] * vlan tci [2] * ether type [2] * user data [8] This can be used to match Profinet traffic classes by FrameID range. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Kurt Kanzenbach authored
The Intel i225 NIC has the possibility to add flex filters which can match up to the first 128 byte of a packet. These filters are useful for all kind of packet matching. One particular use case is Profinet, as the different traffic classes are distinguished by the frame id range which cannot be matched by any other means. Add code to configure and enable flex filters. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Voon Weifeng authored
Implement Wake-on-LAN feature for 88X3310 and 88E2110. This is done by enabling WoL interrupt and WoL detection and configuring MAC address into WoL magic packet registers Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ling Pei Lee <pei.lee.ling@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joakim Zhang authored
Remove un-used "phy-reset-delay" property which found when do dtbs_check (set additionalProperties: false in fsl,fec.yaml). Double check current driver and commit history, "phy-reset-delay" never comes up, so it should be safe to remove it. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml arch/arm/boot/dts/imx7d-mba7.dt.yaml: ethernet@30be0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' arch/arm/boot/dts/imx7d-mba7.dt.yaml: ethernet@30bf0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' /arch/arm/boot/dts/imx7s-mba7.dt.yaml: ethernet@30be0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joakim Zhang authored
Correct node name for FEC which found when do dtbs_check. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml arch/arm/boot/dts/imx35-eukrea-mbimxsd35-baseboard.dt.yaml: fec@50038000: $nodename:0: 'fec@50038000' does not match '^ethernet(@.*)?$' arch/arm/boot/dts/imx35-pdk.dt.yaml: fec@50038000: $nodename:0: 'fec@50038000' does not match '^ethernet(@.*)?$' Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joakim Zhang authored
In order to automate the verification of DT nodes convert fsl-fec.txt to fsl,fec.yaml, and pass binding check with below command. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml DTEX Documentation/devicetree/bindings/net/fsl,fec.example.dts DTC Documentation/devicetree/bindings/net/fsl,fec.example.dt.yaml CHECK Documentation/devicetree/bindings/net/fsl,fec.example.dt.yaml Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peilin Ye authored
Currently netdevsim only supports a single queue per port, which is insufficient for testing multi-queue TC schedulers e.g. sch_mq. Extend the current sysfs interface so that users can create ports with multiple queues: $ echo "[ID] [PORT_COUNT] [NUM_QUEUES]" > /sys/bus/netdevsim/new_device As an example, echoing "2 4 8" creates 4 ports, with 8 queues per port. Note, this is compatible with the current interface, with default number of queues set to 1. For example, echoing "2 4" creates 4 ports with 1 queue per port; echoing "2" simply creates 1 port with 1 queue. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mark Gray authored
The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html Bugzilla: https://bugzilla.redhat.com/1844576Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bill Wendling authored
Fix the clang build warning: drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1862:13: error: variable 'cur_data_offset' set but not used [-Werror,-Wunused-but-set-variable] dma_addr_t cur_data_offset; Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christophe JAILLET authored
Use 'bitmap_alloc()/bitmap_free()' instead of hand-writing it. This makes the code less verbose. Also, use 'bitmap_alloc()' instead of 'bitmap_zalloc()' because the bitmap is fully overridden by a 'bitmap_copy()' call just after its allocation. While at it, remove an extra and unneeded space. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
It has been deal with the 'if (err' statement in rtnetlink_send() and rtnl_unicast(). so remove unnecessary if statement. v2: use the raw name rtnetlink_send(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them. so use nlmsg_notify() instead. so that the caller wouldn't deal with 'if (err > 0' statement. v2: use nlmsg_notify() will do well. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyue Wang authored
The 'tail' pointer is also free-running count, so it needs to be masked as 'adminq_prod_cnt' does, to become an index value of AdminQ buffer. Fixes: 5cdad90d ("gve: Batch AQ commands for creating and destroying queues.") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-07-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 45 non-merge commits during the last 15 day(s) which contain a total of 52 files changed, 3122 insertions(+), 384 deletions(-). The main changes are: 1) Introduce bpf timers, from Alexei. 2) Add sockmap support for unix datagram socket, from Cong. 3) Fix potential memleak and UAF in the verifier, from He. 4) Add bpf_get_func_ip helper, from Jiri. 5) Improvements to generic XDP mode, from Kumar. 6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Cong Wang says: ==================== From: Cong Wang <cong.wang@bytedance.com> This is the last patchset of the original large patchset. In the previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT was introduced and UDP began to support it too. In this patchset, we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that we can finally splice Unix datagram socket and UDP socket. Please check each patch description for more details. To see the big picture, the previous patchsets are available here: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4 and this patchset is available here: https://github.com/congwang/linux/tree/sockmap3Acked-by: John Fastabend <john.fastabend@gmail.com> --- v5: lift socket state check for dgram remove ->unhash() case add retries for EAGAIN in all test cases remove an unused parameter of __unix_dgram_recvmsg() rebase on the latest bpf-next v4: fix af_unix disconnect case add unix_unhash() split out two small patches reduce u->iolock critical section remove an unused parameter of __unix_dgram_recvmsg() v3: fix Kconfig dependency make unix_read_sock() static fix a UAF in unix_release() add a missing header unix_bpf.c v2: separate out from the original large patchset rebase to the latest bpf-next clean up unix_read_sock() export sock_map_close() factor out some helpers in selftests for code reuse ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Cong Wang authored
Add two test cases to ensure redirection between udp and unix work bidirectionally. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-12-xiyou.wangcong@gmail.com
-
Cong Wang authored
Add a test case to ensure redirection between two AF_UNIX datagram sockets work. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-11-xiyou.wangcong@gmail.com
-
Cong Wang authored
Factor out a common helper add_to_sockmap() which adds two sockets into a sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-10-xiyou.wangcong@gmail.com
-
Cong Wang authored
Factor out a common helper udp_socketpair() which creates a pair of connected UDP sockets. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-9-xiyou.wangcong@gmail.com
-
Cong Wang authored
We have to implement unix_dgram_bpf_recvmsg() to replace the original ->recvmsg() to retrieve skmsg from ingress_msg. AF_UNIX is again special here because the lack of sk_prot->recvmsg(). I simply add a special case inside unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-8-xiyou.wangcong@gmail.com
-
Cong Wang authored
Now we can implement unix_bpf_update_proto() to update sk_prot, especially prot->close(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-7-xiyou.wangcong@gmail.com
-
Cong Wang authored
Unlike af_inet, unix_proto is very different, it does not even have a ->close(). We have to add a dummy implementation to satisfy sockmap. Normally it is just a nop, it is introduced only for sockmap to replace it. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-6-xiyou.wangcong@gmail.com
-
Cong Wang authored
Currently only unix stream socket sets TCP_ESTABLISHED, datagram socket can set this too when they connect to its peer socket. At least __ip4_datagram_connect() does the same. This will be used to determine whether an AF_UNIX datagram socket can be redirected to in sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-5-xiyou.wangcong@gmail.com
-
Cong Wang authored
Implement ->read_sock() for AF_UNIX datagram socket, it is pretty much similar to udp_read_sock(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-4-xiyou.wangcong@gmail.com
-
Cong Wang authored
TCP and other connection oriented sockets have accept() for each incoming connection on the server side, hence they can just insert those fd's from accept() to sockmap, which are of course established. Now with datagram sockets begin to support sockmap and redirection, the restriction is no longer applicable to them, as they have no accept(). So we have to lift this restriction for them. This is fine, because inside bpf_sk_redirect_map() we still have another socket status check, sock_map_redirect_allowed(), as a guard. This also means they do not have to be removed from sockmap when disconnecting. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com
-
Cong Wang authored
Currently sock_map still has Kconfig dependency on CONFIG_INET, but there is no actual functional dependency on it after we introduce ->psock_update_sk_prot(). We have to extend it to CONFIG_NET now as we are going to support AF_UNIX. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-2-xiyou.wangcong@gmail.com
-