- 18 Jul, 2020 4 commits
-
-
Jakub Sitnicki authored
Run a BPF program before looking up a listening socket on the receive path. Program selects a listening socket to yield as result of socket lookup by calling bpf_sk_assign() helper and returning SK_PASS code. Program can revert its decision by assigning a NULL socket with bpf_sk_assign(). Alternatively, BPF program can also fail the lookup by returning with SK_DROP, or let the lookup continue as usual with SK_PASS on return, when no socket has been selected with bpf_sk_assign(). This lets the user match packets with listening sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. With BPF code selecting the socket, directing packets destined to an IP range or to a port range to a single socket becomes possible. In case multiple programs are attached, they are run in series in the order in which they were attached. The end result is determined from return codes of all the programs according to following rules: 1. If any program returned SK_PASS and selected a valid socket, the socket is used as result of socket lookup. 2. If more than one program returned SK_PASS and selected a socket, last selection takes effect. 3. If any program returned SK_DROP, and no program returned SK_PASS and selected a socket, socket lookup fails with -ECONNREFUSED. 4. If all programs returned SK_PASS and none of them selected a socket, socket lookup continues to htable-based lookup. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
-
Jakub Sitnicki authored
Prepare for calling into reuseport from __inet_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-4-jakub@cloudflare.com
-
Jakub Sitnicki authored
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
-
Jakub Sitnicki authored
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the prog_array at given position when link gets attached/updated/released. This let's us lift the limit of having just one link attached for the new attach type introduced by subsequent patch. No functional changes intended. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com
-
- 16 Jul, 2020 36 commits
-
-
Randy Dunlap authored
Drop doubled words "will" and "attach". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org
-
Stanislav Fomichev authored
Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0]. This can happen if server_thread runs faster than the main thread. In that case, pthread_cond_wait will wait forever because pthread_cond_signal was executed before the main thread was blocking. Let's move pthread_mutex_lock up a bit to make sure server_thread runs strictly after the main thread goes to sleep. (Not sure why this is 5.5 specific, maybe scheduling is less deterministic? But I was able to confirm that it does indeed happen in a VM.) [0] https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isUAyg-dFzGDY7QWYkmm7rmTSg@mail.gmail.com/Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200715224107.3591967-1-sdf@google.com
-
Seth Forshee authored
This reverts commit 3203c901 ("test_bpf: flag tests that cannot be jited on s390"). The s390 bpf JIT previously had a restriction on the maximum program size, which required some tests in test_bpf to be flagged as expected failures. The program size limitation has been removed, and the tests now pass, so these tests should no longer be flagged. Fixes: d1242b10 ("s390/bpf: Remove JITed image size limitations") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20200716143931.330122-1-seth.forshee@canonical.com
-
Lorenzo Bianconi authored
Similar to what have been done for DEVMAP, introduce tests to verify ability to add a XDP program to an entry in a CPUMAP. Verify CPUMAP programs can not be attached to devices as a normal XDP program, and only programs with BPF_XDP_CPUMAP attach type can be loaded in a CPUMAP. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/9c632fcea5382ea7b4578bd06b6eddf382c3550b.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Extend xdp_redirect_cpu_{usr,kern}.c adding the possibility to load a XDP program on cpumap entries. The following options have been added: - mprog-name: cpumap entry program name - mprog-filename: cpumap entry program filename - redirect-device: output interface if the cpumap program performs a XDP_REDIRECT to an egress interface - redirect-map: bpf map used to perform XDP_REDIRECT to an egress interface - mprog-disable: disable loading XDP program on cpumap entries Add xdp_pass, xdp_drop, xdp_redirect stats accounting Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/aa5a9a281b9dac425620fdabe82670ffb6bbdb92.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
As for DEVMAP, support SEC("xdp_cpumap/") as a short cut for loading the program with type BPF_PROG_TYPE_XDP and expected attach type BPF_XDP_CPUMAP. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/33174c41993a6d860d9c7c1f280a2477ee39ed11.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Introduce XDP_REDIRECT support for eBPF programs attached to cpumap entries. This patch has been tested on Marvell ESPRESSObin using a modified version of xdp_redirect_cpu sample in order to attach a XDP program to CPUMAP entries to perform a redirect on the mvneta interface. In particular the following scenario has been tested: rq (cpu0) --> mvneta - XDP_REDIRECT (cpu0) --> CPUMAP - XDP_REDIRECT (cpu1) --> mvneta $./xdp_redirect_cpu -p xdp_cpu_map0 -d eth0 -c 1 -e xdp_redirect \ -f xdp_redirect_kern.o -m tx_port -r eth0 tx: 285.2 Kpps rx: 285.2 Kpps Attaching a simple XDP program on eth0 to perform XDP_TX gives comparable results: tx: 288.4 Kpps rx: 288.4 Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/2cf8373a731867af302b00c4ff16c122630c4980.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Introduce the capability to attach an eBPF program to cpumap entries. The idea behind this feature is to add the possibility to define on which CPU run the eBPF program if the underlying hw does not support RSS. Current supported verdicts are XDP_DROP and XDP_PASS. This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu sample available in the kernel tree to identify possible performance regressions. Results show there are no observable differences in packet-per-second: $./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1 rx: 354.8 Kpps rx: 356.0 Kpps rx: 356.8 Kpps rx: 356.3 Kpps rx: 356.6 Kpps rx: 356.6 Kpps rx: 356.7 Kpps rx: 355.8 Kpps rx: 356.8 Kpps rx: 356.8 Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
As it has been already done for devmap, introduce 'struct bpf_cpumap_val' to formalize the expected values that can be passed in for a CPUMAP. Update cpumap code to use the struct. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Do not update xdp_redirect_cpu maps running while option loop but defer it after all available options have been parsed. This is a preliminary patch to pass the program name we want to attach to the map entries as a user option Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/95dc46286fd2c609042948e04bb7ae1f5b425538.1594734381.git.lorenzo@kernel.org
-
David Ahern authored
Move the guts of xdp_convert_buff_to_frame to a new helper, xdp_update_frame_from_buff so it can be reused removing code duplication Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/90a68c283d7ebeb48924934c9b7ac79492300472.1594734381.git.lorenzo@kernel.org
-
Jesper Dangaard Brouer authored
Commit 77361825 ("bpf: cpumap use ptr_ring_consume_batched") changed away from using single frame ptr_ring dequeue (__ptr_ring_consume) to consume a batched, but it uses a locked version, which as the comment explain isn't needed. Change to use the non-locked version __ptr_ring_consume_batched. Fixes: 77361825 ("bpf: cpumap use ptr_ring_consume_batched") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/a9c7d06f9a009e282209f0c8c7b2c5d9b9ad60b9.1594734381.git.lorenzo@kernel.org
-
Randy Dunlap authored
Drop the doubled word "by" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled words in several comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled word "to" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled words "or" and "the" in several comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled word "not" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled words in two comments. Fix a spello/typo. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled words in several comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Randy Dunlap authored
Drop doubled word "the" in two comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kieran Bingham authored
The word 'descriptor' is misspelled throughout the tree. Fix it up accordingly: decriptor -> descriptor Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== mlxsw: Offload tc police action This patch set adds support for tc police action in mlxsw. Patches #1-#2 add defines for policer bandwidth limits and resource identifiers (e.g., maximum number of policers). Patch #3 adds a common policer core in mlxsw. Currently it is only used by the policy engine, but future patch sets will use it for trap policers and storm control policers. The common core allows us to share common logic between all policer types and abstract certain details from the various users in mlxsw. Patch #4 exposes the maximum number of supported policers and their current usage to user space via devlink-resource. This provides better visibility and also used for selftests purposes. Patches #5-#7 gradually add support for tc police action in the policy engine by calling into previously mentioned policer core. Patch #8 adds a generic selftest for tc-police that can be used with veth pairs or physical loopbacks. Patches #9-#11 add mlxsw-specific selftests. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Test that policers shared by different tc filters are correctly reference counted by observing policers' occupancy via devlink-resource. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Query the maximum number of supported policers using devlink-resource and test that this number can be reached by configuring tc filters with police action. Test that an error is returned in case the maximum number is exceeded. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Test that upper and lower limits on rate and burst size imposed by the device are rejected by the kernel. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Test tc-police action in various scenarios such as Rx policing, Tx policing, shared policer and police piped to mirred. The test passes with both veth pairs and loopbacked ports. # ./tc_police.sh TEST: police on rx [ OK ] TEST: police on tx [ OK ] TEST: police with shared policer - rx [ OK ] TEST: police with shared policer - tx [ OK ] TEST: police rx and mirror [ OK ] TEST: police tx and mirror [ OK ] Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Offload action police when used with a flower classifier. The number of dropped packets is read from the policer and reported to tc. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add core functionality required to support police action in the policy engine. The utilized hardware policers are stored in a hash table keyed by the flow action index. This allows to support policer sharing between multiple ACL rules. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
In the policy engine, each ACL rule points to an action block where the ACL actions are stored. Each action block consists of one or more action sets. Each action set holds one or more individual actions, up to a maximum queried from the device. For example: Action set #1 Action set #2 +----------+ +--------------+ +--------------+ | ACL rule +----------> Action #1 | +-----> Action #4 | +----------+ +--------------+ | +--------------+ | Action #2 | | | Action #5 | +--------------+ | +--------------+ | Action #3 +------+ | | +--------------+ +--------------+ <---------+ Action block +-----------------> The hardware has a limitation that prevents a policing action (MLXSW_AFA_POLCNT_CODE when used with a policer, not a counter) from being configured in the same action set with a trap action (i.e., MLXSW_AFA_TRAP_CODE or MLXSW_AFA_TRAPWU_CODE). Note that the latter used to implement multiple actions: 'trap', 'mirred', 'drop'. Work around this limitation by teaching mlxsw_afa_block_append_action() to create a new action set not only when there is no more room left in the current set, but also when there is a conflict between previously mentioned actions. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Expose via devlink-resource the maximum number of single-rate policers and their current occupancy. Example: $ devlink resource show pci/0000:01:00.0 ... name global_policers size 1000 unit entry dpipe_tables none resources: name single_rate_policers size 968 occ 0 unit entry dpipe_tables none Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add common code to handle all policer-related functionality in mlxsw. Currently, only policer for policy engines are supported, but it in the future more policer families will be added such as CPU (trap) policers and storm control policers. The API allows different modules to add / delete policers and read their drop counter. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add a resource identifier for maximum global policers so that it could be later used to query the information from firmware. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add policer bandwidth limits for both rate and burst size so that they could be enforced by a later patch. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Luo bin authored
add support to update firmware by the devlink flashing API Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Suraj Upadhyay authored
Remove the unnecessary label from dn_dev_ioctl() and make its error handling simpler to read. Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-