- 13 Dec, 2019 6 commits
-
-
Jakub Sitnicki authored
Prepare for switching reuseport tests to test_progs framework. Loop over the tests and perform setup/cleanup for each test separately, remembering that with test_progs we can select tests to run. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-7-jakub@cloudflare.com
-
Jakub Sitnicki authored
Prepare for iterating over individual tests without introducing another nested loop in the main test function. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-6-jakub@cloudflare.com
-
Jakub Sitnicki authored
Having string arrays to map socket family & type to a name prevents us from unrolling the test runner loop in the subsequent patch. Introduce helpers that do the same thing. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-5-jakub@cloudflare.com
-
Jakub Sitnicki authored
Update the only function that is not using sa_family_t in this source file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-4-jakub@cloudflare.com
-
Jakub Sitnicki authored
Now that libbpf can recognize SK_REUSEPORT programs, we no longer have to pass a prog_type hint before loading the object file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-3-jakub@cloudflare.com
-
Jakub Sitnicki authored
Allow loading BPF object files that contain SK_REUSEPORT programs without having to manually set the program type before load if the the section name is set to "sk_reuseport". Makes user-space code needed to load SK_REUSEPORT BPF program more concise. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-2-jakub@cloudflare.com
-
- 12 Dec, 2019 2 commits
-
-
Andrii Nakryiko authored
On ppc64le __u64 and __s64 are defined as long int and unsigned long int, respectively. This causes compiler to emit warning when %lld/%llu are used to printf 64-bit numbers. Fix this by casting to size_t/ssize_t with %zu and %zd format specifiers, respectively. v1->v2: - use size_t/ssize_t instead of custom typedefs (Martin). Fixes: 1f8e2bcb ("libbpf: Refactor relocation handling") Fixes: abd29c93 ("libbpf: allow specifying map definitions using BTF") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191212171918.638010-1-andriin@fb.com
-
Daniel Borkmann authored
After Spectre 2 fix via 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON config") most major distros use BPF_JIT_ALWAYS_ON configuration these days which compiles out the BPF interpreter entirely and always enables the JIT. Also given recent fix in e1608f3f ("bpf: Avoid setting bpf insns pages read-only when prog is jited"), we additionally avoid fragmenting the direct map for the BPF insns pages sitting in the general data heap since they are not used during execution. Latter is only needed when run through the interpreter. Since both x86 and arm64 JITs have seen a lot of exposure over the years, are generally most up to date and maintained, there is more downside in !BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the bpf_jit_kallsyms knob was set to 0 by default since major distros still had /proc/kallsyms addresses exposed to unprivileged user space which is not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
-
- 11 Dec, 2019 13 commits
-
-
Daniel Borkmann authored
Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event. The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core. Raw example output: # auditctl -D # auditctl -a always,exit -F arch=x86_64 -S bpf # ausearch --start recent -m 1334 ... ---- time->Wed Nov 27 16:04:13 2019 type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf" type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \ success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477 \ pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001 \ egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf" \ exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf" \ subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD ---- time->Wed Nov 27 16:04:13 2019 type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD ... Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
-
Stanislav Fomichev authored
Switch existing pattern of "offsetof(..., member) + FIELD_SIZEOF(..., member)' to "offsetofend(..., member)" which does exactly what we need without all the copy-paste. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191210191933.105321-1-sdf@google.com
-
Andrii Nakryiko authored
New development cycles starts, bump to v0.0.7 proactively. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191209224022.3544519-1-andriin@fb.com
-
Russell King authored
Improve the prologue code sequence to be able to take advantage of 64-bit stores, changing the code from: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp sub ip, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 str ip, [fp, #-100] ; 0xffffff9c mov r6, #0 str r6, [fp, #-96] ; 0xffffffa0 mov r4, #0 mov r3, r4 mov r2, r0 str r4, [fp, #-104] ; 0xffffff98 str r4, [fp, #-108] ; 0xffffff94 to the tighter: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp mov r3, #0 sub r2, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 strd r2, [fp, #-100] ; 0xffffff9c mov r2, #0 strd r2, [fp, #-108] ; 0xffffff94 mov r2, r0 resulting in a saving of three instructions. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/E1ieH2g-0004ih-Rb@rmk-PC.armlinux.org.uk
-
Shahjada Abul Husain authored
T6 has a separate region known as high priority filter region that allows classifying packets going through ULD path. So, query firmware for HPFILTER resources and enable the high priority offload filter support when it is available. Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chen Wandun authored
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs: drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable] Fixes: c431047c ("enetc: add support Credit Based Shaper(CBS) for hardware offload") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Device stats are currently hard coded in the PCI BAR0 layout. Add a ability to read them from the TLV area instead. Names for the stats are maintained by the driver, and their meaning documented. This allows us to more easily add and remove device stats. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kuniyuki Iwashima authored
When a TCP socket is created, sk->sk_state is initialized twice as TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is always called after the sock_init_data(), so it is not necessary to update sk->sk_state in the tcp_init_sock(). Before v2.1.8, the code of the two functions was in the inet_create(). In the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of initialization of sk->state was duplicated. Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Walle authored
Provide a software TX timestamp and add it to the ethtool query interface. skb_tx_timestamp() is also needed if one would like to use PHY timestamping. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jon Maloy says: ==================== tipc: introduce variable window congestion control We improve thoughput greatly by introducing a variety of the Reno congestion control algorithm at the link level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes. - We introduce hard lower and upper window limits per link, still different and configurable per bearer type. - We introduce a 'slow start theshold' variable, initially set to the maximum window size. - We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode. - In congestion avoidance mode we increment the congestion window for each window-size number of acked packets, up to a possible maximum equal to the configured maximum window. - For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2). - If the timeout handler finds that the transmit queue has not moved since the previous timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer, so that this can discover the stale situation. This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal. This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU. Suggested-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
When we increase the link tranmsit window we often observe the following scenario: 1) A STATE message bypasses a sequence of traffic packets and arrives far ahead of those to the receiver. STATE messages contain a 'peers_nxt_snt' field to indicate which was the last packet sent from the peer. This mechanism is intended as a last resort for the receiver to detect missing packets, e.g., during very low traffic when there is no packet flow to help early loss detection. 3) The receiving link compares the 'peer_nxt_snt' field to its own 'rcv_nxt', finds that there is a gap, and immediately sends a NACK message back to the peer. 4) When this NACKs arrives at the sender, all the requested retransmissions are performed, since it is a first-time request. Just like in the scenario described in the previous commit this leads to many redundant retransmissions, with decreased throughput as a consequence. We fix this by adding two more conditions before we send a NACK in this sitution. First, the deferred queue must be empty, so we cannot assume that the potential packet loss has already been detected by other means. Second, we check the 'peers_snd_nxt' field only in probe/ probe_reply messages, thus turning this into a true mechanism of last resort as it was really meant to be. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
When we increase the link send window we sometimes observe the following scenario: 1) A packet #N arrives out of order far ahead of a sequence of older packets which are still under way. The packet is added to the deferred queue. 2) The missing packets arrive in sequence, and for each 16th of them an ACK is sent back to the receiver, as it should be. 3) When building those ACK messages, it is checked if there is a gap between the link's 'rcv_nxt' and the first packet in the deferred queue. This is always the case until packet number #N-1 arrives, and a 'gap' indicator is added, effectively turning them into NACK messages. 4) When those NACKs arrive at the sender, all the requested retransmissions are done, since it is a first-time request. This sometimes leads to a huge amount of redundant retransmissions, causing a drop in max throughput. This problem gets worse when we in a later commit introduce variable window congestion control, since it drops the link back to 'fast recovery' much more often than necessary. We now fix this by not sending any 'gap' indicator in regular ACK messages. We already have a mechanism for sending explicit NACKs in place, and this is sufficient to keep up the packet flow. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 10 Dec, 2019 8 commits
-
-
Nathan Chancellor authored
Clang warns: ../drivers/net/ppp/ppp_async.c:877:6: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] ap->rpkt = skb; ^ ../drivers/net/ppp/ppp_async.c:875:5: note: previous statement is here if (!skb) ^ 1 warning generated. This warning occurs because there is a space before the tab on this line. Clean up this entire block's indentation so that it is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 6722e78c ("[PPP]: handle misaligned accesses") Link: https://github.com/ClangBuiltLinux/linux/issues/800Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nathan Chancellor authored
Clang warns: ../drivers/net/ethernet/smsc/smc911x.c:939:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] if (!lp->ctl_rfduplx) ^ ../drivers/net/ethernet/smsc/smc911x.c:936:2: note: previous statement is here if (lp->ctl_rspeed != 100) ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 0a0c72c9 ("[PATCH] RE: [PATCH 1/1] net driver: Add support for SMSC LAN911x line of ethernet chips") Link: https://github.com/ClangBuiltLinux/linux/issues/796Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nathan Chancellor authored
Clang warns: ../drivers/net/ethernet/dec/tulip/uli526x.c:1812:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch (mode) { ^ ../drivers/net/ethernet/dec/tulip/uli526x.c:1809:2: note: previous statement is here if (cr6set) ^ 1 warning generated. ../drivers/net/ethernet/dec/tulip/dmfe.c:2217:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch(mode) { ^ ../drivers/net/ethernet/dec/tulip/dmfe.c:2214:2: note: previous statement is here if (cr6set) ^ 1 warning generated. This warning occurs because there is a space before the tab on these lines. Remove them so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. While we are here, adjust the default block in dmfe_init_module to have a proper break between the label and assignment and add a space between the switch and opening parentheses to avoid a checkpatch warning. Fixes: e1c3e501 ("[PATCH] initialisation cleanup for ULI526x-net-driver") Link: https://github.com/ClangBuiltLinux/linux/issues/795Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Dan Murphy says: ==================== Fix Tx/Rx FIFO depth for DP83867 The DP83867 supports both the RGMII and SGMII modes. The Tx and Rx FIFO depths are configurable in these modes but may not applicable for both modes. When the device is configured for RGMII mode the Tx FIFO depth is applicable and for SGMII mode both Tx and Rx FIFO depth settings are applicable. When the driver was originally written only the RGMII device was available and there were no standard fifo-depth DT properties. The patchset converts the special ti,fifo-depth property to the standard tx-fifo-depth property while still allowing the ti,fifo-depth property to be set as to maintain backward compatibility. In addition to this change the rx-fifo-depth property support was added and only written when the device is configured for SGMII mode. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Murphy authored
This code changes the TI specific ti,fifo-depth to the common tx-fifo-depth property. The tx depth is applicable for both RGMII and SGMII modes of operation. rx-fifo-depth was added as well but this is only applicable for SGMII mode. So in summary if RGMII mode write tx fifo depth only if SGMII mode write both rx and tx fifo depths If the property is not populated in the device tree then set the value to the default values. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Murphy authored
Convert the ti,fifo-depth from a TI specific property to the common tx-fifo-depth property. Also add support for the rx-fifo-depth. These are optional properties for this device and if these are not available then the fifo depths are set to device default values. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reported-by: Adrian Bunk <bunk@kernel.org> CC: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kevin(Yudong) Yang authored
This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save" that disables TCP ssthresh metrics cache by default. Other parts of TCP metrics cache, e.g. rtt, cwnd, remain unchanged. As modern networks becoming more and more dynamic, TCP metrics cache today often causes more harm than benefits. For example, the same IP address is often shared by different subscribers behind NAT in residential networks. Even if the IP address is not shared by different users, caching the slow-start threshold of a previous short flow using loss-based congestion control (e.g. cubic) often causes the future longer flows of the same network path to exit slow-start prematurely with abysmal throughput. Caching ssthresh is very risky and can lead to terrible performance. Therefore it makes sense to make disabling ssthresh caching by default and opt-in for specific networks by the administrators. This practice also has worked well for several years of deployment with CUBIC congestion control at Google. Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Commit 31243461 ("sctp: cache netns in sctp_ep_common") set netns in asoc and ep base since they're created, and it will never change. It's a better way to get netns from asoc and ep base, comparing to calling sock_net(). This patch is to replace them. v1->v2: - no change. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 09 Dec, 2019 5 commits
-
-
Russell King authored
The Nokia GPON module can hold tx-fault active while it is initialising which can take up to 60s. Avoid this causing the module to be declared faulty after the SFP MSA defined non-cooled module timeout. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The variable rc is assigned with a value that is never read and it is re-assigned a new value later on. The assignment is redundant and can be removed. Clean up multiple occurrances of this pattern. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mao Wenan authored
Convert cpu_to_le16(le16_to_cpu(frame->datalen) + len) to use le16_add_cpu(), which is more concise and does the same thing. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-12-09 Here's the first bluetooth-next pull request for 5.6: - Devicetree bindings updates for Broadcom controllers - Add support for PCM configuration for Broadcom controllers - btusb: Fixes for Realtek devices - butsb: A few other smaller fixes (mem leak & non-atomic allocation issue) Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
WireGuard is a layer 3 secure networking tunnel made specifically for the kernel, that aims to be much simpler and easier to audit than IPsec. Extensive documentation and description of the protocol and considerations, along with formal proofs of the cryptography, are available at: * https://www.wireguard.com/ * https://www.wireguard.com/papers/wireguard.pdf This commit implements WireGuard as a simple network device driver, accessible in the usual RTNL way used by virtual network drivers. It makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of networking subsystem APIs. It has a somewhat novel multicore queueing system designed for maximum throughput and minimal latency of encryption operations, but it is implemented modestly using workqueues and NAPI. Configuration is done via generic Netlink, and following a review from the Netlink maintainer a year ago, several high profile userspace tools have already implemented the API. This commit also comes with several different tests, both in-kernel tests and out-of-kernel tests based on network namespaces, taking profit of the fact that sockets used by WireGuard intentionally stay in the namespace the WireGuard interface was originally created, exactly like the semantics of userspace tun devices. See wireguard.com/netns/ for pictures and examples. The source code is fairly short, but rather than combining everything into a single file, WireGuard is developed as cleanly separable files, making auditing and comprehension easier. Things are laid out as follows: * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the cryptographic aspects of the protocol, and are mostly data-only in nature, taking in buffers of bytes and spitting out buffers of bytes. They also handle reference counting for their various shared pieces of data, like keys and key lists. * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for ratelimiting certain types of cryptographic operations in accordance with particular WireGuard semantics. * allowedips.[ch], peerlookup.[ch]: The main lookup structures of WireGuard, the former being trie-like with particular semantics, an integral part of the design of the protocol, and the latter just being nice helper functions around the various hashtables we use. * device.[ch]: Implementation of functions for the netdevice and for rtnl, responsible for maintaining the life of a given interface and wiring it up to the rest of WireGuard. * peer.[ch]: Each interface has a list of peers, with helper functions available here for creation, destruction, and reference counting. * socket.[ch]: Implementation of functions related to udp_socket and the general set of kernel socket APIs, for sending and receiving ciphertext UDP packets, and taking care of WireGuard-specific sticky socket routing semantics for the automatic roaming. * netlink.[ch]: Userspace API entry point for configuring WireGuard peers and devices. The API has been implemented by several userspace tools and network management utility, and the WireGuard project distributes the basic wg(8) tool. * queueing.[ch]: Shared function on the rx and tx path for handling the various queues used in the multicore algorithms. * send.c: Handles encrypting outgoing packets in parallel on multiple cores, before sending them in order on a single core, via workqueues and ring buffers. Also handles sending handshake and cookie messages as part of the protocol, in parallel. * receive.c: Handles decrypting incoming packets in parallel on multiple cores, before passing them off in order to be ingested via the rest of the networking subsystem with GRO via the typical NAPI poll function. Also handles receiving handshake and cookie messages as part of the protocol, in parallel. * timers.[ch]: Uses the timer wheel to implement protocol particular event timeouts, and gives a set of very simple event-driven entry point functions for callers. * main.c, version.h: Initialization and deinitialization of the module. * selftest/*.h: Runtime unit tests for some of the most security sensitive functions. * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing script using network namespaces. This commit aims to be as self-contained as possible, implementing WireGuard as a standalone module not needing much special handling or coordination from the network subsystem. I expect for future optimizations to the network stack to positively improve WireGuard, and vice-versa, but for the time being, this exists as intentionally standalone. We introduce a menu option for CONFIG_WIREGUARD, as well as providing a verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: David Miller <davem@davemloft.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 08 Dec, 2019 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull more SCSI updates from James Bottomley: "Eleven patches, all in drivers (no core changes) that are either minor cleanups or small fixes. They were late arriving, but still safe for -rc1" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry scsi: megaraid_sas: Make poll_aen_lock static scsi: sd_zbc: Improve report zones error printout scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI scsi: qla2xxx: unregister ports after GPN_FT failure scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan scsi: pm80xx: Remove unused include of linux/version.h scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3 scsi: scsi_transport_sas: Fix memory leak when removing devices scsi: lpfc: size cpu map by last cpu id set scsi: ibmvscsi_tgt: Remove unneeded variable rc
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Nine cifs/smb3 fixes: - one fix for stable (oops during oplock break) - two timestamp fixes including important one for updating mtime at close to avoid stale metadata caching issue on dirty files (also improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the wire) - two fixes for "modefromsid" mount option for file create (now allows mode bits to be set more atomically and accurately on create by adding "sd_context" on create when modefromsid specified on mount) - two fixes for multichannel found in testing this week against different servers - two small cleanup patches" * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve check for when we send the security descriptor context on create smb3: fix mode passed in on create for modetosid mount option cifs: fix possible uninitialized access and race on iface_list cifs: Fix lookup of SMB connections on multichannel smb3: query attributes on file close smb3: remove unused flag passed into close functions cifs: remove redundant assignment to pointer pneg_ctxt fs: cifs: Fix atime update check vs mtime CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull misc vfs cleanups from Al Viro: "No common topic, just three cleanups". * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make __d_alloc() static fs/namespace: add __user to open_tree and move_mount syscalls fs/fnctl: fix missing __user in fcntl_rw_hint()
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull NTB update from Jon Mason: "Just a simple patch to add a new Hygon Device ID to the AMD NTB device driver" * tag 'ntb-5.5' of git://github.com/jonmason/ntb: NTB: Add Hygon Device ID
-