- 18 Oct, 2017 3 commits
-
-
David Howells authored
Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to ignore signals whilst loading up the message queue, provided progress is being made in emptying the queue at the other side. Progress is defined as the base of the transmit window having being advanced within 2 RTT periods. If the period is exceeded with no progress, sendmsg() will return anyway, indicating how much data has been copied, if any. Once the supplied buffer is entirely decanted, the sendmsg() will return. Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Provide a couple of functions to allow cleaner handling of signals in a kernel service. They are: (1) rxrpc_kernel_get_rtt() This allows the kernel service to find out the RTT time for a call, so as to better judge how large a timeout to employ. Note, though, that whilst this returns a value in nanoseconds, the timeouts can only actually be in jiffies. (2) rxrpc_kernel_check_life() This returns a number that is updated when ACKs are received from the peer (notably including PING RESPONSE ACKs which we can elicit by sending PING ACKs to see if the call still exists on the server). The caller should compare the numbers of two calls to see if the call is still alive. These can be used to provide an extending timeout rather than returning immediately in the case that a signal occurs that would otherwise abort an RPC operation. The timeout would be extended if the server is still responsive and the call is still apparently alive on the server. For most operations this isn't that necessary - but for FS.StoreData it is: OpenAFS writes the data to storage as it comes in without making a backup, so if we immediately abort it when partially complete on a CTRL+C, say, we have no idea of the state of the file after the abort. Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Provide support for a kernel service to make use of the service upgrade facility. This involves: (1) Pass an upgrade request flag to rxrpc_kernel_begin_call(). (2) Make rxrpc_kernel_recv_data() return the call's current service ID so that the caller can detect service upgrade and see what the service was upgraded to. Signed-off-by: David Howells <dhowells@redhat.com>
-
- 17 Oct, 2017 1 commit
-
-
Henrik Austad authored
In commit 32302902 ("mqprio: Reserve last 32 classid values for HW traffic classes and misc IDs") sch_mqprio started using netdev_txq_to_tc to find the correct tc instead of dev->tc_to_txq[] However, when mqprio is compiled as a module, it cannot resolve the symbol, leading to this error: ERROR: "netdev_txq_to_tc" [net/sched/sch_mqprio.ko] undefined! This adds an EXPORT_SYMBOL() since the other user in the kernel (netif_set_xps_queue) is also EXPORT_SYMBOL() (and not _GPL) or in a sysfs-callback. Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Henrik Austad <haustad@cisco.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Oct, 2017 31 commits
-
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: GRE: Offload decap without encap Petr says: The current code doesn't offload GRE decapsulation unless there's a corresponding encapsulation route as well. While not strictly incorrect (when encap route is absent, the decap route traps traffic to CPU and the kernel handles it), it's a missed optimization opportunity. With this patchset, IPIP entries are created as soon as offloadable tunneling netdevice is created. This then leads to offloading of decap route, if one exists, or is added afterwards, even when no encap route is present. In Linux, when there is a decap route, matching IP-in-IP packets are always decapsulated. However, with IPv4 overlays in particular, whether the inner packet is then forwarded depends on setting of net.ipv4.conf.*.rp_filter. When RP filtering is turned on, inner packets aren't forwarded unless there's a matching encap route. The mlxsw driver doesn't reflect this behavior in other router interfaces, and thus it's not implemented for tunnel types either. A better support for this will be subject of follow-up work. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Formerly, IPIP entries were created lazily by next hops that referenced an offloadable IP-in-IP netdevice. However now that they are created eagerly as a reaction to events on such netdevices, the reference counting is useless. Hence drop it. The routes whose next hops reference an offloaded IP-in-IP netdevice actually linger around a bit after their device is unregistered. However, mlxsw_sp_ipip_entry_destroy() also destroys the backing loopback, and mlxsw_sp_rif_destroy() transitively (via mlxsw_sp_nexthop_rif_gone_sync()) calls mlxsw_sp_nexthop_ipip_fini(), which unlinks the IPIP entry from a next hop. Thus no dangling pointers are left behind for the brief window after netdevice is gone, but routes not yet. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
IPIP entries are created as soon as an offloadable device is created. That means that when such a device is later moved to a different VRF, the loopback device that backs the tunnel is wrong. Thus when an offloadable encapsulating netdevice moves from one VRF to another, make sure that the loopback is updated as necessary. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Current code for offloading IP-in-IP tunneling assumes that there is no decap without encap. But that's never true for IPv6 overlays, and is not true for IPv4 ones either, if net.ipv4.conf.*.rp_filter is unset. To support decap-only tunnels, an IPIP entry is now created as soon as an offloadable tunneling device is created. When that netdevice is up'd, a decap route is looked up and possibly offloaded. Thus decap is not handled implicitly as part of mlxsw_sp_ipip_entry_get() call anymore, but needs to be done explicitly after the get, if desired. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
So far, all netdevice notifications that the driver cared about were related to its own ports, and mlxsw_sp could be retrieved from the netdevice's private data. For IP-in-IP offloading however, the driver cares about events on foreign netdevices, and getting at mlxsw_sp or router data structures from the handler is inconvenient. Therefore move the netdevice notifier blocks from global scope to struct mlxsw_sp to allow retrieval from the notifier block pointer itself. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
In commit 2f487712 ("tipc: guarantee that group broadcast doesn't bypass group unicast") there was introduced a last-minute rebasing error that broke non-group communication. We fix this here. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Florian Westphal says: ==================== net: core: rcuify rtnl af_ops None of the rtnl af_ops callbacks sleep, so they can be called while holding rcu read lock. Switch handling of af_ops to rcu. This would allow to later call af_ops functions without holding the rtnl mutex anymore. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
rtnl af_ops currently rely on rtnl mutex: unregister (called from module exit functions) takes the rtnl mutex and all users that do af_ops lookup also take the rtnl mutex. IOW, parallel rmmod will block until doit() callback is done. As none of the af_ops implementation sleep we can use rcu instead. doit functions that need the af_ops can now use rcu instead of the rtnl mutex provided the mutex isn't needed for other reasons. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
next patch will rcu-ify rtnl af_ops, i.e. allow af_ops lookup and function calls with rcu read lock held instead of rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The structure tcp_cdg is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'tcp_cdg' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The notifier cause a link error when NET_DSA is a loadable module: drivers/net/ethernet/broadcom/bcmsysport.o: In function `bcm_sysport_remove': bcmsysport.c:(.text+0x1582): undefined reference to `unregister_dsa_notifier' drivers/net/ethernet/broadcom/bcmsysport.o: In function `bcm_sysport_probe': bcmsysport.c:(.text+0x278d): undefined reference to `register_dsa_notifier' This adds a dependency that forces the systemport driver to be a loadable module as well when that happens, but otherwise allows it to be built normally when DSA is either built-in or completely disabled. Fixes: d1565763 ("net: systemport: Establish lower/upper queue mapping") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sudip Mukherjee authored
Modify baycom driver to use the new parallel port device model. Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that for options.c file, I placed the "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
This removes custom flag handling. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steven Rostedt (VMware) authored
All the trace events defined in include/trace/events/bpf.h are only used when CONFIG_BPF_SYSCALL is defined. But this file gets included by include/linux/bpf_trace.h which is included by the networking code with CREATE_TRACE_POINTS defined. If a trace event is created but not used it still has data structures and functions created for its use, even though nothing is using them. To not waste space, do not define the BPF trace events in bpf.h unless CONFIG_BPF_SYSCALL is defined. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
In order to not dirty the cacheline too often, we try to only update dst->__use and dst->lastusetime at most once per jiffy. As dst->lastusetime is only used by ipv6 garbage collector, it should be good enough time resolution. And __use is only used in ipv6_route_seq_show() to show how many times a dst has been used. And as __use is not atomic_t right now, it does not show the precise number of usage times anyway. So we think it should be OK to only update it at most once per jiffy. According to my latest syn flood test on a machine with intel Xeon 6th gen processor and 2 10G mlx nics bonded together, each with 8 rx queues on 2 NUMA nodes: With this patch, the packet process rate increases from ~3.49Mpps to ~3.75Mpps with a 7% increase rate. Note: dst_use() is being renamed to dst_hold_and_use() to better specify the purpose of the function. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@googl.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
In fib6_locate(), we need to first make sure fn is not NULL before doing FIB6_SUBTREE(fn) to avoid crash. This fixes the following static checker warning: net/ipv6/ip6_fib.c:1462 fib6_locate() warn: variable dereferenced before check 'fn' (see line 1459) net/ipv6/ip6_fib.c 1458 if (src_len) { 1459 struct fib6_node *subtree = FIB6_SUBTREE(fn); ^^^^^^^^^^^^^^^^ We shifted this dereference 1460 1461 WARN_ON(saddr == NULL); 1462 if (fn && subtree) ^^ before the check for NULL. 1463 fn = fib6_locate_1(subtree, saddr, src_len, 1464 offsetof(struct rt6_info, rt6i_src) Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Abhijit Ayarekar authored
Update to llvm excludes assembly instructions. llvm git revision is below commit 65fad7c26569 ("bpf: add inline-asm support") This change will be part of llvm release 6.0 __ASM_SYSREG_H define is not required for native compile. -target switch includes appropriate target specific files while cross compiling Tested on x86 and arm64. Signed-off-by: Abhijit Ayarekar <abhijit.ayarekar@caviumnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko <jiri@mellanox.com> ==================== net: sched: remove some tp->q usage In order to prepare for block sharing, tcf_proto instances need to be independent on particular qdisc instances. This patchset takes care of removal of couple occurrences of tp->q usage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
The callers have this info, they will pass it down to tcf_fill_node. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Use tcf_block_q helper to get q pointer to be used for direct call of sch_tree_lock/unlock instead of tcf_tree_lock/unlock. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Use helper to get q pointer per block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
tc_u_common is now per-q. With blocks, it has to be converted to be per-block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Instead of using tp->q, use block to get the net pointer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Whenever the block->q is set, it can be used instead of tp->q as it contains the same value. When it is not set, which can't happen now but it might happen with the follow-up shared blocks introduction, the class is not set in the result. That would lead to a class lookup instead of direct class pointer use for classful qdiscs. However, it is not planned to support classful qdisqs sharing filter blocks, so that may never happen. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
These helpers allows to get a q and netdev pointers for given block easily. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Store net pointer in the block structure. Along the way, introduce qdisc_net helper which allows to easily obtain net pointer for qdisc instance. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Prepare for removal of tp->q and store Qdisc pointer in the block structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This patch makes a slight tweak to mqprio in order to bring the classid values used back in line with what is used for mq. The general idea is to reserve values :ffe0 - :ffef to identify hardware traffic classes normally reported via dev->num_tc. By doing this we can maintain a consistent behavior with mq for classid where :1 - :ffdf will represent a physical qdisc mapped onto a Tx queue represented by classid - 1, and the traffic classes will be mapped onto a known subset of classid values reserved for our virtual qdiscs. Note I reserved the range from :fff0 - :ffff since this way we might be able to reuse these classid values with clsact and ingress which would mean that for mq, mqprio, ingress, and clsact we should be able to maintain a similar classid layout. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2017-10-11: IPoIB Multi Pkey support This series provides the support for IPoIB Multi Pkey. InfiniBand Pkeys are the equivalent of Ethernet vlans. Currently IPoIB device driver supports only default Pkey and IPoIB Pkey child interfaces are not supported with IPoIB offloads mode, this series will add the support for that by allowing creating mlx5 multiple IPoIB netdevices with a non-default Pkey. mlx5 IPoIB Pkey child interface is smaller version of mlx5i IPoIB interfaces and shares most of its resources with the parent IPoIB interface, namely RX steering and ring queue resources. The only mlx5 resources a child Pkey interface will be creating are the TX rings, since they should be assigned to a specific Pkey. mlx5i Pkey netdev is implemented via new mlx5e netdev profile implemented in mlx5/core/ipoib/ipoib_vlan.c. The series starts with a refactoring of mlx5e PTP and mlx5 clock implementation to move the code to be part of mlx5 core rather than mlx5e netdevice, in order to make mlx5 clock and PTP registration part of the core to be shared with mlx5e master Ethernet netdev/IPoIB parent netdev and mlx5_ib in the near future. Add the support for attaching multiple underlay QPs for the different Pkeys in mlx5 core RX steering. Add Pkey index to rdma_netdev to add the ability to set PKEY index to lower IPoIB offload netdev. Use hash-table to map between DQPN (Destination QP number) to child netdev for the IPoIB parent netdev to forward RX packets to the corresponding child Pkey netdev, since the RX rings are shared. The reset of the series adds the ipoib child Pkey: mlx5e netdev profile, netdev nods implementation and minimal set of ethtool callbacks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Oct, 2017 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-10-13 This series contains updates to mqprio and i40e. Amritha introduces a new hardware offload mode in tc/mqprio where the TCs, the queue configurations and bandwidth rate limits are offloaded to the hardware. The existing mqprio framework is extended to configure the queue counts and layout and also added support for rate limiting. This is achieved through new netlink attributes for the 'mode' option which takes values such as 'dcb' (default) and 'channel' and a 'shaper' option for QoS attributes such as bandwidth rate limits in hw mode 1. Legacy devices can fall back to the existing setup supporting hw mode 1 without these additional options where only the TCs are offloaded and then the 'mode' and 'shaper' options defaults to DCB support. The i40e driver enables the new mqprio hardware offload mechanism factoring the TCs, queue configuration and bandwidth rates by creating HW channel VSIs. In this new mode, the priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class when the 'mode' option is set to 'channel'. This is achieved by creating HW channels(VSI). A new channel is created for each of the traffic class configuration offloaded via mqprio framework except for the first TC (TC0) which is for the main VSI. TC0 for the main VSI is also reconfigured as per user provided queue parameters. Finally, bandwidth rate limits are set on these traffic classes through the shaper attribute by sending these rates in addition to the number of TCs and the queue configurations. Colin Ian King makes an array of constant values "constant". Alan fixes and issue where on some firmware versions, we were failing to actually fill out the phy_types which caused ethtool to not report any link types. Also hardened against a potentially malicious VF by not letting the VF to reset itself after requesting to change the number of queues (via ethtool), let the PF reset the VF to institute the requested changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Lucas Bates says: ==================== tc-testing: Test suite updates This patch series is a roundup of changes to the tc-testing suite: - Add test cases for police and mirred modules and some coverage in already-submitted test categories - Break the test case files down into more user-friendly sizes - Bug fix to the tdc.py script's handling of the -l argument v2: fix the lack of final newlines in two new files (thanks David) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lucas Bates authored
This patch fixes a bug in the tdc script, where executing tdc with the -l argument would cause the tests to start running as opposed to listing all the known test cases. Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lucas Bates authored
Add basic unit tests for police and skbmod actions in tc. Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lucas Bates authored
The original submission had the test cases stored in one monolithic file. This can be unwieldy to edit, especially as more test cases are added. This patch removes the original tests.json file in favour of individual ones broken down by category. Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-