- 27 Oct, 2016 7 commits
-
-
David S. Miller authored
Johannes Berg says: ==================== genetlink improvements This series contains some generic netlink improvements, making the API safer to use, and making the function pointers in the family struct safer by allowing it to be __ro_after_init. The first patch, introducing genl_family_attrbuf(), just ensures that the users of family->attrbuf aren't actually racy, but making them use the indirection function for obtaining a reference and checking that the context can actually do so. The second patch removes the more or less broken ability to have a static family ID, the three IDs that need to be static because it's simply needed (genl controller), or due to old API misused. Everything else couldn't be static anyway, or could fail when the family is registered, if somebody else already got a static ID. The third patch statically initializes the families, mostly to save some code. I wrote this initially because I thought I could make them all const, but that ends up being very inefficient (it would require always doing some kind of family -> id lookup), so now it's just here because I had it already and it reduces the code size. The fourth patch then, finally, lays the groundwork for what I had really wanted - now with __ro_after_init instead of const; I remove code there to do the ID->family hash table mapping in genetlink and use IDR instead to both allocate and map the IDs, which again ends up saving some code size. Finally, the fifth patch updates all families, as it turns out, no families exist that really dynamically register/unregister. This last patch should perhaps be split up, I could submit it for each subsystem separately, but it'd depend on the second and third to go in first, so would take a while. I can do that though, if that seems better to you. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Now genl_register_family() is the only thing (other than the users themselves, perhaps, but I didn't find any doing that) writing to the family struct. In all families that I found, genl_register_family() is only called from __init functions (some indirectly, in which case I've add __init annotations to clarifly things), so all can actually be marked __ro_after_init. This protects the data structure from accidental corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Since generic netlink family IDs are small integers, allocated densely, IDR is an ideal match for lookups. Replace the existing hand-written hash-table with IDR for allocation and lookup. This lets the families only be written to once, during register, since the list_head can be removed and removal of a family won't cause any writes. It also slightly reduces the code size (by about 1.3k on x86-64). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Instead of providing macros/inline functions to initialize the families, make all users initialize them statically and get rid of the macros. This reduces the kernel code size by about 1.6k on x86-64 (with allyesconfig). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Static family IDs have never really been used, the only use case was the workaround I introduced for those users that assumed their family ID was also their multicast group ID. Additionally, because static family IDs would never be reserved by the generic netlink code, using a relatively low ID would only work for built-in families that can be registered immediately after generic netlink is started, which is basically only the control family (apart from the workaround code, which I also had to add code for so it would reserve those IDs) Thus, anything other than GENL_ID_GENERATE is flawed and luckily not used except in the cases I mentioned. Move those workarounds into a few lines of code, and then get rid of GENL_ID_GENERATE entirely, making it more robust. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
This helper function allows family implementations to access their family's attrbuf. This gets rid of the attrbuf usage in families, and also adds locking validation, since it's not valid to use the attrbuf with parallel_ops or outside of the dumpit callback. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antonio Quartulli authored
The user may want to use only some bits of the skb mark in his skbedit rules because the remaining part might be used by something else. Introduce the "mask" parameter to the skbedit actor in order to implement such functionality. When the mask is specified, only those bits selected by the latter are altered really changed by the actor, while the rest is left untouched. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Oct, 2016 15 commits
-
-
Elad Raz authored
When a port_type_set() is been called and the new port type set is the same as the old one, just return success. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Richter authored
firewire-net, like the older eth1394 driver, reduced the initial MTU to less than 1500 octets if the local link layer controller's asynchronous packet reception limit was lower. This is bogus, since this reception limit does not have anything to do with the transmission limit. Neither did this reduction affect the TX path positively, nor could it prevent link fragmentation at the RX path. Many FireWire CardBus cards have a max_rec of 9, causing an initial MTU of 1024 - 16 = 1008. RFC 2734 and RFC 3146 allow a minimum max_rec = 8, which would result in an initial MTU of 512 - 16 = 496. On such cards, IPv6 could only be employed if the MTU was manually increased to 1280 or more, i.e. IPv6 would not work without intervention from userland. We now always initialize the MTU to 1500, which is the default according to RFC 2734 and RFC 3146. On a VIA VT6316 based CardBus card which was affected by this, changing the MTU from 1008 to 1500 also increases TX bandwidth by 6 %. RX remains unaffected. CC: netdev@vger.kernel.org CC: linux1394-devel@lists.sourceforge.net CC: Jarod Wilson <jarod@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Richter authored
Commit b3e3893e ("net: use core MTU range checking in misc drivers") mistakenly introduced an upper limit for firewire-net's MTU based on the local link layer controller's reception capability. Revert this. Neither RFC 2734 nor our implementation impose any particular upper limit. Actually, to be on the safe side and to make the code explicit, set ETH_MAX_MTU = 65535 as upper limit now. (I replaced sizeof(struct rfc2734_header) by the equivalent RFC2374_FRAG_HDR_SIZE in order to avoid distracting long/int conversions.) Fixes: b3e3893e('net: use core MTU range checking in misc drivers') CC: netdev@vger.kernel.org CC: linux1394-devel@lists.sourceforge.net CC: Jarod Wilson <jarod@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
This node pointer is returned by of_get_child_by_name() with refcount incremented in this function. of_node_put() on it before exitting this function. This is detected by Coccinelle semantic patch. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Use setup_timer() instead of init_timer(), being the preferred/standard way to set a timer up. Also, quoting the mod_timer() function comment: -> mod_timer() is a more efficient way to update the expire field of an active timer (if the timer is inactive it will be activated). Use setup_timer and mod_timer to setup and arm a timer, to make the code cleaner and easier to read. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fix to return error code -ENODEV from the DMA is not supported error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled, spin_lock_irqsave() make sure always in irq disable context. So the kfree_skb() should be replaced with dev_kfree_skb_irq(). This is detected by Coccinelle semantic patch. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fix to return error code -EINVAL from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
It's not necessary to free memory allocated with devm_kzalloc in the remove path and using kfree leads to a double free. Fixes: 84640e27 ("net: netcp: Add Keystone NetCP core ethernet driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sven Eckelmann authored
The maximum MTU is defined via the slave devices of an batman-adv interface. Thus it is not possible to calculate the max_mtu during the creation of the batman-adv device when no slave devices are attached. Doing so would for example break non-fragmentation setups which then (incorrectly) allow an MTU of 1500 even when underlying device cannot transport 1500 bytes + batman-adv headers. Checking the dynamically calculated max_mtu via the minimum of the slave devices MTU during .ndo_change_mtu is also used by the bridge interface. Cc: Jarod Wilson <jarod@redhat.com> Fixes: b3e3893e ("net: use core MTU range checking in misc drivers") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Xo Wang says: ==================== Broadcom BCM54612E support This series is based on tip of torvalds/master. The first patch adds register definitions from Broadcom docs. The second patch adds the BCM54612E PHY ID, flags, and device-specific RGMII internal delay initialization. I tested on a custom board with an Aspeed AST2500 SOC with its second MAC connected to this PHY. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xo Wang authored
This PHY has internal delays enabled after reset. This clears the internal delay enables unless the interface specifically requests them. Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xo Wang authored
Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control shadow register and a mask for the shadow selector field. Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL. Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When I prepared commit d250a5f9 ("pkt_sched: gen_estimator: Dont report fake rate estimators"), htb still had an implicit rate estimator for all its classes. Then later, I made this rate estimator optional in commit 64153ce0 ("net_sched: htb: do not setup default rate estimators"), but I forgot to update htb use of gnet_stats_copy_rate_est() After this patch, "tc -s qdisc ..." no longer report fake rate estimators for HTB classes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Oct, 2016 11 commits
-
-
Cyrill Gorcunov authored
In criu we are actively using diag interface to collect sockets present in the system when dumping applications. And while for unix, tcp, udp[lite], packet, netlink it works as expected, the raw sockets do not have. Thus add it. v2: - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@) - implement @destroy for diag requests (by dsa@) v3: - add export of raw_abort for IPv6 (by dsa@) - pass net-admin flag into inet_sk_diag_fill due to changes in net-next branch (by dsa@) v4: - use @pad in struct inet_diag_req_v2 for raw socket protocol specification: raw module carries sockets which may have custom protocol passed from socket() syscall and sole @sdiag_protocol is not enough to match underlied ones - start reporting protocol specifed in socket() call when sockets are raw ones for the same reason: user space tools like ss may parse this attribute and use it for socket matching v5 (by eric.dumazet@): - use sock_hold in raw_sock_get instead of atomic_inc, we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock when looking up so counter won't be zero here. v6: - use sdiag_raw_protocol() helper which will access @pad structure used for raw sockets protocol specification: we can't simply rename this member without breaking uapi v7: - sine sdiag_raw_protocol() helper is not suitable for uapi lets rather make an alias structure with proper names. __check_inet_diag_req_raw helper will catch if any of structure unintentionally changed. CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
The field is initialized by ILA and MPLS but never used. Remove it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrey Vagin authored
net_mutex can be locked for a long time. It may be because many namespaces are being destroyed or many processes decide to create a network namespace. Both these operations are heavy, so it is better to have an ability to kill a process which is waiting net_mutex. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shmulik Ladkani authored
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci) is zero, then no vlan accel tag is present. This is incorrect for zero VID vlan accel packets, making the following match fail: tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ... Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was introduced in 05423b24 "vlan: allow null VLAN ID to be used" (and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not adapted). Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's value. Fixes: 05423b24 ("vlan: allow null VLAN ID to be used") Fixes: 1a31f204 ("netsched: Allow meta match on vlan tag on receive") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Driver update Mostly cosmetics and small resource values management rewrite. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Since the number of resources is going to get much bigger, ease up the addition by simly defining IDs. Convert the existing structure members to a set array, one for validity, one for values. Introduce a set of getters and setters for easy access. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Push cmd resource query related defines to cmd.h where they belong. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Extend the MLXSW_REG_DEFINE macro to store register name in string form. Use this string later on instead of hard coded string values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Save some code and also prepare to easily carry name in string form. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Enforce const for getter buf args. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
These should be const, so enforce it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 22 Oct, 2016 7 commits
-
-
David S. Miller authored
Daniel Borkmann says: ==================== Add BPF numa id helper This patch set adds a helper for retrieving current numa node id and a test case for SO_REUSEPORT. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
The test case is very similar to reuseport_bpf_cpu, only that here we select socket members based on current numa node id. # numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 node 0 size: 128867 MB node 0 free: 120080 MB node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23 node 1 size: 96765 MB node 1 free: 87504 MB node distances: node 0 1 0: 10 20 1: 20 10 # ./reuseport_bpf_numa ---- IPv4 UDP ---- send node 0, receive socket 0 send node 1, receive socket 1 send node 1, receive socket 1 send node 0, receive socket 0 ---- IPv6 UDP ---- send node 0, receive socket 0 send node 1, receive socket 1 send node 1, receive socket 1 send node 0, receive socket 0 ---- IPv4 TCP ---- send node 0, receive socket 0 send node 1, receive socket 1 send node 1, receive socket 1 send node 0, receive socket 0 ---- IPv6 TCP ---- send node 0, receive socket 0 send node 1, receive socket 1 send node 1, receive socket 1 send node 0, receive socket 0 SUCCESS Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Use case is mainly for soreuseport to select sockets for the local numa node, but since generic, lets also add this for other networking and tracing program types. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Paolo Abeni says: ==================== udp: refactor memory accounting This patch series refactor the udp memory accounting, replacing the generic implementation with a custom one, in order to remove the needs for locking the socket on the enqueue and dequeue operations. The socket backlog usage is dropped, as well. The first patch factor out pieces of some queue and memory management socket helpers, so that they can later be used by the udp memory accounting functions. The second patch adds the memory account helpers, without using them. The third patch replacse the old rx memory accounting path for udp over ipv4 and udp over ipv6. In kernel UDP users are updated, as well. The memory accounting schema is described in detail in the individual patch commit message. The performance gain depends on the specific scenario; with few flows (and little contention in the original code) the differences are in the noise range, while with several flows contending the same socket, the measured speed-up is relevant (e.g. even over 100% in case of extreme contention) Many thanks to Eric Dumazet for the reiterated reviews and suggestions. v5 -> v6: - do not orphan the skb on enqueue, skb_steal_sock() already did the work for us v4 -> v5: - use the receive queue spin lock to protect the memory accounting - several minor clean-up v3 -> v4: - simplified the locking schema, always use a plain spinlock v2 -> v3: - do not set the now unsed backlog_rcv callback v1 -> v2: - changed slighly the memory accounting schema, we now perform lazy reclaim - fixed forward_alloc updating issue - fixed memory counter integer overflows ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Completely avoid default sock memory accounting and replace it with udp-specific accounting. Since the new memory accounting model encapsulates completely the required locking, remove the socket lock on both enqueue and dequeue, and avoid using the backlog on enqueue. Be sure to clean-up rx queue memory on socket destruction, using udp its own sk_destruct. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr readers Kpps (vanilla) Kpps (patched) 1 170 440 3 1250 2150 6 3000 3650 9 4200 4450 12 5700 6250 v4 -> v5: - avoid unneeded test in first_packet_length v3 -> v4: - remove useless sk_rcvqueues_full() call v2 -> v3: - do not set the now unsed backlog_rcv callback v1 -> v2: - add memory pressure support - fixed dropwatch accounting for ipv6 Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Avoid using the generic helpers. Use the receive queue spin lock to protect the memory accounting operation, both on enqueue and on dequeue. On dequeue perform partial memory reclaiming, trying to leave a quantum of forward allocated memory. On enqueue use a custom helper, to allow some optimizations: - use a plain spin_lock() variant instead of the slightly costly spin_lock_irqsave(), - avoid dst_force check, since the calling code has already dropped the skb dst - avoid orphaning the skb, since skb_steal_sock() already did the work for us The above needs custom memory reclaiming on shutdown, provided by the udp_destruct_sock(). v5 -> v6: - don't orphan the skb on enqueue v4 -> v5: - replace the mem_lock with the receive queue spin lock - ensure that the bh is always allowed to enqueue at least a skb, even if sk_rcvbuf is exceeded v3 -> v4: - reworked memory accunting, simplifying the schema - provide an helper for both memory scheduling and enqueuing v1 -> v2: - use a udp specific destrctor to perform memory reclaiming - remove a couple of helpers, unneeded after the above cleanup - do not reclaim memory on dequeue if not under memory pressure - reworked the fwd accounting schema to avoid potential integer overflow Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Basic sock operations that udp code can use with its own memory accounting schema. No functional change is introduced in the existing APIs. v4 -> v5: - avoid whitespace changes v2 -> v4: - avoid exporting __sock_enqueue_skb v1 -> v2: - avoid export sock_rmem_free Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-