- 13 Mar, 2015 32 commits
-
-
Jeff Kirsher authored
Fix the code comments to align with drivers/net/ code commenting style, as well as whitespace issues. The whitespace issues resolve checkpatch errors, like lines exceeding 80 chars (except for strings) and the use of tabs where possible. CC: <kernel-team@fb.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
-
Alexander Duyck authored
This patch removes some dead code from the cleanup path for ixgbe. Setting and clearing the flag doesn't do anything since all we are doing is setting the flag, scheduling NAPI, clearing the flag and then letting netpoll do the polling cleanup. As such it doesn't make much sense to have it there. This patch also removes one minor white-space error. CC: <kernel-team@fb.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jeff Kirsher authored
This patch makes sure that relaxed ordering is not disabled when on SPARC, where it helps with performance. CC: <kernel-team@fb.com> CC: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
-
Don Skidmore authored
Correcting a mistake when I initial created this function. I should have made this static since it is only referenced where the function pointer is assigned. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Don Skidmore authored
Missed this when I created commit 6a14ee0c ("ixgbe: Add X550 support function pointers"). Use a the __be* type to be consistent with how the value is assigned. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Don Skidmore authored
For the X550 mac type we have to do additional steps around enabling/disabling Rx. This patch will add a layer of indirection around these support functions to enable this. CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Florian Fainelli says: ==================== net: bcmgenet: xmit_more support This patch series adds xmit_more support to the GENET driver by allowing the deferal of the producer index write to the TDMA engine. Changes in v2: - move the netif_tx_stop_queue check *before* updating the producer index ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Delay the update of the TDMA producer index unless this is the last SKB in a batch, or the queue is already stopped. Move the check for whether the queue should be stopped before the xmit_more check to avoid locking the transmit queue in case there was a SKB submitted which has xmit_more set. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
There is no need to have both bcmgenet_xmit_single() and bcmgenet_xmit_frag() perform a free_bds decrement and a prod_index increment by one. In case one of these functions fails to map a SKB or fragment for transmit, we will return and exit bcmgenet_xmit() with an error. We can therefore safely use our local copy of nr_frags to know by how much we should decrement the number of free buffers available, and by how much the producer count must be incremented and do this in the tail of bcmgenet_xmit(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petri Gynther authored
Currently, bcmgenet_desc_rx() calls bcmgenet_rx_refill() at the end of Rx packet processing loop, after the current Rx packet has already been passed to napi_gro_receive(). However, bcmgenet_rx_refill() might fail to allocate a new Rx skb, thus leaving a hole on the Rx queue where no valid Rx buffer exists. To eliminate this situation: 1. Rewrite bcmgenet_rx_refill() to retain the current Rx skb on the Rx queue if a new replacement Rx skb can't be allocated and DMA-mapped. In this case, the data on the current Rx skb is effectively dropped. 2. Modify bcmgenet_desc_rx() to call bcmgenet_rx_refill() at the top of Rx packet processing loop, so that the new replacement Rx skb is already in place before the current Rx skb is processed. Signed-off-by: Petri Gynther <pgynther@google.com> Tested-by: Jaedon Shin <jaedon.shin@gmail.com>-- Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric W. Biederman says: ==================== tcp_metrics: Network namespace bloat reduction v3 This is a small pile of patches that convert tcp_metrics from using a hash table per network namespace to using a single hash table for all network namespaces. This is broken up into several patches so that each small step along the way could be carefully scrutinized as I wrote it, and equally so that each small step can be reviewed. There are several cleanups included in this series. The addition of panic calls during boot where we can not handle failure, and not trying simplifies the code. The removal of the return code from tcp_metrics_flush_all. The motivation for this change is that the tcp_metrics hash table at 128KiB is one of the largest components of a freshly allocated network namespace. I am resending the the previous version I sent has suffered bitrot, so I have respun the patches so that they apply. I believe I have addressed all of the review concerns except optimal behavior on little machines with 32-byte cache lines, which is beyond me as even the current code has bad behavior in that case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Now that all of the operations are safe on a single hash table accross network namespaces, allocate a single global hash table and update the code to use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Rewrite tcp_metrics_flush_all so that it can cope with entries from different network namespaces on it's hash chain. This is based on the logic in tcp_metrics_nl_cmd_del for deleting a selection of entries from a tcp metrics hash chain. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
tcp_metrics_flush_all always returns 0. Remove the unnecessary return code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
In preparation for using one tcp metrics hash table for all network namespaces add a field tcpm_net to struct tcp_metrics_block, and verify that field on all hash table lookups. Make the field tcpm_net of type possible_net_t so it takes no space when network namespaces are disabled. Further add a function tm_net to read that field so we can be efficient when network namespaces are disabled and concise the rest of the time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
In preparation for using one hash table for all network namespaces mix the network namespace into the hash value. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
There is not a practical way to cleanup during boot so just panic if there is a problem initializing tcp_metrics. That will at least give us a clear place to start debugging if something does go wrong. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
In the case of AF_INET s_addr was set to INADDR_ANY (0) which which both symmetric with the AF_INET6 case, where s_addr is not set, and unnecessary as udp_conf is zeroed out earlier in the same function. I suspect this change does not have any run-time effect due to compiler optimisations. But it does make the code a little easier on the/my eyes. Cc: Tom Herbert <therbert@google.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Reobert Shearman noticed that mpls_egress is failing to verify that the bytes to be examined are in fact present in the packet before mpls_egress reads those bytes. As suggested by David Miller reduce this to a single pskb_may_pull call so that we don't do unnecessary work in the fast path. Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jaeden Amero authored
The PHY state machine (in drivers/net/phy/phy.c) will unconditionally call phydev->adjust_link (macb_handle_link_change) when polling in the PHY_CHANGELINK state. As currently written, macb always ends up requesting a new tx_clk frequency in macb_handle_link_change. It is a waste of time to request a new tx_clk frequency if the link state hasn't changed, as the tx_clk will already be configured properly. Let's only request a new tx_clk clock frequency when necessary. Signed-off-by: Jaeden Amero <jaeden.amero@ni.com> Cc: Josh Cartwright <joshc@ni.com> Cc: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
This patch fixes a typo rhashtable_lookup_compare where we fail to recompute the hash when looking up the new table. This causes elements to be missed and potentially a crash during a resize. Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue") changed ht->shift to be atomic, which is actually unnecessary. Instead of leaving the current shift in the core rhashtable structure, it can be cached inside the individual bucket tables. There, it will only be initialized once during a new table allocation in the shrink/expansion slow path, and from then onward it stays immutable for the rest of the bucket table liftime. That allows shift to be non-atomic. The patch also moves hash_rnd management into the table setup. The rhashtable structure now consumes 3 instead of 4 cachelines. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
There is a potential race condition between readers and the rehasher. In particular, the rehasher could have started a rehash while the reader finishes a scan of the old table but fails to see the new table pointer. This patch closes this window by adding smp_wmb/smp_rmb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== inet: tcp listener refactoring, part 8 These patches prepare request socks being hashed into general ehash table : We declare 3 aliases (ireq_state, ireq_refcnt, ireq_family) Note that refcnt is not yet handled, this will be done later. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Before inserting request socks into general hash table, fill their socket family. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ireq->ir_num contains local port, use it. Also, get_openreq4() dumping listen_sk->refcnt makes litle sense. inet_diag_fill_req() can also use ireq->ir_num Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sock_edemux() & sock_gen_put() should be ready to cope with request socks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Make proto_register() & proto_unregister() a bit nicer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When request socks will be in ehash, they'll need to be refcounted. This patch adds rsk_refcnt/ireq_refcnt macros, and adds reqsk_put() function, but nothing yet use them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We need to identify request sock when they'll be visible in global ehash table. ireq_state is an alias to req.__req_common.skc_state. Its value is set to TCP_NEW_SYN_RECV Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP_SYN_RECV state is currently used by fast open sockets. Initial TCP requests (the pseudo sockets created when a SYN is received) are not yet associated to a state. They are attached to their parent, and the parent is in TCP_LISTEN state. This commit adds TCP_NEW_SYN_RECV state, so that we can convert TCP stack to a different schem gradually. This state is not exported to user space. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
I forgot to update dccp_v6_conn_request() & cookie_v6_check(). They both need to set ireq->ireq_net and ireq->ir_cookie Lets clear ireq->ir_cookie in inet_reqsk_alloc() Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Mar, 2015 8 commits
-
-
Daniel Borkmann authored
Currently, it is possible in cls_bpf to access eBPF maps only under rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(), the classifier would be called from __netif_receive_skb_core() under rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via __dev_queue_xmit(). This rcu/rcu_bh mix doesn't work together with eBPF maps as they require soley to be called under rcu_read_lock(). eBPF maps could also be shared among various other eBPF programs (possibly even with other eBPF program types, f.e. tracing) and user space processes, so any context is assumed. Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program invocation under non-bh RCU lock variant. Fixes: e2e9b654 ("cls_bpf: add initial eBPF support for programmable classifiers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexander Duyck says: ==================== fib_trie: Minor fixes for table merge This patch set addresses two issues reported with the tables merged, the first is a NULL pointer dereference, and the other is to remove a WARN_ON and set the ordering for aliases from different tables with the same slen values. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This change makes it so that we should always have a deterministic ordering for the main and local aliases within the merged table when two leaves overlap. So for example if we have a leaf with a key of 192.168.254.0. If we previously added two aliases with a prefix length of 24 from both local and main the first entry would be first and the second would be second. When I was coding this I had added a WARN_ON should such a situation occur as I wasn't sure how likely it would be. However this WARN_ON has been triggered so this is something that should be addressed. With this patch the ordering of the aliases is as follows. First they are sorted on prefix length, then on their table ID, then tos, and finally priority. This way what we end up doing is essentially interleaving the two tables on what used to be leaf_info structure boundaries. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
The function fib_unmerge assumed the local table had already been allocated. If that is not the case however when custom rules are applied then this can result in a NULL pointer dereference. In order to prevent this we must check the value of the local table pointer and if it is NULL simply return 0 as there is no local table to separate from the main. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
I noticed that a helper function with argument type ARG_ANYTHING does not need to have an initialized value (register). This can worst case lead to unintented stack memory leakage in future helper functions if they are not carefully designed, or unintended application behaviour in case the application developer was not careful enough to match a correct helper function signature in the API. The underlying issue is that ARG_ANYTHING should actually be split into two different semantics: 1) ARG_DONTCARE for function arguments that the helper function does not care about (in other words: the default for unused function arguments), and 2) ARG_ANYTHING that is an argument actually being used by a helper function and *guaranteed* to be an initialized register. The current risk is low: ARG_ANYTHING is only used for the 'flags' argument (r4) in bpf_map_update_elem() that internally does strict checking. Fixes: 17a52670 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric W. Biederman says: ==================== Introduce possible_net_t The current usage of write_pnet and read_pnet is a little laborious and error prone as you only notice if you failed to include them if are compiling with network namespaces enabled. possible_net_t remedies that by using a type that is 0 bytes when network namespaces are disabled and can only be read and written to with read_pnet and write_pnet. Aka less work and safer for the same effect. I kill hold_net and release_net first as are they are haven't been used since 2008 and are noise at the points where write_pnet and read_pnet are used. I have folded in Eric Dumazets suggestions to improve the killing of hold_net and release net. And respon. I had to respin anyway as there was enough changes elsewhere in the tree the previous version of these patches did not quite apply cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Having to say > #ifdef CONFIG_NET_NS > struct net *net; > #endif in structures is a little bit wordy and a little bit error prone. Instead it is possible to say: > typedef struct { > #ifdef CONFIG_NET_NS > struct net *net; > #endif > } possible_net_t; And then in a header say: > possible_net_t net; Which is cleaner and easier to use and easier to test, as the possible_net_t is always there no matter what the compile options. Further this allows read_pnet and write_pnet to be functions in all cases which is better at catching typos. This change adds possible_net_t, updates the definitions of read_pnet and write_pnet, updates optional struct net * variables that write_pnet uses on to have the type possible_net_t, and finally fixes up the b0rked users of read_pnet and write_pnet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
hold_net and release_net were an idea that turned out to be useless. The code has been disabled since 2008. Kill the code it is long past due. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-