- 24 Apr, 2012 3 commits
-
-
Eric Dumazet authored
While investigating TCP performance problems on 10Gb+ links, we found a tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends one ACK every two MSS segments. A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default value (87380), allowing a too small backlog. A TCP ACK, even being small, can consume nearly same truesize space than outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and is fast to compute. Performance results on netperf, single flow, receiver with disabled GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop increments at sender. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Randy Dunlap <rdunlap@xenotime.net> reported: > On 04/23/2012 12:07 AM, Stephen Rothwell wrote: > >> Hi all, >> >> Changes since 20120420: > > > include/net/ax25.h:447:75: error: expected ';' before '}' token > > static inline int ax25_register_dev_sysctl(ax25_dev *ax25_dev) { return 0 }; > static inline void ax25_unregister_dev_sysctl(ax25_dev *ax25_dev) {}; > > First function: move ';' inside braces. > Second function: drop the ';'. Put the semicolons where it makes sense. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Apr, 2012 3 commits
-
-
Eric W. Biederman authored
Randy Dunlap <rdunlap@xenotime.net> reported: > On 04/23/2012 12:07 AM, Stephen Rothwell wrote: > >> Hi all, >> >> Changes since 20120420: > > > > ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined! > ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined! > > when CONFIG_SYSCTL is not enabled. Add static inline stub functions to gracefully handle the case when sysctl support is not present. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ajit Khaparde authored
ethtool get settings was not displaying all the settings correctly. use the get_phy_info to get more information about the PHY to fix this. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock': net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable] net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock': net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable] Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 21 Apr, 2012 34 commits
-
-
Wu Jiajun-B06378 authored
Replace netif_receive_skb with napi_gro_receive. Signed-off-by: Jiajun Wu <b06378@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Factorize code, since most fetched values are int type. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
This commit moves the (substantial) common code shared between tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family independent function, tcp_init_sock(). Centralizing this functionality should help avoid drift issues, e.g. where the IPv4 side is updated without a corresponding update to IPv6. There was already some drift: IPv4 initialized snd_cwnd to TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to 2 (in this case it should not matter, since snd_cwnd is also initialized in tcp_init_metrics(), but the general risks and maintenance overhead remain). When diffing the old and new code, note that new tcp_init_sock() function uses the order of steps from the tcp_v4_init_sock() implementation (the order is slightly different in tcp_v6_init_sock()). Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
splice() from socket to pipe needs linear_to_page() helper to transfert skb header to part of page. We can reset the offset in the current sk->sk_sndmsg_page if we are the last user of the page. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This patch changes content of hashlist (used to get port struct by computed index (0...en_port_count-1)). Now the hash list contains only enabled ports so userspace will be able to say what ports can be used for tx/rx. This becomes handy when userspace will need to disable ports which does not belong to active aggregator. By default, newly added port is enabled. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Better to leave this for userspace Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through sock_kmalloc() As these allocations are temporary only and small enough, it makes sense to use plain kmalloc() and avoid sk_omem_alloc atomic overhead. Slightly changed fast path to be even faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
There are options, which are set up on a socket while performing TCP handshake. Need to resurrect them on a socket while repairing. A new sockoption accepts a buffer and parses it. The buffer should be CODE:VALUE sequence of bytes, where CODE is standard option code and VALUE is the respective value. Only 4 options should be handled on repaired socket. To read 3 out of 4 of these options the TCP_INFO sockoption can be used. An ability to get the last one (the mss_clamp) was added by the previous patch. Now the restore. Three of these options -- timestamp_ok, mss_clamp and snd_wscale -- are just restored on a coket. The sack_ok flags has 2 issues. First, whether or not to do sacks at all. This flag is just read and set back. No other sack info is saved or restored, since according to the standart and the code dropping all sack-ed segments is OK, the sender will resubmit them again, so after the repair we will probably experience a pause in connection. Next, the fack bit. It's just set back on a socket if the respective sysctl is set. No collected stats about packets flow is preserved. As far as I see (plz, correct me if I'm wrong) the fack-based congestion algorithm survives dropping all of the stats and repairs itself eventually, probably losing the performance for that period. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
The mss_clamp is the only connection-time negotiated option which cannot be obtained from the user space. Make the TCP_MAXSEG sockopt report one in the repair mode. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Reading queues under repair mode is done with recvmsg call. The queue-under-repair set by TCP_REPAIR_QUEUE option is used to determine which queue should be read. Thus both send and receive queue can be read with this. Caller must pass the MSG_PEEK flag. Writing to queues is done with sendmsg call and yet again -- the repair-queue option can be used to push data into the receive queue. When putting an skb into receive queue a zero tcp header is appented to its head to address the tcp_hdr(skb)->syn and the ->fin checks by the (after repair) tcp_recvmsg. These flags flags are both set to zero and that's why. The fin cannot be met in the queue while reading the source socket, since the repair only works for closed/established sockets and queueing fin packet always changes its state. The syn in the queue denotes that the respective skb's seq is "off-by-one" as compared to the actual payload lenght. Thus, at the rcv queue refill we can just drop this flag and set the skb's sequences to precice values. When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent, waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the protocol POV the send queue looks like it was sent, but the data between the write_seq and snd_nxt is lost in the network. This helps to avoid another sockoption for setting the snd_nxt sequence. Leaving the whole queue in a 'not yet sent' state (as it will be after sendmsg-s) will not allow to receive any acks from the peer since the ack_seq will be after the snd_nxt. Thus even the ack for the window probe will be dropped and the connection will be 'locked' with the zero peer window. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The two options are separate, and some platforms (e.g. arm pxa) have ISA slots but no ISA dma controller, so they cannot build drivers using the DMA API functions. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
Some architectures like ARM cannot handle large numbers as arguments to udelay, so the drivers should use mdelay when delaying for multiple miliseconds. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
No driver should spin the CPU for 10ms, so better use an msleep, which is allowed in the ->suspend function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The ax88796 driver uses the CRC32 functions, so make sure that they are actually enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The iwmc3200 driver selects other code in Kconfig that depends on EXPERIMENTAL. Kconfig warns about this when CONFIG_EXPERIMENTAL is not already set, so logically, these options should also be marked experimental or promoted to stable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The caif_shmcore requires io.h in order to use ioremap, so include that explicitly to compile in all configurations. Also add a note about the use of ioremap(), which is not a proper way to map a DMA buffer into kernel space. It's not completely clear what the intention is for using ioremap, but it is clear that the result of ioremap must not simply be accessed using kernel pointers but should use readl/writel or memcopy_{to,from}io. Assigning the result of ioremap to a regular pointer that can also be set to something else is not ok. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
Drivers that refer to a __devexit function in an operations structure need to annotate that pointer with __devexit_p so replace it with a NULL pointer when the section gets discarded. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The davinci_emac driver can be a module, so the symbols it needs from the cpdma driver must be exported. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
The time stamping code in this driver appears to have been copied from the ixp4xx_eth.c driver, including this timing comment. I had actually measured the time stamp delay on an IXP425, but I really doubt that this value also applies here. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
This patch fixes code which needlessly ran the BPF twice per packet. Instead, we just run the classifier once and test whether the packet is any kind of PTP event message. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch fixes the driver so that multicast PTP event messages can be recognized by the hardware time stamping unit. The station address register must be set according to the desired transport type. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
We will let the pch_gbe code do that according to the receive time stamp filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch clears up a few coding style issues: - Makes two function definitions a bit nicer looking. - Remove unneeded parentheses. - Simplify macros for register bits. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
The code in phc_gbe_main will need to call this method in order to set the station address register according to the receive time stamping filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
The reset logic after a Rx FIFO overrun will clear the programmed multicast addresses. This patch fixes the issue by reprogramming the registers after the reset. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch makes logic surrounding the test of the transmit time stamping flag more readable. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch fixes the helper functions that give the transmit and receive time stamps to return nanoseconds, instead of arbitrary clock ticks. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
All of the users have been converted to use registera_net_sysctl so we no longer need register_net_sysctl. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-