- 19 Feb, 2016 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-02-18 This series contains updates to i40e and i40evf only. Alex Duyck provides all the patches in the series to update and fix the drivers. Fixed the driver to drop the outer checksum offload on UDP tunnels, since the issue is that the upper levels of the stack never requested such an offload and it results in possible errors. Updates the TSO function to just use u64 values, so we do not have to end up casting u32 values. In the TSO path, factored out the L4 header offsets allowing us to ignore the L4 header offsets when dealing with the L3 checksum and length update. Consolidates all of the spots where we were updating either the TCP or IP checksums in the TSO and checksum path into the TSO function. Fixed two issues by adding support for IPv4 encapsulated in IPv6, first issue was the fact that iphdr(skb)->protocol was being used to test for the outer transport protocol which breaks IPv6 support. The second was that we cleared the flag for v4 going to v6, but we did not take care of txflags going the other way. Added support for IPv6 extension headers in setting up the Tx checksum. Added exception handling to the Tx checksum path so that we can handle cases of TSO where the frame is bad, or Tx checksum where we did not recognize a protocol. Fixed a number of issues to make certain that we are using the correct protocols when parsing both the inner and outer headers of a frame that is mixed between IPv4 and IPv6 for inner and outer. Updated the feature flags to reflect the newly enabled/added features. Sorry, no witty patch descriptions this time around, probably should let Mitch help in writing patch descriptions for Alex. :-) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Commit e5d3a51c ("bnx2x: extend DCBx support") was missing HSI changes for big-endian machine, breaking compilation on such platforms. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 18 Feb, 2016 38 commits
-
-
Alexander Duyck authored
This patch updates the code for determining the L4 protocol and L3 header length so that when IPv6 extension headers are being used we can determine the offset and type of the L4 protocol. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
Recent changes should have enabled support for IPv6 based tunnels and support for TSO with outer UDP checksums. As such we can update the feature flags to reflect that. In addition we can clean-up the flags that aren't needed such as SCTP and RXCSUM since having the bits there doesn't add any value. I also found one spot where we were setting the same flag twice. It looks like it was probably a git merge error that resulted in the line being duplicated. As such I have dropped it in this patch. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
Recent changes should have enabled support for IPv6 based tunnels and support for TSO with outer UDP checksums. As such we can update the feature flags to reflect that. In addition we can clean-up the flags that aren't needed such as SCTP and RXCSUM since having the bits there doesn't add any value. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
All of the documentation in the datasheets for the XL710 do not call out any reason to exclude support for IPv6 based tunnels. As such I am dropping the code that was excluding these tunnel types from having their port numbers recognized. This way we can take advantage of things such as checksum offload for inner headers over IPv6 based VXLAN or GENEVE tunnels. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch contains a number of fixes to make certain that we are using the correct protocols when parsing both the inner and outer headers of a frame that is mixed between IPv4 and IPv6 for inner and outer. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
The XL722 has support for providing the outer UDP tunnel checksum on transmits. Make use of this feature to support segmenting UDP tunnels with outer checksums enabled. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This is mostly a minor clean-up for the Rx checksum path in order to avoid some of the unnecessary conditional checks that were being applied. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Yuval Mintz says: ==================== qed{,e}: Add vlan filtering offload This series adds vlan filtering offload to qede. First patch introduces small additional infrastructure needed in qed to support it, while second contains the main bulk of driver changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sudarsana Reddy Kalluru authored
Device would start receiving only vlan-tagged traffic with tags matching that of one of the configured vlan IDs, unless: - Device is expliicly placed in PROMISC mode. - Device exhausts its vlan filter credits. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Today, interfaces are working in vlan-promisc mode; But once vlan filtering offloaded would be supported, we'll need a method to control it directly [e.g., when setting device to PROMISC, or when running out of vlan credits]. This adds the necessary API for L2 client to manually choose whether to accept all vlans or only those for which filters were configured. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew F. Davis authored
Files in sysfs are created using the name from the phy_driver struct, when two names are the same we may get a duplicate filename warning, fix this. Reported-by: kernel test robot <ying.huang@linux.intel.com> Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This patch takes advantage of several assumptions we can make about the headers of the frame in order to reduce overall processing overhead for computing the outer header checksum. First we can assume the entire header is in the region pointed to by skb->head as this is what csum_start is based on. Second, as a result of our first assumption, we can just call csum_partial instead of making a call to skb_checksum which would end up having to configure things so that we could walk through the frags list. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Benjamin Poirier authored
follows up commit 45f6fad8 ("ipv6: add complete rcu protection around np->opt") which added mixed rcu/refcount protection to np->opt. Given the current implementation of rcu_pointer_handoff(), this has no effect at runtime. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Benc says: ==================== iptunnel: scrub packet in iptunnel_pull_header As every IP tunnel has to scrub skb on decapsulation, iptunnel_pull_header tried to do that and open coded part of skb_scrub_packet. Various tunneling protocols (VXLAN, Geneve) then called full skb_scrub_packet on their own, duplicating part of the scrubbing already done. Consolidate the code, calling skb_scrub_packet from iptunnel_pull_header. This will allow additional cleanups in VXLAN code, as the packet is scrubbed early during rx processing after this patchset and VXLAN can start filling out skb fields earlier. The full picture of vxlan cleanup patches can be seen at: https://github.com/jbenc/linux-vxlan/commits/master ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
This is in preparation for iptunnel_pull_header calling skb_scrub_packet. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
This is in preparation for iptunnel_pull_header calling skb_scrub_packet. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Similarly to the existing vxlan_get_sk_family. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Remove the shared br_log_state function and print the info directly in br_set_state, where the net_bridge_port state is actually changed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hariprasad Shenai says: ==================== cxgb4: Use __dev_[um]c_[un]sync for MAC address syncing This patch series adds support to use __dev_uc_sync/__dev_mc_sync to add MAC address and __dev_uc_unsync/__dev_mc_unsync to delete MAC address. This patch series has been created against net-next tree and includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
Add exception handling to the Tx checksum path so that we can handle cases of TSO where the frame is bad, or Tx checksum where we didn't recognize a protocol Drop I40E_TX_FLAGS_CSUM as it is unused, move the CHECKSUM_PARTIAL check into the function itself so that we can decrease indent. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch defers writing to the Tx descriptor bits until we know we have successfully completed a given operation. So for example we defer updating the tunnelling portion of the context descriptor until we have fully identified the type. The advantage to this approach is that we can assemble values as we go instead of having to try and kludge everything together all at once. As a result we can significantly clean up the tunneling configuration for instance as we can just do a pointer walk and do the math for the distance between each set of points. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jiri Benc authored
The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to convert the vni to tun_id correctly. Fixes: 54bfd872 ("vxlan: keep flags and vni in network byte order") Reported-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This patch adds support for IPv6 extension headers in setting up the Tx checksum. Without this patch extension headers would cause IPv6 traffic to fail as the transport protocol could not be identified. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch fixes two issues. First was the fact that iphdr(skb)->protocl was being used to test for the outer transport protocol. This completely breaks IPv6 support. Second was the fact that we cleared the flag for v4 going to v6, but we didn't take care of txflags going the other way. As such we would have the v6 flag still set even if the inner header was v4. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
The Tx checksum path was maintaining a set of 3 pointers and two lengths in order to prepare the packet for being checksummed. The thing is we only really needed 2 pointers, and the lengths that were being maintained can easily be computed. As such we can replace the IPv4 and IPv6 header pointers with one single union that represents both, or a generic pointer to the start of the network header. For the L4 headers we can do the same with TCP and a generic pointer to the start of the transport header. The length of the TCP header is obtained by simply multiplying doff by 4, and the network header length can be obtained by subtracting the network header pointer from the transport header pointer. While I was at it I renamed l4_hdr to l4_proto to make it a bit more clear and less likely to be confused with l4.hdr which is the transport header pointer. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch goes through and pulls all of the spots where we were updating either the TCP or IP checksums in the TSO and checksum path into the TSO function. The general idea here is that we should only be updating the header after we verify we have completed a skb_cow_head check to verify the head is writable. One other advantage to doing this is that it makes things much more obvious. For example, in the case of IPv6 there was one spot where the offset of the IPv4 header checksum was being updated which is obviously incorrect. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch makes it so that the L4 header offsets and such can be ignored when dealing with the L3 checksum and length update. This is done making use of two things. First we can just use the offset from the L4 header to the start of the packet to determine the L4 offset, and from that we can then make use of the data offset to determine the full length of the headers. As far as adjusting the checksum to remove the length we can simply add the inverse of the length instead of having to recompute the entire pseudo-header without the length. In the case of an IPv6 header this should be significantly cheaper since we can make use of a value we already needed instead of having to read the source and destination address out of the packet. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
Instead of casing u32 values to u64 it makes more sense to just start out with u64 values in the first place. This way we don't need to create a mess with all of the casts needed to populate a 64b value. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
The i40e and i40evf drivers contained code for inserting an outer checksum on UDP tunnels. The issue however is that the upper levels of the stack never requested such an offload and it results in possible errors. In addition the same logic was being applied to the Rx side where it was attempting to validate the outer checksum, but the logic there was incorrect in that it was testing for the resultant sum to be equal to the header checksum instead of being equal to 0. Since this code is so massively flawed, and doing things that we didn't ask for it to do I am just dropping it, and will bring it back later to use as an offload for SKB_GSO_UDP_TUNNEL_CSUM which can make use of such a feature. As far as the Rx feature I am dropping it completely since it would need to be massively expanded and applied to IPv4 and IPv6 checksums for all parts, not just the one that supports Tx checksum offload for the outer. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Florian Westphal says: ==================== netlink: remove mmapped netlink support As discussed during netconf 2016 in Seville, this series removes CONFIG_NETLINK_MMAP. Close to three years after it was merged it has retained several problems that do not appear to be fixable. No official netfilter libmnl release contains support for mmap backed netlink sockets. No openvswitch release makes use of it either. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element (NL_MMAP_STATUS_COPY). So if there are odd programs out there that attempt to use MMAP netlink they should continue to work as they already need a socket based code path to work properly. The actual revert (first patch) has a list of problems. The followup patches remove a couple of helpers that are no longer needed after the revert. I did a few tests with mmap vs. socket based interface on a 4.4 based kernel on an i7-4790 box and there are no performance advantages: loopback, single nfqueue, queueing in -t filter INPUT: traffic generated by 8 * ping -q -f localhost: socket backend: real 0m27.325s user 0m3.993s sys 0m23.292s with mmap ring backend: real 0m29.054s user 0m4.924s sys 0m24.127s with single tcp stream, unidirectional, loopback mtu set at 1500 (nc localhost discard < /dev/zero > /dev/null): socket interface: time nfqdump -b $((8 * 1024 * 1024 * 1024)) -w /dev/null real 0m15.960s user 0m1.756s sys 0m11.143s mmap ring: real 0m16.441s user 0m3.040s sys 0m13.687s socket interface nfqdump[1] with --gso option (i.e. MTU is exceeded, no kernel-side segmentation and checksum fixups) completes in about 5s. I also tested dumping a conntrack table with 1m entries. On my box this takes about 2.4 seconds for both mmap and socket backend: time LD_PRELOAD=../../src/.libs/libmnl.so ./nfct-dump-sk > /dev/null mnl_cb_run: Success messages: 1000000 real 0m2.485s user 0m1.085s sys 0m1.400s time LD_PRELOAD=../../src/.libs/libmnl.so ./nfct-dump-mmap > /dev/null messages: 1000000 real 0m2.451s user 0m1.124s sys 0m1.328s [1] https://git.breakpoint.cc/cgit/fw/nfqdump.git/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
reverts commit 3ab1f683 ("nfnetlink: add support for memory mapped netlink")' Like previous commits in the series, remove wrappers that are not needed after mmapped netlink removal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Following mmapped netlink removal this code can be simplified by removing the alloc wrapper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
This reverts commit bb9b18fb ("genl: Add genlmsg_new_unicast() for unicast message allocation")'. Nothing wrong with it; its no longer needed since this was only for mmapped netlink support. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
revert commit 795449d8 ("openvswitch: Enable memory mapped Netlink i/o"). Following the mmaped netlink removal this code can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a035 ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce00 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a0220 ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf35 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef4 ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-