1. 29 Jun, 2018 23 commits
  2. 28 Jun, 2018 17 commits
    • David S. Miller's avatar
      Merge branch 'net-preserve-sock-reference-when-scrubbing-the-skb' · 16c0cd07
      David S. Miller authored
      Flavio Leitner says:
      
      ====================
      net: preserve sock reference when scrubbing the skb.
      
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action,
      but the transmit side needs some extra checking included in the
      first patch.
      
      The first patch will update netfilter to check if the socket
      netns is local before use it.
      
      The second patch removes the skb_orphan() from the skb_scrub_packet()
      and improve the documentation.
      
      ChangeLog:
      - split into two (Eric)
      - addressed Paolo's offline feedback to swap the checks in xt_socket.c
        to preserve original behavior.
      - improved ip-sysctl.txt (reported by Cong)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16c0cd07
    • Flavio Leitner's avatar
      skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252
      Flavio Leitner authored
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action
      and on TX side the netfilter checks if the reference is local before
      use it.
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c4c3252
    • Flavio Leitner's avatar
      netfilter: check if the socket netns is correct. · f5646501
      Flavio Leitner authored
      Netfilter assumes that if the socket is present in the skb, then
      it can be used because that reference is cleaned up while the skb
      is crossing netns.
      
      We want to change that to preserve the socket reference in a future
      patch, so this is a preparation updating netfilter to check if the
      socket netns matches before use it.
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5646501
    • David S. Miller's avatar
      Merge branch 'net-sched-actions-code-style-cleanup-and-fixes' · 003504a2
      David S. Miller authored
      Roman Mashak says:
      
      ====================
      net sched actions: code style cleanup and fixes
      
      The patchset fixes a few code stylistic issues and typos, as well as one
      detected by sparse semantic checker tool.
      
      No functional changes introduced.
      
      Patch 1 & 2 fix coding style bits caught by the checkpatch.pl script
      Patch 3 fixes an issue with a shadowed variable
      Patch 4 adds sizeof() operator instead of magic number for buffer length
      Patch 5 fixes typos in diagnostics messages
      Patch 6 explicitly sets unsigned char for bitwise operation
      
      v2:
         - submit for net-next
         - added Reviewed-by tags
         - use u8* instead of char* as per Davide Caratti suggestion
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      003504a2
    • Roman Mashak's avatar
      net sched actions: avoid bitwise operation on signed value in pedit · 43052741
      Roman Mashak authored
      Since char can be unsigned or signed, and bitwise operators may have
      implementation-dependent results when performed on signed operands,
      declare 'u8 *' operand instead.
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43052741
    • Roman Mashak's avatar
      net sched actions: fix misleading text strings in pedit action · 95b0d2dc
      Roman Mashak authored
      Change "tc filter pedit .." to "tc actions pedit .." in error
      messages to clearly refer to pedit action.
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95b0d2dc
    • Roman Mashak's avatar
      net sched actions: use sizeof operator for buffer length · 6ff7586e
      Roman Mashak authored
      Replace constant integer with sizeof() to clearly indicate
      the destination buffer length in skb_header_pointer() calls.
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ff7586e
    • Roman Mashak's avatar
      net sched actions: fix sparse warning · 544377cd
      Roman Mashak authored
      The variable _data in include/asm-generic/sections.h defines sections,
      this causes sparse warning in pedit:
      
      net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
      ./include/asm-generic/sections.h:36:13: originally declared here
      
      Therefore rename the variable.
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      544377cd
    • Roman Mashak's avatar
      net sched actions: fix coding style in pedit headers · d020d455
      Roman Mashak authored
      Fix coding style issues in tc pedit headers detected by the
      checkpatch script.
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d020d455
    • Roman Mashak's avatar
      net sched actions: fix coding style in pedit action · 80f0f574
      Roman Mashak authored
      Fix coding style issues in tc pedit action detected by the
      checkpatch script.
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80f0f574
    • Yousuk Seung's avatar
      netem: slotting with non-uniform distribution · 0a9fe5c3
      Yousuk Seung authored
      Extend slotting with support for non-uniform distributions. This is
      similar to netem's non-uniform distribution delay feature.
      
      Commit f043efeae2f1 ("netem: support delivering packets in delayed
      time slots") added the slotting feature to approximate the behaviors
      of media with packet aggregation but only supported a uniform
      distribution for delays between transmission attempts. Tests with TCP
      BBR with emulated wifi links with non-uniform distributions produced
      more useful results.
      
      Syntax:
         slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
            [bytes MAX_BYTES]
      
      The syntax and use of the distribution table is the same as in the
      non-uniform distribution delay feature. A file DISTRIBUTION must be
      present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
      NETEM_DIST_SCALE. A random value x is selected from the table and it
      takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
      supported.
      
      Examples:
        Normal distribution delay with mean = 800us and stdev = 100us.
        > tc qdisc add dev eth0 root netem slot dist normal 800us 100us
      
        Optionally set the max slot size in bytes and/or packets.
        > tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
          bytes 64k packets 42
      Signed-off-by: default avatarYousuk Seung <ysseung@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a9fe5c3
    • David Ahern's avatar
      netlink: Return extack message if attribute validation fails · 7861552c
      David Ahern authored
      Have one extack message for parsing and validating.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7861552c
    • Brandon Maier's avatar
      net: phy: xgmiitorgmii: Check read_status results · 8d0752d1
      Brandon Maier authored
      We're ignoring the result of the attached phy device's read_status().
      Return it so we can detect errors.
      Signed-off-by: default avatarBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0752d1
    • Brandon Maier's avatar
      net: phy: xgmiitorgmii: Use correct mdio bus · cf31ea71
      Brandon Maier authored
      The xgmiitorgmii is using the mii_bus of the device it's attached to,
      instead of the bus it was given during probe.
      Signed-off-by: default avatarBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf31ea71
    • Brandon Maier's avatar
      net: phy: xgmiitorgmii: Check phy_driver ready before accessing · ab4e6ee5
      Brandon Maier authored
      Since a phy_device is added to the global mdio_bus list during
      phy_device_register(), but a phy_device's phy_driver doesn't get
      attached until phy_probe(). It's possible of_phy_find_device() in
      xgmiitorgmii will return a valid phy with a NULL phy_driver. Leading to
      a NULL pointer access during the memcpy().
      
      Fixes this Oops:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.40 #1
      Hardware name: Xilinx Zynq Platform
      task: ce4c8d00 task.stack: ce4ca000
      PC is at memcpy+0x48/0x330
      LR is at xgmiitorgmii_probe+0x90/0xe8
      pc : [<c074bc68>]    lr : [<c0529548>]    psr: 20000013
      sp : ce4cbb54  ip : 00000000  fp : ce4cbb8c
      r10: 00000000  r9 : 00000000  r8 : c0c49178
      r7 : 00000000  r6 : cdc14718  r5 : ce762800  r4 : cdc14710
      r3 : 00000000  r2 : 00000054  r1 : 00000000  r0 : cdc14718
      Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 18c5387d  Table: 0000404a  DAC: 00000051
      Process swapper/0 (pid: 1, stack limit = 0xce4ca210)
      ...
      [<c074bc68>] (memcpy) from [<c0529548>] (xgmiitorgmii_probe+0x90/0xe8)
      [<c0529548>] (xgmiitorgmii_probe) from [<c0526a94>] (mdio_probe+0x28/0x34)
      [<c0526a94>] (mdio_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
      [<c04db98c>] (driver_probe_device) from [<c04dbd58>] (__device_attach_driver+0xac/0x10c)
      [<c04dbd58>] (__device_attach_driver) from [<c04d96f4>] (bus_for_each_drv+0x84/0xc8)
      [<c04d96f4>] (bus_for_each_drv) from [<c04db5bc>] (__device_attach+0xd0/0x134)
      [<c04db5bc>] (__device_attach) from [<c04dbdd4>] (device_initial_probe+0x1c/0x20)
      [<c04dbdd4>] (device_initial_probe) from [<c04da8fc>] (bus_probe_device+0x98/0xa0)
      [<c04da8fc>] (bus_probe_device) from [<c04d8660>] (device_add+0x43c/0x5d0)
      [<c04d8660>] (device_add) from [<c0526cb8>] (mdio_device_register+0x34/0x80)
      [<c0526cb8>] (mdio_device_register) from [<c0580b48>] (of_mdiobus_register+0x170/0x30c)
      [<c0580b48>] (of_mdiobus_register) from [<c05349c4>] (macb_probe+0x710/0xc00)
      [<c05349c4>] (macb_probe) from [<c04dd700>] (platform_drv_probe+0x44/0x80)
      [<c04dd700>] (platform_drv_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
      [<c04db98c>] (driver_probe_device) from [<c04dbc58>] (__driver_attach+0x10c/0x118)
      [<c04dbc58>] (__driver_attach) from [<c04d9600>] (bus_for_each_dev+0x8c/0xd0)
      [<c04d9600>] (bus_for_each_dev) from [<c04db1fc>] (driver_attach+0x2c/0x30)
      [<c04db1fc>] (driver_attach) from [<c04daa98>] (bus_add_driver+0x50/0x260)
      [<c04daa98>] (bus_add_driver) from [<c04dc440>] (driver_register+0x88/0x108)
      [<c04dc440>] (driver_register) from [<c04dd6b4>] (__platform_driver_register+0x50/0x58)
      [<c04dd6b4>] (__platform_driver_register) from [<c0b31248>] (macb_driver_init+0x24/0x28)
      [<c0b31248>] (macb_driver_init) from [<c010203c>] (do_one_initcall+0x60/0x1a4)
      [<c010203c>] (do_one_initcall) from [<c0b00f78>] (kernel_init_freeable+0x15c/0x1f8)
      [<c0b00f78>] (kernel_init_freeable) from [<c0763d10>] (kernel_init+0x18/0x124)
      [<c0763d10>] (kernel_init) from [<c0112d74>] (ret_from_fork+0x14/0x20)
      Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
      ---[ end trace 3e4ec21905820a1f ]---
      Signed-off-by: default avatarBrandon Maier <brandon.maier@rockwellcollins.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab4e6ee5
    • David S. Miller's avatar
      Merge branch 'ipsec-selftests-updates' · 26eef11a
      David S. Miller authored
      Shannon Nelson says:
      
      ====================
      Updates for ipsec selftests
      
      Fix up the existing ipsec selftest and add tests for
      the ipsec offload driver API.
      
      v2: addressed formatting nits in netdevsim from Jakub Kicinski
      v3: a couple more nits from Jakub
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26eef11a
    • Shannon Nelson's avatar
      selftests: rtnetlink: add ipsec offload API test · 2766a111
      Shannon Nelson authored
      Using the netdevsim as a device for testing, try out the XFRM commands
      for setting up IPsec hardware offloads.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2766a111