1. 17 Apr, 2018 14 commits
  2. 16 Apr, 2018 21 commits
    • Andrey Ignatov's avatar
      net: Remove unused tcp_set_state tracepoint · ef53e9e1
      Andrey Ignatov authored
      This tracepoint was replaced by inet_sock_set_state in 563e0bb0 and not
      used anywhere in the kernel anymore. Remove it.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef53e9e1
    • David S. Miller's avatar
      Merge branch 'pci-mrrs-consts' · 4c85d2d4
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      PCI: add two more values for PCIe Max_Read_Request_Size and initially use them in r8169 network driver
      
      In r8169 network driver I stumbled across a magic number translating
      to PCI MRRS size 4K. The PCI core is still missing constants for
      values 2K and 4K (as defined in PCI standard).
      
      So let's add these two constants and use the 4K constant in r8169.
      
      Second patch depends on the first one, therefore both patches
      preferrably should go through either PCI or netdev tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c85d2d4
    • Heiner Kallweit's avatar
      r8169: replace magic numbers with PCI MRRS constant · 8d98aa39
      Heiner Kallweit authored
      Replace magic number "0x5 << MAX_READ_REQUEST_SHIFT" with the
      appropriate constant as defined in PCI core.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d98aa39
    • Heiner Kallweit's avatar
      PCI: Add two more values for PCIe Max_Read_Request_Size · a5724fc3
      Heiner Kallweit authored
      This patch adds missing values for the max read request size.
      E.g. network driver r8169 uses a value of 4K.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5724fc3
    • David S. Miller's avatar
      Merge branch 'net-stmmac-Stop-using-hard-coded-callbacks' · 5da8baa3
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      net: stmmac: Stop using hard-coded callbacks
      
      This a starting point for a cleanup and re-organization of stmmac.
      
      In this series we stop using hard-coded callbacks along the code and use
      instead helpers which are defined in a single place ("hwif.h").
      
      This brings several advantages:
      	1) Less typing :)
      	2) Guaranteed function pointer check
      	3) More flexibility
      
      By 2) we stop using the repeated pattern of:
      	if (priv->hw->mac->some_func)
      		priv->hw->mac->some_func(...)
      
      I didn't check but I expect the final .ko will be bigger with this series
      because *all* of function pointers are checked.
      
      Anyway, I hope this can make the code more readable and more flexible now.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5da8baa3
    • Jose Abreu's avatar
      net: stmmac: Switch stmmac_mode_ops to generic HW Interface Helpers · 2c520b1c
      Jose Abreu authored
      Switch stmmac_mode_ops to generic Hardware Interface Helpers instead of
      using hard-coded callbacks. This makes the code more readable and more
      flexible.
      
      No functional change.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c520b1c
    • Jose Abreu's avatar
      net: stmmac: Switch stmmac_hwtimestamp to generic HW Interface Helpers · cc4c9001
      Jose Abreu authored
      Switch stmmac_hwtimestamp to generic Hardware Interface Helpers instead
      of using hard-coded callbacks. This makes the code more readable and
      more flexible.
      
      No functional change.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc4c9001
    • Jose Abreu's avatar
      net: stmmac: Switch stmmac_ops to generic HW Interface Helpers · c10d4c82
      Jose Abreu authored
      Switch stmmac_ops to generic Hardware Interface Helpers instead of using
      hard-coded callbacks. This makes the code more readable and more
      flexible.
      
      No functional change.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c10d4c82
    • Jose Abreu's avatar
      net: stmmac: Switch stmmac_dma_ops to generic HW Interface Helpers · a4e887fa
      Jose Abreu authored
      Switch stmmac_dma_ops to generic Hardware Interface Helpers instead of
      using hard-coded callbacks. This makes the code more readable and more
      flexible.
      
      No functional change.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4e887fa
    • Jose Abreu's avatar
      net: stmmac: Switch stmmac_desc_ops to generic HW Interface Helpers · 42de047d
      Jose Abreu authored
      Switch stmmac_desc_ops to generic Hardware Interface Helpers instead of
      using hard-coded callbacks. This makes the code more readable and more
      flexible.
      
      No functional change.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42de047d
    • David S. Miller's avatar
      Merge branch 'tcp-zero-copy-receive' · 309c446c
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: add zero copy receive
      
      This patch series add mmap() support to TCP sockets for RX zero copy.
      
      While tcp_mmap() patch itself is quite small (~100 LOC), optimal support
      for asynchronous mmap() required better SO_RCVLOWAT behavior, and a
      test program to demonstrate how mmap() on TCP sockets can be used.
      
      Note that mmap() (and associated munmap()) calls are adding more
      pressure on per-process VM semaphore, so might not show benefit
      for processus with high number of threads.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      309c446c
    • Eric Dumazet's avatar
      selftests: net: add tcp_mmap program · 192dc405
      Eric Dumazet authored
      This is a reference program showing how mmap() can be used
      on TCP flows to implement receive zero copy.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      192dc405
    • Eric Dumazet's avatar
      tcp: implement mmap() for zero copy receive · 93ab6cc6
      Eric Dumazet authored
      Some networks can make sure TCP payload can exactly fit 4KB pages,
      with well chosen MSS/MTU and architectures.
      
      Implement mmap() system call so that applications can avoid
      copying data without complex splice() games.
      
      Note that a successful mmap( X bytes) on TCP socket is consuming
      bytes, as if recvmsg() has been done. (tp->copied += X)
      
      Only PROT_READ mappings are accepted, as skb page frags
      are fundamentally shared and read only.
      
      If tcp_mmap() finds data that is not a full page, or a patch of
      urgent data, -EINVAL is returned, no bytes are consumed.
      
      Application must fallback to recvmsg() to read the problematic sequence.
      
      mmap() wont block,  regardless of socket being in blocking or
      non-blocking mode. If not enough bytes are in receive queue,
      mmap() would return -EAGAIN, or -EIO if socket is in a state
      where no other bytes can be added into receive queue.
      
      An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
      to efficiently use mmap()
      
      On the sender side, MSG_EOR might help to clearly separate unaligned
      headers and 4K-aligned chunks if necessary.
      
      Tested:
      
      mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
      MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)
      
      Without mmap() (tcp_mmap -s)
      
      received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
        cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
      received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
        cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
      received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
        cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
      received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
        cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches
      
      With mmap() on receiver (tcp_mmap -s -z)
      
      received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
        cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
        cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
        cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
      received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
        cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93ab6cc6
    • Eric Dumazet's avatar
      tcp: avoid extra wakeups for SO_RCVLOWAT users · 03f45c88
      Eric Dumazet authored
      SO_RCVLOWAT is properly handled in tcp_poll(), so that POLLIN is only
      generated when enough bytes are available in receive queue, after
      David change (commit c7004482 "tcp: Respect SO_RCVLOWAT in tcp_poll().")
      
      But TCP still calls sk->sk_data_ready() for each chunk added in receive
      queue, meaning thread is awaken, and goes back to sleep shortly after.
      
      Tested:
      
      tcp_mmap test program, receiving 32768 MB of data with SO_RCVLOWAT set to 512KB
      
      -> Should get ~2 wakeups (c-switches) per MB, regardless of how many
      (tiny or big) packets were received.
      
      High speed (mostly full size GRO packets)
      
      received 32768 MB (100 % mmap'ed) in 8.03112 s, 34.2266 Gbit,
        cpu usage user:0.037 sys:1.404, 43.9758 usec per MB, 65497 c-switches
      
      received 32768 MB (99.9954 % mmap'ed) in 7.98453 s, 34.4263 Gbit,
        cpu usage user:0.03 sys:1.422, 44.3115 usec per MB, 65485 c-switches
      
      Low speed (sender is ratelimited and sends 1-MSS at a time, so GRO is not helping)
      
      received 22474.5 MB (100 % mmap'ed) in 6015.35 s, 0.0313414 Gbit,
        cpu usage user:0.05 sys:1.586, 72.7952 usec per MB, 44950 c-switches
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03f45c88
    • Eric Dumazet's avatar
      tcp: fix delayed acks behavior for SO_RCVLOWAT · 796f82ea
      Eric Dumazet authored
      We should not delay acks if there are not enough bytes
      in receive queue to satisfy SO_RCVLOWAT.
      
      Since [E]POLLIN event is not going to be generated, there is little
      hope for a delayed ack to be useful.
      
      In fact, delaying ACK prevents sender from completing
      the transfer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      796f82ea
    • Eric Dumazet's avatar
      tcp: fix SO_RCVLOWAT and RCVBUF autotuning · d1361840
      Eric Dumazet authored
      Applications might use SO_RCVLOWAT on TCP socket hoping to receive
      one [E]POLLIN event only when a given amount of bytes are ready in socket
      receive queue.
      
      Problem is that receive autotuning is not aware of this constraint,
      meaning sk_rcvbuf might be too small to allow all bytes to be stored.
      
      Add a new (struct proto_ops)->set_rcvlowat method so that a protocol
      can override the default setsockopt(SO_RCVLOWAT) behavior.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1361840
    • Roman Mashak's avatar
      tc-testing: add sample action tests · 10b19aea
      Roman Mashak authored
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10b19aea
    • Lorenzo Bianconi's avatar
      ipv6: remove unnecessary check in addrconf_prefix_rcv_add_addr() · f85f94b8
      Lorenzo Bianconi authored
      Remove unnecessary check on update_lft variable in
      addrconf_prefix_rcv_add_addr routine since it is always set to 0.
      Moreover remove update_lft re-initialization to 0
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f85f94b8
    • Masahisa KOJIMA's avatar
      net: socionext: reset hardware in ndo_stop · 9a00b697
      Masahisa KOJIMA authored
      When the interface is down, head/tail of the descriptor
      ring address is set to 0 in netsec_netdev_stop().
      But netsec hardware still keeps the previous descriptor
      ring address, so there is inconsistency between driver
      and hardware after interface is up at a later time.
      To address this inconsistency, add netsec_reset_hardware()
      when the interface is down.
      
      In addition, to minimize the reset process,
      add flag to decide whether driver loads the netsec microcode.
      Even if driver resets the netsec hardware, netsec microcode
      keeps resident on RAM, so it is ok we only load the microcode
      at initialization.
      
      This patch is critical for installation over network.
      Signed-off-by: default avatarMasahisa KOJIMA <masahisa.kojima@linaro.org>
      Fixes: 533dd11a ("net: socionext: Add Synquacer NetSec driver")
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a00b697
    • Jassi Brar's avatar
      net: netsec: enable tx-irq during open callback · c009f413
      Jassi Brar authored
      Enable TX-irq as well during ndo_open() as we can not count upon
      RX to arrive early enough to trigger the napi. This patch is critical
      for installation over network.
      
      Fixes: 533dd11a ("net: socionext: Add Synquacer NetSec driver")
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c009f413
    • Ryder Lee's avatar
      net: mediatek: use of_device_get_match_data() · eda7d46d
      Ryder Lee authored
      The usage of of_device_get_match_data() reduce the code size a bit.
      
      Also, the only way to call mtk_probe() is to match an entry in
      of_mtk_match[], so match cannot be NULL.
      Signed-off-by: default avatarRyder Lee <ryder.lee@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eda7d46d
  3. 12 Apr, 2018 5 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5d136594
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) In ip_gre tunnel, handle the conflict between TUNNEL_{SEQ,CSUM} and
          GSO/LLTX properly. From Sabrina Dubroca.
      
       2) Stop properly on error in lan78xx_read_otp(), from Phil Elwell.
      
       3) Don't uncompress in slip before rstate is initialized, from Tejaswi
          Tanikella.
      
       4) When using 1.x firmware on aquantia, issue a deinit before we
          hardware reset the chip, otherwise we break dirty wake WOL. From
          Igor Russkikh.
      
       5) Correct log check in vhost_vq_access_ok(), from Stefan Hajnoczi.
      
       6) Fix ethtool -x crashes in bnxt_en, from Michael Chan.
      
       7) Fix races in l2tp tunnel creation and duplicate tunnel detection,
          from Guillaume Nault.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
        l2tp: fix race in duplicate tunnel detection
        l2tp: fix races in tunnel creation
        tun: send netlink notification when the device is modified
        tun: set the flags before registering the netdevice
        lan78xx: Don't reset the interface on open
        bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().
        bnxt_en: Need to include RDMA rings in bnxt_check_rings().
        bnxt_en: Support max-mtu with VF-reps
        bnxt_en: Ignore src port field in decap filter nodes
        bnxt_en: do not allow wildcard matches for L2 flows
        bnxt_en: Fix ethtool -x crash when device is down.
        vhost: return bool from *_access_ok() functions
        vhost: fix vhost_vq_access_ok() log check
        vhost: Fix vhost_copy_to_user()
        net: aquantia: oops when shutdown on already stopped device
        net: aquantia: Regression on reset with 1.x firmware
        cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
        slip: Check if rstate is initialized before uncompressing
        lan78xx: Avoid spurious kevent 4 "error"
        lan78xx: Correctly indicate invalid OTP
        ...
      5d136594
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 67a7a8ff
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "A few fixes of Xen related core code and drivers"
      
      * tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
        xen/acpi: off by one in read_acpi_id()
        xen/acpi: upload _PSD info for non Dom0 CPUs too
        x86/xen: Delay get_cpu_cap until stack canary is established
        xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END
        xen: xenbus: Catch closing of non existent transactions
        xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
      67a7a8ff
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping · c5c177c5
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix for one swiotlb regression in 2.16 from Takashi"
      
      * tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: fix unexpected swiotlb_alloc_coherent failures
      c5c177c5
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · d1cb7718
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Prevent bus reference leak in mmc_blk_init()
      
        MMC host:
         - tmio: Fix error handling when issuing CMD23
         - jz4740: Fix race condition in IRQ mask update"
      
      * tag 'mmc-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: tmio: Fix error handling when issuing CMD23
        mmc: core: Prevent bus reference leak in mmc_blk_init()
        mmc: jz4740: Fix race condition in IRQ mask update
      d1cb7718
    • Linus Torvalds's avatar
      Merge tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb · cb098d50
      Linus Torvalds authored
      Pull kdb updates from Jason Wessel:
      
       - fix 2032 time access issues and new compiler warnings
      
       - minor regression test cleanup
      
       - formatting fixes for end user use of kdb
      
      * tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
        kdb: use memmove instead of overlapping memcpy
        kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts()
        kdb: bl: don't use tab character in output
        kdb: drop newline in unknown command output
        kdb: make "mdr" command repeat
        kdb: use __ktime_get_real_seconds instead of __current_kernel_time
        misc: kgdbts: Display progress of asynchronous tests
      cb098d50