1. 11 Dec, 2019 3 commits
    • Jon Maloy's avatar
      tipc: introduce variable window congestion control · 16ad3f40
      Jon Maloy authored
      We introduce a simple variable window congestion control for links.
      The algorithm is inspired by the Reno algorithm, covering both 'slow
      start', 'congestion avoidance', and 'fast recovery' modes.
      
      - We introduce hard lower and upper window limits per link, still
        different and configurable per bearer type.
      
      - We introduce a 'slow start theshold' variable, initially set to
        the maximum window size.
      
      - We let a link start at the minimum congestion window, i.e. in slow
        start mode, and then let is grow rapidly (+1 per rceived ACK) until
        it reaches the slow start threshold and enters congestion avoidance
        mode.
      
      - In congestion avoidance mode we increment the congestion window for
        each window-size number of acked packets, up to a possible maximum
        equal to the configured maximum window.
      
      - For each non-duplicate NACK received, we drop back to fast recovery
        mode, by setting the both the slow start threshold to and the
        congestion window to (current_congestion_window / 2).
      
      - If the timeout handler finds that the transmit queue has not moved
        since the previous timeout, it drops the link back to slow start
        and forces a probe containing the last sent sequence number to the
        sent to the peer, so that this can discover the stale situation.
      
      This change does in reality have effect only on unicast ethernet
      transport, as we have seen that there is no room whatsoever for
      increasing the window max size for the UDP bearer.
      For now, we also choose to keep the limits for the broadcast link
      unchanged and equal.
      
      This algorithm seems to give a 50-100% throughput improvement for
      messages larger than MTU.
      Suggested-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16ad3f40
    • Jon Maloy's avatar
      tipc: eliminate more unnecessary nacks and retransmissions · d3b09995
      Jon Maloy authored
      When we increase the link tranmsit window we often observe the following
      scenario:
      
      1) A STATE message bypasses a sequence of traffic packets and arrives
         far ahead of those to the receiver. STATE messages contain a
         'peers_nxt_snt' field to indicate which was the last packet sent
         from the peer. This mechanism is intended as a last resort for the
         receiver to detect missing packets, e.g., during very low traffic
         when there is no packet flow to help early loss detection.
      3) The receiving link compares the 'peer_nxt_snt' field to its own
         'rcv_nxt', finds that there is a gap, and immediately sends a
         NACK message back to the peer.
      4) When this NACKs arrives at the sender, all the requested
         retransmissions are performed, since it is a first-time request.
      
      Just like in the scenario described in the previous commit this leads
      to many redundant retransmissions, with decreased throughput as a
      consequence.
      
      We fix this by adding two more conditions before we send a NACK in
      this sitution. First, the deferred queue must be empty, so we cannot
      assume that the potential packet loss has already been detected by
      other means. Second, we check the 'peers_snd_nxt' field only in probe/
      probe_reply messages, thus turning this into a true mechanism of last
      resort as it was really meant to be.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3b09995
    • Jon Maloy's avatar
      tipc: eliminate gap indicator from ACK messages · 02288248
      Jon Maloy authored
      When we increase the link send window we sometimes observe the
      following scenario:
      
      1) A packet #N arrives out of order far ahead of a sequence of older
         packets which are still under way. The packet is added to the
         deferred queue.
      2) The missing packets arrive in sequence, and for each 16th of them
         an ACK is sent back to the receiver, as it should be.
      3) When building those ACK messages, it is checked if there is a gap
         between the link's 'rcv_nxt' and the first packet in the deferred
         queue. This is always the case until packet number #N-1 arrives, and
         a 'gap' indicator is added, effectively turning them into NACK
         messages.
      4) When those NACKs arrive at the sender, all the requested
         retransmissions are done, since it is a first-time request.
      
      This sometimes leads to a huge amount of redundant retransmissions,
      causing a drop in max throughput. This problem gets worse when we
      in a later commit introduce variable window congestion control,
      since it drops the link back to 'fast recovery' much more often
      than necessary.
      
      We now fix this by not sending any 'gap' indicator in regular ACK
      messages. We already have a mechanism for sending explicit NACKs
      in place, and this is sufficient to keep up the packet flow.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02288248
  2. 10 Dec, 2019 8 commits
  3. 09 Dec, 2019 5 commits
    • Russell King's avatar
      net: sfp: avoid tx-fault with Nokia GPON module · 26c97a2d
      Russell King authored
      The Nokia GPON module can hold tx-fault active while it is initialising
      which can take up to 60s. Avoid this causing the module to be declared
      faulty after the SFP MSA defined non-cooled module timeout.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c97a2d
    • Colin Ian King's avatar
      qed: remove redundant assignments to rc · e70ac628
      Colin Ian King authored
      The variable rc is assigned with a value that is never read and
      it is re-assigned a new value later on.  The assignment is redundant
      and can be removed.  Clean up multiple occurrances of this pattern.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e70ac628
    • Mao Wenan's avatar
      NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu(). · 718eae27
      Mao Wenan authored
      Convert cpu_to_le16(le16_to_cpu(frame->datalen) + len) to
      use le16_add_cpu(), which is more concise and does the same thing.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      718eae27
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · 4a63ef71
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2019-12-09
      
      Here's the first bluetooth-next pull request for 5.6:
      
       - Devicetree bindings updates for Broadcom controllers
       - Add support for PCM configuration for Broadcom controllers
       - btusb: Fixes for Realtek devices
       - butsb: A few other smaller fixes (mem leak & non-atomic allocation issue)
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a63ef71
    • Jason A. Donenfeld's avatar
      net: WireGuard secure network tunnel · e7096c13
      Jason A. Donenfeld authored
      WireGuard is a layer 3 secure networking tunnel made specifically for
      the kernel, that aims to be much simpler and easier to audit than IPsec.
      Extensive documentation and description of the protocol and
      considerations, along with formal proofs of the cryptography, are
      available at:
      
        * https://www.wireguard.com/
        * https://www.wireguard.com/papers/wireguard.pdf
      
      This commit implements WireGuard as a simple network device driver,
      accessible in the usual RTNL way used by virtual network drivers. It
      makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
      networking subsystem APIs. It has a somewhat novel multicore queueing
      system designed for maximum throughput and minimal latency of encryption
      operations, but it is implemented modestly using workqueues and NAPI.
      Configuration is done via generic Netlink, and following a review from
      the Netlink maintainer a year ago, several high profile userspace tools
      have already implemented the API.
      
      This commit also comes with several different tests, both in-kernel
      tests and out-of-kernel tests based on network namespaces, taking profit
      of the fact that sockets used by WireGuard intentionally stay in the
      namespace the WireGuard interface was originally created, exactly like
      the semantics of userspace tun devices. See wireguard.com/netns/ for
      pictures and examples.
      
      The source code is fairly short, but rather than combining everything
      into a single file, WireGuard is developed as cleanly separable files,
      making auditing and comprehension easier. Things are laid out as
      follows:
      
        * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
          cryptographic aspects of the protocol, and are mostly data-only in
          nature, taking in buffers of bytes and spitting out buffers of
          bytes. They also handle reference counting for their various shared
          pieces of data, like keys and key lists.
      
        * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
          ratelimiting certain types of cryptographic operations in accordance
          with particular WireGuard semantics.
      
        * allowedips.[ch], peerlookup.[ch]: The main lookup structures of
          WireGuard, the former being trie-like with particular semantics, an
          integral part of the design of the protocol, and the latter just
          being nice helper functions around the various hashtables we use.
      
        * device.[ch]: Implementation of functions for the netdevice and for
          rtnl, responsible for maintaining the life of a given interface and
          wiring it up to the rest of WireGuard.
      
        * peer.[ch]: Each interface has a list of peers, with helper functions
          available here for creation, destruction, and reference counting.
      
        * socket.[ch]: Implementation of functions related to udp_socket and
          the general set of kernel socket APIs, for sending and receiving
          ciphertext UDP packets, and taking care of WireGuard-specific sticky
          socket routing semantics for the automatic roaming.
      
        * netlink.[ch]: Userspace API entry point for configuring WireGuard
          peers and devices. The API has been implemented by several userspace
          tools and network management utility, and the WireGuard project
          distributes the basic wg(8) tool.
      
        * queueing.[ch]: Shared function on the rx and tx path for handling
          the various queues used in the multicore algorithms.
      
        * send.c: Handles encrypting outgoing packets in parallel on
          multiple cores, before sending them in order on a single core, via
          workqueues and ring buffers. Also handles sending handshake and cookie
          messages as part of the protocol, in parallel.
      
        * receive.c: Handles decrypting incoming packets in parallel on
          multiple cores, before passing them off in order to be ingested via
          the rest of the networking subsystem with GRO via the typical NAPI
          poll function. Also handles receiving handshake and cookie messages
          as part of the protocol, in parallel.
      
        * timers.[ch]: Uses the timer wheel to implement protocol particular
          event timeouts, and gives a set of very simple event-driven entry
          point functions for callers.
      
        * main.c, version.h: Initialization and deinitialization of the module.
      
        * selftest/*.h: Runtime unit tests for some of the most security
          sensitive functions.
      
        * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
          script using network namespaces.
      
      This commit aims to be as self-contained as possible, implementing
      WireGuard as a standalone module not needing much special handling or
      coordination from the network subsystem. I expect for future
      optimizations to the network stack to positively improve WireGuard, and
      vice-versa, but for the time being, this exists as intentionally
      standalone.
      
      We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
      verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7096c13
  4. 08 Dec, 2019 12 commits
    • Linus Torvalds's avatar
      Linux 5.5-rc1 · e42617b8
      Linus Torvalds authored
      e42617b8
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 95e6ba51
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) More jumbo frame fixes in r8169, from Heiner Kallweit.
      
       2) Fix bpf build in minimal configuration, from Alexei Starovoitov.
      
       3) Use after free in slcan driver, from Jouni Hogander.
      
       4) Flower classifier port ranges don't work properly in the HW offload
          case, from Yoshiki Komachi.
      
       5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.
      
       6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.
      
       7) Fix flow dissection in dsa TX path, from Alexander Lobakin.
      
       8) Stale syncookie timestampe fixes from Guillaume Nault.
      
      [ Did an evil merge to silence a warning introduced by this pull - Linus ]
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
        r8169: fix rtl_hw_jumbo_disable for RTL8168evl
        net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
        r8169: add missing RX enabling for WoL on RTL8125
        vhost/vsock: accept only packets with the right dst_cid
        net: phy: dp83867: fix hfs boot in rgmii mode
        net: ethernet: ti: cpsw: fix extra rx interrupt
        inet: protect against too small mtu values.
        gre: refetch erspan header from skb->data after pskb_may_pull()
        pppoe: remove redundant BUG_ON() check in pppoe_pernet
        tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
        tcp: tighten acceptance of ACKs not matching a child socket
        tcp: fix rejected syncookies due to stale timestamps
        lpc_eth: kernel BUG on remove
        tcp: md5: fix potential overestimation of TCP option space
        net: sched: allow indirect blocks to bind to clsact in TC
        net: core: rename indirect block ingress cb function
        net-sysfs: Call dev_hold always in netdev_queue_add_kobject
        net: dsa: fix flow dissection on Tx path
        net/tls: Fix return values to avoid ENOTSUPP
        net: avoid an indirect call in ____sys_recvmsg()
        ...
      95e6ba51
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 138f371d
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "Eleven patches, all in drivers (no core changes) that are either minor
        cleanups or small fixes.
      
        They were late arriving, but still safe for -rc1"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry
        scsi: megaraid_sas: Make poll_aen_lock static
        scsi: sd_zbc: Improve report zones error printout
        scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
        scsi: qla2xxx: unregister ports after GPN_FT failure
        scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
        scsi: pm80xx: Remove unused include of linux/version.h
        scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3
        scsi: scsi_transport_sas: Fix memory leak when removing devices
        scsi: lpfc: size cpu map by last cpu id set
        scsi: ibmvscsi_tgt: Remove unneeded variable rc
      138f371d
    • Linus Torvalds's avatar
      Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · a78f7cdd
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Nine cifs/smb3 fixes:
      
         - one fix for stable (oops during oplock break)
      
         - two timestamp fixes including important one for updating mtime at
           close to avoid stale metadata caching issue on dirty files (also
           improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
           wire)
      
         - two fixes for "modefromsid" mount option for file create (now
           allows mode bits to be set more atomically and accurately on create
           by adding "sd_context" on create when modefromsid specified on
           mount)
      
         - two fixes for multichannel found in testing this week against
           different servers
      
         - two small cleanup patches"
      
      * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: improve check for when we send the security descriptor context on create
        smb3: fix mode passed in on create for modetosid mount option
        cifs: fix possible uninitialized access and race on iface_list
        cifs: Fix lookup of SMB connections on multichannel
        smb3: query attributes on file close
        smb3: remove unused flag passed into close functions
        cifs: remove redundant assignment to pointer pneg_ctxt
        fs: cifs: Fix atime update check vs mtime
        CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
      a78f7cdd
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5bf9a06a
      Linus Torvalds authored
      Pull misc vfs cleanups from Al Viro:
       "No common topic, just three cleanups".
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        make __d_alloc() static
        fs/namespace: add __user to open_tree and move_mount syscalls
        fs/fnctl: fix missing __user in fcntl_rw_hint()
      5bf9a06a
    • Linus Torvalds's avatar
      Merge tag 'ntb-5.5' of git://github.com/jonmason/ntb · 9455d25f
      Linus Torvalds authored
      Pull NTB update from Jon Mason:
       "Just a simple patch to add a new Hygon Device ID to the AMD NTB device
        driver"
      
      * tag 'ntb-5.5' of git://github.com/jonmason/ntb:
        NTB: Add Hygon Device ID
      9455d25f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 73721451
      Linus Torvalds authored
      Pull more input updates from Dmitry Torokhov:
      
       - fixups for Synaptics RMI4 driver
      
       - a quirk for Goodinx touchscreen on Teclast tablet
      
       - a new keycode definition for activating privacy screen feature found
         on a few "enterprise" laptops
      
       - updates to snvs_pwrkey driver
      
       - polling uinput device for writing (which is always allowed) now works
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
        Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
        Input: goodix - add upside-down quirk for Teclast X89 tablet
        Input: add privacy screen toggle keycode
        Input: uinput - fix returning EPOLLOUT from uinput_poll
        Input: snvs_pwrkey - remove gratuitous NULL initializers
        Input: snvs_pwrkey - send key events for i.MX6 S, DL and Q
      73721451
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 95207d55
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
       "Fix a race condition and a use-after-free error:
      
         - Fix a UAF when reporting writeback errors
      
         - Fix a race condition when handling page uptodate on fragmented file
           with blocksize < pagesize"
      
      * tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: stop using ioend after it's been freed in iomap_finish_ioend()
        iomap: fix sub-page uptodate handling
      95207d55
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 50caca9d
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Fix a couple of resource management errors and a hang:
      
         - fix a crash in the log setup code when log mounting fails
      
         - fix a hang when allocating space on the realtime device
      
         - fix a block leak when freeing space on the realtime device"
      
      * tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix mount failure crash on invalid iclog memory access
        xfs: don't check for AG deadlock for realtime files in bunmapi
        xfs: fix realtime file data space leak
      50caca9d
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 316933cf
      Linus Torvalds authored
      Pull orangefs update from Mike Marshall:
       "orangefs: posix open permission checking...
      
        Orangefs has no open, and orangefs checks file permissions on each
        file access. Posix requires that file permissions be checked on open
        and nowhere else. Orangefs-through-the-kernel needs to seem posix
        compliant.
      
        The VFS opens files, even if the filesystem provides no method. We can
        see if a file was successfully opened for read and or for write by
        looking at file->f_mode.
      
        When writes are flowing from the page cache, file is no longer
        available. We can trust the VFS to have checked file->f_mode before
        writing to the page cache.
      
        The mode of a file might change between when it is opened and IO
        commences, or it might be created with an arbitrary mode.
      
        We'll make sure we don't hit EACCES during the IO stage by using
        UID 0"
      
      [ This is "posixish", but not a great solution in the long run, since a
        proper secure network server shouldn't really trust the client like this.
        But proper and secure POSIX behavior requires an open method and a
        resulting cookie for IO of some kind, or similar.    - Linus ]
      
      * tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: posix open permission checking...
      316933cf
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux · 911d137a
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "This is a relatively quiet cycle for nfsd, mainly various bugfixes.
      
        Possibly most interesting is Trond's fixes for some callback races
        that were due to my incomplete understanding of rpc client shutdown.
        Unfortunately at the last minute I've started noticing a new
        intermittent failure to send callbacks. As the logic seems basically
        correct, I'm leaving Trond's patches in for now, and hope to find a
        fix in the next week so I don't have to revert those patches"
      
      * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
        nfsd: depend on CRYPTO_MD5 for legacy client tracking
        NFSD fixing possible null pointer derefering in copy offload
        nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
        nfsd: Ensure CLONE persists data and metadata changes to the target file
        SUNRPC: Fix backchannel latency metrics
        nfsd: restore NFSv3 ACL support
        nfsd: v4 support requires CRYPTO_SHA256
        nfsd: Fix cld_net->cn_tfm initialization
        lockd: remove __KERNEL__ ifdefs
        sunrpc: remove __KERNEL__ ifdefs
        race in exportfs_decode_fh()
        nfsd: Drop LIST_HEAD where the variable it declares is never used.
        nfsd: document callback_wq serialization of callback code
        nfsd: mark cb path down on unknown errors
        nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
        nfsd: minor 4.1 callback cleanup
        SUNRPC: Fix svcauth_gss_proxy_init()
        SUNRPC: Trace gssproxy upcall results
        sunrpc: fix crash when cache_head become valid before update
        nfsd: remove private bin2hex implementation
        ...
      911d137a
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fb9bf40c
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Highlights include:
      
        Features:
      
         - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded
           copy of a file from one source server to a different target
           server).
      
         - New RDMA tracepoints for debugging congestion control and Local
           Invalidate WRs.
      
        Bugfixes and cleanups
      
         - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for
           layoutreturn
      
         - Handle bad/dead sessions correctly in nfs41_sequence_process()
      
         - Various bugfixes to the delegation return operation.
      
         - Various bugfixes pertaining to delegations that have been revoked.
      
         - Cleanups to the NFS timespec code to avoid unnecessary conversions
           between timespec and timespec64.
      
         - Fix unstable RDMA connections after a reconnect
      
         - Close race between waking an RDMA sender and posting a receive
      
         - Wake pending RDMA tasks if connection fails
      
         - Fix MR list corruption, and clean up MR usage
      
         - Fix another RPCSEC_GSS issue with MIC buffer space"
      
      * tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
        SUNRPC: Capture completion of all RPC tasks
        SUNRPC: Fix another issue with MIC buffer space
        NFS4: Trace lock reclaims
        NFS4: Trace state recovery operation
        NFSv4.2 fix memory leak in nfs42_ssc_open
        NFSv4.2 fix kfree in __nfs42_copy_file_range
        NFS: remove duplicated include from nfs4file.c
        NFSv4: Make _nfs42_proc_copy_notify() static
        NFS: Fallocate should use the nfs4_fattr_bitmap
        NFS: Return -ETXTBSY when attempting to write to a swapfile
        fs: nfs: sysfs: Remove NULL check before kfree
        NFS: remove unneeded semicolon
        NFSv4: add declaration of current_stateid
        NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
        NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
        nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list
        SUNRPC: Avoid RPC delays when exiting suspend
        NFS: Add a tracepoint in nfs_fh_to_dentry()
        NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
        NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
        ...
      fb9bf40c
  5. 07 Dec, 2019 12 commits
    • Steve French's avatar
      smb3: improve check for when we send the security descriptor context on create · 231e2a0b
      Steve French authored
      We had cases in the previous patch where we were sending the security
      descriptor context on SMB3 open (file create) in cases when we hadn't
      mounted with with "modefromsid" mount option.
      
      Add check for that mount flag before calling ad_sd_context in
      open init.
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      231e2a0b
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio · 94e89b40
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
      
       - Remove hugepage checks for reserved pfns (Ben Luo)
      
       - Fix irq-bypass unregister ordering (Jiang Yi)
      
      * tag 'vfio-v5.5-rc1' of git://github.com/awilliam/linux-vfio:
        vfio/pci: call irq_bypass_unregister_producer() before freeing irq
        vfio/type1: remove hugepage checks in is_invalid_reserved_pfn()
      94e89b40
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.5b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · f74fd13f
      Linus Torvalds authored
      Pull more xen updates from Juergen Gross:
      
       - a patch to fix a build warning
      
       - a cleanup of no longer needed code in the Xen event handling
      
       - a small series for the Xen grant driver avoiding high order
         allocations and replacing an insane global limit by a per-call one
      
       - a small series fixing Xen frontend/backend module referencing
      
      * tag 'for-linus-5.5b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen-blkback: allow module to be cleanly unloaded
        xen/xenbus: reference count registered modules
        xen/gntdev: switch from kcalloc() to kvcalloc()
        xen/gntdev: replace global limit of mapped pages by limit per call
        xen/gntdev: remove redundant non-zero check on ret
        xen/events: remove event handling recursion detection
      f74fd13f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 6dc517a3
      Linus Torvalds authored
      Merge misc Kconfig updates from Andrew Morton:
       "A number of changes to Kconfig files under lib/ from Changbin Du and
        Krzysztof Kozlowski"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib/: fix Kconfig indentation
        kernel-hacking: move DEBUG_FS to 'Generic Kernel Debugging Instruments'
        kernel-hacking: move DEBUG_BUGVERBOSE to 'printk and dmesg options'
        kernel-hacking: create a submenu for scheduler debugging options
        kernel-hacking: move SCHED_STACK_END_CHECK after DEBUG_STACK_USAGE
        kernel-hacking: move Oops into 'Lockups and Hangs'
        kernel-hacking: move kernel testing and coverage options to same submenu
        kernel-hacking: group kernel data structures debugging together
        kernel-hacking: create submenu for arch special debugging options
        kernel-hacking: group sysrq/kgdb/ubsan into 'Generic Kernel Debugging Instruments'
      6dc517a3
    • Heiner Kallweit's avatar
      r8169: fix rtl_hw_jumbo_disable for RTL8168evl · 0fc75219
      Heiner Kallweit authored
      In referenced fix we removed the RTL8168e-specific jumbo config for
      RTL8168evl in rtl_hw_jumbo_enable(). We have to do the same in
      rtl_hw_jumbo_disable().
      
      v2: fix referenced commit id
      
      Fixes: 14012c9f ("r8169: fix jumbo configuration for RTL8168evl")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fc75219
    • Linus Torvalds's avatar
      pipe: don't use 'pipe_wait() for basic pipe IO · 85190d15
      Linus Torvalds authored
      pipe_wait() may be simple, but since it relies on the pipe lock, it
      means that we have to do the wakeup while holding the lock.  That's
      unfortunate, because the very first thing the waked entity will want to
      do is to get the pipe lock for itself.
      
      So get rid of the pipe_wait() usage by simply releasing the pipe lock,
      doing the wakeup (if required) and then using wait_event_interruptible()
      to wait on the right condition instead.
      
      wait_event_interruptible() handles races on its own by comparing the
      wakeup condition before and after adding itself to the wait queue, so
      you can use an optimistic unlocked condition for it.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85190d15
    • Jiasen Lin's avatar
      NTB: Add Hygon Device ID · 9b5b99a8
      Jiasen Lin authored
      Signed-off-by: default avatarJiasen Lin <linjiasen@hygon.cn>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      9b5b99a8
    • Linus Torvalds's avatar
      pipe: remove 'waiting_writers' merging logic · a28c8b9d
      Linus Torvalds authored
      This code is ancient, and goes back to when we only had a single page
      for the pipe buffers.  The exact history is hidden in the mists of time
      (ie "before git", and in fact predates the BK repository too).
      
      At that long-ago point in time, it actually helped to try to merge big
      back-and-forth pipe reads and writes, and not limit pipe reads to the
      single pipe buffer in length just because that was all we had at a time.
      
      However, since then we've expanded the pipe buffers to multiple pages,
      and this logic really doesn't seem to make sense.  And a lot of it is
      somewhat questionable (ie "hmm, the user asked for a non-blocking read,
      but we see that there's a writer pending, so let's wait anyway to get
      the extra data that the writer will have").
      
      But more importantly, it makes the "go to sleep" logic much less
      obvious, and considering the wakeup issues we've had, I want to make for
      less of those kinds of things.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a28c8b9d
    • Linus Torvalds's avatar
      pipe: fix and clarify pipe read wakeup logic · f467a6a6
      Linus Torvalds authored
      This is the read side version of the previous commit: it simplifies the
      logic to only wake up waiting writers when necessary, and makes sure to
      use a synchronous wakeup.  This time not so much for GNU make jobserver
      reasons (that pipe never fills up), but simply to get the writer going
      quickly again.
      
      A bit less verbose commentary this time, if only because I assume that
      the write side commentary isn't going to be ignored if you touch this
      code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f467a6a6
    • Linus Torvalds's avatar
      pipe: fix and clarify pipe write wakeup logic · 1b6b26ae
      Linus Torvalds authored
      The pipe rework ends up having been extra painful, partly becaused of
      actual bugs with ordering and caching of the pipe state, but also
      because of subtle performance issues.
      
      In particular, the pipe rework caused the kernel build to inexplicably
      slow down.
      
      The reason turns out to be that the GNU make jobserver (which limits the
      parallelism of the build) uses a pipe to implement a "token" system: a
      parallel submake will read a character from the pipe to get the job
      token before starting a new job, and will write a character back to the
      pipe when it is done.  The overall job limit is thus easily controlled
      by just writing the appropriate number of initial token characters into
      the pipe.
      
      But to work well, that really means that the old behavior of write
      wakeups being synchronous (WF_SYNC) is very important - when the pipe
      writer wakes up a reader, we want the reader to actually get scheduled
      immediately.  Otherwise you lose the parallelism of the build.
      
      The pipe rework lost that synchronous wakeup on write, and we had
      clearly all forgotten the reasons and rules for it.
      
      This rewrites the pipe write wakeup logic to do the required Wsync
      wakeups, but also clarifies the logic and avoids extraneous wakeups.
      
      It also ends up addign a number of comments about what oit does and why,
      so that we hopefully don't end up forgetting about this next time we
      change this code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b6b26ae
    • Eric Dumazet's avatar
      net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() · 2dd5616e
      Eric Dumazet authored
      Use the new tcf_proto_check_kind() helper to make sure user
      provided value is well formed.
      
      BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
      BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
      CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
       __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
       string_nocheck lib/vsprintf.c:606 [inline]
       string+0x4be/0x600 lib/vsprintf.c:668
       vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
       __request_module+0x2b1/0x11c0 kernel/kmod.c:143
       tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
       tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
       tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
       rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
       netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45a649
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
      RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
      RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
      R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
       kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
       kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
       slab_alloc_node mm/slub.c:2773 [inline]
       __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
       netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 6f96c3c6 ("net_sched: fix backward compatibility for TCA_KIND")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dd5616e
    • Heiner Kallweit's avatar
      r8169: add missing RX enabling for WoL on RTL8125 · 00222d13
      Heiner Kallweit authored
      RTL8125 also requires to enable RX for WoL.
      
      v2: add missing Fixes tag
      
      Fixes: f1bce4ad ("r8169: add support for RTL8125")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00222d13