1. 17 Jun, 2020 16 commits
    • David Howells's avatar
      rxrpc: Fix afs large storage transmission performance drop · 02c28dff
      David Howells authored
      Commit 2ad6691d, which moved the modification of the status annotation
      for a packet in the Tx buffer prior to the retransmission moved the state
      clearance, but managed to lose the bit that set it to UNACK.
      
      Consequently, if a retransmission occurs, the packet is accidentally
      changed to the ACK state (ie. 0) by masking it off, which means that the
      packet isn't counted towards the tally of newly-ACK'd packets if it gets
      hard-ACK'd.  This then prevents the congestion control algorithm from
      recovering properly.
      
      Fix by reinstating the change of state to UNACK.
      
      Spotted by the generic/460 xfstest.
      
      Fixes: 2ad6691d ("rxrpc: Fix race between incoming ACK parser and retransmitter")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      02c28dff
    • David Howells's avatar
      rxrpc: Fix handling of rwind from an ACK packet · a2ad7c21
      David Howells authored
      The handling of the receive window size (rwind) from a received ACK packet
      is not correct.  The rxrpc_input_ackinfo() function currently checks the
      current Tx window size against the rwind from the ACK to see if it has
      changed, but then limits the rwind size before storing it in the tx_winsize
      member and, if it increased, wake up the transmitting process.  This means
      that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
      followed.
      
      Fix this by limiting rwind before we compare it to tx_winsize.
      
      The effect of this can be seen by enabling the rxrpc_rx_rwind_change
      tracepoint.
      
      Fixes: 702f2ac8 ("rxrpc: Wake up the transmitter if Rx window size increases on the peer")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a2ad7c21
    • David Howells's avatar
      rxrpc: Fix trace string · aadf9dce
      David Howells authored
      The trace symbol printer (__print_symbolic()) ignores symbols that map to
      an empty string and prints the hex value instead.
      
      Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid
      this.
      
      Fixes: b54a134a ("rxrpc: Fix handling of enums-to-string translation in tracing")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      aadf9dce
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b9d37bbb
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-06-17
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 10 non-merge commits during the last 2 day(s) which contain
      a total of 14 files changed, 158 insertions(+), 59 deletions(-).
      
      The main changes are:
      
      1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.
      
      2) [gs]etsockopt fix for large optlen, from Stanislav.
      
      3) devmap allocation fix, from Toke.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9d37bbb
    • Stanislav Fomichev's avatar
      bpf: Document optval > PAGE_SIZE behavior for sockopt hooks · 8030e250
      Stanislav Fomichev authored
      Extend existing doc with more details about requiring ctx->optlen = 0
      for handling optval > PAGE_SIZE.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-3-sdf@google.com
      8030e250
    • Stanislav Fomichev's avatar
      selftests/bpf: Make sure optvals > PAGE_SIZE are bypassed · a0cb12b0
      Stanislav Fomichev authored
      We are relying on the fact, that we can pass > sizeof(int) optvals
      to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
      In the BPF program we check that we can only touch PAGE_SIZE bytes,
      but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
      some predefined value and trim the optlen.
      
      Also, let's modify exiting IP_TOS usecase to test optlen=0 case
      where BPF program just bypasses the data as is.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
      a0cb12b0
    • Stanislav Fomichev's avatar
      bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE · d8fe449a
      Stanislav Fomichev authored
      Attaching to these hooks can break iptables because its optval is
      usually quite big, or at least bigger than the current PAGE_SIZE limit.
      David also mentioned some SCTP options can be big (around 256k).
      
      For such optvals we expose only the first PAGE_SIZE bytes to
      the BPF program. BPF program has two options:
      1. Set ctx->optlen to 0 to indicate that the BPF's optval
         should be ignored and the kernel should use original userspace
         value.
      2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
      
      v5:
      * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
      * update the docs accordingly
      
      v4:
      * use temporary buffer to avoid optval == optval_end == NULL;
        this removes the corner case in the verifier that might assume
        non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
      
      v3:
      * don't increase the limit, bypass the argument
      
      v2:
      * proper comments formatting (Jakub Kicinski)
      
      Fixes: 0d01da6a ("bpf: implement getsockopt and setsockopt hooks")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
      d8fe449a
    • Toke Høiland-Jørgensen's avatar
      devmap: Use bpf_map_area_alloc() for allocating hash buckets · 99c51064
      Toke Høiland-Jørgensen authored
      Syzkaller discovered that creating a hash of type devmap_hash with a large
      number of entries can hit the memory allocator limit for allocating
      contiguous memory regions. There's really no reason to use kmalloc_array()
      directly in the devmap code, so just switch it to the existing
      bpf_map_area_alloc() function that is used elsewhere.
      
      Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
      99c51064
    • Hangbin Liu's avatar
      xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame() · 3ff23516
      Hangbin Liu authored
      In commit 34cc0b33 we only handled the frame_sz in convert_to_xdp_frame().
      This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame().
      
      Fixes: 34cc0b33 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
      3ff23516
    • Tobias Klauser's avatar
      tools, bpftool: Add ringbuf map type to map command docs · 1c7fb20d
      Tobias Klauser authored
      Commit c34a06c5 ("tools/bpftool: Add ringbuf map to a list of known
      map types") added the symbolic "ringbuf" name. Document it in the bpftool
      map command docs and usage as well.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616113303.8123-1-tklauser@distanz.ch
      1c7fb20d
    • Andrii Nakryiko's avatar
      bpf: bpf_probe_read_kernel_str() has to return amount of data read on success · 02553b91
      Andrii Nakryiko authored
      During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on
      success, instead of amount of data successfully read. This majorly breaks
      applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str()
      and their results. Fix this by returning actual number of bytes read.
      
      Fixes: 8d92db5c ("bpf: rework the compat kernel probe handling")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200616050432.1902042-1-andriin@fb.com
      02553b91
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 69119673
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't get per-cpu pointer with preemption enabled in nft_set_pipapo,
          fix from Stefano Brivio.
      
       2) Fix memory leak in ctnetlink, from Pablo Neira Ayuso.
      
       3) Multiple definitions of MPTCP_PM_MAX_ADDR, from Geliang Tang.
      
       4) Accidently disabling NAPI in non-error paths of macb_open(), from
          Charles Keepax.
      
       5) Fix races between alx_stop and alx_remove, from Zekun Shen.
      
       6) We forget to re-enable SRIOV during resume in bnxt_en driver, from
          Michael Chan.
      
       7) Fix memory leak in ipv6_mc_destroy_dev(), from Wang Hai.
      
       8) rxtx stats use wrong index in mvpp2 driver, from Sven Auhagen.
      
       9) Fix memory leak in mptcp_subflow_create_socket error path, from Wei
          Yongjun.
      
      10) We should not adjust the TCP window advertised when sending dup acks
          in non-SACK mode, because it won't be counted as a dup by the sender
          if the window size changes. From Eric Dumazet.
      
      11) Destroy the right number of queues during remove in mvpp2 driver,
          from Sven Auhagen.
      
      12) Various WOL and PM fixes to e1000 driver, from Chen Yu, Vaibhav
          Gupta, and Arnd Bergmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
        e1000e: fix unused-function warning
        e1000: use generic power management
        e1000e: Do not wake up the system via WOL if device wakeup is disabled
        lan743x: add MODULE_DEVICE_TABLE for module loading alias
        mlxsw: spectrum: Adjust headroom buffers for 8x ports
        bareudp: Fixed configuration to avoid having garbage values
        mvpp2: remove module bugfix
        tcp: grow window for OOO packets only for SACK flows
        mptcp: fix memory leak in mptcp_subflow_create_socket()
        netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline
        net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline
        net: dsa: sja1105: fix PTP timestamping with large tc-taprio cycles
        mvpp2: ethtool rxtx stats fix
        MAINTAINERS: switch to my private email for Renesas Ethernet drivers
        rocker: fix incorrect error handling in dma_rings_init
        test_objagg: Fix potential memory leak in error handling
        net: ethernet: mtk-star-emac: simplify interrupt handling
        mld: fix memory leak in ipv6_mc_destroy_dev()
        bnxt_en: Return from timer if interface is not in open state.
        bnxt_en: Fix AER reset logic on 57500 chips.
        ...
      69119673
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 26c20ffc
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "I've managed to get xfstests kind of working with afs. Here are a set
        of patches that fix most of the bugs found.
      
        There are a number of primary issues:
      
         - Incorrect handling of mtime and non-handling of ctime. It might be
           argued, that the latter isn't a bug since the AFS protocol doesn't
           support ctime, but I should probably still update it locally.
      
         - Shared-write mmap, truncate and writeback bugs. This includes not
           changing i_size under the callback lock, overwriting local i_size
           with the reply from the server after a partial writeback, not
           limiting the writeback from an mmapped page to EOF.
      
         - Checks for an abort code indicating that the primary vnode in an
           operation was deleted by a third-party are done in the wrong place.
      
         - Silly rename bugs. This includes an incomplete conversion to the
           new operation handling, duplicate nlink handling, nlink changing
           not being done inside the callback lock and insufficient handling
           of third-party conflicting directory changes.
      
        And some secondary ones:
      
         - The UAEOVERFLOW abort code should map to EOVERFLOW not EREMOTEIO.
      
         - Remove a couple of unused or incompletely used bits.
      
         - Remove a couple of redundant success checks.
      
        These seem to fix all the data-corruption bugs found by
      
      	./check -afs -g quick
      
        along with the obvious silly rename bugs and time bugs.
      
        There are still some test failures, but they seem to fall into two
        classes: firstly, the authentication/security model is different to
        the standard UNIX model and permission is arbitrated by the server and
        cached locally; and secondly, there are a number of features that AFS
        does not support (such as mknod). But in these cases, the tests
        themselves need to be adapted or skipped.
      
        Using the in-kernel afs client with xfstests also found a bug in the
        AuriStor AFS server that has been fixed for a future release"
      
      * tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix silly rename
        afs: afs_vnode_commit_status() doesn't need to check the RPC error
        afs: Fix use of afs_check_for_remote_deletion()
        afs: Remove afs_operation::abort_code
        afs: Fix yfs_fs_fetch_status() to honour vnode selector
        afs: Remove yfs_fs_fetch_file_status() as it's not used
        afs: Fix the mapping of the UAEOVERFLOW abort code
        afs: Fix truncation issues and mmap writeback size
        afs: Concoct ctimes
        afs: Fix EOF corruption
        afs: afs_write_end() should change i_size under the right lock
        afs: Fix non-setting of mtime when writing into mmap
      26c20ffc
    • Randy Dunlap's avatar
      Documentation: remove SH-5 index entries · f17957f7
      Randy Dunlap authored
      Remove SH-5 documentation index entries following the removal
      of SH-5 source code.
      
      Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
      Error: Cannot open file ../arch/sh/mm/tlb-sh5.c
      Error: Cannot open file ../arch/sh/include/asm/tlb_64.h
      Error: Cannot open file ../arch/sh/include/asm/tlb_64.h
      
      Fixes: 3b69e8b4 ("Merge tag 'sh-for-5.8' of git://git.libc.org/linux-sh")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: ysato@users.sourceforge.jp
      Cc: linux-sh@vger.kernel.org
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f17957f7
    • Linus Torvalds's avatar
      Merge tag 'flex-array-conversions-5.8-rc2' of... · ffbc9376
      Linus Torvalds authored
      Merge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull flexible-array member conversions from Gustavo A. R. Silva:
       "Replace zero-length arrays with flexible-array members.
      
        Notice that all of these patches have been baking in linux-next for
        two development cycles now.
      
        There is a regular need in the kernel to provide a way to declare
        having a dynamically sized set of trailing elements in a structure.
        Kernel code should always use “flexible array members”[1] for these
        cases. The older style of one-element or zero-length arrays should no
        longer be used[2].
      
        C99 introduced “flexible array members”, which lacks a numeric size
        for the array declaration entirely:
      
              struct something {
                      size_t count;
                      struct foo items[];
              };
      
        This is the way the kernel expects dynamically sized trailing elements
        to be declared. It allows the compiler to generate errors when the
        flexible array does not occur last in the structure, which helps to
        prevent some kind of undefined behavior[3] bugs from being
        inadvertently introduced to the codebase.
      
        It also allows the compiler to correctly analyze array sizes (via
        sizeof(), CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For
        instance, there is no mechanism that warns us that the following
        application of the sizeof() operator to a zero-length array always
        results in zero:
      
              struct something {
                      size_t count;
                      struct foo items[0];
              };
      
              struct something *instance;
      
              instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
              instance->count = count;
      
              size = sizeof(instance->items) * instance->count;
              memcpy(instance->items, source, size);
      
        At the last line of code above, size turns out to be zero, when one
        might have thought it represents the total size in bytes of the
        dynamic memory recently allocated for the trailing array items. Here
        are a couple examples of this issue[4][5].
      
        Instead, flexible array members have incomplete type, and so the
        sizeof() operator may not be applied[6], so any misuse of such
        operators will be immediately noticed at build time.
      
        The cleanest and least error-prone way to implement this is through
        the use of a flexible array member:
      
              struct something {
                      size_t count;
                      struct foo items[];
              };
      
              struct something *instance;
      
              instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
              instance->count = count;
      
              size = sizeof(instance->items[0]) * instance->count;
              memcpy(instance->items, source, size);
      
        instead"
      
      [1] https://en.wikipedia.org/wiki/Flexible_array_member
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      [4] commit f2cd32a4 ("rndis_wlan: Remove logically dead code")
      [5] commit ab91c2a8 ("tpm: eventlog: Replace zero-length array with flexible-array member")
      [6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      
      * tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (41 commits)
        w1: Replace zero-length array with flexible-array
        tracing/probe: Replace zero-length array with flexible-array
        soc: ti: Replace zero-length array with flexible-array
        tifm: Replace zero-length array with flexible-array
        dmaengine: tegra-apb: Replace zero-length array with flexible-array
        stm class: Replace zero-length array with flexible-array
        Squashfs: Replace zero-length array with flexible-array
        ASoC: SOF: Replace zero-length array with flexible-array
        ima: Replace zero-length array with flexible-array
        sctp: Replace zero-length array with flexible-array
        phy: samsung: Replace zero-length array with flexible-array
        RxRPC: Replace zero-length array with flexible-array
        rapidio: Replace zero-length array with flexible-array
        media: pwc: Replace zero-length array with flexible-array
        firmware: pcdp: Replace zero-length array with flexible-array
        oprofile: Replace zero-length array with flexible-array
        block: Replace zero-length array with flexible-array
        tools/testing/nvdimm: Replace zero-length array with flexible-array
        libata: Replace zero-length array with flexible-array
        kprobes: Replace zero-length array with flexible-array
        ...
      ffbc9376
    • Arvind Sankar's avatar
      x86/purgatory: Add -fno-stack-protector · ff58155c
      Arvind Sankar authored
      The purgatory Makefile removes -fstack-protector options if they were
      configured in, but does not currently add -fno-stack-protector.
      
      If gcc was configured with the --enable-default-ssp configure option,
      this results in the stack protector still being enabled for the
      purgatory (absent distro-specific specs files that might disable it
      again for freestanding compilations), if the main kernel is being
      compiled with stack protection enabled (if it's disabled for the main
      kernel, the top-level Makefile will add -fno-stack-protector).
      
      This will break the build since commit
        e4160b2e ("x86/purgatory: Fail the build if purgatory.ro has missing symbols")
      and prior to that would have caused runtime failure when trying to use
      kexec.
      
      Explicitly add -fno-stack-protector to avoid this, as done in other
      Makefiles that need to disable the stack protector.
      Reported-by: default avatarGabriel C <nix.or.die@googlemail.com>
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff58155c
  2. 16 Jun, 2020 24 commits