1. 08 Feb, 2018 29 commits
    • Hoang Le's avatar
      tipc: fix skb truesize/datasize ratio control · 55b3280d
      Hoang Le authored
      In commit d618d09a ("tipc: enforce valid ratio between skb truesize
      and contents") we introduced a test for ensuring that the condition
      truesize/datasize <= 4 is true for a received buffer. Unfortunately this
      test has two problems.
      
      - Because of the integer arithmetics the test
        if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
        ratios [4 < ratio < 5], which was not the intention.
      - The buffer returned by skb_copy() inherits skb->truesize of the
        original buffer, which doesn't help the situation at all.
      
      In this commit, we change the ratio condition and replace skb_copy()
      with a call to skb_copy_expand() to finally get this right.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55b3280d
    • Ivan Vecera's avatar
      net/sched: cls_u32: fix cls_u32 on filter replace · eb53f7af
      Ivan Vecera authored
      The following sequence is currently broken:
      
       # tc qdisc add dev foo ingress
       # tc filter replace dev foo protocol all ingress \
         u32 match u8 0 0 action mirred egress mirror dev bar1
       # tc filter replace dev foo protocol all ingress \
         handle 800::800 pref 49152 \
         u32 match u8 0 0 action mirred egress mirror dev bar2
       Error: cls_u32: Key node flags do not match passed flags.
       We have an error talking to the kernel, -1
      
      The error comes from u32_change() when comparing new and
      existing flags. The existing ones always contains one of
      TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
      These flags cannot be passed from userspace so the condition
      (n->flags != flags) in u32_change() always fails.
      
      Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
      TCA_CLS_FLAGS_IN_HW are not taken into account.
      
      Fixes: 24d3dc6d ("net/sched: cls_u32: Reflect HW offload status")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb53f7af
    • Dan Williams's avatar
      mpls, nospec: Sanitize array index in mpls_label_ok() · 3968523f
      Dan Williams authored
      mpls_label_ok() validates that the 'platform_label' array index from a
      userspace netlink message payload is valid. Under speculation the
      mpls_label_ok() result may not resolve in the CPU pipeline until after
      the index is used to access an array element. Sanitize the index to zero
      to prevent userspace-controlled arbitrary out-of-bounds speculation, a
      precursor for a speculative execution side channel vulnerability.
      
      Cc: <stable@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3968523f
    • Sowmini Varadhan's avatar
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan authored
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1 ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
    • Kees Cook's avatar
      net: Whitelist the skbuff_head_cache "cb" field · 79a8a642
      Kees Cook authored
      Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
      Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
      result of the cmsg header calculations. This means that hardened usercopy
      will examine the copy, even though it was technically a fixed size and
      should be implicitly whitelisted. All the put_cmsg() calls being built
      from values in skbuff_head_cache are coming out of the protocol-defined
      "cb" field, so whitelist this field entirely instead of creating per-use
      bounce buffers, for which there are concerns about performance.
      
      Original report was:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)!
      WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76
      ...
       __check_heap_object+0x89/0xc0 mm/slab.c:4426
       check_heap_object mm/usercopy.c:236 [inline]
       __check_object_size+0x272/0x530 mm/usercopy.c:259
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_to_user include/linux/uaccess.h:154 [inline]
       put_cmsg+0x233/0x3f0 net/core/scm.c:242
       sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
       packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:810
       ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
       __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
       SYSC_recvmmsg net/socket.c:2368 [inline]
       SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
       entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com
      Fixes: 6d07d1cd ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79a8a642
    • Mathieu Malaterre's avatar
      net: Extra '_get' in declaration of arch_get_platform_mac_address · e728789c
      Mathieu Malaterre authored
      In commit c7f5d105 ("net: Add eth_platform_get_mac_address() helper."),
      two declarations were added:
      
        int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
        unsigned char *arch_get_platform_get_mac_address(void);
      
      An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
      it. Fix compile warning using W=1:
      
        CC      net/ethernet/eth.o
      net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
       unsigned char * __weak arch_get_platform_mac_address(void)
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        AR      net/ethernet/built-in.o
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e728789c
    • Nathan Fontenot's avatar
      ibmvnic: queue reset when CRQ gets closed during reset · ec95dffa
      Nathan Fontenot authored
      While handling a driver reset we get a H_CLOSED return trying
      to send a CRQ event. When this occurs we need to queue up another
      reset attempt. Without doing this we see instances where the driver
      is left in a closed state because the reset failed and there is no
      further attempts to reset the driver.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec95dffa
    • Gustavo A. R. Silva's avatar
      atm: he: use 64-bit arithmetic instead of 32-bit · 583133b3
      Gustavo A. R. Silva authored
      Add suffix ULL to constants 272, 204, 136 and 68 in order to give the
      compiler complete information about the proper arithmetic to use.
      Notice that these constants are used in contexts that expect
      expressions of type unsigned long long (64 bits, unsigned).
      
      The following expressions are currently being evaluated using 32-bit
      arithmetic:
      
      272 * mult
      204 * mult
      136 * mult
      68 * mult
      
      Addresses-Coverity-ID: 201058
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      583133b3
    • Christian Brauner's avatar
      rtnetlink: require unique netns identifier · 4ff66cae
      Christian Brauner authored
      Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
      it is possible for userspace to send us requests with three different
      properties to identify a target network namespace. This affects at least
      RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
      network namespace which is confusing. For legacy reasons the kernel will
      pick the IFLA_NET_NS_PID property first and then look for the
      IFLA_NET_NS_FD property but there is no reason to extend this type of
      behavior to network namespace ids. The regression potential is quite
      minimal since the rtnetlink requests in question either won't allow
      IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
      support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ff66cae
    • Jason Wang's avatar
      tuntap: add missing xdp flush · 762c330d
      Jason Wang authored
      When using devmap to redirect packets between interfaces,
      xdp_do_flush() is usually a must to flush any batched
      packets. Unfortunately this is missed in current tuntap
      implementation.
      
      Unlike most hardware driver which did XDP inside NAPI loop and call
      xdp_do_flush() at then end of each round of poll. TAP did it in the
      context of process e.g tun_get_user(). So fix this by count the
      pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT
      or MSG_MORE was cleared by sendmsg() caller.
      
      With this fix, xdp_redirect_map works again between two TAPs.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      762c330d
    • Nicolas Dichtel's avatar
      netlink: ensure to loop over all netns in genlmsg_multicast_allns() · cb9f7a9a
      Nicolas Dichtel authored
      Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
      case when commit 134e6375 was pushed.
      However, there was no reason to stop the loop if a netns does not have
      listeners.
      Returns -ESRCH only if there was no listeners in all netns.
      
      To avoid having the same problem in the future, I didn't take the
      assumption that nlmsg_multicast() returns only 0 or -ESRCH.
      
      Fixes: 134e6375 ("genetlink: make netns aware")
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb9f7a9a
    • David Howells's avatar
      rxrpc: Don't put crypto buffers on the stack · 8c2f826d
      David Howells authored
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c2f826d
    • David S. Miller's avatar
      Merge branch 'nfp-fix-disabling-TC-offloads-in-flower-max-TSO-segs-and-module-version' · c7025586
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: fix disabling TC offloads in flower, max TSO segs and module version
      
      This set corrects the way nfp deals with the NETIF_F_HW_TC flag.
      It has slipped the review that flower offload does not currently
      refuse disabling this flag when filter offload is active.
      
      nfp's flower offload does not actually keep track of how many filters
      for each port are offloaded.  The accounting of the number of filters
      is added to the nfp core structures, and BPF moved to use these
      structures as well.
      
      If users are allowed to disable TC offloads while filters are active,
      not only is it incorrect behaviour, but actually the NFP will never
      be told to remove the flows, leading to use-after-free when stats
      arrive.
      
      Fourth patch makes sure we declare the max number of TSO segments.
      FW should drop longer packets cleanly (otherwise this would be a
      security problem for untrusted VFs) but dropping longer TSO frames
      is not nice and driver should prevent them from being generated.
      
      Last small addition populates MODULE_VERSION with kernel version.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7025586
    • Jakub Kicinski's avatar
      nfp: populate MODULE_VERSION · 1a5e8e35
      Jakub Kicinski authored
      DKMS and similar out-of-tree module replacement services use
      module version to make sure the out-of-tree software is not
      older than the module shipped with the kernel.  We use the
      kernel version in ethtool -i output, put it into MODULE_VERSION
      as well.
      Reported-by: default avatarJan Gutter <jan.gutter@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a5e8e35
    • Jakub Kicinski's avatar
      nfp: limit the number of TSO segments · 0d592e52
      Jakub Kicinski authored
      Most FWs limit the number of TSO segments a frame can produce
      to 64.  This is for fairness and efficiency (of FW datapath)
      reasons.  If a frame with larger number of segments is submitted
      the FW will drop it.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d592e52
    • Jakub Kicinski's avatar
      nfp: forbid disabling hw-tc-offload on representors while offload active · d692403e
      Jakub Kicinski authored
      All netdevs which can accept TC offloads must implement
      .ndo_set_features().  nfp_reprs currently do not do that, which
      means hw-tc-offload can be turned on and off even when offloads
      are active.
      
      Whether the offloads are active is really a question to nfp_ports,
      so remove the per-app tc_busy callback indirection thing, and
      simply count the number of offloaded items in nfp_port structure.
      
      Fixes: 8a276873 ("nfp: provide infrastructure for offloading flower based TC filters")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d692403e
    • Jakub Kicinski's avatar
      nfp: don't advertise hw-tc-offload on non-port netdevs · 0b9de4ca
      Jakub Kicinski authored
      nfp_port is a structure which represents an ASIC port, both
      PCIe vNIC (on a PF or a VF) or the external MAC port.  vNIC
      netdev (struct nfp_net) and pure representor netdev (struct
      nfp_repr) both have a pointer to this structure.  nfp_reprs
      always have a port associated.  nfp_nets, however, only represent
      a device port in legacy mode, where they are considered the
      MAC port. In switchdev mode they are just the CPU's side of
      the PCIe link.
      
      By definition TC offloads only apply to device ports.  Don't
      set the flag on vNICs without a port (i.e. in switchdev mode).
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b9de4ca
    • Jakub Kicinski's avatar
      nfp: bpf: require ETH table · e3ac6c07
      Jakub Kicinski authored
      Upcoming changes will require all netdevs supporting TC offloads
      to have a full struct nfp_port.  Require those for BPF offload.
      The operation without management FW reporting information about
      Ethernet ports is something we only support for very old and very
      basic NIC firmwares anyway.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ac6c07
    • Song Liu's avatar
      tcp: tracepoint: only call trace_tcp_send_reset with full socket · 5c487bb9
      Song Liu authored
      tracepoint tcp_send_reset requires a full socket to work. However, it
      may be called when in TCP_TIME_WAIT:
      
              case TCP_TW_RST:
                      tcp_v6_send_reset(sk, skb);
                      inet_twsk_deschedule_put(inet_twsk(sk));
                      goto discard_it;
      
      To avoid this problem, this patch checks the socket with sk_fullsock()
      before calling trace_tcp_send_reset().
      
      Fixes: c24b14c4 ("tcp: add tracepoint trace_tcp_send_reset")
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c487bb9
    • Md. Islam's avatar
      sch_netem: Bug fixing in calculating Netem interval · 043e337f
      Md. Islam authored
      In Kernel 4.15.0+, Netem does not work properly.
      
      Netem setup:
      
      tc qdisc add dev h1-eth0 root handle 1: netem delay 10ms 2ms
      
      Result:
      
      PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data.
      64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=22.8 ms
      64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=10.9 ms
      64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=10.9 ms
      64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=11.4 ms
      64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=4303 ms
      64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.2 ms
      64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.3 ms
      64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=4304 ms
      64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=4303 ms
      
      Patch:
      
      (rnd % (2 * sigma)) - sigma was overflowing s32. After applying the
      patch, I found following output which is desirable.
      
      PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data.
      64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=21.1 ms
      64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=8.46 ms
      64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=9.00 ms
      64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=8.36 ms
      64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=8.11 ms
      64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=10.0 ms
      64 bytes from 172.16.101.2: icmp_seq=9 ttl=64 time=11.3 ms
      64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.5 ms
      64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.2 ms
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      043e337f
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: fix net watchdog timeout · 62f94c21
      Grygorii Strashko authored
      It was discovered that simple program which indefinitely sends 200b UDP
      packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network
      watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog
      timeout is triggered due to race between cpsw_ndo_start_xmit() and
      cpsw_tx_handler() [NAPI]
      
      cpsw_ndo_start_xmit()
      	if (unlikely(!cpdma_check_free_tx_desc(txch))) {
      		txq = netdev_get_tx_queue(ndev, q_idx);
      		netif_tx_stop_queue(txq);
      
      ^^ as per [1] barier has to be used after set_bit() otherwise new value
      might not be visible to other cpus
      	}
      
      cpsw_tx_handler()
      	if (unlikely(netif_tx_queue_stopped(txq)))
      		netif_tx_wake_queue(txq);
      
      and when it happens ndev TX queue became disabled forever while driver's HW
      TX queue is empty.
      
      Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue()
      calls and double check for free TX descriptors after stopping ndev TX queue
      - if there are free TX descriptors wake up ndev TX queue.
      
      [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.htmlSigned-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f94c21
    • Thomas Falcon's avatar
      ibmvnic: Ensure that buffers are NULL after free · b0992eca
      Thomas Falcon authored
      This change will guard against a double free in the case that the
      buffers were previously freed at some other time, such as during
      a device reset. It resolves a kernel oops that occurred when changing
      the VNIC device's MTU.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0992eca
    • John Allen's avatar
      ibmvnic: Fix rx queue cleanup for non-fatal resets · 3468656f
      John Allen authored
      At some point, a check was added to exit the polling routine during resets.
      This makes sense for most reset conditions, but for a non-fatal error, we
      expect the polling routine to continue running to properly clean up the rx
      queues. This patch checks if we are performing a non-fatal reset and if we
      are, continues normal polling operation.
      Signed-off-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3468656f
    • Amritha Nambiar's avatar
      i40e: Fix the number of queues available to be mapped for use · bc6d33c8
      Amritha Nambiar authored
      Fix the number of queues per enabled TC and report available queues
      to the kernel without having to limit them to the max RSS limit so
      they are available to be mapped for XPS. This allows a queue per
      processing thread available for handling traffic for the given
      traffic class.
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc6d33c8
    • David Ahern's avatar
      net/ipv6: onlink nexthop checks should default to main table · 44750f84
      David Ahern authored
      Because of differences in how ipv4 and ipv6 handle fib lookups,
      verification of nexthops with onlink flag need to default to the main
      table rather than the local table used by IPv4. As it stands an
      address within a connected route on device 1 can be used with
      onlink on device 2. Updating the table properly rejects the route
      due to the egress device mismatch.
      
      Update the extack message as well to show it could be a device
      mismatch for the nexthop spec.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44750f84
    • David Ahern's avatar
      net/ipv6: Handle reject routes with onlink flag · 58e354c0
      David Ahern authored
      Verification of nexthops with onlink flag need to handle unreachable
      routes. The lookup is only intended to validate the gateway address
      is not a local address and if the gateway resolves the egress device
      must match the given device. Hence, hitting any default reject route
      is ok.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58e354c0
    • Shannon Nelson's avatar
      sun: Add SPDX license tags to Sun network drivers · c861ef83
      Shannon Nelson authored
      Add the appropriate SPDX license tags to the Sun network drivers
      as outlined in Documentation/process/license-rules.rst.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c861ef83
    • David Howells's avatar
      rxrpc: Fix received abort handling · 17e9e23b
      David Howells authored
      AF_RXRPC is incorrectly sending back to the server any abort it receives
      for a client connection.  This is due to the final-ACK offload to the
      connection event processor patch.  The abort code is copied into the
      last-call information on the connection channel and then the event
      processor is set.
      
      Instead, the following should be done:
      
       (1) In the case of a final-ACK for a successful call, the ACK should be
           scheduled as before.
      
       (2) In the case of a locally generated ABORT, the ABORT details should be
           cached for sending in response to further packets related to that
           call and no further action scheduled at call disconnect time.
      
       (3) In the case of an ACK received from the peer, the call should be
           considered dead, no ABORT should be transmitted at this time.  In
           response to further non-ABORT packets from the peer relating to this
           call, an RX_USER_ABORT ABORT should be transmitted.
      
       (4) In the case of a call killed due to network error, an RX_USER_ABORT
           ABORT should be cached for transmission in response to further
           packets, but no ABORT should be sent at this time.
      
      Fixes: 3136ef49 ("rxrpc: Delay terminal ACK transmission on a client call")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17e9e23b
    • Christophe JAILLET's avatar
      cxgb4: Fix error handling path in 'init_one()' · e729452e
      Christophe JAILLET authored
      Commit baf50868 ("cxgb4: restructure VF mgmt code") has reordered
      some code but an error handling label has not been updated accordingly.
      So fix it and free 'adapter' if 't4_wait_dev_ready()' fails.
      
      Fixes: baf50868 ("cxgb4: restructure VF mgmt code")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e729452e
  2. 07 Feb, 2018 11 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 4d80ecdb
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for you net tree, they
      are:
      
      1) Restore __GFP_NORETRY in xt_table allocations to mitigate effects of
         large memory allocation requests, from Michal Hocko.
      
      2) Release IPv6 fragment queue in case of error in fragmentation header,
         this is a follow up to amend patch 83f1999c, from Subash Abhinov
         Kasiviswanathan.
      
      3) Flowtable infrastructure depends on NETFILTER_INGRESS as it registers
         a hook for each flowtable, reported by John Crispin.
      
      4) Missing initialization of info->priv in xt_cgroup version 1, from
         Cong Wang.
      
      5) Give a chance to garbage collector to run after scheduling flowtable
         cleanup.
      
      6) Releasing flowtable content on nft_flow_offload module removal is
         not required at all, there is not dependencies between this module
         and flowtables, remove it.
      
      7) Fix missing xt_rateest_mutex grabbing for hash insertions, also from
         Cong Wang.
      
      8) Move nf_flow_table_cleanup() routine to flowtable core, this patch is
         a dependency for the next patch in this list.
      
      9) Flowtable resources are not properly released on removal from the
         control plane. Fix this resource leak by scheduling removal of all
         entries and explicit call to the garbage collector.
      
      10) nf_ct_nat_offset() declaration is dead code, this function prototype
          is not used anywhere, remove it. From Taehee Yoo.
      
      11) Fix another flowtable resource leak on entry insertion failures,
          this patch also fixes a possible use-after-free. Patch from Felix
          Fietkau.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d80ecdb
    • Felix Fietkau's avatar
      netfilter: nf_flow_offload: fix use-after-free and a resource leak · 0ff90b6c
      Felix Fietkau authored
      flow_offload_del frees the flow, so all associated resource must be
      freed before.
      
      Since the ct entry in struct flow_offload_entry was allocated by
      flow_offload_alloc, it should be freed by flow_offload_free to take care
      of the error handling path when flow_offload_add fails.
      
      While at it, make flow_offload_del static, since it should never be
      called directly, only from the gc step
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0ff90b6c
    • Taehee Yoo's avatar
      netfilter: remove useless prototype · d8ed9600
      Taehee Yoo authored
      prototype nf_ct_nat_offset is not used anymore.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      d8ed9600
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · a2e5790d
      Linus Torvalds authored
      Merge misc updates from Andrew Morton:
      
       - kasan updates
      
       - procfs
      
       - lib/bitmap updates
      
       - other lib/ updates
      
       - checkpatch tweaks
      
       - rapidio
      
       - ubsan
      
       - pipe fixes and cleanups
      
       - lots of other misc bits
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
        Documentation/sysctl/user.txt: fix typo
        MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns
        MAINTAINERS: update various PALM patterns
        MAINTAINERS: update "ARM/OXNAS platform support" patterns
        MAINTAINERS: update Cortina/Gemini patterns
        MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern
        MAINTAINERS: remove ANDROID ION pattern
        mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors
        mm: docs: fix parameter names mismatch
        mm: docs: fixup punctuation
        pipe: read buffer limits atomically
        pipe: simplify round_pipe_size()
        pipe: reject F_SETPIPE_SZ with size over UINT_MAX
        pipe: fix off-by-one error when checking buffer limits
        pipe: actually allow root to exceed the pipe buffer limits
        pipe, sysctl: remove pipe_proc_fn()
        pipe, sysctl: drop 'min' parameter from pipe-max-size converter
        kasan: rework Kconfig settings
        crash_dump: is_kdump_kernel can be boolean
        kernel/mutex: mutex_is_locked can be boolean
        ...
      a2e5790d
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ab2d92ad
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
      
       - membarrier updates (Mathieu Desnoyers)
      
       - SMP balancing optimizations (Mel Gorman)
      
       - stats update optimizations (Peter Zijlstra)
      
       - RT scheduler race fixes (Steven Rostedt)
      
       - misc fixes and updates
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS
        sched/fair: Do not migrate if the prev_cpu is idle
        sched/fair: Restructure wake_affine*() to return a CPU id
        sched/fair: Remove unnecessary parameters from wake_affine_idle()
        sched/rt: Make update_curr_rt() more accurate
        sched/rt: Up the root domain ref count when passing it around via IPIs
        sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
        sched/core: Optimize update_stats_*()
        sched/core: Optimize ttwu_stat()
        membarrier/selftest: Test private expedited sync core command
        membarrier/arm64: Provide core serializing command
        membarrier/x86: Provide core serializing command
        membarrier: Provide core serializing command, *_SYNC_CORE
        lockin/x86: Implement sync_core_before_usermode()
        locking: Introduce sync_core_before_usermode()
        membarrier/selftest: Test global expedited command
        membarrier: Provide GLOBAL_EXPEDITED command
        membarrier: Document scheduler barrier requirements
        powerpc, membarrier: Skip memory barrier in switch_mm()
        membarrier/selftest: Test private expedited command
      ab2d92ad
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b0dda4f
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, plus add missing interval sampling to certain x86 PEBS
        events"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Add trace/beauty/generated/ into .gitignore
        perf trace: Fix call-graph output
        x86/events/intel/ds: Add PERF_SAMPLE_PERIOD into PEBS_FREERUNNING_FLAGS
        perf record: Fix period option handling
        perf evsel: Fix period/freq terms setup
        tools headers: Synchoronize x86 features UAPI headers
        tools headers: Synchronize uapi/linux/sched.h
        tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h
        tooling headers: Synchronize updated s390 kvm UAPI headers
        tools headers: Synchronize sound/asound.h
      4b0dda4f
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b3250aab
      Linus Torvalds authored
      Pull locking fixlets from Ingo Molnar:
       "An endianness fix and a jump labels branch hint update"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/qrwlock: include asm/byteorder.h as needed
        jump_label: Add branch hints to static_branch_{un,}likely()
      b3250aab
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0dc400f4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix error path in netdevsim, from Jakub Kicinski.
      
       2) Default values listed in tcp_wmem and tcp_rmem documentation were
          inaccurate, from Tonghao Zhang.
      
       3) Fix route leaks in SCTP, both for ipv4 and ipv6. From Alexey Kodanev
          and Tommi Rantala.
      
       4) Fix "MASK < Y" meant to be "MASK << Y" in xgbe driver, from Wolfram
          Sang.
      
       5) Use after free in u32_destroy_key(), from Paolo Abeni.
      
       6) Fix two TX issues in be2net driver, from Suredh Reddy.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
        be2net: Handle transmit completion errors in Lancer
        be2net: Fix HW stall issue in Lancer
        RDS: IB: Fix null pointer issue
        nfp: fix kdoc warnings on nested structures
        sample/bpf: fix erspan metadata
        net: erspan: fix erspan config overwrite
        net: erspan: fix metadata extraction
        cls_u32: fix use after free in u32_destroy_key()
        net: amd-xgbe: fix comparison to bitshift when dealing with a mask
        net: phy: Handle not having GPIO enabled in the kernel
        ibmvnic: fix empty firmware version and errors cleanup
        sctp: fix dst refcnt leak in sctp_v4_get_dst
        sctp: fix dst refcnt leak in sctp_v6_get_dst()
        dwc-xlgmac: remove Jie Deng as co-maintainer
        doc: Change the min default value of tcp_wmem/tcp_rmem.
        samples/bpf: use bpf_set_link_xdp_fd
        libbpf: add missing SPDX-License-Identifier
        libbpf: add error reporting in XDP
        libbpf: add function to setup XDP
        tools: add netlink.h and if_link.h in tools uapi
        ...
      0dc400f4
    • Kangmin Park's avatar
      60c3e026
    • Joe Perches's avatar
      MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns · c1dad9ad
      Joe Perches authored
      Commit 32173741 ("tty: serial: msm: Move header file into driver")
      removed the .h file, update the patterns.
      
      Link: http://lkml.kernel.org/r/2b7478bc4c35ab3ac6b06b4edd3b645a8c34a4a2.1517147485.git.joe@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Andy Gross <andy.gross@linaro.org>
      Cc: David Brown <david.brown@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1dad9ad
    • Joe Perches's avatar
      MAINTAINERS: update various PALM patterns · c660f367
      Joe Perches authored
      Commit 4c25c5d2 ("ARM: pxa: make more mach/*.h files local") moved
      the files around, update the patterns.
      
      Link: http://lkml.kernel.org/r/a291f6f61e378a1f35e266fe4c5f646b9feeaa6a.1517147485.git.joe@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marek Vasut <marek.vasut@gmail.com>
      Cc: Tomas Cech <sleep_walker@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c660f367