1. 09 Aug, 2018 10 commits
    • Andrew Lunn's avatar
      dsa: slave: eee: Allow ports to use phylink · 1be52e97
      Andrew Lunn authored
      For a port to be able to use EEE, both the MAC and the PHY must
      support EEE. A phy can be provided by both a phydev or phylink. Verify
      at least one of these exist, not just phydev.
      
      Fixes: aab9c406 ("net: dsa: Plug in PHYLINK support")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1be52e97
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · ef91b6f9
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018-08-08
      
      here are small fixes for SMC: The first patch makes sure, shutdown code
      is not executed for sockets in state SMC_LISTEN. The second patch resets
      send and receive buffer values for accepted sockets, since TCP buffer size
      optimizations for the internal CLC socket should not be forwarded to the
      outer SMC socket. The third patch solves a race between connect and ioctl
      reported by syzbot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef91b6f9
    • Ursula Braun's avatar
      net/smc: move sock lock in smc_ioctl() · 7311d665
      Ursula Braun authored
      When an SMC socket is connecting it is decided whether fallback to
      TCP is needed. To avoid races between connect and ioctl move the
      sock lock before the use_fallback check.
      
      Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
      Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
      Fixes: 1992d998 ("net/smc: take sock lock in smc_ioctl()")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7311d665
    • Ursula Braun's avatar
      net/smc: allow sysctl rmem and wmem defaults for servers · bd58c7e0
      Ursula Braun authored
      Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
      defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
      for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
      optimizations for servers should be ignored.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd58c7e0
    • Ursula Braun's avatar
      net/smc: no shutdown in state SMC_LISTEN · caa21e19
      Ursula Braun authored
      Invoking shutdown for a socket in state SMC_LISTEN does not make
      sense. Nevertheless programs like syzbot fuzzing the kernel may
      try to do this. For SMC this means a socket refcounting problem.
      This patch makes sure a shutdown call for an SMC socket in state
      SMC_LISTEN simply returns with -ENOTCONN.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caa21e19
    • Dmitry Bogdanov's avatar
      net: aquantia: Fix IFF_ALLMULTI flag functionality · 11ba961c
      Dmitry Bogdanov authored
      It was noticed that NIC always pass all multicast traffic to the host
      regardless of IFF_ALLMULTI flag on the interface.
      The rule in MC Filter Table in NIC, that is configured to accept any
      multicast packets, is turning on if IFF_MULTICAST flag is set on the
      interface. It leads to passing all multicast traffic to the host.
      This fix changes the condition to turn on that rule by checking
      IFF_ALLMULTI flag as it should.
      
      Fixes: b21f502f ("net:ethernet:aquantia: Fix for multicast filter handling.")
      Signed-off-by: default avatarDmitry Bogdanov <dmitry.bogdanov@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11ba961c
    • David Howells's avatar
      rxrpc: Fix the keepalive generator [ver #2] · 330bdcfa
      David Howells authored
      AF_RXRPC has a keepalive message generator that generates a message for a
      peer ~20s after the last transmission to that peer to keep firewall ports
      open.  The implementation is incorrect in the following ways:
      
       (1) It mixes up ktime_t and time64_t types.
      
       (2) It uses ktime_get_real(), the output of which may jump forward or
           backward due to adjustments to the time of day.
      
       (3) If the current time jumps forward too much or jumps backwards, the
           generator function will crank the base of the time ring round one slot
           at a time (ie. a 1s period) until it catches up, spewing out VERSION
           packets as it goes.
      
      Fix the problem by:
      
       (1) Only using time64_t.  There's no need for sub-second resolution.
      
       (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
           isn't perceived to go backwards.
      
       (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
           parts:
      
           (a) The "worker" function that manages the buckets and the timer.
      
           (b) The "dispatch" function that takes the pending peers and
           	 potentially transmits a keepalive packet before putting them back
           	 in the ring into the slot appropriate to the revised last-Tx time.
      
       (4) Taking everything that's pending out of the ring and splicing it into
           a temporary collector list for processing.
      
           In the case that there's been a significant jump forward, the ring
           gets entirely emptied and then the time base can be warped forward
           before the peers are processed.
      
           The warping can't happen if the ring isn't empty because the slot a
           peer is in is keepalive-time dependent, relative to the base time.
      
       (5) Limit the number of iterations of the bucket array when scanning it.
      
       (6) Set the timer to skip any empty slots as there's no point waking up if
           there's nothing to do yet.
      
      This can be triggered by an incoming call from a server after a reboot with
      AF_RXRPC and AFS built into the kernel causing a peer record to be set up
      before userspace is started.  The system clock is then adjusted by
      userspace, thereby potentially causing the keepalive generator to have a
      meltdown - which leads to a message like:
      
      	watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
      	...
      	Workqueue: krxrpcd rxrpc_peer_keepalive_worker
      	EIP: lock_acquire+0x69/0x80
      	...
      	Call Trace:
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? _raw_spin_lock_bh+0x29/0x60
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? __lock_acquire+0x3d3/0x870
      	 ? process_one_work+0x110/0x340
      	 ? process_one_work+0x166/0x340
      	 ? process_one_work+0x110/0x340
      	 ? worker_thread+0x39/0x3c0
      	 ? kthread+0xdb/0x110
      	 ? cancel_delayed_work+0x90/0x90
      	 ? kthread_stop+0x70/0x70
      	 ? ret_from_fork+0x19/0x24
      
      Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      330bdcfa
    • David S. Miller's avatar
      Merge branch 'mlx5-fixes' · f39cc1c7
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5e fixes 2018-08-07
      
      I know it is late into 4.18 release, and this is why I am submitting
      only two mlx5e ethernet fixes.
      
      The first one from Or, is needed for -stable and it fixes hairpin
      for "same device" check.
      
      The second fix is a non risk fix from Huy which cleans up and improves
      error return value reporting for dcbnl_ieee_setapp.
      
      For -stable v4.16
      - net/mlx5e: Properly check if hairpin is possible between two functions
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f39cc1c7
    • Huy Nguyen's avatar
      net/mlx5e: Cleanup of dcbnl related fields · f280c6a1
      Huy Nguyen authored
      Remove unused netdev_registered_init/remove in en.h
      Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
      Remove extra white space
      
      Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Cc: Yuval Shaia <yuval.shaia@oracle.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f280c6a1
    • Or Gerlitz's avatar
      net/mlx5e: Properly check if hairpin is possible between two functions · 816f6706
      Or Gerlitz authored
      The current check relies on function BDF addresses and can get
      us wrong e.g when two VFs are assigned into a VM and the PCI
      v-address is set by the hypervisor.
      
      Fixes: 5c65c564 ('net/mlx5e: Support offloading TC NIC hairpin flows')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Tested-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      816f6706
  2. 08 Aug, 2018 1 commit
  3. 07 Aug, 2018 6 commits
    • Cong Wang's avatar
      llc: use refcount_inc_not_zero() for llc_sap_find() · 0dcb8225
      Cong Wang authored
      llc_sap_put() decreases the refcnt before deleting sap
      from the global list. Therefore, there is a chance
      llc_sap_find() could find a sap with zero refcnt
      in this global list.
      
      Close this race condition by checking if refcnt is zero
      or not in llc_sap_find(), if it is zero then it is being
      removed so we can just treat it as gone.
      
      Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dcb8225
    • Alexey Kodanev's avatar
      dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() · 61ef4b07
      Alexey Kodanev authored
      The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
      can lead to undefined behavior [1].
      
      In order to fix this use a gradual shift of the window with a 'while'
      loop, similar to what tcp_cwnd_restart() is doing.
      
      When comparing delta and RTO there is a minor difference between TCP
      and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
      'cwnd' if delta equals RTO. That case is preserved in this change.
      
      [1]:
      [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
      [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
      [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G        W   E     4.18.0-rc7.x86_64 #1
      ...
      [40851.377176] Call Trace:
      [40851.408503]  dump_stack+0xf1/0x17b
      [40851.451331]  ? show_regs_print_info+0x5/0x5
      [40851.503555]  ubsan_epilogue+0x9/0x7c
      [40851.548363]  __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
      [40851.617109]  ? __ubsan_handle_load_invalid_value+0x18f/0x18f
      [40851.686796]  ? xfrm4_output_finish+0x80/0x80
      [40851.739827]  ? lock_downgrade+0x6d0/0x6d0
      [40851.789744]  ? xfrm4_prepare_output+0x160/0x160
      [40851.845912]  ? ip_queue_xmit+0x810/0x1db0
      [40851.895845]  ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
      [40851.963530]  ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
      [40852.029063]  dccp_xmit_packet+0x1d3/0x720 [dccp]
      [40852.086254]  dccp_write_xmit+0x116/0x1d0 [dccp]
      [40852.142412]  dccp_sendmsg+0x428/0xb20 [dccp]
      [40852.195454]  ? inet_dccp_listen+0x200/0x200 [dccp]
      [40852.254833]  ? sched_clock+0x5/0x10
      [40852.298508]  ? sched_clock+0x5/0x10
      [40852.342194]  ? inet_create+0xdf0/0xdf0
      [40852.388988]  sock_sendmsg+0xd9/0x160
      ...
      
      Fixes: 113ced1f ("dccp ccid-2: Perform congestion-window validation")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61ef4b07
    • Ying Xue's avatar
      tipc: fix an interrupt unsafe locking scenario · 37436d9c
      Ying Xue authored
      Commit 9faa89d4 ("tipc: make function tipc_net_finalize() thread
      safe") tries to make it thread safe to set node address, so it uses
      node_list_lock lock to serialize the whole process of setting node
      address in tipc_net_finalize(). But it causes the following interrupt
      unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        rht_deferred_worker()
        rhashtable_rehash_table()
        lock(&(&ht->lock)->rlock)
      			       tipc_nl_compat_doit()
                                     tipc_net_finalize()
                                     local_irq_disable();
                                     lock(&(&tn->node_list_lock)->rlock);
                                     tipc_sk_reinit()
                                     rhashtable_walk_enter()
                                     lock(&(&ht->lock)->rlock);
        <Interrupt>
        tipc_disc_rcv()
        tipc_node_check_dest()
        tipc_node_create()
        lock(&(&tn->node_list_lock)->rlock);
      
       *** DEADLOCK ***
      
      When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
      disable BH. So if an interrupt happens after the lock, it can create
      an inverse lock ordering between ht->lock and tn->node_list_lock. As
      a consequence, deadlock might happen.
      
      The reason causing the inverse lock ordering scenario above is because
      the initial purpose of node_list_lock is not designed to do the
      serialization of node address setting.
      
      As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
      we use it to replace node_list_lock to ensure setting node address can
      be atomically finished. It turns out the potential deadlock can be
      avoided as well.
      
      Fixes: 9faa89d4 ("tipc: make function tipc_net_finalize() thread safe")
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <maloy@donjonn.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37436d9c
    • Cong Wang's avatar
      vsock: split dwork to avoid reinitializations · 455f05ec
      Cong Wang authored
      syzbot reported that we reinitialize an active delayed
      work in vsock_stream_connect():
      
      	ODEBUG: init active (active state 0) object type: timer_list hint:
      	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
      	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
      	debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      
      The pattern is apparently wrong, we should only initialize
      the dealyed work once and could repeatly schedule it. So we
      have to move out the initializations to allocation side.
      And to avoid confusion, we can split the shared dwork
      into two, instead of re-using the same one.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
      Cc: Andy king <acking@vmware.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      455f05ec
    • Colin Ian King's avatar
      net: thunderx: check for failed allocation lmac->dmacs · a94cead7
      Colin Ian King authored
      The allocation of lmac->dmacs is not being checked for allocation
      failure. Add the check.
      
      Fixes: 3a34ecfd ("net: thunderx: add MAC address filter tracking for LMAC")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a94cead7
    • Al Viro's avatar
      cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts · adfb442d
      Al Viro authored
      Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
      sets fs.val.{l,f}ip to net-endian values without conversion - they come
      straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp.  So
      the assignment in mk_act_open_req() ought to be a straight copy.
      
      	As far as I know, T4 PCIe cards do exist, so it's not as if that
      thing could only be found on little-endian systems...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adfb442d
  4. 06 Aug, 2018 3 commits
  5. 05 Aug, 2018 11 commits
    • Linus Torvalds's avatar
      Linux 4.18-rc8 · 1ffaddd0
      Linus Torvalds authored
      1ffaddd0
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a8c19920
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "A single fix, which addresses boot failures on machines which do not
        report EBDA correctly, which can place the trampoline into reserved
        memory regions. Validating against E820 prevents that"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/compressed/64: Validate trampoline placement against E820
      a8c19920
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2f3672cb
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two oneliners addressing NOHZ failures:
      
         - Use a bitmask to check for the pending timer softirq and not the
           bit number. The existing code using the bit number checked for
           the wrong bit, which caused timers to either expire late or stop
           completely.
      
         - Make the nohz evaluation on interrupt exit more robust. The
           existing code did not re-arm the hardware when interrupting a
           running softirq in task context (ksoftirqd or tail of
           local_bh_enable()), which caused timers to either expire late
           or stop completely"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        nohz: Fix missing tick reprogram when interrupting an inline softirq
        nohz: Fix local_timer_softirq_pending()
      2f3672cb
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0cdf6d46
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A set of fixes for perf:
      
        Kernel side:
      
         - Fix the hardcoded index of extra PCI devices on Broadwell which
           caused a resource conflict and triggered warnings on CPU hotplug.
      
        Tooling:
      
         - Update the tools copy of several files, including perf_event.h,
           powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
           memcpy_64.s (used in 'perf bench mem'), silencing the respective
           warnings during the perf tools build.
      
         - Fix the build on the alpine:edge distro"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
        perf tools: Fix the build on the alpine:edge distro
        tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
        tools headers uapi: Refresh linux/bpf.h copy
        tools headers powerpc: Update asm/unistd.h copy to pick new
        tools headers uapi: Update tools's copy of linux/perf_event.h
      0cdf6d46
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b9fb1fc7
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A single bugfix for the irq core to prevent silent data corruption and
        malfunction of threaded interrupts under certain conditions"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Make force irq threading setup more robust
      b9fb1fc7
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 212dab05
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Handle frames in error situations properly in AF_XDP, from Jakub
          Kicinski.
      
       2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
          Maninder Singh.
      
       3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.
      
       4) Fix regression in netlink bind handling of multicast gruops, from
          Dmitry Safonov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        netlink: Don't shift on 64 for ngroups
        net/smc: no cursor update send in state SMC_INIT
        l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
        mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
        mlxsw: core_acl_flex_actions: Remove redundant counter destruction
        mlxsw: core_acl_flex_actions: Remove redundant resource destruction
        mlxsw: core_acl_flex_actions: Return error for conflicting actions
        selftests/bpf: update test_lwt_seg6local.sh according to iproute2
        drivers: net: lmc: fix case value for target abort error
        selftest/net: fix protocol family to work for IPv4.
        net: xsk: don't return frames via the allocator on error
        tools/bpftool: fix a percpu_array map dump problem
      212dab05
    • Linus Torvalds's avatar
      Merge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 60f5a217
      Linus Torvalds authored
      Pull usercopy whitelisting fix from Kees Cook:
       "Bart Massey discovered that the usercopy whitelist for JFS was
        incomplete: the inline inode data may intentionally "overflow" into
        the neighboring "extended area", so the size of the whitelist needed
        to be raised to include the neighboring field"
      
      * tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        jfs: Fix usercopy whitelist for inline inode data
      60f5a217
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f639bef5
      Linus Torvalds authored
      Pull xfs bugfix from Darrick Wong:
       "One more patch for 4.18 to fix a coding error in the iomap_bmap()
        function introduced in -rc1: fix incorrect shifting"
      
      * tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: fix iomap_bmap position calculation
      f639bef5
    • Linus Torvalds's avatar
      Partially revert "block: fail op_is_write() requests to read-only partitions" · a32e236e
      Linus Torvalds authored
      It turns out that commit 721c7fc7 ("block: fail op_is_write()
      requests to read-only partitions"), while obviously correct, causes
      problems for some older lvm2 installations.
      
      The reason is that the lvm snapshotting will continue to write to the
      snapshow COW volume, even after the volume has been marked read-only.
      End result: snapshot failure.
      
      This has actually been fixed in newer version of the lvm2 tool, but the
      old tools still exist, and the breakage was reported both in the kernel
      bugzilla and in the Debian bugzilla:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=200439
        https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
      
      The lvm2 fix is here
      
        https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
      
      but until everybody has updated to recent versions, we'll have to weaken
      the "never write to read-only partitions" check.  It now allows the
      write to happen, but causes a warning, something like this:
      
        generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
        Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
        CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
        Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
        Workqueue: ksnaphd do_metadata
        RIP: 0010:generic_make_request_checks+0x4ac/0x600
        ...
        Call Trace:
         generic_make_request+0x64/0x400
         submit_bio+0x6c/0x140
         dispatch_io+0x287/0x430
         sync_io+0xc3/0x120
         dm_io+0x1f8/0x220
         do_metadata+0x1d/0x30
         process_one_work+0x1b9/0x3e0
         worker_thread+0x2b/0x3c0
         kthread+0x113/0x130
         ret_from_fork+0x35/0x40
      
      Note that this is a "revert" in behavior only.  I'm leaving alone the
      actual code cleanups in commit 721c7fc7, but letting the previously
      uncaught request go through with a warning instead of stopping it.
      
      Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions")
      Reported-and-tested-by: default avatarWGH <wgh@torlan.ru>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a32e236e
    • Dmitry Safonov's avatar
      netlink: Don't shift on 64 for ngroups · 91874ecf
      Dmitry Safonov authored
      It's legal to have 64 groups for netlink_sock.
      
      As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
      only to first 32 groups.
      
      The check for correctness of .bind() userspace supplied parameter
      is done by applying mask made from ngroups shift. Which broke Android
      as they have 64 groups and the shift for mask resulted in an overflow.
      
      Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Reported-and-Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91874ecf
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5dbfb6ec
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-08-05
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix bpftool percpu_array dump by using correct roundup to next
         multiple of 8 for the value size, from Yonghong.
      
      2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
         allocator since driver will recycle frame anyway in case of an
         error, from Jakub.
      
      3) Fix up BPF test_lwt_seg6local test cases to final iproute2
         syntax, from Mathieu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dbfb6ec
  6. 04 Aug, 2018 2 commits
    • Ursula Braun's avatar
      net/smc: no cursor update send in state SMC_INIT · 5607016c
      Ursula Braun authored
      If a writer blocked condition is received without data, the current
      consumer cursor is immediately sent. Servers could already receive this
      condition in state SMC_INIT without finished tx-setup. This patch
      avoids sending a consumer cursor update in this case.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5607016c
    • Kees Cook's avatar
      jfs: Fix usercopy whitelist for inline inode data · 961b33c2
      Kees Cook authored
      Bart Massey reported what turned out to be a usercopy whitelist false
      positive in JFS when symlink contents exceeded 128 bytes. The inline
      inode data (i_inline) is actually designed to overflow into the "extended
      area" following it (i_inline_ea) when needed. So the whitelist needed to
      be expanded to include both i_inline and i_inline_ea (the whole size
      of which is calculated internally using IDATASIZE, 256, instead of
      sizeof(i_inline), 128).
      
      $ cd /mnt/jfs
      $ touch $(perl -e 'print "B" x 250')
      $ ln -s B* b
      $ ls -l >/dev/null
      
      [  249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)!
      Reported-by: default avatarBart Massey <bart.massey@gmail.com>
      Fixes: 8d2704d3 ("jfs: Define usercopy region in jfs_ip slab cache")
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      961b33c2
  7. 03 Aug, 2018 7 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 0b5b1f9a
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Two vmx bugfixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: x86: vmx: fix vpid leak
        KVM: vmx: use local variable for current_vmptr when emulating VMPTRST
      0b5b1f9a
    • Guillaume Nault's avatar
      l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl() · f664e37d
      Guillaume Nault authored
      If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
      drop the reference taken by l2tp_session_get().
      
      Fixes: ecd012e4 ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f664e37d
    • David S. Miller's avatar
      Merge branch 'mlxsw-Fix-ACL-actions-error-condition-handling' · 60a01828
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Fix ACL actions error condition handling
      
      Nir says:
      
      Two issues were lately noticed within mlxsw ACL actions error condition
      handling. The first patch deals with conflicting actions such as:
      
       # tc filter add dev swp49 parent ffff: \
         protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
         action goto chain 100 \
         action mirred egress redirect dev swp4
      
      The second action will never execute, however SW model allows this
      configuration, while the mlxsw driver cannot allow for it as it
      implements actions in sets of up to three actions per set with a single
      termination marking. Conflicting actions create a contradiction over
      this single marking and thus cannot be configured. The fix replaces a
      misplaced warning with an error code to be returned.
      
      Patches 2-4 fix a condition of duplicate destruction of resources. Some
      actions require allocation of specific resource prior to setting the
      action itself. On error condition this resource was destroyed twice,
      leading to a crash when using mirror action, and to a redundant
      destruction in other cases, since for error condition rule destruction
      also takes care of resource destruction. In order to fix this state a
      symmetry in behavior is added and resource destruction also takes care
      of removing the resource from rule's resource list.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60a01828
    • Nir Dotan's avatar
      mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction · caebd1b3
      Nir Dotan authored
      In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
      resource detruction scenario.
      For mirror actions, such duplicate destruction leads to a crash as in:
      
       # tc qdisc add dev swp49 ingress
       # tc filter add dev swp49 parent ffff: \
         protocol ip chain 100 pref 10 \
         flower skip_sw dst_ip 192.168.101.1 action drop
       # tc filter add dev swp49 parent ffff: \
         protocol ip pref 10 \
         flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
         action mirred egress mirror dev swp4
      
      Therefore add a call to mlxsw_afa_resource_del() in
      mlxsw_afa_mirror_destroy() in order to clear that resource
      from rule's resources.
      
      Fixes: d0d13c18 ("mlxsw: spectrum_acl: Add support for mirror action")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caebd1b3
    • Nir Dotan's avatar
      mlxsw: core_acl_flex_actions: Remove redundant counter destruction · 7cc61694
      Nir Dotan authored
      Each tc flower rule uses a hidden count action. As counter resource may
      not be available due to limited HW resources, update _counter_create()
      and _counter_destroy() pair to follow previously introduced symmetric
      error condition handling, add a call to mlxsw_afa_resource_del() as part
      of the counter resource destruction.
      
      Fixes: c18c1e18 ("mlxsw: core: Make counter index allocated inside the action append")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cc61694
    • Nir Dotan's avatar
      mlxsw: core_acl_flex_actions: Remove redundant resource destruction · dda0a3a3
      Nir Dotan authored
      Some ACL actions require the allocation of a separate resource
      prior to applying the action itself. When facing an error condition
      during the setup phase of the action, resource should be destroyed.
      For such actions the destruction was done twice which is dangerous
      and lead to a potential crash.
      The destruction took place first upon error on action setup phase
      and then as the rule was destroyed.
      
      The following sequence generated a crash:
      
       # tc qdisc add dev swp49 ingress
       # tc filter add dev swp49 parent ffff: \
         protocol ip chain 100 pref 10 \
         flower skip_sw dst_ip 192.168.101.1 action drop
       # tc filter add dev swp49 parent ffff: \
         protocol ip pref 10 \
         flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
         action mirred egress mirror dev swp4
      
      Therefore add mlxsw_afa_resource_del() as a complement of
      mlxsw_afa_resource_add() to add symmetry to resource_list membership
      handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
      _fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
      NOP.
      
      Fixes: 140ce421 ("mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dda0a3a3
    • Nir Dotan's avatar
      mlxsw: core_acl_flex_actions: Return error for conflicting actions · 3757b255
      Nir Dotan authored
      Spectrum switch ACL action set is built in groups of three actions
      which may point to additional actions. A group holds a single record
      which can be set as goto record for pointing at a following group
      or can be set to mark the termination of the lookup. This is perfectly
      adequate for handling a series of actions to be executed on a packet.
      While the SW model allows configuration of conflicting actions
      where it is clear that some actions will never execute, the mlxsw
      driver must block such configurations as it creates a conflict
      over the single terminate/goto record value.
      
      For a conflicting actions configuration such as:
      
       # tc filter add dev swp49 parent ffff: \
         protocol ip pref 10 \
         flower skip_sw dst_ip 192.168.101.1 \
         action goto chain 100 \
         action mirred egress mirror dev swp4
      
      Where it is clear that the last action will never execute, the
      mlxsw driver was issuing a warning instead of returning an error.
      Therefore replace that warning with an error for this specific
      case.
      
      Fixes: 4cda7d8d ("mlxsw: core: Introduce flexible actions support")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3757b255