1. 25 Jul, 2018 4 commits
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag · 5ce83d4b
      Kiran Kumar Modukuri authored
      In cachefiles_mark_object_active(), the new object is marked active and
      then we try to add it to the active object tree.  If a conflicting object
      is already present, we want to wait for that to go away.  After the wait,
      we go round again and try to re-mark the object as being active - but it's
      already marked active from the first time we went through and a BUG is
      issued.
      
      Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
      
      Analysis from Kiran Kumar Modukuri:
      
      [Impact]
      Oops during heavy NFS + FSCache + Cachefiles
      
      CacheFiles: Error: Overlong wait for old active object to go away.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      
      CacheFiles: Error: Object already active kernel BUG at
      fs/cachefiles/namei.c:163!
      
      [Cause]
      In a heavily loaded system with big files being read and truncated, an
      fscache object for a cookie is being dropped and a new object being
      looked. The new object being looked for has to wait for the old object
      to go away before the new object is moved to active state.
      
      [Fix]
      Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
      retrying the object lookup.
      
      [Testcase]
      Have run ~100 hours of NFS stress tests and have not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      5ce83d4b
    • Kiran Kumar Modukuri's avatar
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri authored
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      
      
      Analysis from Kiran Kumar:
      
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      
      [Cause]
      Two threads are trying to do operate on a cookie and two objects.
      
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
      
            nfs_fscache_release_super_cookie
             -> __fscache_relinquish_cookie
              ->__fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      (2) A second thread tries to lookup an object for reading data in following
          path:
      
          fscache_alloc_object
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                ->fscache_object_destroy
                  ->fscache_cookie_put
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
      
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          ...
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
           deactivate_locked_super+0x48/0x80
           deactivate_super+0x5c/0x60
           cleanup_mnt+0x3f/0x90
           __cleanup_mnt+0x12/0x20
           task_work_run+0x86/0xb0
           exit_to_usermode_loop+0xc2/0xd0
           syscall_return_slowpath+0x4e/0x60
           int_ret_from_sys_call+0x25/0x9f
      
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      fscache_attach_object().
      
      [Testcase]
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f29507ce
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix refcounting bug in backing-file read monitoring · 934140ab
      Kiran Kumar Modukuri authored
      cachefiles_read_waiter() has the right to access a 'monitor' object by
      virtue of being called under the waitqueue lock for one of the pages in its
      purview.  However, it has no ref on that monitor object or on the
      associated operation.
      
      What it is allowed to do is to move the monitor object to the operation's
      to_do list, but once it drops the work_lock, it's actually no longer
      permitted to access that object.  However, it is trying to enqueue the
      retrieval operation for processing - but it can only do this via a pointer
      in the monitor object, something it shouldn't be doing.
      
      If it doesn't enqueue the operation, the operation may not get processed.
      If the order is flipped so that the enqueue is first, then it's possible
      for the work processor to look at the to_do list before the monitor is
      enqueued upon it.
      
      Fix this by getting a ref on the operation so that we can trust that it
      will still be there once we've added the monitor to the to_do list and
      dropped the work_lock.  The op can then be enqueued after the lock is
      dropped.
      
      The bug can manifest in one of a couple of ways.  The first manifestation
      looks like:
      
       FS-Cache:
       FS-Cache: Assertion failed
       FS-Cache: 6 == 5 is false
       ------------[ cut here ]------------
       kernel BUG at fs/fscache/operation.c:494!
       RIP: 0010:fscache_put_operation+0x1e3/0x1f0
       ...
       fscache_op_work_func+0x26/0x50
       process_one_work+0x131/0x290
       worker_thread+0x45/0x360
       kthread+0xf8/0x130
       ? create_worker+0x190/0x190
       ? kthread_cancel_work_sync+0x10/0x10
       ret_from_fork+0x1f/0x30
      
      This is due to the operation being in the DEAD state (6) rather than
      INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
      fscache_put_operation().
      
      The bug can also manifest like the following:
      
       kernel BUG at fs/fscache/operation.c:69!
       ...
          [exception RIP: fscache_enqueue_operation+246]
       ...
       #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
       #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
       #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
      
      I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
      entirely clear which assertion failed.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarLei Xue <carmark.dlut@gmail.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Reported-by: default avatarAnthony DeRobertis <aderobertis@metrics.net>
      Reported-by: default avatarNeilBrown <neilb@suse.com>
      Reported-by: default avatarDaniel Axtens <dja@axtens.net>
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      934140ab
    • Kiran Kumar Modukuri's avatar
      fscache: Allow cancelled operations to be enqueued · d0eb06af
      Kiran Kumar Modukuri authored
      Alter the state-check assertion in fscache_enqueue_operation() to allow
      cancelled operations to be given processing time so they can be cleaned up.
      
      Also fix a debugging statement that was requiring such operations to have
      an object assigned.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d0eb06af
  2. 04 Jul, 2018 5 commits
  3. 03 Jul, 2018 1 commit
    • Linus Torvalds's avatar
      net/smc: fix up merge error with poll changes · 410da1e1
      Linus Torvalds authored
      My networking merge (commit 4e33d7d4: "Pull networking fixes from
      David Miller") got the poll() handling conflict wrong for af_smc.
      
      The conflict between my a11e1d43 ("Revert changes to convert to
      ->poll_mask() and aio IOCB_CMD_POLL") and Ursula Braun's 24ac3a08
      ("net/smc: rebuild nonblocking connect") should have left the call to
      sock_poll_wait() in place, just without the socket lock release/retake.
      
      And I really should have realized that.  But happily, I at least asked
      Ursula to double-check the merge, and she set me right.
      
      This also fixes an incidental whitespace issue nearby that annoyed me
      while looking at this.
      Pointed-out-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      410da1e1
  4. 02 Jul, 2018 11 commits
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · d0fbad0a
      Linus Torvalds authored
      Pull MD fixes from Shaohua Li:
       "Two small fixes for MD:
      
         - an error handling fix from me
      
         - a recover bug fix for raid10 from BingJing"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        md/raid10: fix that replacement cannot complete recovery after reassemble
        MD: cleanup resources in failure
      d0fbad0a
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://github.com/stffrdhrn/linux · 8d2b6f6b
      Linus Torvalds authored
      Pull OpenRISC fixes from Stafford Horne:
       "Two fixes for issues which were breaking OpenRISC boot:
      
         - Fix bug in __pte_free_tlb() exposed in 4.18 by Matthew Wilcox's
           page table flag addition.
      
         - Fix issue booting on real hardware if delay slot detection
           emulation is disabled"
      
      * tag 'for-linus' of git://github.com/stffrdhrn/linux:
        openrisc: entry: Fix delay slot exception detection
        openrisc: Call destructor during __pte_free_tlb
      8d2b6f6b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4e33d7d4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.
      
       2) Need to bump memory lock rlimit for test_sockmap bpf test, from
          Yonghong Song.
      
       3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.
      
       4) Fix uninitialized read in nf_log, from Jann Horn.
      
       5) Fix raw command length parsing in mlx5, from Alex Vesker.
      
       6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
          Varadhan.
      
       7) Fix regressions in FIB rule matching during create, from Jason A.
          Donenfeld and Roopa Prabhu.
      
       8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.
      
       9) More bpfilter build fixes/adjustments from Masahiro Yamada.
      
      10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
          Dangaard Brouer.
      
      11) fib_tests.sh file permissions were broken, from Shuah Khan.
      
      12) Make sure BH/preemption is disabled in data path of mac80211, from
          Denis Kenzior.
      
      13) Don't ignore nla_parse_nested() return values in nl80211, from
          Johannes berg.
      
      14) Properly account sock objects ot kmemcg, from Shakeel Butt.
      
      15) Adjustments to setting bpf program permissions to read-only, from
          Daniel Borkmann.
      
      16) TCP Fast Open key endianness was broken, it always took on the host
          endiannness. Whoops. Explicitly make it little endian. From Yuching
          Cheng.
      
      17) Fix prefix route setting for link local addresses in ipv6, from
          David Ahern.
      
      18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.
      
      19) Various bpf sockmap fixes, from John Fastabend.
      
      20) Use after free for GRO with ESP, from Sabrina Dubroca.
      
      21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
          Eric Biggers.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
        qede: Adverstise software timestamp caps when PHC is not available.
        qed: Fix use of incorrect size in memcpy call.
        qed: Fix setting of incorrect eswitch mode.
        qed: Limit msix vectors in kdump kernel to the minimum required count.
        ipvlan: call dev_change_flags when ipvlan mode is reset
        ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
        net: fix use-after-free in GRO with ESP
        tcp: prevent bogus FRTO undos with non-SACK flows
        bpf: sockhash, add release routine
        bpf: sockhash fix omitted bucket lock in sock_close
        bpf: sockmap, fix smap_list_map_remove when psock is in many maps
        bpf: sockmap, fix crash when ipv6 sock is added
        net: fib_rules: bring back rule_exists to match rule during add
        hv_netvsc: split sub-channel setup into async and sync
        net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
        atm: zatm: Fix potential Spectre v1
        s390/qeth: consistently re-enable device features
        s390/qeth: don't clobber buffer on async TX completion
        s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
        s390/qeth: fix race when setting MAC address
        ...
      4e33d7d4
    • David S. Miller's avatar
      Merge branch 'qed-fixes' · e48e0979
      David S. Miller authored
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Fix series.
      
      The patch series addresses few issues in the qed* drivers.
      
      Please consider applying it to 'net' branch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e48e0979
    • Sudarsana Reddy Kalluru's avatar
      qede: Adverstise software timestamp caps when PHC is not available. · 82a4e71b
      Sudarsana Reddy Kalluru authored
      When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode),
      get-tsinfo() callback should return the software timestamp capabilities
      instead of returning the error.
      
      Fixes: 4c55215c ("qede: Add driver support for PTP")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82a4e71b
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix use of incorrect size in memcpy call. · cc9b27cd
      Sudarsana Reddy Kalluru authored
      Use the correct size value while copying chassis/port id values.
      
      Fixes: 6ad8c632 ("qed: Add support for query/config dcbx.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc9b27cd
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix setting of incorrect eswitch mode. · 538f8d00
      Sudarsana Reddy Kalluru authored
      By default, driver sets the eswitch mode incorrectly as VEB (virtual
      Ethernet bridging).
      Need to set VEB eswitch mode only when sriov is enabled, and it should be
      to set NONE by default. The patch incorporates this change.
      
      Fixes: 0fefbfba ("qed*: Management firmware - notifications and defaults")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      538f8d00
    • Sudarsana Reddy Kalluru's avatar
      qed: Limit msix vectors in kdump kernel to the minimum required count. · bb7858ba
      Sudarsana Reddy Kalluru authored
      Memory size is limited in the kdump kernel environment. Allocation of more
      msix-vectors (or queues) consumes few tens of MBs of memory, which might
      lead to the kdump kernel failure.
      This patch adds changes to limit the number of MSI-X vectors in kdump
      kernel to minimum required value (i.e., 2 per engine).
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb7858ba
    • Hangbin Liu's avatar
      ipvlan: call dev_change_flags when ipvlan mode is reset · 5dc2d399
      Hangbin Liu authored
      After we change the ipvlan mode from l3 to l2, or vice versa, we only
      reset IFF_NOARP flag, but don't flush the ARP table cache, which will
      cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
      Then the message will not come out of host.
      
      Here is the reproducer on local host:
      
      ip link set eth1 up
      ip addr add 192.168.1.1/24 dev eth1
      ip link add link eth1 ipvlan1 type ipvlan mode l3
      
      ip netns add net1
      ip link set ipvlan1 netns net1
      ip netns exec net1 ip link set ipvlan1 up
      ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1
      
      ip route add 192.168.2.0/24 via 192.168.1.2
      ping 192.168.2.2 -c 2
      
      ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
      ping 192.168.2.2 -c 2
      
      Add the same configuration on remote host. After we set the mode to l2,
      we could find that the src/dst MAC addresses are the same on eth1:
      
      21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
          192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64
      
      Fix this by calling dev_change_flags(), which will call netdevice notifier
      with flag change info.
      
      v2:
      a) As pointed out by Wang Cong, check return value for dev_change_flags() when
      change dev flags.
      b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
      So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dc2d399
    • Eric Biggers's avatar
      ipv6: sr: fix passing wrong flags to crypto_alloc_shash() · fc9c2029
      Eric Biggers authored
      The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
      not 'gfp_t'.  So don't pass GFP_KERNEL to it.
      
      Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc9c2029
    • Sabrina Dubroca's avatar
      net: fix use-after-free in GRO with ESP · 603d4cf8
      Sabrina Dubroca authored
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      603d4cf8
  5. 01 Jul, 2018 10 commits
    • Linus Torvalds's avatar
      Linux 4.18-rc3 · 021c9179
      Linus Torvalds authored
      021c9179
    • Linus Torvalds's avatar
      Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d3bc0e67
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "We have a few regression fixes for qgroup rescan status tracking and
        the vm_fault_t conversion that mixed up the error values"
      
      * tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix mount failure when qgroup rescan is in progress
        Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
        btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
      d3bc0e67
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 4a770e63
      Linus Torvalds authored
      Pull vfs fix from Al Viro:
       "Followup to procfs-seq_file series this window"
      
      This fixes a memory leak by making sure that proc seq files release any
      private data on close.  The 'proc_seq_open' has to be properly paired
      with 'proc_seq_release' that releases the extra private data.
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        proc: add proc_seq_release
      4a770e63
    • Linus Torvalds's avatar
      Merge tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · d7563ca5
      Linus Torvalds authored
      Pull staging/IIO fixes from Greg KH:
       "Here are a few small staging and IIO driver fixes for 4.18-rc3.
      
        Nothing major or big, all just fixes for reported problems since
        4.18-rc1. All of these have been in linux-next this week with no
        reported problems"
      
      * tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: android: ion: Return an ERR_PTR in ion_map_kernel
        staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
        iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines
        iio: buffer: fix the function signature to match implementation
        iio: mma8452: Fix ignoring MMA8452_INT_DRDY
        iio: tsl2x7x/tsl2772: avoid potential division by zero
        iio: pressure: bmp280: fix relative humidity unit
      d7563ca5
    • Linus Torvalds's avatar
      Merge tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 652788a9
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are five fixes for the tty core and some serial drivers.
      
        The tty core ones fix some security and other issues reported by the
        syzbot that I have taken too long in responding to (sorry Tetsuo!).
      
        The 8350 serial driver fix resolves an issue of devices that used to
        work properly stopping working as they shouldn't have been added to a
        blacklist.
      
        All of these have been in linux-next for a few days with no reported
        issues"
      
      * tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        vt: prevent leaking uninitialized data to userspace via /dev/vcs*
        serdev: fix memleak on module unload
        serial: 8250_pci: Remove stalled entries in blacklist
        n_tty: Access echo_* variables carefully.
        n_tty: Fix stall at n_tty_receive_char_special().
      652788a9
    • Linus Torvalds's avatar
      Merge tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c2aee376
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here is a number of USB gadget and other driver fixes for 4.18-rc3.
      
        There's a bunch of them here, most of them being gadget driver and
        xhci host controller fixes for reported issues (as normal), but there
        are also some new device ids, and some fixes for the typec code.
      
        There is an acpi core patch in here that was acked by the acpi
        maintainer as it is needed for the typec fixes in order to properly
        solve a problem in that driver.
      
        All of these have been in linux-next this week with no reported
        issues"
      
      * tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
        usb: chipidea: host: fix disconnection detect issue
        usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered
        typec: tcpm: Fix a msecs vs jiffies bug
        NFC: pn533: Fix wrong GFP flag usage
        usb: cdc_acm: Add quirk for Uniden UBC125 scanner
        staging/typec: fix tcpci_rt1711h build errors
        usb: typec: ucsi: Fix for incorrect status data issue
        usb: typec: ucsi: acpi: Workaround for cache mode issue
        acpi: Add helper for deactivating memory region
        usb: xhci: increase CRS timeout value
        usb: xhci: tegra: fix runtime PM error handling
        usb: xhci: remove the code build warning
        xhci: Fix kernel oops in trace_xhci_free_virt_device
        xhci: Fix perceived dead host due to runtime suspend race with event handler
        dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation
        usb: gadget: dwc2: fix memory leak in gadget_init()
        usb: gadget: composite: fix delayed_status race condition when set_interface
        usb: dwc2: fix isoc split in transfer with no data
        usb: dwc2: alloc dma aligned buffer for isoc split in
        usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
        ...
      c2aee376
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping · c350d6d1
      Linus Torvalds authored
      Pull dma mapping fixlet from Christoph Hellwig:
       "Add a missing export required by riscv and unicore"
      
      * tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: export swiotlb_dma_ops
      c350d6d1
    • Ilpo Järvinen's avatar
      tcp: prevent bogus FRTO undos with non-SACK flows · 1236f22f
      Ilpo Järvinen authored
      If SACK is not enabled and the first cumulative ACK after the RTO
      retransmission covers more than the retransmitted skb, a spurious
      FRTO undo will trigger (assuming FRTO is enabled for that RTO).
      The reason is that any non-retransmitted segment acknowledged will
      set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
      no indication that it would have been delivered for real (the
      scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
      case so the check for that bit won't help like it does with SACK).
      Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
      in tcp_process_loss.
      
      We need to use more strict condition for non-SACK case and check
      that none of the cumulatively ACKed segments were retransmitted
      to prove that progress is due to original transmissions. Only then
      keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
      non-SACK case.
      
      (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
      to better indicate its purpose but to keep this change minimal, it
      will be done in another patch).
      
      Besides burstiness and congestion control violations, this problem
      can result in RTO loop: When the loss recovery is prematurely
      undoed, only new data will be transmitted (if available) and
      the next retransmission can occur only after a new RTO which in case
      of multiple losses (that are not for consecutive packets) requires
      one RTO per loss to recover.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1236f22f
    • Stafford Horne's avatar
      openrisc: entry: Fix delay slot exception detection · ae15a41a
      Stafford Horne authored
      Originally in patch e6d20c55 ("openrisc: entry: Fix delay slot
      detection") I fixed delay slot detection, but only for QEMU.  We missed
      that hardware delay slot detection using delay slot exception flag (DSX)
      was still broken.  This was because QEMU set the DSX flag in both
      pre-exception supervision register (ESR) and supervision register (SR)
      register, but on real hardware the DSX flag is only set on the SR
      register during exceptions.
      
      Fix this by carrying the DSX flag into the SR register during exception.
      We also update the DSX flag read locations to read the value from the SR
      register not the pt_regs SR register which represents ESR.  The ESR
      should never have the DSX flag set.
      
      In the process I updated/removed a few comments to match the current
      state.  Including removing a comment saying that the DSX detection logic
      was inefficient and needed to be rewritten.
      
      I have tested this on QEMU with a patch ensuring it matches the hardware
      specification.
      
      Link: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg00000.html
      Fixes: e6d20c55 ("openrisc: entry: Fix delay slot detection")
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      ae15a41a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 271b955e
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-07-01
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) A bpf_fib_lookup() helper fix to change the API before freeze to
         return an encoding of the FIB lookup result and return the nexthop
         device index in the params struct (instead of device index as return
         code that we had before), from David.
      
      2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
         reject progs when set_memory_*() fails since it could still be RO.
         Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
         an issue, and a memory leak in s390 JIT found during review, from
         Daniel.
      
      3) Multiple fixes for sockmap/hash to address most of the syzkaller
         triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
         a missing sock_map_release() routine to hook up to callbacks, and a
         fix for an omitted bucket lock in sock_close(), from John.
      
      4) Two bpftool fixes to remove duplicated error message on program load,
         and another one to close the libbpf object after program load. One
         additional fix for nfp driver's BPF offload to avoid stopping offload
         completely if replace of program failed, from Jakub.
      
      5) Couple of BPF selftest fixes that bail out in some of the test
         scripts if the user does not have the right privileges, from Jeffrin.
      
      6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
         where we need to set the flag that some of the test cases are expected
         to fail, from Kleber.
      
      7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
         since it has no relation to it and lirc2 users often have configs
         without cgroups enabled and thus would not be able to use it, from Sean.
      
      8) Fix a selftest failure in sockmap by removing a useless setrlimit()
         call that would set a too low limit where at the same time we are
         already including bpf_rlimit.h that does the job, from Yonghong.
      
      9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      271b955e
  6. 30 Jun, 2018 9 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sockmap-fixes' · bf2b866a
      Daniel Borkmann authored
      John Fastabend says:
      
      ====================
      This addresses two syzbot issues that lead to identifying (by Eric and
      Wei) a class of bugs where we don't correctly check for IPv4/v6
      sockets and their associated state. The second issue was a locking
      omission in sockhash.
      
      The first patch addresses IPv6 socks and fixing an error where
      sockhash would overwrite the prot pointer with IPv4 prot. To fix
      this build similar solution to TLS ULP. Although we continue to
      allow socks in all states not just ESTABLISH in this patch set
      because as Martin points out there should be no issue with this
      on the sockmap ULP because we don't use the ctx in this code. Once
      multiple ULPs coexist we may need to revisit this. However we
      can do this in *next trees.
      
      The other issue syzbot found that the tcp_close() handler missed
      locking the hash bucket lock which could result in corrupting the
      sockhash bucket list if delete and close ran at the same time.
      And also the smap_list_remove() routine was not working correctly
      at all. This was not caught in my testing because in general my
      tests (to date at least lets add some more robust selftest in
      bpf-next) do things in the "expected" order, create map, add socks,
      delete socks, then tear down maps. The tests we have that do the
      ops out of this order where only working on single maps not multi-
      maps so we never saw the issue. Thanks syzbot. The fix is to
      restructure the tcp_close() lock handling. And fix the obvious
      bug in smap_list_remove().
      
      Finally, during review I noticed the release handler was omitted
      from the upstream code (patch 4) due to an incorrect merge conflict
      fix when I ported the code to latest bpf-next before submitting.
      This would leave references to the map around if the user never
      closes the map.
      
      v3: rework patches, dropping ESTABLISH check and adding rcu
          annotation along with the smap_list_remove fix
      
      v4: missed one more case where maps was being accessed without
          the sk_callback_lock, spoted by Martin as well.
      
      v5: changed to use a specific lock for maps and reduced callback
          lock so that it is only used to gaurd sk callbacks. I think
          this makes the logic a bit cleaner and avoids confusion
          ovoer what each lock is doing.
      
      Also big thanks to Martin for thorough review he caught at least
      one case where I missed a rcu_call().
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bf2b866a
    • John Fastabend's avatar
      bpf: sockhash, add release routine · caac76a5
      John Fastabend authored
      Add map_release_uref pointer to hashmap ops. This was dropped when
      original sockhash code was ported into bpf-next before initial
      commit.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      caac76a5
    • John Fastabend's avatar
      bpf: sockhash fix omitted bucket lock in sock_close · e9db4ef6
      John Fastabend authored
      First the sk_callback_lock() was being used to protect both the
      sock callback hooks and the psock->maps list. This got overly
      convoluted after the addition of sockhash (in sockmap it made
      some sense because masp and callbacks were tightly coupled) so
      lets split out a specific lock for maps and only use the callback
      lock for its intended purpose. This fixes a couple cases where
      we missed using maps lock when it was in fact needed. Also this
      makes it easier to follow the code because now we can put the
      locking closer to the actual code its serializing.
      
      Next, in sock_hash_delete_elem() the pattern was as follows,
      
        sock_hash_delete_elem()
           [...]
           spin_lock(bucket_lock)
           l = lookup_elem_raw()
           if (l)
              hlist_del_rcu()
              write_lock(sk_callback_lock)
               .... destroy psock ...
              write_unlock(sk_callback_lock)
           spin_unlock(bucket_lock)
      
      The ordering is necessary because we only know the {p}sock after
      dereferencing the hash table which we can't do unless we have the
      bucket lock held. Once we have the bucket lock and the psock element
      it is deleted from the hashmap to ensure any other path doing a lookup
      will fail. Finally, the refcnt is decremented and if zero the psock
      is destroyed.
      
      In parallel with the above (or free'ing the map) a tcp close event
      may trigger tcp_close(). Which at the moment omits the bucket lock
      altogether (oops!) where the flow looks like this,
      
        bpf_tcp_close()
           [...]
           write_lock(sk_callback_lock)
           for each psock->maps // list of maps this sock is part of
               hlist_del_rcu(ref_hash_node);
               .... destroy psock ...
           write_unlock(sk_callback_lock)
      
      Obviously, and demonstrated by syzbot, this is broken because
      we can have multiple threads deleting entries via hlist_del_rcu().
      
      To fix this we might be tempted to wrap the hlist operation in a
      bucket lock but that would create a lock inversion problem. In
      summary to follow locking rules the psocks maps list needs the
      sk_callback_lock (after this patch maps_lock) but we need the bucket
      lock to do the hlist_del_rcu.
      
      To resolve the lock inversion problem pop the head of the maps list
      repeatedly and remove the reference until no more are left. If a
      delete happens in parallel from the BPF API that is OK as well because
      it will do a similar action, lookup the lock in the map/hash, delete
      it from the map/hash, and dec the refcnt. We check for this case
      before doing a destroy on the psock to ensure we don't have two
      threads tearing down a psock. The new logic is as follows,
      
        bpf_tcp_close()
        e = psock_map_pop(psock->maps) // done with map lock
        bucket_lock() // lock hash list bucket
        l = lookup_elem_raw(head, hash, key, key_size);
        if (l) {
           //only get here if elmnt was not already removed
           hlist_del_rcu()
           ... destroy psock...
        }
        bucket_unlock()
      
      And finally for all the above to work add missing locking around  map
      operations per above. Then add RCU annotations and use
      rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
      that the object is not free'd from sock_hash_free() while it is being
      referenced in bpf_tcp_close().
      
      Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e9db4ef6
    • John Fastabend's avatar
      bpf: sockmap, fix smap_list_map_remove when psock is in many maps · 54fedb42
      John Fastabend authored
      If a hashmap is free'd with open socks it removes the reference to
      the hash entry from the psock. If that is the last reference to the
      psock then it will also be free'd by the reference counting logic.
      However the current logic that removes the hash reference from the
      list of references is broken. In smap_list_remove() we first check
      if the sockmap entry matches and then check if the hashmap entry
      matches. But, the sockmap entry sill always match because its NULL in
      this case which causes the first entry to be removed from the list.
      If this is always the "right" entry (because the user adds/removes
      entries in order) then everything is OK but otherwise a subsequent
      bpf_tcp_close() may reference a free'd object.
      
      To fix this create two list handlers one for sockmap and one for
      sockhash.
      
      Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      54fedb42
    • John Fastabend's avatar
      bpf: sockmap, fix crash when ipv6 sock is added · 9901c5d7
      John Fastabend authored
      This fixes a crash where we assign tcp_prot to IPv6 sockets instead
      of tcpv6_prot.
      
      Previously we overwrote the sk->prot field with tcp_prot even in the
      AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
      are used.
      
      Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
      'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
      crashing case here.
      
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9901c5d7
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 883c9ab9
      Linus Torvalds authored
      Pull parisc fixes and cleanups from Helge Deller:
       "Nothing exiting in this patchset, just
      
         - small cleanups of header files
      
         - default to 4 CPUs when building a SMP kernel
      
         - mark 16kB and 64kB page sizes broken
      
         - addition of the new io_pgetevents syscall"
      
      * 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Build kernel without -ffunction-sections
        parisc: Reduce debug output in unwind code
        parisc: Wire up io_pgetevents syscall
        parisc: Default to 4 SMP CPUs
        parisc: Convert printk(KERN_LEVEL) to pr_lvl()
        parisc: Mark 16kB and 64kB page sizes BROKEN
        parisc: Drop struct sigaction from not exported header file
      883c9ab9
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 08af78d7
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A smaller batch for the end of the week (let's see if I can keep the
        weekly cadence going for once).
      
        All medium-grade fixes here, nothing worrisome:
      
         - Fixes for some fairly old bugs around SD card write-protect
           detection and GPIO interrupt assignments on Davinci.
      
         - Wifi module suspend fix for Hikey.
      
         - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
           of which solves booting with third-party u-boot"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: dts: hikey960: Define wl1837 power capabilities
        arm64: dts: hikey: Define wl1835 power capabilities
        ARM64: dts: meson-gxl: fix Mali GPU compatible string
        ARM64: dts: meson-axg: fix ethernet stability issue
        ARM64: dts: meson-gx: fix ATF reserved memory region
        ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
        ARM64: dts: meson: fix register ranges for SD/eMMC
        ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
        ARM: dts: da850: Fix interrups property for gpio
        ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD
      08af78d7
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v4.18' of... · 22d3e0c3
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - introduce __diag_* macros and suppress -Wattribute-alias warnings
         from GCC 8
      
       - fix stack protector test script for x86_64
      
       - fix line number handling in Kconfig
      
       - document that '#' starts a comment in Kconfig
      
       - handle P_SYMBOL property in dump debugging of Kconfig
      
       - correct help message of LD_DEAD_CODE_DATA_ELIMINATION
      
       - fix occasional segmentation faults in Kconfig
      
      * tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: loop boundary condition fix
        kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
        kconfig: handle P_SYMBOL in print_symbol()
        kconfig: document Kconfig source file comments
        kconfig: fix line numbers for if-entries in menu tree
        stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
        powerpc: Remove -Wattribute-alias pragmas
        disable -Wattribute-alias warning for SYSCALL_DEFINEx()
        kbuild: add macro for controlling warnings to linux/compiler.h
      22d3e0c3
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0fbc4aea
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "The biggest diffstat comes from self-test updates, plus there's entry
        code fixes, 5-level paging related fixes, console debug output fixes,
        and misc fixes"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Clean up the printk()s in show_fault_oops()
        x86/mm: Drop unneeded __always_inline for p4d page table helpers
        x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
        selftests/x86/sigreturn: Do minor cleanups
        selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
        x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
        x86/mm: Don't free P4D table when it is folded at runtime
        x86/entry/32: Add explicit 'l' instruction suffix
        x86/mm: Get rid of KERN_CONT in show_fault_oops()
      0fbc4aea