1. 13 Mar, 2019 40 commits
    • Keith Busch's avatar
      nvme-pci: add missing unlock for reset error · 7066774e
      Keith Busch authored
      [ Upstream commit 4726bcf3 ]
      
      The reset work holds a mutex to prevent races with removal modifying the
      same resources, but was unlocking only on success. Unlock on failure
      too.
      
      Fixes: 5c959d73 ("nvme-pci: fix rapid add remove sequence")
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7066774e
    • Liu Bo's avatar
      blk-iolatency: fix IO hang due to negative inflight counter · 6d482bc5
      Liu Bo authored
      [ Upstream commit 8c772a9b ]
      
      Our test reported the following stack, and vmcore showed that
      ->inflight counter is -1.
      
      [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
      [ffffc9003fcc3958] schedule at ffffffff8173de26
      [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
      [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
      [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
      [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
      [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
      [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
      [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
      [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
      [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
      [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
      [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
      [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
      [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
      [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
      [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
      [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
      [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
      
      The ->inflight counter may be negative (-1) if
      
      1) blk-iolatency was disabled when the IO was issued,
      
      2) blk-iolatency was enabled before this IO reached its endio,
      
      3) the ->inflight counter is decreased from 0 to -1 in endio()
      
      In fact the hang can be easily reproduced by the below script,
      
      H=/sys/fs/cgroup/unified/
      P=/sys/fs/cgroup/unified/test
      
      echo "+io" > $H/cgroup.subtree_control
      mkdir -p $P
      
      echo $$ > $P/cgroup.procs
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      This fixes the problem by freezing the queue so that while
      enabling/disabling iolatency, there is no inflight rq running.
      
      Note that quiesce_queue is not needed as this only updating iolatency
      configuration about which dispatching request_queue doesn't care.
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d482bc5
    • Sudarsana Reddy Kalluru's avatar
      qede: Fix system crash on configuring channels. · 1781ae6f
      Sudarsana Reddy Kalluru authored
      [ Upstream commit 0aa4febb ]
      
      Under heavy traffic load, when changing number of channels via
      ethtool (ethtool -L) which will cause interface to be reloaded,
      it was observed that some packets gets transmitted on old TX
      channel/queue id which doesn't really exist after the channel
      configuration leads to system crash.
      
      Add a safeguard in the driver by validating queue id through
      ndo_select_queue() which is called before the ndo_start_xmit().
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1781ae6f
    • Sudarsana Reddy Kalluru's avatar
      qed: Consider TX tcs while deriving the max num_queues for PF. · 84828dd2
      Sudarsana Reddy Kalluru authored
      [ Upstream commit fb1faab7 ]
      
      Max supported queues is derived incorrectly in the case of multi-CoS.
      Need to consider TCs while calculating num_queues for PF.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84828dd2
    • Manish Chopra's avatar
      qed: Fix EQ full firmware assert. · d727c0ed
      Manish Chopra authored
      [ Upstream commit 660492bc ]
      
      When slowpath messages are sent with high rate, the resulting
      events can lead to a FW assert in case they are not handled fast
      enough (Event Queue Full assert). Attempt to send queued slowpath
      messages only after the newly evacuated entries in the EQ ring
      are indicated to FW.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d727c0ed
    • Tetsuo Handa's avatar
      fs: ratelimit __find_get_block_slow() failure message. · 72426ed2
      Tetsuo Handa authored
      [ Upstream commit 43636c80 ]
      
      When something let __find_get_block_slow() hit all_mapped path, it calls
      printk() for 100+ times per a second. But there is no need to print same
      message with such high frequency; it is just asking for stall warning, or
      at least bloating log files.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.873324][T15342] b_state=0x00000029, b_size=512
        [  399.878403][T15342] device loop0 blocksize: 4096
        [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.890400][T15342] b_state=0x00000029, b_size=512
        [  399.895595][T15342] device loop0 blocksize: 4096
        [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.907471][T15342] b_state=0x00000029, b_size=512
        [  399.912506][T15342] device loop0 blocksize: 4096
      
      This patch reduces frequency to up to once per a second, in addition to
      concatenating three lines into one.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      72426ed2
    • Keith Busch's avatar
      nvme-pci: fix rapid add remove sequence · 3cc6703d
      Keith Busch authored
      [ Upstream commit 5c959d73 ]
      
      A surprise removal may fail to tear down request queues if it is racing
      with the initial asynchronous probe. If that happens, the remove path
      won't see the queue resources to tear down, and the controller reset
      path may create a new request queue on a removed device, but will not
      be able to make forward progress, deadlocking the pci removal.
      
      Protect setting up non-blocking resources from a shutdown by holding the
      same mutex, and transition to the CONNECTING state after these resources
      are initialized so the probe path may see the dead controller state
      before dispatching new IO.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081Reported-by: default avatarAlex Gagniuc <Alex_Gagniuc@Dellteam.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Tested-by: default avatarAlex Gagniuc <mr.nuke.me@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cc6703d
    • Keith Busch's avatar
      nvme: lock NS list changes while handling command effects · e3aabe4c
      Keith Busch authored
      [ Upstream commit e7ad43c3 ]
      
      If a controller supports the NS Change Notification, the namespace
      scan_work is automatically triggered after attaching a new namespace.
      
      Occasionally the namespace scan_work may append the new namespace to the
      list before the admin command effects handling is completed. The effects
      handling unfreezes namespaces, but if it unfreezes the newly attached
      namespace, its request_queue freeze depth will be off and we'll hit the
      warning in blk_mq_unfreeze_queue().
      
      On the next namespace add, we will fail to freeze that queue due to the
      previous bad accounting and deadlock waiting for frozen.
      
      Fix that by preventing scan work from altering the namespace list while
      command effects handling needs to pair freeze with unfreeze.
      Reported-by: default avatarWen Xiong <wenxiong@us.ibm.com>
      Tested-by: default avatarWen Xiong <wenxiong@us.ibm.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3aabe4c
    • Philip Yang's avatar
      drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr · 25aa5c8b
      Philip Yang authored
      [ Upstream commit 0a5f49cb ]
      
      amdgpu_vm_get_task_info is called from interrupt handler and sched timeout
      workqueue, we should use irq version spin_lock to avoid deadlock.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      25aa5c8b
    • Tony Lindgren's avatar
      i2c: omap: Use noirq system sleep pm ops to idle device for suspend · ee84b62f
      Tony Lindgren authored
      [ Upstream commit c6e2bd95 ]
      
      We currently get the following error with pixcir_ts driver during a
      suspend resume cycle:
      
      omap_i2c 4802a000.i2c: controller timed out
      pixcir_ts 1-005c: pixcir_int_enable: can't read reg 0x34 : -110
      pixcir_ts 1-005c: Failed to disable interrupt generation: -110
      pixcir_ts 1-005c: Failed to stop
      dpm_run_callback(): pixcir_i2c_ts_resume+0x0/0x98
      [pixcir_i2c_ts] returns -110
      PM: Device 1-005c failed to resume: error -110
      
      And at least am437x based devices with pixcir_ts will fail to resume
      to a touchscreen that is configured as the wakeup-source in device
      tree for these devices.
      
      This is because pixcir_ts tries to reconfigure it's registers for
      noirq suspend which fails. This also leaves i2c-omap in enabled state
      for suspend.
      
      Let's fix the pixcir_ts issue and make sure i2c-omap is suspended by
      adding SET_NOIRQ_SYSTEM_SLEEP_PM_OPS.
      
      Let's also get rid of some ifdefs while at it and replace them with
      __maybe_unused as SET_RUNTIME_PM_OPS and SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
      already deal with the various PM Kconfig options.
      Reported-by: default avatarKeerthy <j-keerthy@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee84b62f
    • Ross Lagerwall's avatar
      Revert "scsi: libfc: Add WARN_ON() when deleting rports" · 29f7b376
      Ross Lagerwall authored
      [ Upstream commit d8f6382a ]
      
      This reverts commit bbc0f8bd.
      
      It added a warning whose intent was to check whether the rport was still
      linked into the peer list. It doesn't work as intended and gives false
      positive warnings for two reasons:
      
      1) If the rport is never linked into the peer list it will not be
      considered empty since the list_head is never initialized.
      
      2) If the rport is deleted from the peer list using list_del_rcu(), then
      the list_head is in an undefined state and it is not considered empty.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29f7b376
    • Jun-Ru Chang's avatar
      MIPS: Remove function size check in get_frame_info() · cd8520a2
      Jun-Ru Chang authored
      [ Upstream commit 2b424cfc ]
      
      Patch (b6c7a324 "MIPS: Fix get_frame_info() handling of
      microMIPS function size.") introduces additional function size
      check for microMIPS by only checking insn between ip and ip + func_size.
      However, func_size in get_frame_info() is always 0 if KALLSYMS is not
      enabled. This causes get_frame_info() to return immediately without
      calculating correct frame_size, which in turn causes "Can't analyze
      schedule() prologue" warning messages at boot time.
      
      This patch removes func_size check, and let the frame_size check run
      up to 128 insns for both MIPS and microMIPS.
      Signed-off-by: default avatarJun-Ru Chang <jrjang@realtek.com>
      Signed-off-by: default avatarTony Wu <tonywu@realtek.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: b6c7a324 ("MIPS: Fix get_frame_info() handling of microMIPS function size.")
      Cc: <ralf@linux-mips.org>
      Cc: <jhogan@kernel.org>
      Cc: <macro@mips.com>
      Cc: <yamada.masahiro@socionext.com>
      Cc: <peterz@infradead.org>
      Cc: <mingo@kernel.org>
      Cc: <linux-mips@vger.kernel.org>
      Cc: <linux-kernel@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd8520a2
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Support multiple "vfs_getname" probes · 738f9e27
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 6ab3bc24 ]
      
      With a suitably defined "probe:vfs_getname" probe, 'perf trace' can
      "beautify" its output, so syscalls like open() or openat() can print the
      "filename" argument instead of just its hex address, like:
      
        $ perf trace -e open -- touch /dev/null
        [...]
             0.590 ( 0.014 ms): touch/18063 open(filename: /dev/null, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
        [...]
      
      The output without such beautifier looks like:
      
           0.529 ( 0.011 ms): touch/18075 open(filename: 0xc78cf288, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
      
      However, when the vfs_getname probe expands to multiple probes and it is
      not the first one that is hit, the beautifier fails, as following:
      
           0.326 ( 0.010 ms): touch/18072 open(filename: , flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
      
      Fix it by hooking into all the expanded probes (inlines), now, for instance:
      
        [root@quaco ~]# perf probe -l
          probe:vfs_getname    (on getname_flags:73@fs/namei.c with pathname)
          probe:vfs_getname_1  (on getname_flags:73@fs/namei.c with pathname)
        [root@quaco ~]# perf trace -e open* sleep 1
             0.010 ( 0.005 ms): sleep/5588 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: RDONLY|CLOEXEC)   = 3
             0.029 ( 0.006 ms): sleep/5588 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: RDONLY|CLOEXEC)   = 3
             0.194 ( 0.008 ms): sleep/5588 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: RDONLY|CLOEXEC) = 3
        [root@quaco ~]#
      
      Works, further verified with:
      
        [root@quaco ~]# perf test vfs
        65: Use vfs_getname probe to get syscall args filenames   : Ok
        66: Add vfs_getname probe to get syscall args filenames   : Ok
        67: Check open filename arg using perf trace + vfs_getname: Ok
        [root@quaco ~]#
      Reported-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Tested-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-mv8kolk17xla1smvmp3qabv1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      738f9e27
    • Jiri Olsa's avatar
      perf symbols: Filter out hidden symbols from labels · 47e3f3c0
      Jiri Olsa authored
      [ Upstream commit 59a17706 ]
      
      When perf is built with the annobin plugin (RHEL8 build) extra symbols
      are added to its binary:
      
        # nm perf | grep annobin | head -10
        0000000000241100 t .annobin_annotate.c
        0000000000326490 t .annobin_annotate.c
        0000000000249255 t .annobin_annotate.c_end
        00000000003283a8 t .annobin_annotate.c_end
        00000000001bce18 t .annobin_annotate.c_end.hot
        00000000001bce18 t .annobin_annotate.c_end.hot
        00000000001bc3e2 t .annobin_annotate.c_end.unlikely
        00000000001bc400 t .annobin_annotate.c_end.unlikely
        00000000001bce18 t .annobin_annotate.c.hot
        00000000001bce18 t .annobin_annotate.c.hot
        ...
      
      Those symbols have no use for report or annotation and should be
      skipped.  Moreover they interfere with the DWARF unwind test on the PPC
      arch, where they are mixed with checked symbols and then the test fails:
      
        # perf test dwarf -v
        59: Test dwarf unwind                                     :
        --- start ---
        test child forked, pid 8515
        unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
        ...
        got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
        unwind: failed with 'no error'
      
      The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:
      
        # readelf -s ./perf | grep annobin | head -1
          40: 00000000001bce4f     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_init.c
      
      They can still pass the check for the label symbol. Adding check for
      HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
      out such symbols.
      
      >   Just to be awkward, if you are going to ignore STV_HIDDEN
      >   symbols then you should probably also ignore STV_INTERNAL ones
      >   as well...  Annobin does not generate them, but you never know,
      >   one day some other tool might create some.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Clifton <nickc@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190128133526.GD15461@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47e3f3c0
    • Julian Wiedmann's avatar
      s390/qeth: cancel close_dev work before removing a card · 825e58bc
      Julian Wiedmann authored
      [ Upstream commit c2780c1a ]
      
      A card's close_dev work is scheduled on a driver-wide workqueue. If the
      card is removed and freed while the work is still active, this causes a
      use-after-free.
      So make sure that the work is completed before freeing the card.
      
      Fixes: 0f54761d ("qeth: Support VEPA mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      825e58bc
    • Julian Wiedmann's avatar
      s390/qeth: fix use-after-free in error path · 5327c553
      Julian Wiedmann authored
      [ Upstream commit afa0c590 ]
      
      The error path in qeth_alloc_qdio_buffers() that takes care of
      cleaning up the Output Queues is buggy. It first frees the queue, but
      then calls qeth_clear_outq_buffers() with that very queue struct.
      
      Make the call to qeth_clear_outq_buffers() part of the free action
      (in the correct order), and while at it fix the naming of the helper.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5327c553
    • Julian Wiedmann's avatar
      s390/qeth: release cmd buffer in error paths · 575a2461
      Julian Wiedmann authored
      [ Upstream commit 5065b2dd ]
      
      Whenever we fail before/while starting an IO, make sure to release the
      IO buffer. Usually qeth_irq() would do this for us, but if the IO
      doesn't even start we obviously won't get an interrupt for it either.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      575a2461
    • Martynas Pumputis's avatar
      netfilter: nf_nat: skip nat clash resolution for same-origin entries · 5058447b
      Martynas Pumputis authored
      [ Upstream commit 4e35c1cb ]
      
      It is possible that two concurrent packets originating from the same
      socket of a connection-less protocol (e.g. UDP) can end up having
      different IP_CT_DIR_REPLY tuples which results in one of the packets
      being dropped.
      
      To illustrate this, consider the following simplified scenario:
      
      1. Packet A and B are sent at the same time from two different threads
         by same UDP socket.  No matching conntrack entry exists yet.
         Both packets cause allocation of a new conntrack entry.
      2. get_unique_tuple gets called for A.  No clashing entry found.
         conntrack entry for A is added to main conntrack table.
      3. get_unique_tuple is called for B and will find that the reply
         tuple of B is already taken by A.
         It will allocate a new UDP source port for B to resolve the clash.
      4. conntrack entry for B cannot be added to main conntrack table
         because its ORIGINAL direction is clashing with A and the REPLY
         directions of A and B are not the same anymore due to UDP source
         port reallocation done in step 3.
      
      This patch modifies nf_conntrack_tuple_taken so it doesn't consider
      colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.
      
      [ Florian: simplify patch to not use .allow_clash setting
        and always ignore identical flows ]
      Signed-off-by: default avatarMartynas Pumputis <martynas@weave.works>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5058447b
    • Florian Westphal's avatar
      selftests: netfilter: add simple masq/redirect test cases · 5c39e08f
      Florian Westphal authored
      [ Upstream commit 98bfc341 ]
      
      Check basic nat/redirect/masquerade for ipv4 and ipv6.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c39e08f
    • Naresh Kamboju's avatar
      selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET · 974ed365
      Naresh Kamboju authored
      [ Upstream commit 952b72f8 ]
      
      In selftests the config fragment for netfilter was added as
      NF_TABLES_INET=y and this patch correct it as CONFIG_NF_TABLES_INET=y
      Signed-off-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      974ed365
    • Andy Shevchenko's avatar
      dmaengine: dmatest: Abort test in case of mapping error · 0203f0c9
      Andy Shevchenko authored
      [ Upstream commit 6454368a ]
      
      In case of mapping error the DMA addresses are invalid and continuing
      will screw system memory or potentially something else.
      
      [  222.480310] dmatest: dma0chan7-copy0: summary 1 tests, 3 failures 6 iops 349 KB/s (0)
      ...
      [  240.912725] check: Corrupted low memory at 00000000c7c75ac9 (2940 phys) = 5656000000000000
      [  240.921998] check: Corrupted low memory at 000000005715a1cd (2948 phys) = 279f2aca5595ab2b
      [  240.931280] check: Corrupted low memory at 000000002f4024c0 (2950 phys) = 5e5624f349e793cf
      ...
      
      Abort any test if mapping failed.
      
      Fixes: 4076e755 ("dmatest: convert to dmaengine_unmap_data")
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0203f0c9
    • Stefano Garzarella's avatar
      vsock/virtio: reset connected sockets on device removal · 5eae5899
      Stefano Garzarella authored
      [ Upstream commit 85965487 ]
      
      When the virtio transport device disappear, we should reset all
      connected sockets in order to inform the users.
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5eae5899
    • Stefano Garzarella's avatar
      vsock/virtio: fix kernel panic after device hot-unplug · cd201356
      Stefano Garzarella authored
      [ Upstream commit 22b5c0b6 ]
      
      virtio_vsock_remove() invokes the vsock_core_exit() also if there
      are opened sockets for the AF_VSOCK protocol family. In this way
      the vsock "transport" pointer is set to NULL, triggering the
      kernel panic at the first socket activity.
      
      This patch move the vsock_core_init()/vsock_core_exit() in the
      virtio_vsock respectively in module_init and module_exit functions,
      that cannot be invoked until there are open sockets.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699Reported-by: default avatarYan Fu <yafu@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd201356
    • Codrin Ciubotariu's avatar
      dmaengine: at_xdmac: Fix wrongfull report of a channel as in use · f3ffd455
      Codrin Ciubotariu authored
      [ Upstream commit dc3f595b ]
      
      atchan->status variable is used to store two different information:
       - pass channel interrupts status from interrupt handler to tasklet;
       - channel information like whether it is cyclic or paused;
      
      This causes a bug when device_terminate_all() is called,
      (AT_XDMAC_CHAN_IS_CYCLIC cleared on atchan->status) and then a late End
      of Block interrupt arrives (AT_XDMAC_CIS_BIS), which sets bit 0 of
      atchan->status. Bit 0 is also used for AT_XDMAC_CHAN_IS_CYCLIC, so when
      a new descriptor for a cyclic transfer is created, the driver reports
      the channel as in use:
      
      if (test_and_set_bit(AT_XDMAC_CHAN_IS_CYCLIC, &atchan->status)) {
      	dev_err(chan2dev(chan), "channel currently used\n");
      	return NULL;
      }
      
      This patch fixes the bug by adding a different struct member to keep
      the interrupts status separated from the channel status bits.
      
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarCodrin Ciubotariu <codrin.ciubotariu@microchip.com>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3ffd455
    • Paul Kocialkowski's avatar
      drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init · 7cf4466d
      Paul Kocialkowski authored
      [ Upstream commit b14e945b ]
      
      When initializing clocks, a reference to the TCON channel 0 clock is
      obtained. However, the clock is never prepared and enabled later.
      Switching from simplefb to DRM actually disables the clock (that was
      usually configured by U-Boot) because of that.
      
      On the V3s, this results in a hang when writing to some mixer registers
      when switching over to DRM from simplefb.
      
      Fix this by preparing and enabling the clock when initializing other
      clocks. Waiting for sun4i_tcon_channel_enable to enable the clock is
      apparently too late and results in the same mixer register access hang.
      Signed-off-by: default avatarPaul Kocialkowski <paul.kocialkowski@bootlin.com>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@bootlin.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190131132550.26355-1-paul.kocialkowski@bootlin.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      7cf4466d
    • Martin KaFai Lau's avatar
      bpf: Fix syscall's stackmap lookup potential deadlock · ae26a710
      Martin KaFai Lau authored
      [ Upstream commit 7c4cd051 ]
      
      The map_lookup_elem used to not acquiring spinlock
      in order to optimize the reader.
      
      It was true until commit 557c0c6e ("bpf: convert stackmap to pre-allocation")
      The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
      bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
      If that is the case, pcpu_freelist_push() saves this elem for reuse later.
      This push requires a spinlock.
      
      If a tracing bpf_prog got run in the middle of the syscall's
      map_lookup_elem(stackmap) and this tracing bpf_prog is calling
      bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
      spinlock, it may end up with a dead lock situation as reported by
      Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
      
      The situation is the same as the syscall's map_update_elem() which
      needs to acquire the pcpu_freelist's spinlock and could race
      with tracing bpf_prog.  Hence, this patch fixes it by protecting
      bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
      to prevent tracing bpf_prog from running.
      
      A later syscall's map_lookup_elem commit f1a2e44a ("bpf: add queue and stack maps")
      also acquires a spinlock and races with tracing bpf_prog similarly.
      Hence, this patch is forward looking and protects the majority
      of the map lookups.  bpf_map_offload_lookup_elem() is the exception
      since it is for network bpf_prog only (i.e. never called by tracing
      bpf_prog).
      
      Fixes: 557c0c6e ("bpf: convert stackmap to pre-allocation")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae26a710
    • Alexei Starovoitov's avatar
      bpf: fix potential deadlock in bpf_prog_register · 3bbe6a42
      Alexei Starovoitov authored
      [ Upstream commit e16ec340 ]
      
      Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
      [   13.007000] WARNING: possible circular locking dependency detected
      [   13.007587] 5.0.0-rc3-00018-g2fa53f89-dirty #477 Not tainted
      [   13.008124] ------------------------------------------------------
      [   13.008624] test_progs/246 is trying to acquire lock:
      [   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
      [   13.009770]
      [   13.009770] but task is already holding lock:
      [   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.010877]
      [   13.010877] which lock already depends on the new lock.
      [   13.010877]
      [   13.011532]
      [   13.011532] the existing dependency chain (in reverse order) is:
      [   13.012129]
      [   13.012129] -> #4 (bpf_event_mutex){+.+.}:
      [   13.012582]        perf_event_query_prog_array+0x9b/0x130
      [   13.013016]        _perf_ioctl+0x3aa/0x830
      [   13.013354]        perf_ioctl+0x2e/0x50
      [   13.013668]        do_vfs_ioctl+0x8f/0x6a0
      [   13.014003]        ksys_ioctl+0x70/0x80
      [   13.014320]        __x64_sys_ioctl+0x16/0x20
      [   13.014668]        do_syscall_64+0x4a/0x180
      [   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.015469]
      [   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
      [   13.015910]        perf_event_init_cpu+0x5a/0x90
      [   13.016291]        perf_event_init+0x1b2/0x1de
      [   13.016654]        start_kernel+0x2b8/0x42a
      [   13.016995]        secondary_startup_64+0xa4/0xb0
      [   13.017382]
      [   13.017382] -> #2 (pmus_lock){+.+.}:
      [   13.017794]        perf_event_init_cpu+0x21/0x90
      [   13.018172]        cpuhp_invoke_callback+0xb3/0x960
      [   13.018573]        _cpu_up+0xa7/0x140
      [   13.018871]        do_cpu_up+0xa4/0xc0
      [   13.019178]        smp_init+0xcd/0xd2
      [   13.019483]        kernel_init_freeable+0x123/0x24f
      [   13.019878]        kernel_init+0xa/0x110
      [   13.020201]        ret_from_fork+0x24/0x30
      [   13.020541]
      [   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
      [   13.021051]        static_key_slow_inc+0xe/0x20
      [   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
      [   13.021891]        perf_trace_event_init+0x11f/0x250
      [   13.022297]        perf_trace_init+0x6b/0xa0
      [   13.022644]        perf_tp_event_init+0x25/0x40
      [   13.023011]        perf_try_init_event+0x6b/0x90
      [   13.023386]        perf_event_alloc+0x9a8/0xc40
      [   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
      [   13.024173]        do_syscall_64+0x4a/0x180
      [   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.024968]
      [   13.024968] -> #0 (tracepoints_mutex){+.+.}:
      [   13.025434]        __mutex_lock+0x86/0x970
      [   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
      [   13.026215]        bpf_probe_register+0x40/0x60
      [   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.027042]        __do_sys_bpf+0x94f/0x1a90
      [   13.027389]        do_syscall_64+0x4a/0x180
      [   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.028171]
      [   13.028171] other info that might help us debug this:
      [   13.028171]
      [   13.028807] Chain exists of:
      [   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
      [   13.028807]
      [   13.029666]  Possible unsafe locking scenario:
      [   13.029666]
      [   13.030140]        CPU0                    CPU1
      [   13.030510]        ----                    ----
      [   13.030875]   lock(bpf_event_mutex);
      [   13.031166]                                lock(&cpuctx_mutex);
      [   13.031645]                                lock(bpf_event_mutex);
      [   13.032135]   lock(tracepoints_mutex);
      [   13.032441]
      [   13.032441]  *** DEADLOCK ***
      [   13.032441]
      [   13.032911] 1 lock held by test_progs/246:
      [   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.033909]
      [   13.033909] stack backtrace:
      [   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f89-dirty #477
      [   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   13.035657] Call Trace:
      [   13.035859]  dump_stack+0x5f/0x8b
      [   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
      [   13.036526]  __lock_acquire+0x1158/0x1350
      [   13.036852]  ? lock_acquire+0x98/0x190
      [   13.037154]  lock_acquire+0x98/0x190
      [   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.037876]  __mutex_lock+0x86/0x970
      [   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.039028]  ? __mutex_lock+0x86/0x970
      [   13.039337]  ? __mutex_lock+0x24a/0x970
      [   13.039649]  ? bpf_probe_register+0x1d/0x60
      [   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
      [   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
      [   13.041325]  bpf_probe_register+0x40/0x60
      [   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.042068]  ? __might_fault+0x3e/0x90
      [   13.042374]  __do_sys_bpf+0x94f/0x1a90
      [   13.042678]  do_syscall_64+0x4a/0x180
      [   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.043382] RIP: 0033:0x7f23b10a07f9
      [   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
      [   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
      [   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
      [   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      [   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
      
      Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
      there is no need to take bpf_event_mutex too.
      bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
      bpf_raw_tracepoints don't need to take this mutex.
      
      Fixes: c4f6699d ("bpf: introduce BPF_RAW_TRACEPOINT")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3bbe6a42
    • Alexei Starovoitov's avatar
      bpf: fix lockdep false positive in percpu_freelist · e3bc64c9
      Alexei Starovoitov authored
      [ Upstream commit a89fac57 ]
      
      Lockdep warns about false positive:
      [   12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
      [   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
      [   12.493275]  (&rq->lock){-.-.}
      [   12.493276]
      [   12.493276]
      [   12.493276] and interrupts could create inverse lock ordering between them.
      [   12.493276]
      [   12.494435]
      [   12.494435] other info that might help us debug this:
      [   12.494979]  Possible interrupt unsafe locking scenario:
      [   12.494979]
      [   12.495518]        CPU0                    CPU1
      [   12.495879]        ----                    ----
      [   12.496243]   lock(&head->lock);
      [   12.496502]                                local_irq_disable();
      [   12.496969]                                lock(&rq->lock);
      [   12.497431]                                lock(&head->lock);
      [   12.497890]   <Interrupt>
      [   12.498104]     lock(&rq->lock);
      [   12.498368]
      [   12.498368]  *** DEADLOCK ***
      [   12.498368]
      [   12.498837] 1 lock held by dd/276:
      [   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
      [   12.499747]
      [   12.499747] the shortest dependencies between 2nd lock and 1st lock:
      [   12.500389]  -> (&rq->lock){-.-.} {
      [   12.500669]     IN-HARDIRQ-W at:
      [   12.500934]                       _raw_spin_lock+0x2f/0x40
      [   12.501373]                       scheduler_tick+0x4c/0xf0
      [   12.501812]                       update_process_times+0x40/0x50
      [   12.502294]                       tick_periodic+0x27/0xb0
      [   12.502723]                       tick_handle_periodic+0x1f/0x60
      [   12.503203]                       timer_interrupt+0x11/0x20
      [   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
      [   12.504167]                       handle_irq_event_percpu+0x20/0x50
      [   12.504674]                       handle_irq_event+0x37/0x60
      [   12.505139]                       handle_level_irq+0xa7/0x120
      [   12.505601]                       handle_irq+0xa1/0x150
      [   12.506018]                       do_IRQ+0x77/0x140
      [   12.506411]                       ret_from_intr+0x0/0x1d
      [   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
      [   12.507362]                       __setup_irq+0x481/0x730
      [   12.507789]                       setup_irq+0x49/0x80
      [   12.508195]                       hpet_time_init+0x21/0x32
      [   12.508644]                       x86_late_time_init+0xb/0x16
      [   12.509106]                       start_kernel+0x390/0x42a
      [   12.509554]                       secondary_startup_64+0xa4/0xb0
      [   12.510034]     IN-SOFTIRQ-W at:
      [   12.510305]                       _raw_spin_lock+0x2f/0x40
      [   12.510772]                       try_to_wake_up+0x1c7/0x4e0
      [   12.511220]                       swake_up_locked+0x20/0x40
      [   12.511657]                       swake_up_one+0x1a/0x30
      [   12.512070]                       rcu_process_callbacks+0xc5/0x650
      [   12.512553]                       __do_softirq+0xe6/0x47b
      [   12.512978]                       irq_exit+0xc3/0xd0
      [   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
      [   12.513876]                       apic_timer_interrupt+0xf/0x20
      [   12.514343]                       default_idle+0x1c/0x170
      [   12.514765]                       do_idle+0x199/0x240
      [   12.515159]                       cpu_startup_entry+0x19/0x20
      [   12.515614]                       start_kernel+0x422/0x42a
      [   12.516045]                       secondary_startup_64+0xa4/0xb0
      [   12.516521]     INITIAL USE at:
      [   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
      [   12.517258]                      rq_attach_root+0x16/0xd0
      [   12.517685]                      sched_init+0x2f2/0x3eb
      [   12.518096]                      start_kernel+0x1fb/0x42a
      [   12.518525]                      secondary_startup_64+0xa4/0xb0
      [   12.518986]   }
      [   12.519132]   ... key      at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
      [   12.519649]   ... acquired at:
      [   12.519892]    pcpu_freelist_pop+0x7b/0xd0
      [   12.520221]    bpf_get_stackid+0x1d2/0x4d0
      [   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.520887]
      [   12.521008] -> (&head->lock){+...} {
      [   12.521292]    HARDIRQ-ON-W at:
      [   12.521539]                     _raw_spin_lock+0x2f/0x40
      [   12.521950]                     pcpu_freelist_push+0x2a/0x40
      [   12.522396]                     bpf_get_stackid+0x494/0x4d0
      [   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
      [   12.523296]    INITIAL USE at:
      [   12.523537]                    _raw_spin_lock+0x2f/0x40
      [   12.523944]                    pcpu_freelist_populate+0xc0/0x120
      [   12.524417]                    htab_map_alloc+0x405/0x500
      [   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
      [   12.525253]                    do_syscall_64+0x4a/0x180
      [   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   12.526167]  }
      [   12.526311]  ... key      at: [<ffffffff838f7668>] __key.13130+0x0/0x8
      [   12.526812]  ... acquired at:
      [   12.527047]    __lock_acquire+0x521/0x1350
      [   12.527371]    lock_acquire+0x98/0x190
      [   12.527680]    _raw_spin_lock+0x2f/0x40
      [   12.527994]    pcpu_freelist_push+0x2a/0x40
      [   12.528325]    bpf_get_stackid+0x494/0x4d0
      [   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.528970]
      [   12.529092]
      [   12.529092] stack backtrace:
      [   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f89 #475
      [   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   12.530750] Call Trace:
      [   12.530948]  dump_stack+0x5f/0x8b
      [   12.531248]  check_usage_backwards+0x10c/0x120
      [   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
      [   12.531935]  ? mark_lock+0x382/0x560
      [   12.532229]  mark_lock+0x382/0x560
      [   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
      [   12.532928]  __lock_acquire+0x521/0x1350
      [   12.533271]  ? find_get_entry+0x17f/0x2e0
      [   12.533586]  ? find_get_entry+0x19c/0x2e0
      [   12.533902]  ? lock_acquire+0x98/0x190
      [   12.534196]  lock_acquire+0x98/0x190
      [   12.534482]  ? pcpu_freelist_push+0x2a/0x40
      [   12.534810]  _raw_spin_lock+0x2f/0x40
      [   12.535099]  ? pcpu_freelist_push+0x2a/0x40
      [   12.535432]  pcpu_freelist_push+0x2a/0x40
      [   12.535750]  bpf_get_stackid+0x494/0x4d0
      [   12.536062]  ___bpf_prog_run+0x8b4/0x11a0
      
      It has been explained that is a false positive here:
      https://lkml.org/lkml/2018/7/25/756
      Recap:
      - stackmap uses pcpu_freelist
      - The lock in pcpu_freelist is a percpu lock
      - stackmap is only used by tracing bpf_prog
      - A tracing bpf_prog cannot be run if another bpf_prog
        has already been running (ensured by the percpu bpf_prog_active counter).
      
      Eric pointed out that this lockdep splats stops other
      legit lockdep splats in selftests/bpf/test_progs.c.
      
      Fix this by calling local_irq_save/restore for stackmap.
      
      Another false positive had also been worked around by calling
      local_irq_save in commit 89ad2fa3 ("bpf: fix lockdep splat").
      That commit added unnecessary irq_save/restore to fast path of
      bpf hash map. irqs are already disabled at that point, since htab
      is holding per bucket spin_lock with irqsave.
      
      Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
      function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
      to be used elsewhere (right now only in stackmap).
      It stops lockdep false positive in stackmap with a bit of acceptable overhead.
      
      Fixes: 557c0c6e ("bpf: convert stackmap to pre-allocation")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3bc64c9
    • Martynas Pumputis's avatar
      bpf, selftests: fix handling of sparse CPU allocations · 0ace0d28
      Martynas Pumputis authored
      [ Upstream commit 1bb54c40 ]
      
      Previously, bpf_num_possible_cpus() had a bug when calculating a
      number of possible CPUs in the case of sparse CPU allocations, as
      it was considering only the first range or element of
      /sys/devices/system/cpu/possible.
      
      E.g. in the case of "0,2-3" (CPU 1 is not available), the function
      returned 1 instead of 3.
      
      This patch fixes the function by making it parse all CPU ranges and
      elements.
      Signed-off-by: default avatarMartynas Pumputis <m@lambda.lt>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ace0d28
    • Greg Kroah-Hartman's avatar
      relay: check return of create_buf_file() properly · 232bd90c
      Greg Kroah-Hartman authored
      [ Upstream commit 2c1cf00e ]
      
      If create_buf_file() returns an error, don't try to reference it later
      as a valid dentry pointer.
      
      This problem was exposed when debugfs started to return errors instead
      of just NULL for some calls when they do not succeed properly.
      
      Also, the check for WARN_ON(dentry) was just wrong :)
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Reported-and-tested-by: syzbot+16c3a70e1e9b29346c43@syzkaller.appspotmail.com
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Fixes: ff9fb72b ("debugfs: return error values, not NULL")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      232bd90c
    • Zenghui Yu's avatar
      irqchip/gic-v3-its: Fix ITT_entry_size accessor · 2a5c84e1
      Zenghui Yu authored
      [ Upstream commit 56841070 ]
      
      According to ARM IHI 0069C (ID070116), we should use GITS_TYPER's
      bits [7:4] as ITT_entry_size instead of [8:4]. Although this is
      pretty annoying, it only results in a potential over-allocation
      of memory, and nothing bad happens.
      
      Fixes: 3dfa576b ("irqchip/gic-v3-its: Add probing for VLPI properties")
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      [maz: massaged subject and commit message]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a5c84e1
    • Jose Abreu's avatar
      net: stmmac: Disable EEE mode earlier in XMIT callback · fbdbb194
      Jose Abreu authored
      [ Upstream commit e2cd682d ]
      
      In stmmac xmit callback we use a different flow for TSO packets but TSO
      xmit callback is not disabling the EEE mode.
      
      Fix this by disabling earlier the EEE mode, i.e. before calling the TSO
      xmit callback.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fbdbb194
    • Jose Abreu's avatar
      net: stmmac: Send TSO packets always from Queue 0 · 496eaed7
      Jose Abreu authored
      [ Upstream commit c5acdbee ]
      
      The number of TSO enabled channels in HW can be different than the
      number of total channels. There is no way to determined, at runtime, the
      number of TSO capable channels and its safe to assume that if TSO is
      enabled then at least channel 0 will be TSO capable.
      
      Lets always send TSO packets from Queue 0.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      496eaed7
    • Jose Abreu's avatar
      net: stmmac: Fallback to Platform Data clock in Watchdog conversion · 46ba03c5
      Jose Abreu authored
      [ Upstream commit 4ec5302f ]
      
      If we don't have DT then stmmac_clk will not be available. Let's add a
      new Platform Data field so that we can specify the refclk by this mean.
      
      This way we can still use the coalesce command in PCI based setups.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      46ba03c5
    • Chris Wilson's avatar
      drm/amdgpu: Transfer fences to dmabuf importer · 8096bc39
      Chris Wilson authored
      [ Upstream commit 6e11ea9d ]
      
      amdgpu only uses shared-fences internally, but dmabuf importers rely on
      implicit write hazard tracking via the reservation_object.fence_excl.
      For example, the importer use the write hazard for timing a page flip to
      only occur after the exporter has finished flushing its write into the
      surface. As such, on exporting a dmabuf, we must either flush all
      outstanding fences (for we do not know which are writes and should have
      been exclusive) or alternatively create a new exclusive fence that is
      the composite of all the existing shared fences, and so will only be
      signaled when all earlier fences are signaled (ensuring that we can not
      be signaled before the completion of any earlier write).
      
      v2: reservation_object is already locked by amdgpu_bo_reserve()
      v3: Replace looping with get_fences_rcu and special case the promotion
      of a single shared fence directly to an exclusive fence, bypassing the
      fence array.
      v4: Drop the fence array ref after assigning to reservation_object
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
      Testcase: igt/amd_prime/amd-to-i915
      References: 8e94a46c ("drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Reviewed-by: default avatar"Christian König" <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8096bc39
    • Alex Deucher's avatar
      drm/radeon: check if device is root before getting pci speed caps · 4ec880d7
      Alex Deucher authored
      [ Upstream commit afeff4c1 ]
      
      Check if the device is root rather before attempting to see what
      speeds the pcie port supports.  Fixes a crash with pci passthrough
      in a VM.
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109366Reviewed-by: default avatarEvan Quan <evan.quan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ec880d7
    • Alex Deucher's avatar
      drm/amdgpu: Add missing power attribute to APU check · 09439238
      Alex Deucher authored
      [ Upstream commit dc14eb12 ]
      
      Add missing power_average to visible check for power
      attributes for APUs.  Was missed before.
      Reviewed-by: default avatarEvan Quan <evan.quan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09439238
    • Lubomir Rintel's avatar
      irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable · 1bf79102
      Lubomir Rintel authored
      [ Upstream commit 2380a22b ]
      
      Resetting bit 4 disables the interrupt delivery to the "secure
      processor" core. This breaks the keyboard on a OLPC XO 1.75 laptop,
      where the firmware running on the "secure processor" bit-bangs the
      PS/2 protocol over the GPIO lines.
      
      It is not clear what the rest of the bits are and Marvell was unhelpful
      when asked for documentation. Aside from the SP bit, there are probably
      priority bits.
      
      Leaving the unknown bits as the firmware set them up seems to be a wiser
      course of action compared to just turning them off.
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      [maz: fixed-up subject and commit message]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1bf79102
    • Marc Zyngier's avatar
      irqchip/gic-v3-its: Gracefully fail on LPI exhaustion · 423869f8
      Marc Zyngier authored
      [ Upstream commit 45725e0f ]
      
      In the unlikely event that we cannot find any available LPI in the
      system, we should gracefully return an error instead of carrying
      on with no LPI allocated at all.
      
      Fixes: 38dd7c49 ("irqchip/gic-v3-its: Drop chunk allocation compatibility")
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      423869f8
    • Heyi Guo's avatar
      irqchip/gic-v4: Fix occasional VLPI drop · dc81cfaf
      Heyi Guo authored
      [ Upstream commit 6479450f ]
      
      1. In current implementation, every VLPI will temporarily be mapped to
      the first CPU in system (normally CPU0) and then moved to the real
      scheduled CPU later.
      
      2. So there is a time window and a VLPI may be sent to CPU0 instead of
      the real scheduled vCPU, in a multi-CPU virtual machine.
      
      3. However, CPU0 may have not been scheduled as a virtual CPU after
      system boots up, so the value of its GICR_VPROPBASER is unknown at
      that moment.
      
      4. If the INTID of VLPI is larger than 2^(GICR_VPROPBASER.IDbits+1),
      while IDbits is also in unknown state, GIC will behave as if the VLPI
      is out of range and simply drop it, which results in interrupt missing
      in Guest.
      
      As no code will clear GICR_VPROPBASER at runtime, we can safely
      initialize the IDbits field at boot time for each CPU to get rid of
      this issue.
      
      We also clear Valid bit of GICR_VPENDBASER in case any ancient
      programming gets left in and causes memory corrupting. A new function
      its_clear_vpend_valid() is added to reuse the code in
      its_vpe_deschedule().
      
      Fixes: e643d803 ("irqchip/gic-v3-its: Add VPE scheduling")
      Signed-off-by: default avatarHeyi Guo <guoheyi@huawei.com>
      Signed-off-by: default avatarHeyi Guo <heyi.guo@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc81cfaf