1. 03 Apr, 2019 40 commits
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support headset mode for New DELL WYSE NB · 89ec6d40
      Kailang Yang authored
      commit da484d00 upstream.
      
      Enable headset mode support for new WYSE NB platform.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89ec6d40
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support headset mode for DELL WYSE AIO · 522f06c9
      Kailang Yang authored
      commit 136824ef upstream.
      
      This patch will enable WYSE AIO for Headset mode.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      522f06c9
    • Takashi Iwai's avatar
      ALSA: pcm: Don't suspend stream in unrecoverable PCM state · 5b93302b
      Takashi Iwai authored
      commit 113ce081 upstream.
      
      Currently PCM core sets each opened stream forcibly to SUSPENDED state
      via snd_pcm_suspend_all() call, and the user-space is responsible for
      re-triggering the resume manually either via snd_pcm_resume() or
      prepare call.  The scheme works fine usually, but there are corner
      cases where the stream can't be resumed by that call: the streams
      still in OPEN state before finishing hw_params.  When they are
      suspended, user-space cannot perform resume or prepare because they
      haven't been set up yet.  The only possible recovery is to re-open the
      device, which isn't nice at all.  Similarly, when a stream is in
      DISCONNECTED state, it makes no sense to change it to SUSPENDED
      state.  Ditto for in SETUP state; which you can re-prepare directly.
      
      So, this patch addresses these issues by filtering the PCM streams to
      be suspended by checking the PCM state.  When a stream is in either
      OPEN, SETUP or DISCONNECTED as well as already SUSPENDED, the suspend
      action is skipped.
      
      To be noted, this problem was originally reported for the PCM runtime
      PM on HD-audio.  And, the runtime PM problem itself was already
      addressed (although not intended) by the code refactoring commits
      3d21ef0b ("ALSA: pcm: Suspend streams globally via device type PM
      ops") and 17bc4815 ("ALSA: pci: Remove superfluous
      snd_pcm_suspend*() calls").  These commits eliminated the
      snd_pcm_suspend*() calls from the runtime PM suspend callback code
      path, hence the racy OPEN state won't appear while runtime PM.
      (FWIW, the race window is between snd_pcm_open_substream() and the
      first power up in azx_pcm_open().)
      
      Although the runtime PM issue was already "fixed", the same problem is
      still present for the system PM, hence this patch is still needed.
      And for stable trees, this patch alone should suffice for fixing the
      runtime PM problem, too.
      Reported-and-tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b93302b
    • Takashi Iwai's avatar
      ALSA: pcm: Fix possible OOB access in PCM oss plugins · 7fc6064d
      Takashi Iwai authored
      commit ca0214ee upstream.
      
      The PCM OSS emulation converts and transfers the data on the fly via
      "plugins".  The data is converted over the dynamically allocated
      buffer for each plugin, and recently syzkaller caught OOB in this
      flow.
      
      Although the bisection by syzbot pointed out to the commit
      65766ee0 ("ALSA: oss: Use kvzalloc() for local buffer
      allocations"), this is merely a commit to replace vmalloc() with
      kvmalloc(), hence it can't be the cause.  The further debug action
      revealed that this happens in the case where a slave PCM doesn't
      support only the stereo channels while the OSS stream is set up for a
      mono channel.  Below is a brief explanation:
      
      At each OSS parameter change, the driver sets up the PCM hw_params
      again in snd_pcm_oss_change_params_lock().  This is also the place
      where plugins are created and local buffers are allocated.  The
      problem is that the plugins are created before the final hw_params is
      determined.  Namely, two snd_pcm_hw_param_near() calls for setting the
      period size and periods may influence on the final result of channels,
      rates, etc, too, while the current code has already created plugins
      beforehand with the premature values.  So, the plugin believes that
      channels=1, while the actual I/O is with channels=2, which makes the
      driver reading/writing over the allocated buffer size.
      
      The fix is simply to move the plugin allocation code after the final
      hw_params call.
      
      Reported-by: syzbot+d4503ae45b65c5bc1194@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fc6064d
    • Gustavo A. R. Silva's avatar
      ALSA: seq: oss: Fix Spectre v1 vulnerability · b425f452
      Gustavo A. R. Silva authored
      commit c709f14f upstream.
      
      dev is indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      sound/core/seq/oss/seq_oss_synth.c:626 snd_seq_oss_synth_make_info() warn: potential spectre issue 'dp->synths' [w] (local cap)
      
      Fix this by sanitizing dev before using it to index dp->synths.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b425f452
    • Gustavo A. R. Silva's avatar
      ALSA: rawmidi: Fix potential Spectre v1 vulnerability · bd55e672
      Gustavo A. R. Silva authored
      commit 2b1d9c8f upstream.
      
      info->stream is indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      sound/core/rawmidi.c:604 __snd_rawmidi_info_select() warn: potential spectre issue 'rmidi->streams' [r] (local cap)
      
      Fix this by sanitizing info->stream before using it to index
      rmidi->streams.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd55e672
    • Christian Lamparter's avatar
      net: dsa: qca8k: remove leftover phy accessors · a485919f
      Christian Lamparter authored
      commit 1eec7151 upstream.
      
      This belated patch implements Andrew Lunn's request of
      "remove the phy_read() and phy_write() functions."
      <https://lore.kernel.org/patchwork/comment/902734/>
      
      While seemingly harmless, this causes the switch's user
      port PHYs to get registered twice. This is because the
      DSA subsystem will create a slave mdio-bus not knowing
      that the qca8k_phy_(read|write) accessors operate on
      the external mdio-bus. So the same "bus" gets effectively
      duplicated.
      
      Cc: stable@vger.kernel.org
      Fixes: 6b93fb46 ("net-next: dsa: add new driver for qca8xxx family")
      Signed-off-by: default avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a485919f
    • Olga Kornievskaia's avatar
      NFSv4.1 don't free interrupted slot on open · 64751542
      Olga Kornievskaia authored
      commit 0cb98abb upstream.
      
      Allow the async rpc task for finish and update the open state if needed,
      then free the slot. Otherwise, the async rpc unable to decode the reply.
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Fixes: ae55e59d ("pnfs: Don't release the sequence slot...")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64751542
    • NeilBrown's avatar
      NFS: fix mount/umount race in nlmclnt. · da57cba4
      NeilBrown authored
      commit 4a9be28c upstream.
      
      If the last NFSv3 unmount from a given host races with a mount from the
      same host, we can destroy an nlm_host that is still in use.
      
      Specifically nlmclnt_lookup_host() can increment h_count on
      an nlm_host that nlmclnt_release_host() has just successfully called
      refcount_dec_and_test() on.
      Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
      will be called to destroy the nlmclnt which is now in use again.
      
      The cause of the problem is that the dec_and_test happens outside the
      locked region.  This is easily fixed by using
      refcount_dec_and_mutex_lock().
      
      Fixes: 8ea6ecc8 ("lockd: Create client-side nlm_host cache")
      Cc: stable@vger.kernel.org (v2.6.38+)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da57cba4
    • Cornelia Huck's avatar
      vfio: ccw: only free cp on final interrupt · 0f273f0c
      Cornelia Huck authored
      commit 50b7f1b7 upstream.
      
      When we get an interrupt for a channel program, it is not
      necessarily the final interrupt; for example, the issuing
      guest may request an intermediate interrupt by specifying
      the program-controlled-interrupt flag on a ccw.
      
      We must not switch the state to idle if the interrupt is not
      yet final; even more importantly, we must not free the translated
      channel program if the interrupt is not yet final, or the host
      can crash during cp rewind.
      
      Fixes: e5f84dba ("vfio: ccw: return I/O results asynchronously")
      Cc: stable@vger.kernel.org # v4.12+
      Reviewed-by: default avatarEric Farman <farman@linux.ibm.com>
      Signed-off-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f273f0c
    • Naveen N. Rao's avatar
      powerpc: bpf: Fix generation of load/store DW instructions · 92d4ee2e
      Naveen N. Rao authored
      commit 86be36f6 upstream.
      
      Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
      was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
      macro is not masking away the last two bits of the offset per the ISA,
      resulting in the generation of 'lwa' instruction instead of the intended
      'ld' instruction.
      
      Segher also pointed out that we can't simply mask away the last two bits
      as that will result in loading/storing from/to a memory location that
      was not intended.
      
      This patch addresses this by using ldx/stdx if the offset is not
      word-aligned. We load the offset into a temporary register (TMP_REG_2)
      and use that as the index register in a subsequent ldx/stdx. We fix
      PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
      and PPC_BPF_STL() to factor in the offset value and generate the proper
      instruction sequence. We also convert all existing users of PPC_LD() and
      PPC_STD() to use these macros. All existing uses of these macros have
      been audited to ensure that TMP_REG_2 can be clobbered.
      
      Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: default avatarYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92d4ee2e
    • Kohji Okuno's avatar
      ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time · 9397f0d9
      Kohji Okuno authored
      commit 91740fc8 upstream.
      
      In the current cpuidle implementation for i.MX6q, the CPU that sets
      'WAIT_UNCLOCKED' and the CPU that returns to 'WAIT_CLOCKED' are always
      the same. While the CPU that sets 'WAIT_UNCLOCKED' is in IDLE state of
      "WAIT", if the other CPU wakes up and enters IDLE state of "WFI"
      istead of "WAIT", this CPU can not wake up at expired time.
       Because, in the case of "WFI", the CPU must be waked up by the local
      timer interrupt. But, while 'WAIT_UNCLOCKED' is set, the local timer
      is stopped, when all CPUs execute "wfi" instruction. As a result, the
      local timer interrupt is not fired.
       In this situation, this CPU will wake up by IRQ different from local
      timer. (e.g. broacast timer)
      
      So, this fix changes CPU to return to 'WAIT_CLOCKED'.
      Signed-off-by: default avatarKohji Okuno <okuno.kohji@jp.panasonic.com>
      Fixes: e5f9dec8 ("ARM: imx6q: support WAIT mode using cpuidle")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9397f0d9
    • Filipe Manana's avatar
      Btrfs: fix assertion failure on fsync with NO_HOLES enabled · fd1b2536
      Filipe Manana authored
      commit 0ccc3876 upstream.
      
      Back in commit a89ca6f2 ("Btrfs: fix fsync after truncate when
      no_holes feature is enabled") I added an assertion that is triggered when
      an inline extent is found to assert that the length of the (uncompressed)
      data the extent represents is the same as the i_size of the inode, since
      that is true most of the time I couldn't find or didn't remembered about
      any exception at that time. Later on the assertion was expanded twice to
      deal with a case of a compressed inline extent representing a range that
      matches the sector size followed by an expanding truncate, and another
      case where fallocate can update the i_size of the inode without adding
      or updating existing extents (if the fallocate range falls entirely within
      the first block of the file). These two expansion/fixes of the assertion
      were done by commit 7ed586d0 ("Btrfs: fix assertion on fsync of
      regular file when using no-holes feature") and commit 6399fb5a
      ("Btrfs: fix assertion failure during fsync in no-holes mode").
      These however missed the case where an falloc expands the i_size of an
      inode to exactly the sector size and inline extent exists, for example:
      
       $ mkfs.btrfs -f -O no-holes /dev/sdc
       $ mount /dev/sdc /mnt
      
       $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar
       wrote 1096/1096 bytes at offset 0
       1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec)
      
       $ xfs_io -c "falloc 1096 3000" /mnt/foobar
       $ xfs_io -c "fsync" /mnt/foobar
       Segmentation fault
      
       $ dmesg
       [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727
       [701253.602962] ------------[ cut here ]------------
       [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533!
       [701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G        W         5.0.0-rc8-btrfs-next-45 #1
       [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs]
       (...)
       [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286
       [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000
       [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868
       [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000
       [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18
       [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80
       [701253.607769] FS:  00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000
       [701253.608163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0
       [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [701253.609608] Call Trace:
       [701253.609994]  btrfs_log_inode+0xdfb/0xe40 [btrfs]
       [701253.610383]  btrfs_log_inode_parent+0x2be/0xa60 [btrfs]
       [701253.610770]  ? do_raw_spin_unlock+0x49/0xc0
       [701253.611150]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
       [701253.611537]  btrfs_sync_file+0x3b2/0x440 [btrfs]
       [701253.612010]  ? do_sysinfo+0xb0/0xf0
       [701253.612552]  do_fsync+0x38/0x60
       [701253.612988]  __x64_sys_fsync+0x10/0x20
       [701253.613360]  do_syscall_64+0x60/0x1b0
       [701253.613733]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [701253.614103] RIP: 0033:0x7f14da4e66d0
       (...)
       [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
       [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0
       [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003
       [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010
       [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260
       [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10
       (...)
       [701253.619941] ---[ end trace e088d74f132b6da5 ]---
      
      Updating the assertion again to allow for this particular case would result
      in a meaningless assertion, plus there is currently no risk of logging
      content that would result in any corruption after a log replay if the size
      of the data encoded in an inline extent is greater than the inode's i_size
      (which is not currently possibe either with or without compression),
      therefore just remove the assertion.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd1b2536
    • Nikolay Borisov's avatar
      btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size · 0ae3b84b
      Nikolay Borisov authored
      commit 139a5617 upstream.
      
      qgroup_rsv_size is calculated as the product of
      outstanding_extent * fs_info->nodesize. The product is calculated with
      32 bit precision since both variables are defined as u32. Yet
      qgroup_rsv_size expects a 64 bit result.
      
      Avoid possible multiplication overflow by casting outstanding_extent to
      u64. Such overflow would in the worst case (64K nodesize) require more
      than 65536 extents, which is quite large and i'ts not likely that it
      would happen in practice.
      
      Fixes-coverity-id: 1435101
      Fixes: ff6bc37e ("btrfs: qgroup: Use independent and accurate per inode qgroup rsv")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ae3b84b
    • Andrea Righi's avatar
      btrfs: raid56: properly unmap parity page in finish_parity_scrub() · 1cf4ab01
      Andrea Righi authored
      commit 3897b6f0 upstream.
      
      Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
      a reference counter bug on i386, i.e.:
      
       [ 157.662401] kernel BUG at mm/highmem.c:349!
       [ 157.666725] invalid opcode: 0000 [#1] SMP PTI
      
      The reason is that kunmap(p_page) was completely left out, so we never
      did an unmap for the p_page and the loop unmapping the rbio page was
      iterating over the wrong number of stripes: unmapping should be done
      with nr_data instead of rbio->real_stripes.
      
      Test case to reproduce the bug:
      
       - create a raid5 btrfs filesystem:
         # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde
      
       - mount it:
         # mount /dev/sdb /mnt
      
       - run btrfs scrub in a loop:
         # while :; do btrfs scrub start -BR /mnt; done
      
      BugLink: https://bugs.launchpad.net/bugs/1812845
      Fixes: 5a6ac9ea ("Btrfs, raid56: support parity scrub on raid56")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cf4ab01
    • David Sterba's avatar
      btrfs: don't report readahead errors and don't update statistics · d952c337
      David Sterba authored
      commit 0cc068e6 upstream.
      
      As readahead is an optimization, all errors are usually filtered out,
      but still properly handled when the real read call is done. The commit
      5e9d3982 ("btrfs: readpages() should submit IO as read-ahead") added
      REQ_RAHEAD to readpages() because that's only used for readahead
      (despite what one would expect from the callback name).
      
      This causes a flood of messages and inflated read error stats, so skip
      reporting in case it's readahead.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403Reported-by: default avatarLimeTech <tomm@lime-technology.com>
      Fixes: 5e9d3982 ("btrfs: readpages() should submit IO as read-ahead")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d952c337
    • Josef Bacik's avatar
      btrfs: remove WARN_ON in log_dir_items · b57220cc
      Josef Bacik authored
      commit 2cc83342 upstream.
      
      When Filipe added the recursive directory logging stuff in
      2f2ff0ee ("Btrfs: fix metadata inconsistencies after directory
      fsync") he specifically didn't take the directory i_mutex for the
      children directories that we need to log because of lockdep.  This is
      generally fine, but can lead to this WARN_ON() tripping if we happen to
      run delayed deletion's in between our first search and our second search
      of dir_item/dir_indexes for this directory.  We expect this to happen,
      so the WARN_ON() isn't necessary.  Drop the WARN_ON() and add a comment
      so we know why this case can happen.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b57220cc
    • Filipe Manana's avatar
      Btrfs: fix incorrect file size after shrinking truncate and fsync · 22dcb30f
      Filipe Manana authored
      commit bf504110 upstream.
      
      If we do a shrinking truncate against an inode which is already present
      in the respective log tree and then rename it, as part of logging the new
      name we end up logging an inode item that reflects the old size of the
      file (the one which we previously logged) and not the new smaller size.
      The decision to preserve the size previously logged was added by commit
      1a4bcf47 ("Btrfs: fix fsync data loss after adding hard link to
      inode") in order to avoid data loss after replaying the log. However that
      decision is only needed for the case the logged inode size is smaller then
      the current size of the inode, as explained in that commit's change log.
      If the current size of the inode is smaller then the previously logged
      size, we know a shrinking truncate happened and therefore need to use
      that smaller size.
      
      Example to trigger the problem:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ xfs_io -f -c "pwrite -S 0xab 0 8000" /mnt/foo
        $ xfs_io -c "fsync" /mnt/foo
        $ xfs_io -c "truncate 3000" /mnt/foo
      
        $ mv /mnt/foo /mnt/bar
        $ xfs_io -c "fsync" /mnt/bar
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ od -t x1 -A d /mnt/bar
        0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
        *
        0008000
      
      Once we rename the file, we log its name (and inode item), and because
      the inode was already logged before in the current transaction, we log it
      with a size of 8000 bytes because that is the size we previously logged
      (with the first fsync). As part of the rename, besides logging the inode,
      we do also sync the log, which is done since commit d4682ba0
      ("Btrfs: sync log after logging new name"), so the next fsync against our
      inode is effectively a no-op, since no new changes happened since the
      rename operation. Even if did not sync the log during the rename
      operation, the same problem (fize size of 8000 bytes instead of 3000
      bytes) would be visible after replaying the log if the log ended up
      getting synced to disk through some other means, such as for example by
      fsyncing some other modified file. In the example above the fsync after
      the rename operation is there just because not every filesystem may
      guarantee logging/journalling the inode (and syncing the log/journal)
      during the rename operation, for example it is needed for f2fs, but not
      for ext4 and xfs.
      
      Fix this scenario by, when logging a new name (which is triggered by
      rename and link operations), using the current size of the inode instead
      of the previously logged inode size.
      
      A test case for fstests follows soon.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202695
      CC: stable@vger.kernel.org # 4.4+
      Reported-by: default avatarSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22dcb30f
    • Michael Ellerman's avatar
      powerpc/security: Fix spectre_v2 reporting · a1df5db3
      Michael Ellerman authored
      commit 92edf8df upstream.
      
      When I updated the spectre_v2 reporting to handle software count cache
      flush I got the logic wrong when there's no software count cache
      enabled at all.
      
      The result is that on systems with the software count cache flush
      disabled we print:
      
        Mitigation: Indirect branch cache disabled, Software count cache flush
      
      Which correctly indicates that the count cache is disabled, but
      incorrectly says the software count cache flush is enabled.
      
      The root of the problem is that we are trying to handle all
      combinations of options. But we know now that we only expect to see
      the software count cache flush enabled if the other options are false.
      
      So split the two cases, which simplifies the logic and fixes the bug.
      We were also missing a space before "(hardware accelerated)".
      
      The result is we see one of:
      
        Mitigation: Indirect branch serialisation (kernel only)
        Mitigation: Indirect branch cache disabled
        Mitigation: Software count cache flush
        Mitigation: Software count cache flush (hardware accelerated)
      
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarMichael Neuling <mikey@neuling.org>
      Reviewed-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1df5db3
    • Christophe Leroy's avatar
      powerpc/fsl: Fix the flush of branch predictor. · 986f0c65
      Christophe Leroy authored
      commit 27da8071 upstream.
      
      The commit identified below adds MC_BTB_FLUSH macro only when
      CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
      on some configs (seen several times with kisskb randconfig_defconfig)
      
      arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
      make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
      make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
      make[1]: *** [Makefile:1043: arch/powerpc] Error 2
      make: *** [Makefile:152: sub-make] Error 2
      
      This patch adds a blank definition of MC_BTB_FLUSH for other cases.
      
      Fixes: 10c5e83a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
      Cc: Diana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      Reviewed-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      986f0c65
    • Diana Craciun's avatar
      powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup' · b848d19c
      Diana Craciun authored
      commit 039daac5 upstream.
      
      Fixed the following build warning:
      powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from
      `arch/powerpc/kernel/head_44x.o' being placed in section
      `__btb_flush_fixup'.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b848d19c
    • Diana Craciun's avatar
      powerpc/fsl: Update Spectre v2 reporting · 632d8392
      Diana Craciun authored
      commit dfa88658 upstream.
      
      Report branch predictor state flush as a mitigation for
      Spectre variant 2.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      632d8392
    • Diana Craciun's avatar
      powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used · 43f40620
      Diana Craciun authored
      commit 3bc8ea86 upstream.
      
      If the user choses not to use the mitigations, replace
      the code sequence with nops.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43f40620
    • Diana Craciun's avatar
      powerpc/fsl: Flush branch predictor when entering KVM · a46a5038
      Diana Craciun authored
      commit e7aa61f4 upstream.
      
      Switching from the guest to host is another place
      where the speculative accesses can be exploited.
      Flush the branch predictor when entering KVM.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a46a5038
    • Diana Craciun's avatar
      powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit) · 3cb931c7
      Diana Craciun authored
      commit 7fef4362 upstream.
      
      In order to protect against speculation attacks on
      indirect branches, the branch predictor is flushed at
      kernel entry to protect for the following situations:
      - userspace process attacking another userspace process
      - userspace process attacking the kernel
      Basically when the privillege level change (i.e.the kernel
      is entered), the branch predictor state is flushed.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cb931c7
    • Diana Craciun's avatar
      powerpc/fsl: Flush the branch predictor at each kernel entry (64bit) · cf72dad9
      Diana Craciun authored
      commit 10c5e83a upstream.
      
      In order to protect against speculation attacks on
      indirect branches, the branch predictor is flushed at
      kernel entry to protect for the following situations:
      - userspace process attacking another userspace process
      - userspace process attacking the kernel
      Basically when the privillege level change (i.e. the
      kernel is entered), the branch predictor state is flushed.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf72dad9
    • Diana Craciun's avatar
      powerpc/fsl: Add nospectre_v2 command line argument · 020e5f13
      Diana Craciun authored
      commit f633a8ad upstream.
      
      When the command line argument is present, the Spectre variant 2
      mitigations are disabled.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      020e5f13
    • Diana Craciun's avatar
      powerpc/fsl: Emulate SPRN_BUCSR register · 4a6a2287
      Diana Craciun authored
      commit 98518c4d upstream.
      
      In order to flush the branch predictor the guest kernel performs
      writes to the BUCSR register which is hypervisor privilleged. However,
      the branch predictor is flushed at each KVM entry, so the branch
      predictor has been already flushed, so just return as soon as possible
      to guest.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      [mpe: Tweak comment formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6a2287
    • Diana Craciun's avatar
      powerpc/fsl: Add macro to flush the branch predictor · 4944f1d4
      Diana Craciun authored
      commit 1cbf8990 upstream.
      
      The BUCSR register can be used to invalidate the entries in the
      branch prediction mechanisms.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4944f1d4
    • Diana Craciun's avatar
      powerpc/fsl: Add infrastructure to fixup branch predictor flush · d67ab3d9
      Diana Craciun authored
      commit 76a5eaa3 upstream.
      
      In order to protect against speculation attacks (Spectre
      variant 2) on NXP PowerPC platforms, the branch predictor
      should be flushed when the privillege level is changed.
      This patch is adding the infrastructure to fixup at runtime
      the code sections that are performing the branch predictor flush
      depending on a boot arg parameter which is added later in a
      separate patch.
      Signed-off-by: default avatarDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d67ab3d9
    • Eric Dumazet's avatar
      tun: add a missing rcu_read_unlock() in error path · e044d21c
      Eric Dumazet authored
      commit 9180bb4f upstream.
      
      In my latest patch I missed one rcu_read_unlock(), in case
      device is down.
      
      Fixes: 4477138f ("tun: properly test for IFF_UP")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e044d21c
    • Dean Nelson's avatar
      thunderx: eliminate extra calls to put_page() for pages held for recycling · 6bdb5fdc
      Dean Nelson authored
      [ Upstream commit cd35ef91 ]
      
      For the non-XDP case, commit 77322538 ("net: thunderx: Optimize
      page recycling for XDP") added code to nicvf_free_rbdr() that, when releasing
      the additional receive buffer page reference held for recycling, repeatedly
      calls put_page() until the page's _refcount goes to zero. Which results in
      the page being freed.
      
      This is not okay if the page's _refcount was greater than 1 (in the non-XDP
      case), because nicvf_free_rbdr() should not be subtracting more than what
      nicvf_alloc_page() had previously added to the page's _refcount, which was
      only 1 (in the non-XDP case).
      
      This can arise if a received packet is still being processed and the receive
      buffer (i.e., skb->head) has not yet been freed via skb_free_head() when
      nicvf_free_rbdr() is spinning through the aforementioned put_page() loop.
      
      If this should occur, when the received packet finishes processing and
      skb_free_head() is called, various problems can ensue. Exactly what, depends on
      whether the page has already been reallocated or not, anything from "BUG: Bad
      page state ... ", to "Unable to handle kernel NULL pointer dereference ..." or
      "Unable to handle kernel paging request...".
      
      So this patch changes nicvf_free_rbdr() to only call put_page() once for pages
      held for recycling (in the non-XDP case).
      
      Fixes: 77322538 ("net: thunderx: Optimize page recycling for XDP")
      Signed-off-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bdb5fdc
    • Dean Nelson's avatar
      thunderx: enable page recycling for non-XDP case · ac8411d7
      Dean Nelson authored
      [ Upstream commit b3e20806 ]
      
      Commit 77322538 ("net: thunderx: Optimize page recycling for XDP")
      added code to nicvf_alloc_page() that inadvertently disables receive buffer
      page recycling for the non-XDP case by always NULL'ng the page pointer.
      
      This patch corrects two if-conditionals to allow for the recycling of non-XDP
      mode pages by only setting the page pointer to NULL when the page is not ready
      for recycling.
      
      Fixes: 77322538 ("net: thunderx: Optimize page recycling for XDP")
      Signed-off-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac8411d7
    • John Hurley's avatar
      net: sched: fix cleanup NULL pointer exception in act_mirr · a491de90
      John Hurley authored
      [ Upstream commit 064c5d68 ]
      
      A new mirred action is created by the tcf_mirred_init function. This
      contains a list head struct which is inserted into a global list on
      successful creation of a new action. However, after a creation, it is
      still possible to error out and call the tcf_idr_release function. This,
      in turn, calls the act_mirr cleanup function via __tcf_idr_release and
      __tcf_action_put. This cleanup function tries to delete the list entry
      which is as yet uninitialised, leading to a NULL pointer exception.
      
      Fix this by initialising the list entry on creation of a new action.
      
      Bug report:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
      Oops: 0002 [#1] SMP PTI
      CPU: 32 PID: 5636 Comm: handler194 Tainted: G           OE     5.0.0+ #186
      Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
      RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
      Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
      RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
      RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
      R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
      FS:  00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
      Call Trace:
      tcf_action_cleanup+0x59/0xca
      __tcf_action_put+0x54/0x6b
      __tcf_idr_release.cold.33+0x9/0x12
      tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
      tcf_action_init_1+0x3d0/0x4c0
      tcf_action_init+0x9c/0x130
      tcf_exts_validate+0xab/0xc0
      fl_change+0x1ca/0x982 [cls_flower]
      tc_new_tfilter+0x647/0x8d0
      ? load_balance+0x14b/0x9e0
      rtnetlink_rcv_msg+0xe3/0x370
      ? __switch_to_asm+0x40/0x70
      ? __switch_to_asm+0x34/0x70
      ? _cond_resched+0x15/0x30
      ? __kmalloc_node_track_caller+0x1d4/0x2b0
      ? rtnl_calcit.isra.31+0xf0/0xf0
      netlink_rcv_skb+0x49/0x110
      netlink_unicast+0x16f/0x210
      netlink_sendmsg+0x1df/0x390
      sock_sendmsg+0x36/0x40
      ___sys_sendmsg+0x27b/0x2c0
      ? futex_wake+0x80/0x140
      ? do_futex+0x2b9/0xac0
      ? ep_scan_ready_list.constprop.22+0x1f2/0x210
      ? ep_poll+0x7a/0x430
      __sys_sendmsg+0x47/0x80
      do_syscall_64+0x55/0x100
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 4e232818 ("net: sched: act_mirred: remove dependency on rtnl lock")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a491de90
    • Herbert Xu's avatar
      ila: Fix rhashtable walker list corruption · 7254ad09
      Herbert Xu authored
      [ Upstream commit b5f9bd15 ]
      
      ila_xlat_nl_cmd_flush uses rhashtable walkers allocated from the
      stack but it never frees them.  This corrupts the walker list of
      the hash table.
      
      This patch fixes it.
      
      Reported-by: syzbot+dae72a112334aa65a159@syzkaller.appspotmail.com
      Fixes: b6e71bde ("ila: Flush netlink command to clear xlat...")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7254ad09
    • Zhiqiang Liu's avatar
      vxlan: Don't call gro_cells_destroy() before device is unregistered · 979f8a67
      Zhiqiang Liu authored
      [ Upstream commit cc4807bb ]
      
      Commit ad6c9986 ("vxlan: Fix GRO cells race condition between
      receive and link delete") fixed a race condition for the typical case a vxlan
      device is dismantled from the current netns. But if a netns is dismantled,
      vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue()
      of all the vxlan tunnels that are related to this netns.
      
      In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before
      unregister_netdevice_queue(). This means that the gro_cells_destroy() call is
      done too soon, for the same reasons explained in above commit.
      
      So we need to fully respect the RCU rules, and thus must remove the
      gro_cells_destroy() call or risk use after-free.
      
      Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")
      Signed-off-by: default avatarSuanming.Mou <mousuanming@huawei.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      979f8a67
    • Sabrina Dubroca's avatar
      vrf: prevent adding upper devices · 3b1386be
      Sabrina Dubroca authored
      [ Upstream commit 1017e098 ]
      
      VRF devices don't work with upper devices. Currently, it's possible to
      add a VRF device to a bridge or team, and to create macvlan, macsec, or
      ipvlan devices on top of a VRF (bond and vlan are prevented respectively
      by the lack of an ndo_set_mac_address op and the NETIF_F_VLAN_CHALLENGED
      feature flag).
      
      Fix this by setting the IFF_NO_RX_HANDLER flag (introduced in commit
      f5426250 ("net: introduce IFF_NO_RX_HANDLER")).
      
      Cc: David Ahern <dsahern@gmail.com>
      Fixes: 193125db ("net: Introduce VRF device driver")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b1386be
    • Eric Dumazet's avatar
      tun: properly test for IFF_UP · 8ea78da1
      Eric Dumazet authored
      [ Upstream commit 4477138f ]
      
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro layer.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ea78da1
    • Erik Hugne's avatar
      tipc: fix cancellation of topology subscriptions · 52a7505c
      Erik Hugne authored
      [ Upstream commit 33872d79 ]
      
      When cancelling a subscription, we have to clear the cancel bit in the
      request before iterating over any established subscriptions with memcmp.
      Otherwise no subscription will ever be found, and it will not be
      possible to explicitly unsubscribe individual subscriptions.
      
      Fixes: 8985ecc7 ("tipc: simplify endianness handling in topology subscriber")
      Signed-off-by: default avatarErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52a7505c
    • Xin Long's avatar
      tipc: change to check tipc_own_id to return in tipc_net_stop · 1be6c0c7
      Xin Long authored
      [ Upstream commit 9926cb5f ]
      
      When running a syz script, a panic occurred:
      
      [  156.088228] BUG: KASAN: use-after-free in tipc_disc_timeout+0x9c9/0xb20 [tipc]
      [  156.094315] Call Trace:
      [  156.094844]  <IRQ>
      [  156.095306]  dump_stack+0x7c/0xc0
      [  156.097346]  print_address_description+0x65/0x22e
      [  156.100445]  kasan_report.cold.3+0x37/0x7a
      [  156.102402]  tipc_disc_timeout+0x9c9/0xb20 [tipc]
      [  156.106517]  call_timer_fn+0x19a/0x610
      [  156.112749]  run_timer_softirq+0xb51/0x1090
      
      It was caused by the netns freed without deleting the discoverer timer,
      while later on the netns would be accessed in the timer handler.
      
      The timer should have been deleted by tipc_net_stop() when cleaning up a
      netns. However, tipc has been able to enable a bearer and start d->timer
      without the local node_addr set since Commit 52dfae5c ("tipc: obtain
      node identity from interface by default"), which caused the timer not to
      be deleted in tipc_net_stop() then.
      
      So fix it in tipc_net_stop() by changing to check local node_id instead
      of local node_addr, as Jon suggested.
      
      While at it, remove the calling of tipc_nametbl_withdraw() there, since
      tipc_nametbl_stop() will take of the nametbl's freeing after.
      
      Fixes: 52dfae5c ("tipc: obtain node identity from interface by default")
      Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1be6c0c7