1. 24 Feb, 2020 35 commits
    • Arvind Sankar's avatar
      x86/sysfb: Fix check for bad VRAM size · 0ef2661d
      Arvind Sankar authored
      [ Upstream commit dacc9092 ]
      
      When checking whether the reported lfb_size makes sense, the height
      * stride result is page-aligned before seeing whether it exceeds the
      reported size.
      
      This doesn't work if height * stride is not an exact number of pages.
      For example, as reported in the kernel bugzilla below, an 800x600x32 EFI
      framebuffer gets skipped because of this.
      
      Move the PAGE_ALIGN to after the check vs size.
      Reported-by: default avatarChristopher Head <chead@chead.ca>
      Tested-by: default avatarChristopher Head <chead@chead.ca>
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206051
      Link: https://lkml.kernel.org/r/20200107230410.2291947-1-nivedita@alum.mit.eduSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ef2661d
    • Kai Li's avatar
      jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal · 8d8a4711
      Kai Li authored
      [ Upstream commit a09decff ]
      
      If the journal is dirty when the filesystem is mounted, jbd2 will replay
      the journal but the journal superblock will not be updated by
      journal_reset() because JBD2_ABORT flag is still set (it was set in
      journal_init_common()). This is problematic because when a new transaction
      is then committed, it will be recorded in block 1 (journal->j_tail was set
      to 1 in journal_reset()). If unclean shutdown happens again before the
      journal superblock is updated, the new recorded transaction will not be
      replayed during the next mount (because of stale sb->s_start and
      sb->s_sequence values) which can lead to filesystem corruption.
      
      Fixes: 85e0c4e8 ("jbd2: if the journal is aborted then don't allow update of the log tail")
      Signed-off-by: default avatarKai Li <li.kai4@h3c.com>
      Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8d8a4711
    • Siddhesh Poyarekar's avatar
      kselftest: Minimise dependency of get_size on C library interfaces · 0ee2c886
      Siddhesh Poyarekar authored
      [ Upstream commit 6b64a650 ]
      
      It was observed[1] on arm64 that __builtin_strlen led to an infinite
      loop in the get_size selftest.  This is because __builtin_strlen (and
      other builtins) may sometimes result in a call to the C library
      function.  The C library implementation of strlen uses an IFUNC
      resolver to load the most efficient strlen implementation for the
      underlying machine and hence has a PLT indirection even for static
      binaries.  Because this binary avoids the C library startup routines,
      the PLT initialization never happens and hence the program gets stuck
      in an infinite loop.
      
      On x86_64 the __builtin_strlen just happens to expand inline and avoid
      the call but that is not always guaranteed.
      
      Further, while testing on x86_64 (Fedora 31), it was observed that the
      test also failed with a segfault inside write() because the generated
      code for the write function in glibc seems to access TLS before the
      syscall (probably due to the cancellation point check) and fails
      because TLS is not initialised.
      
      To mitigate these problems, this patch reduces the interface with the
      C library to just the syscall function.  The syscall function still
      sets errno on failure, which is undesirable but for now it only
      affects cases where syscalls fail.
      
      [1] https://bugs.linaro.org/show_bug.cgi?id=5479Signed-off-by: default avatarSiddhesh Poyarekar <siddhesh@gotplt.org>
      Reported-by: default avatarMasami Hiramatsu <masami.hiramatsu@linaro.org>
      Tested-by: default avatarMasami Hiramatsu <masami.hiramatsu@linaro.org>
      Reviewed-by: default avatarTim Bird <tim.bird@sony.com>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ee2c886
    • Colin Ian King's avatar
      clocksource/drivers/bcm2835_timer: Fix memory leak of timer · be1777ba
      Colin Ian King authored
      [ Upstream commit 2052d032 ]
      
      Currently when setup_irq fails the error exit path will leak the
      recently allocated timer structure.  Originally the code would
      throw a panic but a later commit changed the behaviour to return
      via the err_iounmap path and hence we now have a memory leak. Fix
      this by adding a err_timer_free error path that kfree's timer.
      
      Addresses-Coverity: ("Resource Leak")
      Fixes: 524a7f08 ("clocksource/drivers/bcm2835_timer: Convert init function to return error")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Link: https://lore.kernel.org/r/20191219213246.34437-1-colin.king@canonical.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      be1777ba
    • John Keeping's avatar
      usb: dwc2: Fix IN FIFO allocation · 39a80bbf
      John Keeping authored
      [ Upstream commit 644139f8 ]
      
      On chips with fewer FIFOs than endpoints (for example RK3288 which has 9
      endpoints, but only 6 which are cabable of input), the DPTXFSIZN
      registers above the FIFO count may return invalid values.
      
      With logging added on startup, I see:
      
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=1 sz=256
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=2 sz=128
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=3 sz=128
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=4 sz=64
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=5 sz=64
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=6 sz=32
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=7 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=8 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=9 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=10 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=11 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=12 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=13 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=14 sz=0
      	dwc2 ff580000.usb: dwc2_hsotg_init_fifo: ep=15 sz=0
      
      but:
      
      	# cat /sys/kernel/debug/ff580000.usb/fifo
      	Non-periodic FIFOs:
      	RXFIFO: Size 275
      	NPTXFIFO: Size 16, Start 0x00000113
      
      	Periodic TXFIFOs:
      		DPTXFIFO 1: Size 256, Start 0x00000123
      		DPTXFIFO 2: Size 128, Start 0x00000223
      		DPTXFIFO 3: Size 128, Start 0x000002a3
      		DPTXFIFO 4: Size 64, Start 0x00000323
      		DPTXFIFO 5: Size 64, Start 0x00000363
      		DPTXFIFO 6: Size 32, Start 0x000003a3
      		DPTXFIFO 7: Size 0, Start 0x000003e3
      		DPTXFIFO 8: Size 0, Start 0x000003a3
      		DPTXFIFO 9: Size 256, Start 0x00000123
      
      so it seems that FIFO 9 is mirroring FIFO 1.
      
      Fix the allocation by using the FIFO count instead of the endpoint count
      when selecting a FIFO for an endpoint.
      Acked-by: default avatarMinas Harutyunyan <hminas@synopsys.com>
      Signed-off-by: default avatarJohn Keeping <john@metanate.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      39a80bbf
    • Jia-Ju Bai's avatar
      usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe() · 6c053825
      Jia-Ju Bai authored
      [ Upstream commit 9c1ed62a ]
      
      The driver may sleep while holding a spinlock.
      The function call path (from bottom to top) in Linux 4.19 is:
      
      drivers/usb/gadget/udc/core.c, 1175:
      	kzalloc(GFP_KERNEL) in usb_add_gadget_udc_release
      drivers/usb/gadget/udc/core.c, 1272:
      	usb_add_gadget_udc_release in usb_add_gadget_udc
      drivers/usb/gadget/udc/gr_udc.c, 2186:
      	usb_add_gadget_udc in gr_probe
      drivers/usb/gadget/udc/gr_udc.c, 2183:
      	spin_lock in gr_probe
      
      drivers/usb/gadget/udc/core.c, 1195:
      	mutex_lock in usb_add_gadget_udc_release
      drivers/usb/gadget/udc/core.c, 1272:
      	usb_add_gadget_udc_release in usb_add_gadget_udc
      drivers/usb/gadget/udc/gr_udc.c, 2186:
      	usb_add_gadget_udc in gr_probe
      drivers/usb/gadget/udc/gr_udc.c, 2183:
      	spin_lock in gr_probe
      
      drivers/usb/gadget/udc/gr_udc.c, 212:
      	debugfs_create_file in gr_probe
      drivers/usb/gadget/udc/gr_udc.c, 2197:
      	gr_dfs_create in gr_probe
      drivers/usb/gadget/udc/gr_udc.c, 2183:
          spin_lock in gr_probe
      
      drivers/usb/gadget/udc/gr_udc.c, 2114:
      	devm_request_threaded_irq in gr_request_irq
      drivers/usb/gadget/udc/gr_udc.c, 2202:
      	gr_request_irq in gr_probe
      drivers/usb/gadget/udc/gr_udc.c, 2183:
          spin_lock in gr_probe
      
      kzalloc(GFP_KERNEL), mutex_lock(), debugfs_create_file() and
      devm_request_threaded_irq() can sleep at runtime.
      
      To fix these possible bugs, usb_add_gadget_udc(), gr_dfs_create() and
      gr_request_irq() are called without handling the spinlock.
      
      These bugs are found by a static analysis tool STCheck written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6c053825
    • Jia-Ju Bai's avatar
      uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol() · ea6b7b1d
      Jia-Ju Bai authored
      [ Upstream commit b7435128 ]
      
      The driver may sleep while holding a spinlock.
      The function call path (from bottom to top) in Linux 4.19 is:
      
      kernel/irq/manage.c, 523:
      	synchronize_irq in disable_irq
      drivers/uio/uio_dmem_genirq.c, 140:
      	disable_irq in uio_dmem_genirq_irqcontrol
      drivers/uio/uio_dmem_genirq.c, 134:
      	_raw_spin_lock_irqsave in uio_dmem_genirq_irqcontrol
      
      synchronize_irq() can sleep at runtime.
      
      To fix this bug, disable_irq() is called without holding the spinlock.
      
      This bug is found by a static analysis tool STCheck written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Link: https://lore.kernel.org/r/20191218094405.6009-1-baijiaju1990@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ea6b7b1d
    • David S. Miller's avatar
      sparc: Add .exit.data section. · 73a1803c
      David S. Miller authored
      [ Upstream commit 548f0b9a ]
      
      This fixes build errors of all sorts.
      
      Also, emit .exit.text unconditionally.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      73a1803c
    • Tiezhu Yang's avatar
      MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init() · 2ebbbc9b
      Tiezhu Yang authored
      [ Upstream commit 72d052e2 ]
      
      If kzalloc fails, it should return -ENOMEM, otherwise may trigger a NULL
      pointer dereference.
      
      Fixes: 3adeb256 ("MIPS: Loongson: Improve LEFI firmware interface")
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarPaul Burton <paulburton@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2ebbbc9b
    • Ard Biesheuvel's avatar
      efi/x86: Map the entire EFI vendor string before copying it · cf8938b1
      Ard Biesheuvel authored
      [ Upstream commit ffc2760b ]
      
      Fix a couple of issues with the way we map and copy the vendor string:
      - we map only 2 bytes, which usually works since you get at least a
        page, but if the vendor string happens to cross a page boundary,
        a crash will result
      - only call early_memunmap() if early_memremap() succeeded, or we will
        call it with a NULL address which it doesn't like,
      - while at it, switch to early_memremap_ro(), and array indexing rather
        than pointer dereferencing to read the CHAR16 characters.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Fixes: 5b83683f ("x86: EFI runtime service support")
      Link: https://lkml.kernel.org/r/20200103113953.9571-5-ardb@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cf8938b1
    • Hans de Goede's avatar
      pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins · 0a8a859f
      Hans de Goede authored
      [ Upstream commit a2368059 ]
      
      Suspending Goodix touchscreens requires changing the interrupt pin to
      output before sending them a power-down command. Followed by wiggling
      the interrupt pin to wake the device up, after which it is put back
      in input mode.
      
      On Bay Trail devices with a Goodix touchscreen direct-irq mode is used
      in combination with listing the pin as a normal GpioIo resource.
      
      This works fine, until the goodix driver gets rmmod-ed and then insmod-ed
      again. In this case byt_gpio_disable_free() calls
      byt_gpio_clear_triggering() which clears the IRQ flags and after that the
      (direct) IRQ no longer triggers.
      
      This commit fixes this by adding a check for the BYT_DIRECT_IRQ_EN flag
      to byt_gpio_clear_triggering().
      
      Note that byt_gpio_clear_triggering() only gets called from
      byt_gpio_disable_free() for direct-irq enabled pins, as these are excluded
      from the irq_valid mask by byt_init_irq_valid_mask().
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a8a859f
    • Jia-Ju Bai's avatar
      media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run() · 47505a7d
      Jia-Ju Bai authored
      [ Upstream commit bb6d4206 ]
      
      The driver may sleep while holding a spinlock.
      The function call path (from bottom to top) in Linux 4.19 is:
      
      drivers/media/platform/sti/bdisp/bdisp-hw.c, 385:
          msleep in bdisp_hw_reset
      drivers/media/platform/sti/bdisp/bdisp-v4l2.c, 341:
          bdisp_hw_reset in bdisp_device_run
      drivers/media/platform/sti/bdisp/bdisp-v4l2.c, 317:
          _raw_spin_lock_irqsave in bdisp_device_run
      
      To fix this bug, msleep() is replaced with udelay().
      
      This bug is found by a static analysis tool STCheck written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: default avatarFabien Dessenne <fabien.dessenne@st.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47505a7d
    • Sergey Senozhatsky's avatar
      char/random: silence a lockdep splat with printk() · 15341b1d
      Sergey Senozhatsky authored
      [ Upstream commit 1b710b1b ]
      
      Sergey didn't like the locking order,
      
      uart_port->lock  ->  tty_port->lock
      
      uart_write (uart_port->lock)
        __uart_start
          pl011_start_tx
            pl011_tx_chars
              uart_write_wakeup
                tty_port_tty_wakeup
                  tty_port_default
                    tty_port_tty_get (tty_port->lock)
      
      but those code is so old, and I have no clue how to de-couple it after
      checking other locks in the splat. There is an onging effort to make all
      printk() as deferred, so until that happens, workaround it for now as a
      short-term fix.
      
      LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s
      read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$
      1000b:$TMPDIR/doio.f2.$$)
      WARNING: possible circular locking dependency detected
      ------------------------------------------------------
      doio/49441 is trying to acquire lock:
      ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050
      
      but task is already holding lock:
      60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #4 (&pool->lock/1){-.-.}:
             lock_acquire+0x320/0x360
             _raw_spin_lock+0x64/0x80
             __queue_work+0x4b4/0xa10
             queue_work_on+0xac/0x11c
             tty_schedule_flip+0x84/0xbc
             tty_flip_buffer_push+0x1c/0x28
             pty_write+0x98/0xd0
             n_tty_write+0x450/0x60c
             tty_write+0x338/0x474
             __vfs_write+0x88/0x214
             vfs_write+0x12c/0x1a4
             redirected_tty_write+0x90/0xdc
             do_loop_readv_writev+0x140/0x180
             do_iter_write+0xe0/0x10c
             vfs_writev+0x134/0x1cc
             do_writev+0xbc/0x130
             __arm64_sys_writev+0x58/0x8c
             el0_svc_handler+0x170/0x240
             el0_sync_handler+0x150/0x250
             el0_sync+0x164/0x180
      
        -> #3 (&(&port->lock)->rlock){-.-.}:
             lock_acquire+0x320/0x360
             _raw_spin_lock_irqsave+0x7c/0x9c
             tty_port_tty_get+0x24/0x60
             tty_port_default_wakeup+0x1c/0x3c
             tty_port_tty_wakeup+0x34/0x40
             uart_write_wakeup+0x28/0x44
             pl011_tx_chars+0x1b8/0x270
             pl011_start_tx+0x24/0x70
             __uart_start+0x5c/0x68
             uart_write+0x164/0x1c8
             do_output_char+0x33c/0x348
             n_tty_write+0x4bc/0x60c
             tty_write+0x338/0x474
             redirected_tty_write+0xc0/0xdc
             do_loop_readv_writev+0x140/0x180
             do_iter_write+0xe0/0x10c
             vfs_writev+0x134/0x1cc
             do_writev+0xbc/0x130
             __arm64_sys_writev+0x58/0x8c
             el0_svc_handler+0x170/0x240
             el0_sync_handler+0x150/0x250
             el0_sync+0x164/0x180
      
        -> #2 (&port_lock_key){-.-.}:
             lock_acquire+0x320/0x360
             _raw_spin_lock+0x64/0x80
             pl011_console_write+0xec/0x2cc
             console_unlock+0x794/0x96c
             vprintk_emit+0x260/0x31c
             vprintk_default+0x54/0x7c
             vprintk_func+0x218/0x254
             printk+0x7c/0xa4
             register_console+0x734/0x7b0
             uart_add_one_port+0x734/0x834
             pl011_register_port+0x6c/0xac
             sbsa_uart_probe+0x234/0x2ec
             platform_drv_probe+0xd4/0x124
             really_probe+0x250/0x71c
             driver_probe_device+0xb4/0x200
             __device_attach_driver+0xd8/0x188
             bus_for_each_drv+0xbc/0x110
             __device_attach+0x120/0x220
             device_initial_probe+0x20/0x2c
             bus_probe_device+0x54/0x100
             device_add+0xae8/0xc2c
             platform_device_add+0x278/0x3b8
             platform_device_register_full+0x238/0x2ac
             acpi_create_platform_device+0x2dc/0x3a8
             acpi_bus_attach+0x390/0x3cc
             acpi_bus_attach+0x108/0x3cc
             acpi_bus_attach+0x108/0x3cc
             acpi_bus_attach+0x108/0x3cc
             acpi_bus_scan+0x7c/0xb0
             acpi_scan_init+0xe4/0x304
             acpi_init+0x100/0x114
             do_one_initcall+0x348/0x6a0
             do_initcall_level+0x190/0x1fc
             do_basic_setup+0x34/0x4c
             kernel_init_freeable+0x19c/0x260
             kernel_init+0x18/0x338
             ret_from_fork+0x10/0x18
      
        -> #1 (console_owner){-...}:
             lock_acquire+0x320/0x360
             console_lock_spinning_enable+0x6c/0x7c
             console_unlock+0x4f8/0x96c
             vprintk_emit+0x260/0x31c
             vprintk_default+0x54/0x7c
             vprintk_func+0x218/0x254
             printk+0x7c/0xa4
             get_random_u64+0x1c4/0x1dc
             shuffle_pick_tail+0x40/0xac
             __free_one_page+0x424/0x710
             free_one_page+0x70/0x120
             __free_pages_ok+0x61c/0xa94
             __free_pages_core+0x1bc/0x294
             memblock_free_pages+0x38/0x48
             __free_pages_memory+0xcc/0xfc
             __free_memory_core+0x70/0x78
             free_low_memory_core_early+0x148/0x18c
             memblock_free_all+0x18/0x54
             mem_init+0xb4/0x17c
             mm_init+0x14/0x38
             start_kernel+0x19c/0x530
      
        -> #0 (&(&zone->lock)->rlock){..-.}:
             validate_chain+0xf6c/0x2e2c
             __lock_acquire+0x868/0xc2c
             lock_acquire+0x320/0x360
             _raw_spin_lock+0x64/0x80
             rmqueue+0x138/0x2050
             get_page_from_freelist+0x474/0x688
             __alloc_pages_nodemask+0x3b4/0x18dc
             alloc_pages_current+0xd0/0xe0
             alloc_slab_page+0x2b4/0x5e0
             new_slab+0xc8/0x6bc
             ___slab_alloc+0x3b8/0x640
             kmem_cache_alloc+0x4b4/0x588
             __debug_object_init+0x778/0x8b4
             debug_object_init_on_stack+0x40/0x50
             start_flush_work+0x16c/0x3f0
             __flush_work+0xb8/0x124
             flush_work+0x20/0x30
             xlog_cil_force_lsn+0x88/0x204 [xfs]
             xfs_log_force_lsn+0x128/0x1b8 [xfs]
             xfs_file_fsync+0x3c4/0x488 [xfs]
             vfs_fsync_range+0xb0/0xd0
             generic_write_sync+0x80/0xa0 [xfs]
             xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
             xfs_file_write_iter+0x1a0/0x218 [xfs]
             __vfs_write+0x1cc/0x214
             vfs_write+0x12c/0x1a4
             ksys_write+0xb0/0x120
             __arm64_sys_write+0x54/0x88
             el0_svc_handler+0x170/0x240
             el0_sync_handler+0x150/0x250
             el0_sync+0x164/0x180
      
             other info that might help us debug this:
      
       Chain exists of:
         &(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&pool->lock/1);
                                     lock(&(&port->lock)->rlock);
                                     lock(&pool->lock/1);
        lock(&(&zone->lock)->rlock);
      
                      *** DEADLOCK ***
      
      4 locks held by doio/49441:
       #0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4
       #1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at:
      xfs_ilock+0x2a8/0x300 [xfs]
       #2: ffff9000129f2390 (rcu_read_lock){....}, at:
      rcu_lock_acquire+0x8/0x38
       #3: 60ff000822352818 (&pool->lock/1){-.-.}, at:
      start_flush_work+0xd8/0x3f0
      
                     stack backtrace:
      CPU: 48 PID: 49441 Comm: doio Tainted: G        W
      Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS
      L50_5.13_1.11 06/18/2019
      Call trace:
       dump_backtrace+0x0/0x248
       show_stack+0x20/0x2c
       dump_stack+0xe8/0x150
       print_circular_bug+0x368/0x380
       check_noncircular+0x28c/0x294
       validate_chain+0xf6c/0x2e2c
       __lock_acquire+0x868/0xc2c
       lock_acquire+0x320/0x360
       _raw_spin_lock+0x64/0x80
       rmqueue+0x138/0x2050
       get_page_from_freelist+0x474/0x688
       __alloc_pages_nodemask+0x3b4/0x18dc
       alloc_pages_current+0xd0/0xe0
       alloc_slab_page+0x2b4/0x5e0
       new_slab+0xc8/0x6bc
       ___slab_alloc+0x3b8/0x640
       kmem_cache_alloc+0x4b4/0x588
       __debug_object_init+0x778/0x8b4
       debug_object_init_on_stack+0x40/0x50
       start_flush_work+0x16c/0x3f0
       __flush_work+0xb8/0x124
       flush_work+0x20/0x30
       xlog_cil_force_lsn+0x88/0x204 [xfs]
       xfs_log_force_lsn+0x128/0x1b8 [xfs]
       xfs_file_fsync+0x3c4/0x488 [xfs]
       vfs_fsync_range+0xb0/0xd0
       generic_write_sync+0x80/0xa0 [xfs]
       xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
       xfs_file_write_iter+0x1a0/0x218 [xfs]
       __vfs_write+0x1cc/0x214
       vfs_write+0x12c/0x1a4
       ksys_write+0xb0/0x120
       __arm64_sys_write+0x54/0x88
       el0_svc_handler+0x170/0x240
       el0_sync_handler+0x150/0x250
       el0_sync+0x164/0x180
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/1573679785-21068-1-git-send-email-cai@lca.pwSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      15341b1d
    • Jacob Pan's avatar
      iommu/vt-d: Fix off-by-one in PASID allocation · 4802b257
      Jacob Pan authored
      [ Upstream commit 39d630e3 ]
      
      PASID allocator uses IDR which is exclusive for the end of the
      allocation range. There is no need to decrement pasid_max.
      
      Fixes: af395073 ("iommu/vt-d: Apply global PASID in SVA")
      Reported-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4802b257
    • Jia-Ju Bai's avatar
      gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap() · 442b50c0
      Jia-Ju Bai authored
      [ Upstream commit e36eaf94 ]
      
      The driver may sleep while holding a spinlock.
      The function call path (from bottom to top) in Linux 4.19 is:
      
      drivers/gpio/gpio-grgpio.c, 261:
      	request_irq in grgpio_irq_map
      drivers/gpio/gpio-grgpio.c, 255:
      	_raw_spin_lock_irqsave in grgpio_irq_map
      
      drivers/gpio/gpio-grgpio.c, 318:
      	free_irq in grgpio_irq_unmap
      drivers/gpio/gpio-grgpio.c, 299:
      	_raw_spin_lock_irqsave in grgpio_irq_unmap
      
      request_irq() and free_irq() can sleep at runtime.
      
      To fix these bugs, request_irq() and free_irq() are called without
      holding the spinlock.
      
      These bugs are found by a static analysis tool STCheck written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Link: https://lore.kernel.org/r/20191218132605.10594-1-baijiaju1990@gmail.comSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      442b50c0
    • Oliver O'Halloran's avatar
      powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number · 67f7f0c7
      Oliver O'Halloran authored
      [ Upstream commit 3b5b9997 ]
      
      On pseries there is a bug with adding hotplugged devices to an IOMMU
      group. For a number of dumb reasons fixing that bug first requires
      re-working how VFs are configured on PowerNV. For background, on
      PowerNV we use the pcibios_sriov_enable() hook to do two things:
      
        1. Create a pci_dn structure for each of the VFs, and
        2. Configure the PHB's internal BARs so the MMIO range for each VF
           maps to a unique PE.
      
      Roughly speaking a PE is the hardware counterpart to a Linux IOMMU
      group since all the devices in a PE share the same IOMMU table. A PE
      also defines the set of devices that should be isolated in response to
      a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all
      MMIO and DMA traffic to and from devicein the PE is blocked by the
      root complex until the PE is recovered by the OS.
      
      The requirement to block MMIO causes a giant headache because the P8
      PHB generally uses a fixed mapping between MMIO addresses and PEs. As
      a result we need to delay configuring the IOMMU groups for device
      until after MMIO resources are assigned. For physical devices (i.e.
      non-VFs) the PE assignment is done in pcibios_setup_bridge() which is
      called immediately after the MMIO resources for downstream
      devices (and the bridge's windows) are assigned. For VFs the setup is
      more complicated because:
      
        a) pcibios_setup_bridge() is not called again when VFs are activated, and
        b) The pci_dev for VFs are created by generic code which runs after
           pcibios_sriov_enable() is called.
      
      The work around for this is a two step process:
      
        1. A fixup in pcibios_add_device() is used to initialised the cached
           pe_number in pci_dn, then
        2. A bus notifier then adds the device to the IOMMU group for the PE
           specified in pci_dn->pe_number.
      
      A side effect fixing the pseries bug mentioned in the first paragraph
      is moving the fixup out of pcibios_add_device() and into
      pcibios_bus_add_device(), which is called much later. This results in
      step 2. failing because pci_dn->pe_number won't be initialised when
      the bus notifier is run.
      
      We can fix this by removing the need for the fixup. The PE for a VF is
      known before the VF is even scanned so we can initialise
      pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately,
      moving the initialisation causes two problems:
      
        1. We trip the WARN_ON() in the current fixup code, and
        2. The EEH core clears pdn->pe_number when recovering a VF and
           relies on the fixup to correctly re-set it.
      
      The only justification for either of these is a comment in
      eeh_rmv_device() suggesting that pdn->pe_number *must* be set to
      IODA_INVALID_PE in order for the VF to be scanned. However, this
      comment appears to have no basis in reality. Both bugs can be fixed by
      just deleting the code.
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      67f7f0c7
    • Eugen Hristev's avatar
      media: i2c: mt9v032: fix enum mbus codes and frame sizes · 03ac6ed4
      Eugen Hristev authored
      [ Upstream commit 1451d5ae ]
      
      This driver supports both the mt9v032 (color) and the mt9v022 (mono)
      sensors. Depending on which sensor is used, the format from the sensor is
      different. The format.code inside the dev struct holds this information.
      The enum mbus and enum frame sizes need to take into account both type of
      sensors, not just the color one. To solve this, use the format.code in
      these functions instead of the hardcoded bayer color format (which is only
      used for mt9v032).
      
      [Sakari Ailus: rewrapped commit message]
      Suggested-by: default avatarWenyou Yang <wenyou.yang@microchip.com>
      Signed-off-by: default avatarEugen Hristev <eugen.hristev@microchip.com>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      03ac6ed4
    • Christophe JAILLET's avatar
      pxa168fb: Fix the function used to release some memory in an error handling path · 8cc5aa5c
      Christophe JAILLET authored
      [ Upstream commit 3c911fe7 ]
      
      In the probe function, some resources are allocated using 'dma_alloc_wc()',
      they should be released with 'dma_free_wc()', not 'dma_free_coherent()'.
      
      We already use 'dma_free_wc()' in the remove function, but not in the
      error handling path of the probe function.
      
      Also, remove a useless 'PAGE_ALIGN()'. 'info->fix.smem_len' is already
      PAGE_ALIGNed.
      
      Fixes: 638772c7 ("fb: add support of LCD display controller on pxa168/910 (base layer)")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      CC: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190831100024.3248-1-christophe.jaillet@wanadoo.frSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      8cc5aa5c
    • Geert Uytterhoeven's avatar
      pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs · e5c8d49b
      Geert Uytterhoeven authored
      [ Upstream commit 55b1cb1f ]
      
      pinmux_func_gpios[] contains a hole due to the missing function GPIO
      definition for the "CTX0&CTX1" signal, which is the logical "AND" of the
      two CAN outputs.
      
      Fix this by:
        - Renaming CRX0_CRX1_MARK to CTX0_CTX1_MARK, as PJ2MD[2:0]=010
          configures the combined "CTX0&CTX1" output signal,
        - Renaming CRX0X1_MARK to CRX0_CRX1_MARK, as PJ3MD[1:0]=10 configures
          the shared "CRX0/CRX1" input signal, which is fed to both CAN
          inputs,
        - Adding the missing function GPIO definition for "CTX0&CTX1" to
          pinmux_func_gpios[],
        - Moving all CAN enums next to each other.
      
      See SH7262 Group, SH7264 Group User's Manual: Hardware, Rev. 4.00:
        [1] Figure 1.2 (3) (Pin Assignment for the SH7264 Group (1-Mbyte
            Version),
        [2] Figure 1.2 (4) Pin Assignment for the SH7264 Group (640-Kbyte
            Version,
        [3] Table 1.4 List of Pins,
        [4] Figure 20.29 Connection Example when Using This Module as 1-Channel
            Module (64 Mailboxes x 1 Channel),
        [5] Table 32.10 Multiplexed Pins (Port J),
        [6] Section 32.2.30 (3) Port J Control Register 0 (PJCR0).
      
      Note that the last 2 disagree about PJ2MD[2:0], which is probably the
      root cause of this bug.  But considering [4], "CTx0&CTx1" in [5] must
      be correct, and "CRx0&CRx1" in [6] must be wrong.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Link: https://lore.kernel.org/r/20191218194812.12741-4-geert+renesas@glider.beSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5c8d49b
    • Vladimir Oltean's avatar
      gianfar: Fix TX timestamping with a stacked DSA driver · 195e54e6
      Vladimir Oltean authored
      [ Upstream commit c26a2c2d ]
      
      The driver wrongly assumes that it is the only entity that can set the
      SKBTX_IN_PROGRESS bit of the current skb. Therefore, in the
      gfar_clean_tx_ring function, where the TX timestamp is collected if
      necessary, the aforementioned bit is used to discriminate whether or not
      the TX timestamp should be delivered to the socket's error queue.
      
      But a stacked driver such as a DSA switch can also set the
      SKBTX_IN_PROGRESS bit, which is actually exactly what it should do in
      order to denote that the hardware timestamping process is undergoing.
      
      Therefore, gianfar would misinterpret the "in progress" bit as being its
      own, and deliver a second skb clone in the socket's error queue,
      completely throwing off a PTP process which is not expecting to receive
      it, _even though_ TX timestamping is not enabled for gianfar.
      
      There have been discussions [0] as to whether non-MAC drivers need or
      not to set SKBTX_IN_PROGRESS at all (whose purpose is to avoid sending 2
      timestamps, a sw and a hw one, to applications which only expect one).
      But as of this patch, there are at least 2 PTP drivers that would break
      in conjunction with gianfar: the sja1105 DSA switch and the felix
      switch, by way of its ocelot core driver.
      
      So regardless of that conclusion, fix the gianfar driver to not do stuff
      based on flags set by others and not intended for it.
      
      [0]: https://www.spinics.net/lists/netdev/msg619699.html
      
      Fixes: f0ee7acf ("gianfar: Add hardware TX timestamping support")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      195e54e6
    • Takashi Sakamoto's avatar
      ALSA: ctl: allow TLV read operation for callback type of element in locked case · 2dbae70b
      Takashi Sakamoto authored
      [ Upstream commit d61fe22c ]
      
      A design of ALSA control core allows applications to execute three
      operations for TLV feature; read, write and command. Furthermore, it
      allows driver developers to process the operations by two ways; allocated
      array or callback function. In the former, read operation is just allowed,
      thus developers uses the latter when device driver supports variety of
      models or the target model is expected to dynamically change information
      stored in TLV container.
      
      The core also allows applications to lock any element so that the other
      applications can't perform write operation to the element for element
      value and TLV information. When the element is locked, write and command
      operation for TLV information are prohibited as well as element value.
      Any read operation should be allowed in the case.
      
      At present, when an element has callback function for TLV information,
      TLV read operation returns EPERM if the element is locked. On the
      other hand, the read operation is success when an element has allocated
      array for TLV information. In both cases, read operation is success for
      element value expectedly.
      
      This commit fixes the bug. This change can be backported to v4.14
      kernel or later.
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20191223093347.15279-1-o-takashi@sakamocchi.jpSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2dbae70b
    • Ritesh Harjani's avatar
      ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT · 428bb08a
      Ritesh Harjani authored
      [ Upstream commit f629afe3 ]
      
      Apparently our current rwsem code doesn't like doing the trylock, then
      lock for real scheme.  So change our dax read/write methods to just do the
      trylock for the RWF_NOWAIT case.
      This seems to fix AIM7 regression in some scalable filesystems upto ~25%
      in some cases. Claimed in commit 942491c9 ("xfs: fix AIM7 regression")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMatthew Bobrowski <mbobrowski@mbobrowski.org>
      Tested-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      428bb08a
    • Zahari Petkov's avatar
      leds: pca963x: Fix open-drain initialization · 44d748f2
      Zahari Petkov authored
      [ Upstream commit 69752909 ]
      
      Before commit bb29b9cc ("leds: pca963x: Add bindings to invert
      polarity") Mode register 2 was initialized directly with either 0x01
      or 0x05 for open-drain or totem pole (push-pull) configuration.
      
      Afterwards, MODE2 initialization started using bitwise operations on
      top of the default MODE2 register value (0x05). Using bitwise OR for
      setting OUTDRV with 0x01 and 0x05 does not produce correct results.
      When open-drain is used, instead of setting OUTDRV to 0, the driver
      keeps it as 1:
      
      Open-drain: 0x05 | 0x01 -> 0x05 (0b101 - incorrect)
      Totem pole: 0x05 | 0x05 -> 0x05 (0b101 - correct but still wrong)
      
      Now OUTDRV setting uses correct bitwise operations for initialization:
      
      Open-drain: 0x05 & ~0x04 -> 0x01 (0b001 - correct)
      Totem pole: 0x05 | 0x04 -> 0x05 (0b101 - correct)
      
      Additional MODE2 register definitions are introduced now as well.
      
      Fixes: bb29b9cc ("leds: pca963x: Add bindings to invert polarity")
      Signed-off-by: default avatarZahari Petkov <zahari@balena.io>
      Signed-off-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      44d748f2
    • Dan Carpenter's avatar
      brcmfmac: Fix use after free in brcmf_sdio_readframes() · ead1cee8
      Dan Carpenter authored
      [ Upstream commit 216b4400 ]
      
      The brcmu_pkt_buf_free_skb() function frees "pkt" so it leads to a
      static checker warning:
      
          drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:1974 brcmf_sdio_readframes()
          error: dereferencing freed memory 'pkt'
      
      It looks like there was supposed to be a continue after we free "pkt".
      
      Fixes: 4754fcee ("brcmfmac: streamline SDIO read frame routine")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ead1cee8
    • Peter Zijlstra's avatar
      cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order · b9dc4d61
      Peter Zijlstra authored
      [ Upstream commit 45178ac0 ]
      
      Paul reported a very sporadic, rcutorture induced, workqueue failure.
      When the planets align, the workqueue rescuer's self-migrate fails and
      then triggers a WARN for running a work on the wrong CPU.
      
      Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
      could be ignored! When stopper->enabled is false, stop_machine will
      insta complete the work, without actually doing the work. Worse, it
      will not WARN about this (we really should fix this).
      
      It turns out there is a small window where a freshly online'ed CPU is
      marked 'online' but doesn't yet have the stopper task running:
      
      	BP				AP
      
      	bringup_cpu()
      	  __cpu_up(cpu, idle)	 -->	start_secondary()
      					...
      					cpu_startup_entry()
      	  bringup_wait_for_ap()
      	    wait_for_ap_thread() <--	  cpuhp_online_idle()
      					  while (1)
      					    do_idle()
      
      					... available to run kthreads ...
      
      	    stop_machine_unpark()
      	      stopper->enable = true;
      
      Close this by moving the stop_machine_unpark() into
      cpuhp_online_idle(), such that the stopper thread is ready before we
      start the idle loop and schedule.
      Reported-by: default avatar"Paul E. McKenney" <paulmck@kernel.org>
      Debugged-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatar"Paul E. McKenney" <paulmck@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b9dc4d61
    • Paul Kocialkowski's avatar
      drm/gma500: Fixup fbdev stolen size usage evaluation · 5d358e7e
      Paul Kocialkowski authored
      [ Upstream commit fd1a5e52 ]
      
      psbfb_probe performs an evaluation of the required size from the stolen
      GTT memory, but gets it wrong in two distinct ways:
      - The resulting size must be page-size-aligned;
      - The size to allocate is derived from the surface dimensions, not the fb
        dimensions.
      
      When two connectors are connected with different modes, the smallest will
      be stored in the fb dimensions, but the size that needs to be allocated must
      match the largest (surface) dimensions. This is what is used in the actual
      allocation code.
      
      Fix this by correcting the evaluation to conform to the two points above.
      It allows correctly switching to 16bpp when one connector is e.g. 1920x1080
      and the other is 1024x768.
      Signed-off-by: default avatarPaul Kocialkowski <paul.kocialkowski@bootlin.com>
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191107153048.843881-1-paul.kocialkowski@bootlin.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      5d358e7e
    • Sean Christopherson's avatar
      KVM: nVMX: Use correct root level for nested EPT shadow page tables · 2130de7d
      Sean Christopherson authored
      [ Upstream commit 148d735e ]
      
      Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
      currently also hardcodes the page walk level for nested EPT to be 4
      levels.  The L2 guest is all but guaranteed to soft hang on its first
      instruction when L1 is using EPT, as KVM will construct 4-level page
      tables and then tell hardware to use 5-level page tables.
      
      Fixes: 855feb67 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2130de7d
    • Sasha Levin's avatar
      Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs" · 9c270ce3
      Sasha Levin authored
      This reverts commit 57211b73.
      
      This patch isn't needed on 4.19 and older.
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9c270ce3
    • Sasha Levin's avatar
      249387d7
    • Davide Caratti's avatar
      net/sched: flower: add missing validation of TCA_FLOWER_FLAGS · e2eb6f22
      Davide Caratti authored
      [ Upstream commit e2debf08 ]
      
      unlike other classifiers that can be offloaded (i.e. users can set flags
      like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
      netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
      to fl_policy.
      
      Fixes: 5b33f488 ("net/flower: Introduce hardware offload support")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2eb6f22
    • Davide Caratti's avatar
      net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS · 6752ae60
      Davide Caratti authored
      [ Upstream commit 1afa3cc9 ]
      
      unlike other classifiers that can be offloaded (i.e. users can set flags
      like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
      of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
      entry to mall_policy.
      
      Fixes: b87f7936 ("net/sched: Add match-all classifier hw offloading.")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6752ae60
    • Per Forlin's avatar
      net: dsa: tag_qca: Make sure there is headroom for tag · d1e0f10e
      Per Forlin authored
      [ Upstream commit 04fb9124 ]
      
      Passing tag size to skb_cow_head will make sure
      there is enough headroom for the tag data.
      This change does not introduce any overhead in case there
      is already available headroom for tag.
      Signed-off-by: default avatarPer Forlin <perfn@axis.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1e0f10e
    • Eric Dumazet's avatar
      net/smc: fix leak of kernel memory to user space · 421ab411
      Eric Dumazet authored
      [ Upstream commit 457fed77 ]
      
      As nlmsg_put() does not clear the memory that is reserved,
      it this the caller responsability to make sure all of this
      memory will be written, in order to not reveal prior content.
      
      While we are at it, we can provide the socket cookie even
      if clsock is not set.
      
      syzbot reported :
      
      BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
      BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
      BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline]
      BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
      BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
      BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
      BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
      BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
      CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
       __fswab32 include/uapi/linux/swab.h:59 [inline]
       __swab32p include/uapi/linux/swab.h:179 [inline]
       __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
       get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
       ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
       ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
       bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_kmalloc_large+0x73/0xc0 mm/kmsan/kmsan_hooks.c:128
       kmalloc_large_node_hook mm/slub.c:1406 [inline]
       kmalloc_large_node+0x282/0x2c0 mm/slub.c:3841
       __kmalloc_node_track_caller+0x44b/0x1200 mm/slub.c:4368
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       netlink_dump+0x44b/0x1ab0 net/netlink/af_netlink.c:2224
       __netlink_dump_start+0xbb2/0xcf0 net/netlink/af_netlink.c:2352
       netlink_dump_start include/linux/netlink.h:233 [inline]
       smc_diag_handler_dump+0x2ba/0x300 net/smc/smc_diag.c:242
       sock_diag_rcv_msg+0x211/0x610 net/core/sock_diag.c:256
       netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
       sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg net/socket.c:659 [inline]
       kernel_sendmsg+0x433/0x440 net/socket.c:679
       sock_no_sendpage+0x235/0x300 net/core/sock.c:2740
       kernel_sendpage net/socket.c:3776 [inline]
       sock_sendpage+0x1e1/0x2c0 net/socket.c:937
       pipe_to_sendpage+0x38c/0x4c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x539/0xed0 fs/splice.c:636
       splice_from_pipe fs/splice.c:671 [inline]
       generic_splice_sendpage+0x1d5/0x2d0 fs/splice.c:844
       do_splice_from fs/splice.c:863 [inline]
       do_splice fs/splice.c:1170 [inline]
       __do_sys_splice fs/splice.c:1447 [inline]
       __se_sys_splice+0x2380/0x3350 fs/splice.c:1427
       __x64_sys_splice+0x6e/0x90 fs/splice.c:1427
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f16a7dd5 ("smc: netlink interface for SMC sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      421ab411
    • Firo Yang's avatar
      enic: prevent waking up stopped tx queues over watchdog reset · 150f8c56
      Firo Yang authored
      [ Upstream commit 0f905225 ]
      
      Recent months, our customer reported several kernel crashes all
      preceding with following message:
      NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
      Error message of one of those crashes:
      BUG: unable to handle kernel paging request at ffffffffa007e090
      
      After analyzing severl vmcores, I found that most of crashes are
      caused by memory corruption. And all the corrupted memory areas
      are overwritten by data of network packets. Moreover, I also found
      that the tx queues were enabled over watchdog reset.
      
      After going through the source code, I found that in enic_stop(),
      the tx queues stopped by netif_tx_disable() could be woken up over
      a small time window between netif_tx_disable() and the
      napi_disable() by the following code path:
      napi_poll->
        enic_poll_msix_wq->
           vnic_cq_service->
              enic_wq_service->
                 netif_wake_subqueue(enic->netdev, q_number)->
                    test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
      In turn, upper netowrk stack could queue skb to ENIC NIC though
      enic_hard_start_xmit(). And this might introduce some race condition.
      
      Our customer comfirmed that this kind of kernel crash doesn't occur over
      90 days since they applied this patch.
      Signed-off-by: default avatarFiro Yang <firo.yang@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      150f8c56
    • Toke Høiland-Jørgensen's avatar
      core: Don't skip generic XDP program execution for cloned SKBs · ce754a31
      Toke Høiland-Jørgensen authored
      [ Upstream commit ad1e03b2 ]
      
      The current generic XDP handler skips execution of XDP programs entirely if
      an SKB is marked as cloned. This leads to some surprising behaviour, as
      packets can end up being cloned in various ways, which will make an XDP
      program not see all the traffic on an interface.
      
      This was discovered by a simple test case where an XDP program that always
      returns XDP_DROP is installed on a veth device. When combining this with
      the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
      side, SKBs reliably end up in the cloned state, causing them to be passed
      through to the receiving interface instead of being dropped. A minimal
      reproducer script for this is included below.
      
      This patch fixed the issue by simply triggering the existing linearisation
      code for cloned SKBs instead of skipping the XDP program execution. This
      behaviour is in line with the behaviour of the native XDP implementation
      for the veth driver, which will reallocate and copy the SKB data if the SKB
      is marked as shared.
      
      Reproducer Python script (requires BCC and Scapy):
      
      from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
      from bcc import BPF
      import time, sys, subprocess, shlex
      
      SKB_MODE = (1 << 1)
      DRV_MODE = (1 << 2)
      PYTHON=sys.executable
      
      def client():
          time.sleep(2)
          # Sniffing on the sender causes skb_cloned() to be set
          s = AsyncSniffer()
          s.start()
      
          for p in range(10):
              sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"),
                    verbose=False)
              time.sleep(0.1)
      
          s.stop()
          return 0
      
      def server(mode):
          prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}")
          func = prog.load_func("dummy_drop", BPF.XDP)
          prog.attach_xdp("a_to_b", func, mode)
      
          time.sleep(1)
      
          s = sniff(iface="a_to_b", count=10, timeout=15)
          if len(s):
              print(f"Got {len(s)} packets - should have gotten 0")
              return 1
          else:
              print("Got no packets - as expected")
              return 0
      
      if len(sys.argv) < 2:
          print(f"Usage: {sys.argv[0]} <skb|drv>")
          sys.exit(1)
      
      if sys.argv[1] == "client":
          sys.exit(client())
      elif sys.argv[1] == "server":
          mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
          sys.exit(server(mode))
      else:
          try:
              mode = sys.argv[1]
              if mode not in ('skb', 'drv'):
                  print(f"Usage: {sys.argv[0]} <skb|drv>")
                  sys.exit(1)
              print(f"Running in {mode} mode")
      
              for cmd in [
                      'ip netns add netns_a',
                      'ip netns add netns_b',
                      'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
                      # Disable ipv6 to make sure there's no address autoconf traffic
                      'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
                      'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
                      'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
                      'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
                      'ip -n netns_a link set dev a_to_b up',
                      'ip -n netns_b link set dev b_to_a up']:
                  subprocess.check_call(shlex.split(cmd))
      
              server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
              client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))
      
              client.wait()
              server.wait()
              sys.exit(server.returncode)
      
          finally:
              subprocess.run(shlex.split("ip netns delete netns_a"))
              subprocess.run(shlex.split("ip netns delete netns_b"))
      
      Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices")
      Reported-by: default avatarStepan Horacek <shoracek@redhat.com>
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce754a31
  2. 19 Feb, 2020 5 commits