1. 12 Nov, 2020 3 commits
  2. 28 Oct, 2020 1 commit
  3. 27 Oct, 2020 7 commits
    • Linus Walleij's avatar
      ARM: 9017/2: Enable KASan for ARM · 42101571
      Linus Walleij authored
      This patch enables the kernel address sanitizer for ARM. XIP_KERNEL
      has not been tested and is therefore not allowed for now.
      
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
      Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
      Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
      Signed-off-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      42101571
    • Linus Walleij's avatar
      ARM: 9016/2: Initialize the mapping of KASan shadow memory · 5615f69b
      Linus Walleij authored
      This patch initializes KASan shadow region's page table and memory.
      There are two stage for KASan initializing:
      
      1. At early boot stage the whole shadow region is mapped to just
         one physical page (kasan_zero_page). It is finished by the function
         kasan_early_init which is called by __mmap_switched(arch/arm/kernel/
         head-common.S)
      
      2. After the calling of paging_init, we use kasan_zero_page as zero
         shadow for some memory that KASan does not need to track, and we
         allocate a new shadow space for the other memory that KASan need to
         track. These issues are finished by the function kasan_init which is
         call by setup_arch.
      
      When using KASan we also need to increase the THREAD_SIZE_ORDER
      from 1 to 2 as the extra calls for shadow memory uses quite a bit
      of stack.
      
      As we need to make a temporary copy of the PGD when setting up
      shadow memory we create a helpful PGD_SIZE definition for both
      LPAE and non-LPAE setups.
      
      The KASan core code unconditionally calls pud_populate() so this
      needs to be changed from BUG() to do {} while (0) when building
      with KASan enabled.
      
      After the initial development by Andre Ryabinin several modifications
      have been made to this code:
      
      Abbott Liu <liuwenliang@huawei.com>
      - Add support ARM LPAE: If LPAE is enabled, KASan shadow region's
        mapping table need be copied in the pgd_alloc() function.
      - Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate,
        kasan_pgd_populate from .meminit.text section to .init.text section.
        Reported by Florian Fainelli <f.fainelli@gmail.com>
      
      Linus Walleij <linus.walleij@linaro.org>:
      - Drop the custom mainpulation of TTBR0 and just use
        cpu_switch_mm() to switch the pgd table.
      - Adopt to handle 4th level page tabel folding.
      - Rewrite the entire page directory and page entry initialization
        sequence to be recursive based on ARM64:s kasan_init.c.
      
      Ard Biesheuvel <ardb@kernel.org>:
      - Necessary underlying fixes.
      - Crucial bug fixes to the memory set-up code.
      Co-developed-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Co-developed-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Co-developed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
      Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
      Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
      Reported-by: default avatarRussell King - ARM Linux <rmk+kernel@armlinux.org.uk>
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      5615f69b
    • Linus Walleij's avatar
      ARM: 9015/2: Define the virtual space of KASan's shadow region · c12366ba
      Linus Walleij authored
      Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for
      the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB
      addressable by a 32bit architecture) out of the virtual address
      space to use as shadow memory for KASan as follows:
      
       +----+ 0xffffffff
       |    |
       |    | |-> Static kernel image (vmlinux) BSS and page table
       |    |/
       +----+ PAGE_OFFSET
       |    |
       |    | |->  Loadable kernel modules virtual address space area
       |    |/
       +----+ MODULES_VADDR = KASAN_SHADOW_END
       |    |
       |    | |-> The shadow area of kernel virtual address.
       |    |/
       +----+->  TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the
       |    |   shadow address of MODULES_VADDR
       |    | |
       |    | |
       |    | |-> The user space area in lowmem. The kernel address
       |    | |   sanitizer do not use this space, nor does it map it.
       |    | |
       |    | |
       |    | |
       |    | |
       |    |/
       ------ 0
      
      0 .. TASK_SIZE is the memory that can be used by shared
      userspace/kernelspace. It us used for userspace processes and for
      passing parameters and memory buffers in system calls etc. We do not
      need to shadow this area.
      
      KASAN_SHADOW_START:
       This value begins with the MODULE_VADDR's shadow address. It is the
       start of kernel virtual space. Since we have modules to load, we need
       to cover also that area with shadow memory so we can find memory
       bugs in modules.
      
      KASAN_SHADOW_END
       This value is the 0x100000000's shadow address: the mapping that would
       be after the end of the kernel memory at 0xffffffff. It is the end of
       kernel address sanitizer shadow area. It is also the start of the
       module area.
      
      KASAN_SHADOW_OFFSET:
       This value is used to map an address to the corresponding shadow
       address by the following formula:
      
         shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;
      
       As you would expect, >> 3 is equal to dividing by 8, meaning each
       byte in the shadow memory covers 8 bytes of kernel memory, so one
       bit shadow memory per byte of kernel memory is used.
      
       The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending
       on the VMSPLIT layout of the system: the kernel and userspace can
       split up lowmem in different ways according to needs, so we calculate
       the shadow offset depending on this.
      
      When kasan is enabled, the definition of TASK_SIZE is not an 8-bit
      rotated constant, so we need to modify the TASK_SIZE access code in the
      *.s file.
      
      The kernel and modules may use different amounts of memory,
      according to the VMSPLIT configuration, which in turn
      determines the PAGE_OFFSET.
      
      We use the following KASAN_SHADOW_OFFSETs depending on how the
      virtual memory is split up:
      
      - 0x1f000000 if we have 1G userspace / 3G kernelspace split:
        - The kernel address space is 3G (0xc0000000)
        - PAGE_OFFSET is then set to 0x40000000 so the kernel static
          image (vmlinux) uses addresses 0x40000000 .. 0xffffffff
        - On top of that we have the MODULES_VADDR which under
          the worst case (using ARM instructions) is
          PAGE_OFFSET - 16M (0x01000000) = 0x3f000000
          so the modules use addresses 0x3f000000 .. 0x3fffffff
        - So the addresses 0x3f000000 .. 0xffffffff need to be
          covered with shadow memory. That is 0xc1000000 bytes
          of memory.
        - 1/8 of that is needed for its shadow memory, so
          0x18200000 bytes of shadow memory is needed. We
          "steal" that from the remaining lowmem.
        - The KASAN_SHADOW_START becomes 0x26e00000, to
          KASAN_SHADOW_END at 0x3effffff.
        - Now we can calculate the KASAN_SHADOW_OFFSET for any
          kernel address as 0x3f000000 needs to map to the first
          byte of shadow memory and 0xffffffff needs to map to
          the last byte of shadow memory. Since:
          SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
          0x26e00000 = (0x3f000000 >> 3) + KASAN_SHADOW_OFFSET
          KASAN_SHADOW_OFFSET = 0x26e00000 - (0x3f000000 >> 3)
          KASAN_SHADOW_OFFSET = 0x26e00000 - 0x07e00000
          KASAN_SHADOW_OFFSET = 0x1f000000
      
      - 0x5f000000 if we have 2G userspace / 2G kernelspace split:
        - The kernel space is 2G (0x80000000)
        - PAGE_OFFSET is set to 0x80000000 so the kernel static
          image uses 0x80000000 .. 0xffffffff.
        - On top of that we have the MODULES_VADDR which under
          the worst case (using ARM instructions) is
          PAGE_OFFSET - 16M (0x01000000) = 0x7f000000
          so the modules use addresses 0x7f000000 .. 0x7fffffff
        - So the addresses 0x7f000000 .. 0xffffffff need to be
          covered with shadow memory. That is 0x81000000 bytes
          of memory.
        - 1/8 of that is needed for its shadow memory, so
          0x10200000 bytes of shadow memory is needed. We
          "steal" that from the remaining lowmem.
        - The KASAN_SHADOW_START becomes 0x6ee00000, to
          KASAN_SHADOW_END at 0x7effffff.
        - Now we can calculate the KASAN_SHADOW_OFFSET for any
          kernel address as 0x7f000000 needs to map to the first
          byte of shadow memory and 0xffffffff needs to map to
          the last byte of shadow memory. Since:
          SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
          0x6ee00000 = (0x7f000000 >> 3) + KASAN_SHADOW_OFFSET
          KASAN_SHADOW_OFFSET = 0x6ee00000 - (0x7f000000 >> 3)
          KASAN_SHADOW_OFFSET = 0x6ee00000 - 0x0fe00000
          KASAN_SHADOW_OFFSET = 0x5f000000
      
      - 0x9f000000 if we have 3G userspace / 1G kernelspace split,
        and this is the default split for ARM:
        - The kernel address space is 1GB (0x40000000)
        - PAGE_OFFSET is set to 0xc0000000 so the kernel static
          image uses 0xc0000000 .. 0xffffffff.
        - On top of that we have the MODULES_VADDR which under
          the worst case (using ARM instructions) is
          PAGE_OFFSET - 16M (0x01000000) = 0xbf000000
          so the modules use addresses 0xbf000000 .. 0xbfffffff
        - So the addresses 0xbf000000 .. 0xffffffff need to be
          covered with shadow memory. That is 0x41000000 bytes
          of memory.
        - 1/8 of that is needed for its shadow memory, so
          0x08200000 bytes of shadow memory is needed. We
          "steal" that from the remaining lowmem.
        - The KASAN_SHADOW_START becomes 0xb6e00000, to
          KASAN_SHADOW_END at 0xbfffffff.
        - Now we can calculate the KASAN_SHADOW_OFFSET for any
          kernel address as 0xbf000000 needs to map to the first
          byte of shadow memory and 0xffffffff needs to map to
          the last byte of shadow memory. Since:
          SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
          0xb6e00000 = (0xbf000000 >> 3) + KASAN_SHADOW_OFFSET
          KASAN_SHADOW_OFFSET = 0xb6e00000 - (0xbf000000 >> 3)
          KASAN_SHADOW_OFFSET = 0xb6e00000 - 0x17e00000
          KASAN_SHADOW_OFFSET = 0x9f000000
      
      - 0x8f000000 if we have 3G userspace / 1G kernelspace with
        full 1 GB low memory (VMSPLIT_3G_OPT):
        - The kernel address space is 1GB (0x40000000)
        - PAGE_OFFSET is set to 0xb0000000 so the kernel static
          image uses 0xb0000000 .. 0xffffffff.
        - On top of that we have the MODULES_VADDR which under
          the worst case (using ARM instructions) is
          PAGE_OFFSET - 16M (0x01000000) = 0xaf000000
          so the modules use addresses 0xaf000000 .. 0xaffffff
        - So the addresses 0xaf000000 .. 0xffffffff need to be
          covered with shadow memory. That is 0x51000000 bytes
          of memory.
        - 1/8 of that is needed for its shadow memory, so
          0x0a200000 bytes of shadow memory is needed. We
          "steal" that from the remaining lowmem.
        - The KASAN_SHADOW_START becomes 0xa4e00000, to
          KASAN_SHADOW_END at 0xaeffffff.
        - Now we can calculate the KASAN_SHADOW_OFFSET for any
          kernel address as 0xaf000000 needs to map to the first
          byte of shadow memory and 0xffffffff needs to map to
          the last byte of shadow memory. Since:
          SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
          0xa4e00000 = (0xaf000000 >> 3) + KASAN_SHADOW_OFFSET
          KASAN_SHADOW_OFFSET = 0xa4e00000 - (0xaf000000 >> 3)
          KASAN_SHADOW_OFFSET = 0xa4e00000 - 0x15e00000
          KASAN_SHADOW_OFFSET = 0x8f000000
      
      - The default value of 0xffffffff for KASAN_SHADOW_OFFSET
        is an error value. We should always match one of the
        above shadow offsets.
      
      When we do this, TASK_SIZE will sometimes get a bit odd values
      that will not fit into immediate mov assembly instructions.
      To account for this, we need to rewrite some assembly using
      TASK_SIZE like this:
      
      -       mov     r1, #TASK_SIZE
      +       ldr     r1, =TASK_SIZE
      
      or
      
      -       cmp     r4, #TASK_SIZE
      +       ldr     r0, =TASK_SIZE
      +       cmp     r4, r0
      
      this is done to avoid the immediate #TASK_SIZE that need to
      fit into a limited number of bits.
      
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
      Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
      Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
      Reported-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      c12366ba
    • Linus Walleij's avatar
      ARM: 9014/2: Replace string mem* functions for KASan · d6d51a96
      Linus Walleij authored
      Functions like memset()/memmove()/memcpy() do a lot of memory
      accesses.
      
      If a bad pointer is passed to one of these functions it is important
      to catch this. Compiler instrumentation cannot do this since these
      functions are written in assembly.
      
      KASan replaces these memory functions with instrumented variants.
      
      The original functions are declared as weak symbols so that
      the strong definitions in mm/kasan/kasan.c can replace them.
      
      The original functions have aliases with a '__' prefix in their
      name, so we can call the non-instrumented variant if needed.
      
      We must use __memcpy()/__memset() in place of memcpy()/memset()
      when we copy .data to RAM and when we clear .bss, because
      kasan_early_init cannot be called before the initialization of
      .data and .bss.
      
      For the kernel compression and EFI libstub's custom string
      libraries we need a special quirk: even if these are built
      without KASan enabled, they rely on the global headers for their
      custom string libraries, which means that e.g. memcpy()
      will be defined to __memcpy() and we get link failures.
      Since these implementations are written i C rather than
      assembly we use e.g. __alias(memcpy) to redirected any
      users back to the local implementation.
      
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
      Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
      Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
      Reported-by: default avatarRussell King - ARM Linux <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarAhmad Fatoum <a.fatoum@pengutronix.de>
      Signed-off-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      d6d51a96
    • Linus Walleij's avatar
      ARM: 9013/2: Disable KASan instrumentation for some code · d5d44e7e
      Linus Walleij authored
      Disable instrumentation for arch/arm/boot/compressed/*
      since that code is executed before the kernel has even
      set up its mappings and definately out of scope for
      KASan.
      
      Disable instrumentation of arch/arm/vdso/* because that code
      is not linked with the kernel image, so the KASan management
      code would fail to link.
      
      Disable instrumentation of arch/arm/mm/physaddr.c. See commit
      ec6d06ef ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
      for more details.
      
      Disable kasan check in the function unwind_pop_register because
      it does not matter that kasan checks failed when unwind_pop_register()
      reads the stack memory of a task.
      
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
      Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
      Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reported-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarAbbott Liu <liuwenliang@huawei.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      d5d44e7e
    • Ard Biesheuvel's avatar
      ARM: 9012/1: move device tree mapping out of linear region · 7a1be318
      Ard Biesheuvel authored
      On ARM, setting up the linear region is tricky, given the constraints
      around placement and alignment of the memblocks, and how the kernel
      itself as well as the DT are placed in physical memory.
      
      Let's simplify matters a bit, by moving the device tree mapping to the
      top of the address space, right between the end of the vmalloc region
      and the start of the the fixmap region, and create a read-only mapping
      for it that is independent of the size of the linear region, and how it
      is organized.
      
      Since this region was formerly used as a guard region, which will now be
      populated fully on LPAE builds by this read-only mapping (which will
      still be able to function as a guard region for stray writes), bump the
      start of the [underutilized] fixmap region by 512 KB as well, to ensure
      that there is always a proper guard region here. Doing so still leaves
      ample room for the fixmap space, even with NR_CPUS set to its maximum
      value of 32.
      Tested-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      7a1be318
    • Ard Biesheuvel's avatar
      ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address · e9a2f8b5
      Ard Biesheuvel authored
      Before moving the DT mapping out of the linear region, let's prepare
      for this change by removing all the phys-to-virt translations of the
      __atags_pointer variable, and perform this translation only once at
      setup time.
      Tested-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      e9a2f8b5
  4. 25 Oct, 2020 17 commits
  5. 24 Oct, 2020 12 commits
    • Linus Torvalds's avatar
      Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block · d7691390
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request from Christoph
           - rdma error handling fixes (Chao Leng)
           - fc error handling and reconnect fixes (James Smart)
           - fix the qid displace when tracing ioctl command (Keith Busch)
           - don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
           - fix MTDT for passthru (Logan Gunthorpe)
           - blacklist Write Same on more devices (Kai-Heng Feng)
           - fix an uninitialized work struct (zhenwei pi)"
      
       - lightnvm out-of-bounds fix (Colin)
      
       - SG allocation leak fix (Doug)
      
       - rnbd fixes (Gioh, Guoqing, Jack)
      
       - zone error translation fixes (Keith)
      
       - kerneldoc markup fix (Mauro)
      
       - zram lockdep fix (Peter)
      
       - Kill unused io_context members (Yufen)
      
       - NUMA memory allocation cleanup (Xianting)
      
       - NBD config wakeup fix (Xiubo)
      
      * tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
        block: blk-mq: fix a kernel-doc markup
        nvme-fc: shorten reconnect delay if possible for FC
        nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
        nvme-fc: fix error loop in create_hw_io_queues
        nvme-fc: fix io timeout to abort I/O
        null_blk: use zone status for max active/open
        nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
        nvmet: cleanup nvmet_passthru_map_sg()
        nvmet: limit passthru MTDS by BIO_MAX_PAGES
        nvmet: fix uninitialized work for zero kato
        nvme-pci: disable Write Zeroes on Sandisk Skyhawk
        nvme: use queuedata for nvme_req_qid
        nvme-rdma: fix crash due to incorrect cqe
        nvme-rdma: fix crash when connect rejected
        block: remove unused members for io_context
        blk-mq: remove the calling of local_memory_node()
        zram: Fix __zram_bvec_{read,write}() locking order
        skd_main: remove unused including <linux/version.h>
        sgl_alloc_order: fix memory leak
        lightnvm: fix out-of-bounds write to array devices->info[]
        ...
      d7691390
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block · af004187
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - fsize was missed in previous unification of work flags
      
       - Few fixes cleaning up the flags unification creds cases (Pavel)
      
       - Fix NUMA affinities for completely unplugged/replugged node for io-wq
      
       - Two fallout fixes from the set_fs changes. One local to io_uring, one
         for the splice entry point that io_uring uses.
      
       - Linked timeout fixes (Pavel)
      
       - Removal of ->flush() ->files work-around that we don't need anymore
         with referenced files (Pavel)
      
       - Various cleanups (Pavel)
      
      * tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
        splice: change exported internal do_splice() helper to take kernel offset
        io_uring: make loop_rw_iter() use original user supplied pointers
        io_uring: remove req cancel in ->flush()
        io-wq: re-set NUMA node affinities if CPUs come online
        io_uring: don't reuse linked_timeout
        io_uring: unify fsize with def->work_flags
        io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
        io_uring: do poll's hash_node init in common code
        io_uring: inline io_poll_task_handler()
        io_uring: remove extra ->file check in poll prep
        io_uring: make cached_cq_overflow non atomic_t
        io_uring: inline io_fail_links()
        io_uring: kill ref get/drop in personality init
        io_uring: flags-based creds init in queue
      af004187
    • Linus Torvalds's avatar
      Merge tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block · cb6b2897
      Linus Torvalds authored
      Pull libata fixes from Jens Axboe:
       "Two minor libata fixes:
      
         - Fix a DMA boundary mask regression for sata_rcar (Geert)
      
         - kerneldoc markup fix (Mauro)"
      
      * tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
        ata: fix some kernel-doc markups
        ata: sata_rcar: Fix DMA boundary mask
      cb6b2897
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 0eac1102
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff all over the place (the largest group here is
        Christoph's stat cleanups)"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: remove KSTAT_QUERY_FLAGS
        fs: remove vfs_stat_set_lookup_flags
        fs: move vfs_fstatat out of line
        fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
        fs: remove vfs_statx_fd
        fs: omfs: use kmemdup() rather than kmalloc+memcpy
        [PATCH] reduce boilerplate in fsid handling
        fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
        selftests: mount: add nosymfollow tests
        Add a "nosymfollow" mount option.
      0eac1102
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping · 1b307ac8
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
      
       - document the new dma_{alloc,free}_pages() API
      
       - two fixups for the dma-mapping.h split
      
      * tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: document dma_{alloc,free}_pages
        dma-mapping: move more functions to dma-map-ops.h
        ARM/sa1111: add a missing include of dma-map-ops.h
      1b307ac8
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 9bf8d8bc
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Two fixes for this merge window, and an unrelated bugfix for a host
        hang"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: ioapic: break infinite recursion on lazy EOI
        KVM: vmx: rename pi_init to avoid conflict with paride
        KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
      9bf8d8bc
    • Linus Torvalds's avatar
      Merge tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c51ae124
      Linus Torvalds authored
      Pull x86 SEV-ES fixes from Borislav Petkov:
       "Three fixes to SEV-ES to correct setting up the new early pagetable on
        5-level paging machines, to always map boot_params and the kernel
        cmdline, and disable stack protector for ../compressed/head{32,64}.c.
        (Arvind Sankar)"
      
      * tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/64: Explicitly map boot_params and command line
        x86/head/64: Disable stack protection for head$(BITS).o
        x86/boot/64: Initialize 5-level paging variables earlier
      c51ae124
    • Willy Tarreau's avatar
      random32: add a selftest for the prandom32 code · c6e169bc
      Willy Tarreau authored
      Given that this code is new, let's add a selftest for it as well.
      It doesn't rely on fixed sets, instead it picks 1024 numbers and
      verifies that they're not more correlated than desired.
      
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      Cc: George Spelvin <lkml@sdf.org>
      Cc: Amit Klein <aksecurity@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c6e169bc
    • Willy Tarreau's avatar
      random32: add noise from network and scheduling activity · 3744741a
      Willy Tarreau authored
      With the removal of the interrupt perturbations in previous random32
      change (random32: make prandom_u32() output unpredictable), the PRNG
      has become 100% deterministic again. While SipHash is expected to be
      way more robust against brute force than the previous Tausworthe LFSR,
      there's still the risk that whoever has even one temporary access to
      the PRNG's internal state is able to predict all subsequent draws till
      the next reseed (roughly every minute). This may happen through a side
      channel attack or any data leak.
      
      This patch restores the spirit of commit f227e3ec ("random32: update
      the net random state on interrupt and activity") in that it will perturb
      the internal PRNG's statee using externally collected noise, except that
      it will not pick that noise from the random pool's bits nor upon
      interrupt, but will rather combine a few elements along the Tx path
      that are collectively hard to predict, such as dev, skb and txq
      pointers, packet length and jiffies values. These ones are combined
      using a single round of SipHash into a single long variable that is
      mixed with the net_rand_state upon each invocation.
      
      The operation was inlined because it produces very small and efficient
      code, typically 3 xor, 2 add and 2 rol. The performance was measured
      to be the same (even very slightly better) than before the switch to
      SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
      (i40e), the connection rate dropped from 556k/s to 555k/s while the
      SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
      
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      Cc: George Spelvin <lkml@sdf.org>
      Cc: Amit Klein <aksecurity@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3744741a
    • George Spelvin's avatar
      random32: make prandom_u32() output unpredictable · c51f8f88
      George Spelvin authored
      Non-cryptographic PRNGs may have great statistical properties, but
      are usually trivially predictable to someone who knows the algorithm,
      given a small sample of their output.  An LFSR like prandom_u32() is
      particularly simple, even if the sample is widely scattered bits.
      
      It turns out the network stack uses prandom_u32() for some things like
      random port numbers which it would prefer are *not* trivially predictable.
      Predictability led to a practical DNS spoofing attack.  Oops.
      
      This patch replaces the LFSR with a homebrew cryptographic PRNG based
      on the SipHash round function, which is in turn seeded with 128 bits
      of strong random key.  (The authors of SipHash have *not* been consulted
      about this abuse of their algorithm.)  Speed is prioritized over security;
      attacks are rare, while performance is always wanted.
      
      Replacing all callers of prandom_u32() is the quick fix.
      Whether to reinstate a weaker PRNG for uses which can tolerate it
      is an open question.
      
      Commit f227e3ec ("random32: update the net random state on interrupt
      and activity") was an earlier attempt at a solution.  This patch replaces
      it.
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Fixes: f227e3ec ("random32: update the net random state on interrupt and activity")
      Signed-off-by: default avatarGeorge Spelvin <lkml@sdf.org>
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      [ willy: partial reversal of f227e3ec; moved SIPROUND definitions
        to prandom.h for later use; merged George's prandom_seed() proposal;
        inlined siprand_u32(); replaced the net_rand_state[] array with 4
        members to fix a build issue; cosmetic cleanups to make checkpatch
        happy; fixed RANDOM32_SELFTEST build ]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c51f8f88
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b6f96e75
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - A fix for undetected data corruption on Power9 Nimbus <= DD2.1 in the
         emulation of VSX loads. The affected CPUs were not widely available.
      
       - Two fixes for machine check handling in guests under PowerVM.
      
       - A fix for our recent changes to SMP setup, when
         CONFIG_CPUMASK_OFFSTACK=y.
      
       - Three fixes for races in the handling of some of our powernv sysfs
         attributes.
      
       - One change to remove TM from the set of Power10 CPU features.
      
       - A couple of other minor fixes.
      
      Thanks to: Aneesh Kumar K.V, Christophe Leroy, Ganesh Goudar, Jordan
      Niethe, Mahesh Salgaonkar, Michael Neuling, Oliver O'Halloran, Qian Cai,
      Srikar Dronamraju, Vasant Hegde.
      
      * tag 'powerpc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pseries: Avoid using addr_to_pfn in real mode
        powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9
        powerpc/eeh: Fix eeh_dev_check_failure() for PE#0
        powerpc/64s: Remove TM from Power10 features
        selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround
        powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
        powerpc/powernv/dump: Handle multiple writes to ack attribute
        powerpc/powernv/dump: Fix race while processing OPAL dump
        powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
        powerpc/smp: Remove unnecessary variable
        powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash
        powerpc/opal_elog: Handle multiple writes to ack attribute
      b6f96e75
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 0593c1b4
      Linus Torvalds authored
      Pull more RISC-V updates from Palmer Dabbelt:
       "Just a single patch set: the remainder of Christoph's work to remove
        set_fs, including the RISC-V portion"
      
      * tag 'riscv-for-linus-5.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: remove address space overrides using set_fs()
        riscv: implement __get_kernel_nofault and __put_user_nofault
        riscv: refactor __get_user and __put_user
        riscv: use memcpy based uaccess for nommu again
        asm-generic: make the set_fs implementation optional
        asm-generic: add nommu implementations of __{get,put}_kernel_nofault
        asm-generic: improve the nommu {get,put}_user handling
        uaccess: provide a generic TASK_SIZE_MAX definition
      0593c1b4