1. 20 Jun, 2017 12 commits
    • Benjamin Marzinski's avatar
      dm space map metadata: fix 'struct sm_metadata' leak on failed create · 82559603
      Benjamin Marzinski authored
      commit 314c25c5 upstream.
      
      In dm_sm_metadata_create() we temporarily change the dm_space_map
      operations from 'ops' (whose .destroy function deallocates the
      sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).
      
      If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
      sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
      dm_sm_destroy() with the intention of freeing the sm_metadata, but it
      doesn't (because the dm_space_map operations is still set to
      'bootstrap_ops').
      
      Fix this by setting the dm_space_map operations back to 'ops' if
      dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
      
      [js] no nr_blocks test in 3.12 yet
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      82559603
    • Ondrej Kozina's avatar
      dm crypt: mark key as invalid until properly loaded · 3044e195
      Ondrej Kozina authored
      commit 265e9098 upstream.
      
      In crypt_set_key(), if a failure occurs while replacing the old key
      (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
      set.  Otherwise, the crypto layer would have an invalid key that still
      has DM_CRYPT_KEY_VALID flag set.
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3044e195
    • Dan Williams's avatar
      block: fix del_gendisk() vs blkdev_ioctl crash · 1dd3d3e6
      Dan Williams authored
      commit ac34f15e upstream.
      
      When tearing down a block device early in its lifetime, userspace may
      still be performing discovery actions like blkdev_ioctl() to re-read
      partitions.
      
      The nvdimm_revalidate_disk() implementation depends on
      disk->driverfs_dev to be valid at entry.  However, it is set to NULL in
      del_gendisk() and fatally this is happening *before* the disk device is
      deleted from userspace view.
      
      There's no reason for del_gendisk() to clear ->driverfs_dev.  That
      device is the parent of the disk.  It is guaranteed to not be freed
      until the disk, as a child, drops its ->parent reference.
      
      We could also fix this issue locally in nvdimm_revalidate_disk() by
      using disk_to_dev(disk)->parent, but lets fix it globally since
      ->driverfs_dev follows the lifetime of the parent.  Longer term we
      should probably just add a @parent parameter to add_disk(), and stop
      carrying this pointer in the gendisk.
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
       CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
       [..]
       Call Trace:
        [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
        [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
        [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
        [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
        [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
        [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
        [<ffffffff812916dd>] block_ioctl+0x3d/0x50
        [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
        [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
        [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff81264f09>] SyS_ioctl+0x79/0x90
        [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Reported-by: default avatarRobert Hu <robert.hu@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1dd3d3e6
    • Mauricio Faria de Oliveira's avatar
      block: allow WRITE_SAME commands with the SG_IO ioctl · 5cb01741
      Mauricio Faria de Oliveira authored
      commit 25cdb645 upstream.
      
      The WRITE_SAME commands are not present in the blk_default_cmd_filter
      write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
      is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
      [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]
      
      The problem can be reproduced with the sg_write_same command
      
        # sg_write_same --num 1 --xferlen 512 /dev/sda
        #
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
          Write same: pass through os error: Operation not permitted
        #
      
      For comparison, the WRITE_VERIFY command does not observe this problem,
      since it is in that list:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda'
        #
      
      So, this patch adds the WRITE_SAME commands to the list, in order
      for the SG_IO ioctl to finish successfully:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
        #
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
      which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).
      
      In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
      which are translated to write-same calls in the guest kernel, and then into
      SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:
      
        [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
        [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
        [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
        [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00
        [...] blk_update_request: I/O error, dev sda, sector 17096824
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarBrahadambal Srinivasan <latha@linux.vnet.ibm.com>
      Reported-by: default avatarManjunatha H R <manjuhr1@in.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5cb01741
    • Omar Sandoval's avatar
      block: fix use-after-free in sys_ioprio_get() · 0f3a4aaa
      Omar Sandoval authored
      commit 8ba86821 upstream.
      
      get_task_ioprio() accesses the task->io_context without holding the task
      lock and thus can race with exit_io_context(), leading to a
      use-after-free. The reproducer below hits this within a few seconds on
      my 4-core QEMU VM:
      
      int main(int argc, char **argv)
      {
      	pid_t pid, child;
      	long nproc, i;
      
      	/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
      	syscall(SYS_ioprio_set, 1, 0, 0x6000);
      
      	nproc = sysconf(_SC_NPROCESSORS_ONLN);
      
      	for (i = 0; i < nproc; i++) {
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				pid = fork();
      				assert(pid != -1);
      				if (pid == 0) {
      					_exit(0);
      				} else {
      					child = wait(NULL);
      					assert(child == pid);
      				}
      			}
      		}
      
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      				syscall(SYS_ioprio_get, 2, 0);
      			}
      		}
      	}
      
      	for (;;) {
      		/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      		syscall(SYS_ioprio_get, 2, 0);
      	}
      
      	return 0;
      }
      
      This gets us KASAN dumps like this:
      
      [   35.526914] ==================================================================
      [   35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c
      [   35.530009] Read of size 2 by task ioprio-gpf/363
      [   35.530009] =============================================================================
      [   35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
      [   35.530009] -----------------------------------------------------------------------------
      
      [   35.530009] Disabling lock debugging due to kernel taint
      [   35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
      [   35.530009] 	___slab_alloc+0x55d/0x5a0
      [   35.530009] 	__slab_alloc.isra.20+0x2b/0x40
      [   35.530009] 	kmem_cache_alloc_node+0x84/0x200
      [   35.530009] 	create_task_io_context+0x2b/0x370
      [   35.530009] 	get_task_io_context+0x92/0xb0
      [   35.530009] 	copy_process.part.8+0x5029/0x5660
      [   35.530009] 	_do_fork+0x155/0x7e0
      [   35.530009] 	SyS_clone+0x19/0x20
      [   35.530009] 	do_syscall_64+0x195/0x3a0
      [   35.530009] 	return_from_SYSCALL_64+0x0/0x6a
      [   35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
      [   35.530009] 	__slab_free+0x27b/0x3d0
      [   35.530009] 	kmem_cache_free+0x1fb/0x220
      [   35.530009] 	put_io_context+0xe7/0x120
      [   35.530009] 	put_io_context_active+0x238/0x380
      [   35.530009] 	exit_io_context+0x66/0x80
      [   35.530009] 	do_exit+0x158e/0x2b90
      [   35.530009] 	do_group_exit+0xe5/0x2b0
      [   35.530009] 	SyS_exit_group+0x1d/0x20
      [   35.530009] 	entry_SYSCALL_64_fastpath+0x1a/0xa4
      [   35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
      [   35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
      [   35.530009] ==================================================================
      
      Fix it by grabbing the task lock while we poke at the io_context.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Acked-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0f3a4aaa
    • Daeho Jeong's avatar
      ext4: fix inode checksum calculation problem if i_extra_size is small · cd0d9254
      Daeho Jeong authored
      commit 05ac5aa1 upstream.
      
      We've fixed the race condition problem in calculating ext4 checksum
      value in commit b47820ed ("ext4: avoid modifying checksum fields
      directly during checksum veficationon"). However, by this change,
      when calculating the checksum value of inode whose i_extra_size is
      less than 4, we couldn't calculate the checksum value in a proper way.
      This problem was found and reported by Nix, Thank you.
      Reported-by: default avatarNix <nix@esperi.org.uk>
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarYoungjin Gil <youngjin.gil@samsung.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cd0d9254
    • Theodore Ts'o's avatar
      ext4: return EROFS if device is r/o and journal replay is needed · 48a5889b
      Theodore Ts'o authored
      commit 4753d8a2 upstream.
      
      If the file system requires journal recovery, and the device is
      read-ony, return EROFS to the mount system call.  This allows xfstests
      generic/050 to pass.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      48a5889b
    • Theodore Ts'o's avatar
      ext4: preserve the needs_recovery flag when the journal is aborted · 399562b6
      Theodore Ts'o authored
      commit 97abd7d4 upstream.
      
      If the journal is aborted, the needs_recovery feature flag should not
      be removed.  Otherwise, it's the journal might not get replayed and
      this could lead to more data getting lost.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      399562b6
    • Jan Kara's avatar
      ext4: trim allocation requests to group size · 98f58e05
      Jan Kara authored
      commit cd648b8a upstream.
      
      If filesystem groups are artifically small (using parameter -g to
      mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
      larger than a block group. Trim the request size to not confuse
      allocation code.
      Reported-by: default avatar"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      98f58e05
    • Theodore Ts'o's avatar
      ext4: fix fencepost in s_first_meta_bg validation · 77bd57e6
      Theodore Ts'o authored
      commit 2ba3e6e8 upstream.
      
      It is OK for s_first_meta_bg to be equal to the number of block group
      descriptor blocks.  (It rarely happens, but it shouldn't cause any
      problems.)
      
      https://bugzilla.kernel.org/show_bug.cgi?id=194567
      
      Fixes: 3a4b77cdSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      77bd57e6
    • Theodore Ts'o's avatar
      jbd2: don't leak modified metadata buffers on an aborted journal · 45f1a95e
      Theodore Ts'o authored
      commit e112666b upstream.
      
      If the journal has been aborted, we shouldn't mark the underlying
      buffer head as dirty, since that will cause the metadata block to get
      modified.  And if the journal has been aborted, we shouldn't allow
      this since it will almost certainly lead to a corrupted file system.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      45f1a95e
    • Eryu Guan's avatar
      ext4: validate s_first_meta_bg at mount time · 188b2ebb
      Eryu Guan authored
      commit 3a4b77cd upstream.
      
      Ralf Spenneberg reported that he hit a kernel crash when mounting a
      modified ext4 image. And it turns out that kernel crashed when
      calculating fs overhead (ext4_calculate_overhead()), this is because
      the image has very large s_first_meta_bg (debug code shows it's
      842150400), and ext4 overruns the memory in count_overhead() when
      setting bitmap buffer, which is PAGE_SIZE.
      
      ext4_calculate_overhead():
        buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
        blks = count_overhead(sb, i, buf);
      
      count_overhead():
        for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
                ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun
                count++;
        }
      
      This can be reproduced easily for me by this script:
      
        #!/bin/bash
        rm -f fs.img
        mkdir -p /mnt/ext4
        fallocate -l 16M fs.img
        mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
        debugfs -w -R "ssv first_meta_bg 842150400" fs.img
        mount -o loop fs.img /mnt/ext4
      
      Fix it by validating s_first_meta_bg first at mount time, and
      refusing to mount if its value exceeds the largest possible meta_bg
      number.
      
      [js] use EXT4_HAS_INCOMPAT_FEATURE instead of new
           ext4_has_feature_meta_bg
      Reported-by: default avatarRalf Spenneberg <ralf@os-t.de>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      188b2ebb
  2. 19 Jun, 2017 5 commits
    • Theodore Ts'o's avatar
      ext4: add sanity checking to count_overhead() · d61f4e22
      Theodore Ts'o authored
      commit c48ae41b upstream.
      
      The commit "ext4: sanity check the block and cluster size at mount
      time" should prevent any problems, but in case the superblock is
      modified while the file system is mounted, add an extra safety check
      to make sure we won't overrun the allocated buffer.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d61f4e22
    • Theodore Ts'o's avatar
      ext4: fix in-superblock mount options processing · bd652ad1
      Theodore Ts'o authored
      commit 5aee0f8a upstream.
      
      Fix a large number of problems with how we handle mount options in the
      superblock.  For one, if the string in the superblock is long enough
      that it is not null terminated, we could run off the end of the string
      and try to interpret superblocks fields as characters.  It's unlikely
      this will cause a security problem, but it could result in an invalid
      parse.  Also, parse_options is destructive to the string, so in some
      cases if there is a comma-separated string, it would be modified in
      the superblock.  (Fortunately it only happens on file systems with a
      1k block size.)
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bd652ad1
    • Theodore Ts'o's avatar
      ext4: use more strict checks for inodes_per_block on mount · 408d8245
      Theodore Ts'o authored
      commit cd6bb35b upstream.
      
      Centralize the checks for inodes_per_block and be more strict to make
      sure the inodes_per_block_group can't end up being zero.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      408d8245
    • Liu Bo's avatar
      Btrfs: fix memory leak in reading btree blocks · de714a8a
      Liu Bo authored
      commit 2571e739 upstream.
      
      So we can read a btree block via readahead or intentional read,
      and we can end up with a memory leak when something happens as
      follows,
      1) readahead starts to read block A but does not wait for read
         completion,
      2) btree_readpage_end_io_hook finds that block A is corrupted,
         and it needs to clear all block A's pages' uptodate bit.
      3) meanwhile an intentional read kicks in and checks block A's
         pages' uptodate to decide which page needs to be read.
      4) when some pages have the uptodate bit during 3)'s check so
         3) doesn't count them for eb->io_pages, but they are later
         cleared by 2) so we has to readpage on the page, we get
         the wrong eb->io_pages which results in a memory leak of
         this block.
      
      This fixes the problem by firstly getting all pages's locking and
      then checking pages' uptodate bit.
      
         t1(readahead)                              t2(readahead endio)                                       t3(the following read)
      read_extent_buffer_pages                    end_bio_extent_readpage
        for pg in eb:                                for page 0,1,2 in eb:
            if pg is uptodate:                           btree_readpage_end_io_hook(pg)
                num_reads++                              if uptodate:
        eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
        for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages
             if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
                 __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
                                                             clear_extent_buffer_uptodate(eb)                         num_reads++
                                                                 for pg in eb:                                eb->io_pages = num_reads
                                                                     ClearPageUptodate(page)  _______________
                                                                                                              for pg in eb:
                                                                                                                  if pg is NOT uptodate:
                                                                                                                      __extent_read_full_page(pg)
      
      So t3's eb->io_pages is not consistent with the number of pages it's reading,
      and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative
      number so that we're not able to free the eb.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      de714a8a
    • Jeff Mahoney's avatar
      Revert "Btrfs: don't delay inode ref updates during log, replay" · 0ee88216
      Jeff Mahoney authored
      commit 081fafdd upstream.
      
      This reverts commit 644d1071, upstream
      commit 6f896054.
      
      The original patch for mainline, 6f896054 (Btrfs: don't delay
      inode ref updates during log replay) lists 1d52c78a (Btrfs: try
      not to ENOSPC on log replay) as the only pre-3.18 dependency, but it
      also depends on 67de1176 (Btrfs: introduce the delayed inode ref
      deletion for the single link inode), which was introduced in 3.14
      and isn't in 3.12.y.
      
      The -stable commit added the check to btrfs_delayed_update_inode,
      which may look similar to btrfs_delayed_delete_inode_ref, but it's
      only superficial.  The tops of both functions handle typical
      delayed node boilerplate.  The upshot is that the patch is harmless
      since the caller already checks to see if we're doing log recovery,
      so we're not breaking anything.  It should be reverted because it
      makes it appear as if this issue was fixed for users who did
      backport 67de1176, when it is not.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0ee88216
  3. 15 Jun, 2017 1 commit
  4. 09 Jun, 2017 1 commit
  5. 07 Jun, 2017 21 commits
    • Willy Tarreau's avatar
      char: lp: fix possible integer overflow in lp_setup() · 66cb3240
      Willy Tarreau authored
      commit 3e21f4af upstream.
      
      The lp_setup() code doesn't apply any bounds checking when passing
      "lp=none", and only in this case, resulting in an overflow of the
      parport_nr[] array. All versions in Git history are affected.
      Reported-By: default avatarRoee Hay <roee.hay@hcl.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      66cb3240
    • Eric Dumazet's avatar
      dccp/tcp: do not inherit mc_list from parent · 2f954a62
      Eric Dumazet authored
      commit 657831ff upstream.
      
      syzkaller found a way to trigger double frees from ip_mc_drop_socket()
      
      It turns out that leave a copy of parent mc_list at accept() time,
      which is very bad.
      
      Very similar to commit 8b485ce6 ("tcp: do not inherit
      fastopen_req from parent")
      
      Initial report from Pray3r, completed by Andrey one.
      Thanks a lot to them !
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPray3r <pray3r.z@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2f954a62
    • Keno Fischer's avatar
      mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp · e64530fd
      Keno Fischer authored
      commit 8310d48b upstream.
      
      In commit 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from
      __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
      after a COW was resolved to setting the (newly introduced) FOLL_COW
      instead.  Simultaneously, the check in gup.c was updated to still allow
      writes with FOLL_FORCE set if FOLL_COW had also been set.
      
      However, a similar check in huge_memory.c was forgotten.  As a result,
      remote memory writes to ro regions of memory backed by transparent huge
      pages cause an infinite loop in the kernel (handle_mm_fault sets
      FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
      out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
      true.
      
      While in this state the process is stil SIGKILLable, but little else
      works (e.g.  no ptrace attach, no other signals).  This is easily
      reproduced with the following code (assuming thp are set to always):
      
          #include <assert.h>
          #include <fcntl.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <string.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/types.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          #define TEST_SIZE 5 * 1024 * 1024
      
          int main(void) {
            int status;
            pid_t child;
            int fd = open("/proc/self/mem", O_RDWR);
            void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                              MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
            assert(addr != MAP_FAILED);
            pid_t parent_pid = getpid();
            if ((child = fork()) == 0) {
              void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                                 MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
              assert(addr2 != MAP_FAILED);
              memset(addr2, 'a', TEST_SIZE);
              pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
              return 0;
            }
            assert(child == waitpid(child, &status, 0));
            assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
            return 0;
          }
      
      Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
      to the update in gup.c in the original commit.  The same pattern exists
      in follow_devmap_pmd.  However, we should not be able to reach that
      check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
      ever do.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.comSigned-off-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Drop change to follow_devmap_pmd()
       - pmd_dirty() is not available; check the page flags as in
         can_follow_write_pte()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [mhocko:
        This has been forward ported from the 3.2 stable tree.
        And fixed to return NULL.]
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e64530fd
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · 43fe8188
      Aleksa Sarai authored
      commit 613cc2b6 upstream.
      
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      43fe8188
    • Dave Jones's avatar
      ipv6: handle -EFAULT from skb_copy_bits · 8676954b
      Dave Jones authored
      commit a98f9175 upstream.
      
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8676954b
    • Alexander Popov's avatar
      tty: n_hdlc: get rid of racy n_hdlc.tbuf · 030e0b58
      Alexander Popov authored
      commit 82f2341c upstream.
      
      Currently N_HDLC line discipline uses a self-made singly linked list for
      data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
      an error.
      
      The commit be10eb75
      ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
      After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
      one data buffer to tx_free_buf_list twice. That causes double free in
      n_hdlc_release().
      
      Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
      in case of tx error put current data buffer after the head of tx_buf_list.
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      030e0b58
    • Jiri Slaby's avatar
      TTY: n_hdlc, fix lockdep false positive · a9511e79
      Jiri Slaby authored
      commit e9b736d8 upstream.
      
      The class of 4 n_hdls buf locks is the same because a single function
      n_hdlc_buf_list_init is used to init all the locks. But since
      flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
      n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
      emits a warning:
      =============================================
      [ INFO: possible recursive locking detected ]
      4.3.0-25.g91e30a7-default #1 Not tainted
      ---------------------------------------------
      a.out/1248 is trying to acquire lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
      
      but task is already holding lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->spinlock)->rlock);
        lock(&(&list->spinlock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by a.out/1248:
       #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
       #1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      ...
      Call Trace:
      ...
       [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
       [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
       [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
       [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
      ...
      
      Fix it by initializing the spin_locks separately. This removes also
      reduntand memset of a freshly kzallocated space.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a9511e79
    • David Hildenbrand's avatar
      KVM: kvm_io_bus_unregister_dev() should never fail · 211bcbcc
      David Hildenbrand authored
      commit 90db1043 upstream.
      
      No caller currently checks the return value of
      kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
      freeing their device. A stale reference will remain in the io_bus,
      getting at least used again, when the iobus gets teared down on
      kvm_destroy_vm() - leading to use after free errors.
      
      There is nothing the callers could do, except retrying over and over
      again.
      
      So let's simply remove the bus altogether, print an error and make
      sure no one can access this broken bus again (returning -ENOMEM on any
      attempt to access it).
      
      Fixes: e93f8a0f ("KVM: convert io_bus to SRCU")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [wt: no kvm_io_bus_read_cookie in 3.10, slightly different constructs]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      211bcbcc
    • Amos Kong's avatar
      kvm: exclude ioeventfd from counting kvm_io_range limit · d8abb8b8
      Amos Kong authored
      commit 6ea34c9b upstream.
      
      We can easily reach the 1000 limit by start VM with a couple
      hundred I/O devices (multifunction=on). The hardcode limit
      already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
      
      In userspace, we already have maximum file descriptor to
      limit ioeventfd count. But kvm_io_bus devices also are used
      for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
      by maximum file descriptor.
      
      Currently only ioeventfds take too much kvm_io_bus devices,
      so just exclude it from counting kvm_io_range limit.
      
      Also fixed one indent issue in kvm_host.h
      Signed-off-by: default avatarAmos Kong <akong@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      [wt: next patch depends on this one]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d8abb8b8
    • Peter Xu's avatar
      KVM: x86: clear bus pointer when destroyed · dd8db853
      Peter Xu authored
      commit df630b8c upstream.
      
      When releasing the bus, let's clear the bus pointers to mark it out. If
      any further device unregister happens on this bus, we know that we're
      done if we found the bus being released already.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd8db853
    • Marcelo Ricardo Leitner's avatar
      sctp: deny peeloff operation on asocs with threads sleeping on it · 620d73a9
      Marcelo Ricardo Leitner authored
      commit dfcb9f4f upstream.
      
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      620d73a9
    • Marcelo Ricardo Leitner's avatar
      sctp: avoid BUG_ON on sctp_wait_for_sndbuf · 1f7cb732
      Marcelo Ricardo Leitner authored
      commit 2dcab598 upstream.
      
      Alexander Popov reported that an application may trigger a BUG_ON in
      sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
      waiting on it to queue more data and meanwhile another thread peels off
      the association being used by the first thread.
      
      This patch replaces the BUG_ON call with a proper error handling. It
      will return -EPIPE to the original sendmsg call, similarly to what would
      have been done if the association wasn't found in the first place.
      Acked-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1f7cb732
    • Li RongQing's avatar
      ipv6: fix the use of pcpu_tstats in ip6_tunnel · abcfcd04
      Li RongQing authored
      commit abb6013c upstream.
      
      when read/write the 64bit data, the correct lock should be hold.
      
      Fixes: 87b6d218 ("tunnel: implement 64 bits statistics")
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      abcfcd04
    • Dan Carpenter's avatar
      ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() · e246c098
      Dan Carpenter authored
      commit 63117f09 upstream.
      
      Casting is a high precedence operation but "off" and "i" are in terms of
      bytes so we need to have some parenthesis here.
      
      Fixes: fbfa743a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e246c098
    • Eric Dumazet's avatar
      ipv6: fix ip6_tnl_parse_tlv_enc_lim() · 72bbf335
      Eric Dumazet authored
      commit fbfa743a upstream.
      
      This function suffers from multiple issues.
      
      First one is that pskb_may_pull() may reallocate skb->head,
      so the 'raw' pointer needs either to be reloaded or not used at all.
      
      Second issue is that NEXTHDR_DEST handling does not validate
      that the options are present in skb->data, so we might read
      garbage or access non existent memory.
      
      With help from Willem de Bruijn.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      72bbf335
    • Takashi Iwai's avatar
      xc2028: Fix use-after-free bug properly · 3bdb1571
      Takashi Iwai authored
      commit 22a1e778 upstream.
      
      The commit 8dfbcc43 ("[media] xc2028: avoid use after free") tried
      to address the reported use-after-free by clearing the reference.
      
      However, it's clearing the wrong pointer; it sets NULL to
      priv->ctrl.fname, but it's anyway overwritten by the next line
      memcpy(&priv->ctrl, p, sizeof(priv->ctrl)).
      
      OTOH, the actual code accessing the freed string is the strcmp() call
      with priv->fname:
      	if (!firmware_name[0] && p->fname &&
      	    priv->fname && strcmp(p->fname, priv->fname))
      		free_firmware(priv);
      
      where priv->fname points to the previous file name, and this was
      already freed by kfree().
      
      For fixing the bug properly, this patch does the following:
      
      - Keep the copy of firmware file name in only priv->fname,
        priv->ctrl.fname isn't changed;
      - The allocation is done only when the firmware gets loaded;
      - The kfree() is called in free_firmware() commonly
      
      Fixes: commit 8dfbcc43 ('[media] xc2028: avoid use after free')
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3bdb1571
    • Dan Carpenter's avatar
      xc2028: unlock on error in xc2028_set_config() · d9a854e5
      Dan Carpenter authored
      commit 210bd104 upstream.
      
      We have to unlock before returning -ENOMEM.
      
      Fixes: 8dfbcc43 ('[media] xc2028: avoid use after free')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d9a854e5
    • Mauro Carvalho Chehab's avatar
      xc2028: avoid use after free · bb4884f2
      Mauro Carvalho Chehab authored
      commit 8dfbcc43 upstream.
      
      If struct xc2028_config is passed without a firmware name,
      the following trouble may happen:
      
      [11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
      [11009.907491] ==================================================================
      [11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
      [11009.907992] Read of size 1 by task modprobe/28992
      [11009.907994] =============================================================================
      [11009.907997] BUG kmalloc-16 (Tainted: G        W      ): kasan: bad access detected
      [11009.907999] -----------------------------------------------------------------------------
      
      [11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
      [11009.908012] 	___slab_alloc+0x581/0x5b0
      [11009.908014] 	__slab_alloc+0x51/0x90
      [11009.908017] 	__kmalloc+0x27b/0x350
      [11009.908022] 	xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
      [11009.908026] 	usb_hcd_submit_urb+0x1e8/0x1c60
      [11009.908029] 	usb_submit_urb+0xb0e/0x1200
      [11009.908032] 	usb_serial_generic_write_start+0xb6/0x4c0
      [11009.908035] 	usb_serial_generic_write+0x92/0xc0
      [11009.908039] 	usb_console_write+0x38a/0x560
      [11009.908045] 	call_console_drivers.constprop.14+0x1ee/0x2c0
      [11009.908051] 	console_unlock+0x40d/0x900
      [11009.908056] 	vprintk_emit+0x4b4/0x830
      [11009.908061] 	vprintk_default+0x1f/0x30
      [11009.908064] 	printk+0x99/0xb5
      [11009.908067] 	kasan_report_error+0x10a/0x550
      [11009.908070] 	__asan_report_load1_noabort+0x43/0x50
      [11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
      [11009.908077] 	__slab_free+0x2ec/0x460
      [11009.908080] 	kfree+0x266/0x280
      [11009.908083] 	xc2028_set_config+0x90/0x630 [tuner_xc2028]
      [11009.908086] 	xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908090] 	em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908094] 	em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908098] 	em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908101] 	em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908105] 	em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908108] 	do_one_initcall+0x141/0x300
      [11009.908111] 	do_init_module+0x1d0/0x5ad
      [11009.908114] 	load_module+0x6666/0x9ba0
      [11009.908117] 	SyS_finit_module+0x108/0x130
      [11009.908120] 	entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x          (null) flags=0x2ffff8000004080
      [11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001
      
      [11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00  ....*....(......
      [11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff  ...........j....
      [11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G    B   W       4.5.0-rc1+ #43
      [11009.908140] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
      [11009.908142]  ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
      [11009.908148]  ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
      [11009.908153]  ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
      [11009.908158] Call Trace:
      [11009.908162]  [<ffffffff81932007>] dump_stack+0x4b/0x64
      [11009.908165]  [<ffffffff81556759>] print_trailer+0xf9/0x150
      [11009.908168]  [<ffffffff8155ccb4>] object_err+0x34/0x40
      [11009.908171]  [<ffffffff8155f260>] kasan_report_error+0x230/0x550
      [11009.908175]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908179]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908182]  [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
      [11009.908185]  [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
      [11009.908189]  [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
      [11009.908192]  [<ffffffff8194cea6>] strcmp+0x96/0xb0
      [11009.908196]  [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
      [11009.908200]  [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908203]  [<ffffffff8155ea78>] ? memset+0x28/0x30
      [11009.908206]  [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
      [11009.908211]  [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908215]  [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
      [11009.908219]  [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
      [11009.908222]  [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
      [11009.908226]  [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
      [11009.908230]  [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
      [11009.908233]  [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
      [11009.908238]  [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908242]  [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
      [11009.908245]  [<ffffffff8195222d>] ? string+0x14d/0x1f0
      [11009.908249]  [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
      [11009.908253]  [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
      [11009.908257]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908260]  [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
      [11009.908264]  [<ffffffff812e9846>] ? __module_address+0xb6/0x360
      [11009.908268]  [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
      [11009.908271]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908275]  [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
      [11009.908278]  [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
      [11009.908282]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908285]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908289]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908292]  [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
      [11009.908296]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908299]  [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
      [11009.908302]  [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
      [11009.908306]  [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
      [11009.908309]  [<ffffffff8159e708>] ? put_object+0x48/0x70
      [11009.908314]  [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908317]  [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908320]  [<ffffffffa0150000>] ? 0xffffffffa0150000
      [11009.908324]  [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908327]  [<ffffffff810021b1>] do_one_initcall+0x141/0x300
      [11009.908330]  [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
      [11009.908333]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908337]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908340]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908343]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908346]  [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
      [11009.908350]  [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
      [11009.908353]  [<ffffffff812f2626>] load_module+0x6666/0x9ba0
      [11009.908356]  [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
      [11009.908361]  [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
      [11009.908366]  [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
      [11009.908369]  [<ffffffff815bc940>] ? open_exec+0x50/0x50
      [11009.908374]  [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
      [11009.908377]  [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
      [11009.908379]  [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
      [11009.908383]  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
      [11009.908394]  [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908396] Memory state around the buggy address:
      [11009.908398]  ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908401]  ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
      [11009.908405]                                            ^
      [11009.908407]  ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908409]  ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908411] ==================================================================
      
      In order to avoid it, let's set the cached value of the firmware
      name to NULL after freeing it. While here, return an error if
      the memory allocation fails.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bb4884f2
    • Vitaly Kuznetsov's avatar
      Drivers: hv: avoid vfree() on crash · 9391c802
      Vitaly Kuznetsov authored
      commit a9f61ca7 upstream.
      
      When we crash from NMI context (e.g. after NMI injection from host when
      'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit
      
          kernel BUG at mm/vmalloc.c:1530!
      
      as vfree() is denied. While the issue could be solved with in_nmi() check
      instead I opted for skipping vfree on all sorts of crashes to reduce the
      amount of work which can cause consequent crashes. We don't really need to
      free anything on crash.
      
      [js] no tsc and kexec in 3.12 yet
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9391c802
    • Eric Dumazet's avatar
      can: Fix kernel panic at security_sock_rcv_skb · 801c8a02
      Eric Dumazet authored
      commit f1712c73 upstream.
      
      Zhang Yanmin reported crashes [1] and provided a patch adding a
      synchronize_rcu() call in can_rx_unregister()
      
      The main problem seems that the sockets themselves are not RCU
      protected.
      
      If CAN uses RCU for delivery, then sockets should be freed only after
      one RCU grace period.
      
      Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
      ease stable backports with the following fix instead.
      
      [1]
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
      
      Call Trace:
       <IRQ>
       [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
       [<ffffffff81d55771>] sk_filter+0x41/0x210
       [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
       [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
       [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
       [<ffffffff81f07af9>] can_receive+0xd9/0x120
       [<ffffffff81f07beb>] can_rcv+0xab/0x100
       [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
       [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
       [<ffffffff81d37f67>] process_backlog+0x127/0x280
       [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
       [<ffffffff810c88d4>] __do_softirq+0x184/0x440
       [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
       [<ffffffff810c8bed>] do_softirq+0x1d/0x20
       [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
       [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
       [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
       [<ffffffff810e3baf>] process_one_work+0x24f/0x670
       [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
       [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
       [<ffffffff810ebafc>] kthread+0x12c/0x150
       [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
      Reported-by: default avatarZhang Yanmin <yanmin.zhang@intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      801c8a02
    • Oliver O'Halloran's avatar
      mm/init: fix zone boundary creation · 414989d2
      Oliver O'Halloran authored
      commit 90cae1fe upstream.
      
      As a part of memory initialisation the architecture passes an array to
      free_area_init_nodes() which specifies the max PFN of each memory zone.
      This array is not necessarily monotonic (due to unused zones) so this
      array is parsed to build monotonic lists of the min and max PFN for each
      zone.  ZONE_MOVABLE is special cased here as its limits are managed by
      the mm subsystem rather than the architecture.  Unfortunately, this
      special casing is broken when ZONE_MOVABLE is the not the last zone in
      the zone list.  The core of the issue is:
      
      	if (i == ZONE_MOVABLE)
      		continue;
      	arch_zone_lowest_possible_pfn[i] =
      		arch_zone_highest_possible_pfn[i-1];
      
      As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone will
      be set to zero.  This patch fixes this bug by adding explicitly tracking
      where the next zone should start rather than relying on the contents
      arch_zone_highest_possible_pfn[].
      
      Thie is low priority.  To get bitten by this you need to enable a zone
      that appears after ZONE_MOVABLE in the zone_type enum.  As far as I can
      tell this means running a kernel with ZONE_DEVICE or ZONE_CMA enabled,
      so I can't see this affecting too many people.
      
      I only noticed this because I've been fiddling with ZONE_DEVICE on
      powerpc and 4.6 broke my test kernel.  This bug, in conjunction with the
      changes in Taku Izumi's kernelcore=mirror patch (d91749c1) and
      powerpc being the odd architecture which initialises max_zone_pfn[] to
      ~0ul instead of 0 caused all of system memory to be placed into
      ZONE_DEVICE at boot, followed a panic since device memory cannot be used
      for kernel allocations.  I've already submitted a patch to fix the
      powerpc specific bits, but I figured this should be fixed too.
      
      Link: http://lkml.kernel.org/r/1462435033-15601-1-git-send-email-oohall@gmail.comSigned-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      414989d2