1. 26 Jan, 2017 34 commits
    • Ondrej Kozina's avatar
      dm crypt: mark key as invalid until properly loaded · 1a131486
      Ondrej Kozina authored
      commit 265e9098 upstream.
      
      In crypt_set_key(), if a failure occurs while replacing the old key
      (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
      set.  Otherwise, the crypto layer would have an invalid key that still
      has DM_CRYPT_KEY_VALID flag set.
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1a131486
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · aae555fb
      Aleksa Sarai authored
      commit 613cc2b6 upstream.
      
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      aae555fb
    • Shaohua Li's avatar
      mm/vmscan.c: set correct defer count for shrinker · 9ba6fb6a
      Shaohua Li authored
      commit 5f33a080 upstream.
      
      Our system uses significantly more slab memory with memcg enabled with
      the latest kernel.  With 3.10 kernel, slab uses 2G memory, while with
      4.6 kernel, 6G memory is used.  The shrinker has problem.  Let's see we
      have two memcg for one shrinker.  In do_shrink_slab:
      
      1. Check cg1.  nr_deferred = 0, assume total_scan = 700.  batch size
         is 1024, then no memory is freed.  nr_deferred = 700
      
      2. Check cg2.  nr_deferred = 700.  Assume freeable = 20, then
         total_scan = 10 or 40.  Let's assume it's 10.  No memory is freed.
         nr_deferred = 10.
      
      The deferred share of cg1 is lost in this case.  kswapd will free no
      memory even run above steps again and again.
      
      The fix makes sure one memcg's deferred share isn't lost.
      
      Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9ba6fb6a
    • Nicolai Stange's avatar
      f2fs: set ->owner for debugfs status file's file_operations · 5b0d12d3
      Nicolai Stange authored
      commit 05e6ea26 upstream.
      
      The struct file_operations instance serving the f2fs/status debugfs file
      lacks an initialization of its ->owner.
      
      This means that although that file might have been opened, the f2fs module
      can still get removed. Any further operation on that opened file, releasing
      included,  will cause accesses to unmapped memory.
      
      Indeed, Mike Marshall reported the following:
      
        BUG: unable to handle kernel paging request at ffffffffa0307430
        IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
        <...>
        Call Trace:
         [] __fput+0xdf/0x1d0
         [] ____fput+0xe/0x10
         [] task_work_run+0x8e/0xc0
         [] do_exit+0x2ae/0xae0
         [] ? __audit_syscall_entry+0xae/0x100
         [] ? syscall_trace_enter+0x1ca/0x310
         [] do_group_exit+0x44/0xc0
         [] SyS_exit_group+0x14/0x20
         [] do_syscall_64+0x61/0x150
         [] entry_SYSCALL64_slow_path+0x25/0x25
        <...>
        ---[ end trace f22ae883fa3ea6b8 ]---
        Fixing recursive fault but reboot is needed!
      
      Fix this by initializing the f2fs/status file_operations' ->owner with
      THIS_MODULE.
      
      This will allow debugfs to grab a reference to the f2fs module upon any
      open on that file, thus preventing it from getting removed.
      
      Fixes: 902829aa ("f2fs: move proc files to debugfs")
      Reported-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reported-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b0d12d3
    • Dan Carpenter's avatar
      ext4: return -ENOMEM instead of success · e832a16a
      Dan Carpenter authored
      commit 578620f4 upstream.
      
      We should set the error code if kzalloc() fails.
      
      Fixes: 67cf5b09 ("ext4: add the basic function for inline data support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e832a16a
    • Darrick J. Wong's avatar
      ext4: reject inodes with negative size · 30256224
      Darrick J. Wong authored
      commit 7e6e1ef4 upstream.
      
      Don't load an inode with a negative size; this causes integer overflow
      problems in the VFS.
      
      [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]
      
      js: use EIO for 3.12 instead of EFSCORRUPTED.
      
      Fixes: a48380f7 (ext4: rename i_dir_acl to i_size_high)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      30256224
    • Theodore Ts'o's avatar
      ext4: add sanity checking to count_overhead() · 2b47f735
      Theodore Ts'o authored
      commit c48ae41b upstream.
      
      The commit "ext4: sanity check the block and cluster size at mount
      time" should prevent any problems, but in case the superblock is
      modified while the file system is mounted, add an extra safety check
      to make sure we won't overrun the allocated buffer.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2b47f735
    • Theodore Ts'o's avatar
      ext4: fix in-superblock mount options processing · 399cf969
      Theodore Ts'o authored
      commit 5aee0f8a upstream.
      
      Fix a large number of problems with how we handle mount options in the
      superblock.  For one, if the string in the superblock is long enough
      that it is not null terminated, we could run off the end of the string
      and try to interpret superblocks fields as characters.  It's unlikely
      this will cause a security problem, but it could result in an invalid
      parse.  Also, parse_options is destructive to the string, so in some
      cases if there is a comma-separated string, it would be modified in
      the superblock.  (Fortunately it only happens on file systems with a
      1k block size.)
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      399cf969
    • Theodore Ts'o's avatar
      ext4: use more strict checks for inodes_per_block on mount · de4f994b
      Theodore Ts'o authored
      commit cd6bb35b upstream.
      
      Centralize the checks for inodes_per_block and be more strict to make
      sure the inodes_per_block_group can't end up being zero.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      de4f994b
    • Chandan Rajendra's avatar
      ext4: fix stack memory corruption with 64k block size · c4c0dbd2
      Chandan Rajendra authored
      commit 30a9d7af upstream.
      
      The number of 'counters' elements needed in 'struct sg' is
      super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
      elements in the array. This is insufficient for block sizes >= 32k. In
      such cases the memcpy operation performed in ext4_mb_seq_groups_show()
      would cause stack memory corruption.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c4c0dbd2
    • Chandan Rajendra's avatar
      ext4: fix mballoc breakage with 64k block size · a844e8a7
      Chandan Rajendra authored
      commit 69e43e8c upstream.
      
      'border' variable is set to a value of 2 times the block size of the
      underlying filesystem. With 64k block size, the resulting value won't
      fit into a 16-bit variable. Hence this commit changes the data type of
      'border' to 'unsigned int'.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a844e8a7
    • Alex Porosanu's avatar
      crypto: caam - fix AEAD givenc descriptors · a8e872c9
      Alex Porosanu authored
      commit d128af17 upstream.
      
      The AEAD givenc descriptor relies on moving the IV through the
      output FIFO and then back to the CTX2 for authentication. The
      SEQ FIFO STORE could be scheduled before the data can be
      read from OFIFO, especially since the SEQ FIFO LOAD needs
      to wait for the SEQ FIFO LOAD SKIP to finish first. The
      SKIP takes more time when the input is SG than when it's
      a contiguous buffer. If the SEQ FIFO LOAD is not scheduled
      before the STORE, the DECO will hang waiting for data
      to be available in the OFIFO so it can be transferred to C2.
      In order to overcome this, first force transfer of IV to C2
      by starting the "cryptlen" transfer first and then starting to
      store data from OFIFO to the output buffer.
      
      Fixes: 1acebad3 ("crypto: caam - faster aead implementation")
      Signed-off-by: default avatarAlex Porosanu <alexandru.porosanu@nxp.com>
      Signed-off-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a8e872c9
    • NeilBrown's avatar
      block_dev: don't test bdev->bd_contains when it is not stable · 94b87438
      NeilBrown authored
      commit bcc7f5b4 upstream.
      
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      94b87438
    • Liu Bo's avatar
      Btrfs: fix memory leak in reading btree blocks · 3d83da25
      Liu Bo authored
      commit 2571e739 upstream.
      
      So we can read a btree block via readahead or intentional read,
      and we can end up with a memory leak when something happens as
      follows,
      1) readahead starts to read block A but does not wait for read
         completion,
      2) btree_readpage_end_io_hook finds that block A is corrupted,
         and it needs to clear all block A's pages' uptodate bit.
      3) meanwhile an intentional read kicks in and checks block A's
         pages' uptodate to decide which page needs to be read.
      4) when some pages have the uptodate bit during 3)'s check so
         3) doesn't count them for eb->io_pages, but they are later
         cleared by 2) so we has to readpage on the page, we get
         the wrong eb->io_pages which results in a memory leak of
         this block.
      
      This fixes the problem by firstly getting all pages's locking and
      then checking pages' uptodate bit.
      
         t1(readahead)                              t2(readahead endio)                                       t3(the following read)
      read_extent_buffer_pages                    end_bio_extent_readpage
        for pg in eb:                                for page 0,1,2 in eb:
            if pg is uptodate:                           btree_readpage_end_io_hook(pg)
                num_reads++                              if uptodate:
        eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
        for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages
             if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
                 __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
                                                             clear_extent_buffer_uptodate(eb)                         num_reads++
                                                                 for pg in eb:                                eb->io_pages = num_reads
                                                                     ClearPageUptodate(page)  _______________
                                                                                                              for pg in eb:
                                                                                                                  if pg is NOT uptodate:
                                                                                                                      __extent_read_full_page(pg)
      
      So t3's eb->io_pages is not consistent with the number of pages it's reading,
      and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative
      number so that we're not able to free the eb.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3d83da25
    • Takashi Iwai's avatar
      ALSA: hda - Gate the mic jack on HP Z1 Gen3 AiO · 2628573b
      Takashi Iwai authored
      commit f73cd43a upstream.
      
      HP Z1 Gen3 AiO with Conexant codec doesn't give an unsolicited event
      to the headset mic pin upon the jack plugging, it reports only to the
      headphone pin.  It results in the missing mic switching.  Let's fix up
      by simply gating the jack event.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2628573b
    • Jussi Laako's avatar
      ALSA: hiface: Fix M2Tech hiFace driver sampling rate change · 72588bd4
      Jussi Laako authored
      commit 995c6a7f upstream.
      
      Sampling rate changes after first set one are not reflected to the
      hardware, while driver and ALSA think the rate has been changed.
      
      Fix the problem by properly stopping the interface at the beginning of
      prepare call, allowing new rate to be set to the hardware. This keeps
      the hardware in sync with the driver.
      Signed-off-by: default avatarJussi Laako <jussi@sonarnerd.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      72588bd4
    • Con Kolivas's avatar
      ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to volume_control_quirks · 5bf1a774
      Con Kolivas authored
      commit 82ffb6fc upstream.
      
      The Logitech QuickCam Communicate Deluxe/S7500 microphone fails with the
      following warning.
      
      [    6.778995] usb 2-1.2.2.2: Warning! Unlikely big volume range (=3072),
      cval->res is probably wrong.
      [    6.778996] usb 2-1.2.2.2: [5] FU [Mic Capture Volume] ch = 1, val =
      4608/7680/1
      
      Adding it to the list of devices in volume_control_quirks makes it work
      properly, fixing related typo.
      Signed-off-by: default avatarCon Kolivas <kernel@kolivas.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5bf1a774
    • Alan Stern's avatar
      USB: UHCI: report non-PME wakeup signalling for Intel hardware · 1d22fa31
      Alan Stern authored
      commit ccdb6be9 upstream.
      
      The UHCI controllers in Intel chipsets rely on a platform-specific non-PME
      mechanism for wakeup signalling.  They can generate wakeup signals even
      though they don't support PME.
      
      We need to let the USB core know this so that it will enable runtime
      suspend for UHCI controllers.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1d22fa31
    • Felipe Balbi's avatar
      usb: gadget: composite: correctly initialize ep->maxpacket · bc87609a
      Felipe Balbi authored
      commit e8f29bb7 upstream.
      
      usb_endpoint_maxp() returns wMaxPacketSize in its
      raw form. Without taking into consideration that it
      also contains other bits reserved for isochronous
      endpoints.
      
      This patch fixes one occasion where this is a
      problem by making sure that we initialize
      ep->maxpacket only with lower 10 bits of the value
      returned by usb_endpoint_maxp(). Note that seperate
      patches will be necessary to audit all call sites of
      usb_endpoint_maxp() and make sure that
      usb_endpoint_maxp() only returns lower 10 bits of
      wMaxPacketSize.
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bc87609a
    • Mathias Nyman's avatar
      usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices · 6cc18ebe
      Mathias Nyman authored
      commit 37be6676 upstream.
      
      USB-3 does not have any link state that will avoid negotiating a connection
      with a plugged-in cable but will signal the host when the cable is
      unplugged.
      
      For USB-3 we used to first set the link to Disabled, then to RxDdetect to
      be able to detect cable connects or disconnects. But in RxDetect the
      connected device is detected again and eventually enabled.
      
      Instead set the link into U3 and disable remote wakeups for the device.
      This is what Windows does, and what Alan Stern suggested.
      
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6cc18ebe
    • Nathaniel Quillin's avatar
      USB: cdc-acm: add device id for GW Instek AFG-125 · cdd9e42b
      Nathaniel Quillin authored
      commit 30121604 upstream.
      
      Add device-id entry for GW Instek AFG-125, which has a byte swapped
      bInterfaceSubClass (0x20).
      Signed-off-by: default avatarNathaniel Quillin <ndq@google.com>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cdd9e42b
    • Johan Hovold's avatar
      USB: serial: kl5kusb105: fix open error path · 7d15019f
      Johan Hovold authored
      commit 6774d5f5 upstream.
      
      Kill urbs and disable read before returning from open on failure to
      retrieve the line state.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7d15019f
    • Giuseppe Lippolis's avatar
      USB: serial: option: add dlink dwm-158 · 99362880
      Giuseppe Lippolis authored
      commit d8a12b71 upstream.
      
      Adding registration for 3G modem DWM-158 in usb-serial-option
      Signed-off-by: default avatarGiuseppe Lippolis <giu.lippolis@gmail.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      99362880
    • Daniele Palmas's avatar
      USB: serial: option: add support for Telit LE922A PIDs 0x1040, 0x1041 · 09f7a171
      Daniele Palmas authored
      commit 5b09eff0 upstream.
      
      This patch adds support for PIDs 0x1040, 0x1041 of Telit LE922A.
      
      Since the interface positions are the same than the ones used
      for other Telit compositions, previous defined blacklists are used.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      09f7a171
    • Robbie Ko's avatar
      Btrfs: fix tree search logic when replaying directory entry deletes · 5b5c8f9f
      Robbie Ko authored
      commit 2a7bf53f upstream.
      
      If a log tree has a layout like the following:
      
      leaf N:
              ...
              item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
                      dir log end 1275809046
      leaf N + 1:
              item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
                      dir log end 18446744073709551615
              ...
      
      When we pass the value 1275809046 + 1 as the parameter start_ret to the
      function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
      end up with path->slots[0] having the value 239 (points to the last item
      of leaf N, item 240). Because the dir log item in that position has an
      offset value smaller than *start_ret (1275809046 + 1) we need to move on
      to the next leaf, however the logic for that is wrong since it compares
      the current slot to the number of items in the leaf, which is smaller
      and therefore we don't lookup for the next leaf but instead we set the
      slot to point to an item that does not exist, at slot 240, and we later
      operate on that slot which has unexpected content or in the worst case
      can result in an invalid memory access (accessing beyond the last page
      of leaf N's extent buffer).
      
      So fix the logic that checks when we need to lookup at the next leaf
      by first incrementing the slot and only after to check if that slot
      is beyond the last item of the current leaf.
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Fixes: e02119d5 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      [Modified changelog for clarity and correctness]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b5c8f9f
    • Jeff Mahoney's avatar
      Revert "Btrfs: don't delay inode ref updates during log, replay" · 081fafdd
      Jeff Mahoney authored
      This reverts commit 644d1071, upstream
      commit 6f896054.
      
      The original patch for mainline, 6f896054 (Btrfs: don't delay
      inode ref updates during log replay) lists 1d52c78a (Btrfs: try
      not to ENOSPC on log replay) as the only pre-3.18 dependency, but it
      also depends on 67de1176 (Btrfs: introduce the delayed inode ref
      deletion for the single link inode), which was introduced in 3.14
      and isn't in 3.12.y.
      
      The -stable commit added the check to btrfs_delayed_update_inode,
      which may look similar to btrfs_delayed_delete_inode_ref, but it's
      only superficial.  The tops of both functions handle typical
      delayed node boilerplate.  The upshot is that the patch is harmless
      since the caller already checks to see if we're doing log recovery,
      so we're not breaking anything.  It should be reverted because it
      makes it appear as if this issue was fixed for users who did
      backport 67de1176, when it is not.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      081fafdd
    • Michal Hocko's avatar
      hotplug: Make register and unregister notifier API symmetric · ae252fd8
      Michal Hocko authored
      commit 777c6e0d upstream.
      
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      [js] backport to 3.12
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ae252fd8
    • Boris Brezillon's avatar
      m68k: Fix ndelay() macro · 13e993f8
      Boris Brezillon authored
      commit 7e251bb2 upstream.
      
      The current ndelay() macro definition has an extra semi-colon at the
      end of the line thus leading to a compilation error when ndelay is used
      in a conditional block without curly braces like this one:
      
      	if (cond)
      		ndelay(t);
      	else
      		...
      
      which, after the preprocessor pass gives:
      
      	if (cond)
      		m68k_ndelay(t);;
      	else
      		...
      
      thus leading to the following gcc error:
      
      	error: 'else' without a previous 'if'
      
      Remove this extra semi-colon.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: c8ee038b ("m68k: Implement ndelay() based on the existing udelay() logic")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      13e993f8
    • 추지호's avatar
      can: peak: fix bad memory access and free sequence · f1f12289
      추지호 authored
      commit b67d0dd7 upstream.
      
      Fix for bad memory access while disconnecting. netdev is freed before
      private data free, and dev is accessed after freeing netdev.
      
      This makes a slub problem, and it raise kernel oops with slub debugger
      config.
      Signed-off-by: default avatarJiho Chu <jiho.chu@samsung.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f1f12289
    • Marc Kleine-Budde's avatar
      can: raw: raw_setsockopt: limit number of can_filter that can be set · 9f412aa5
      Marc Kleine-Budde authored
      commit 332b05ca upstream.
      
      This patch adds a check to limit the number of can_filters that can be
      set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
      are not prevented resulting in a warning.
      
      Reference: https://lkml.org/lkml/2016/12/2/230Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9f412aa5
    • Peter Zijlstra (Intel)'s avatar
      perf/x86: Fix full width counter, counter overflow · d9dba3a9
      Peter Zijlstra (Intel) authored
      commit 7f612a7f upstream.
      
      Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM.
      
      Both these parts have full_width_write set, and that does indeed have
      a problem. In order to deal with counter wrap, we must sample the
      counter at at least half the counter period (see also the sampling
      theorem) such that we can unambiguously reconstruct the count.
      
      However commit:
      
        069e0c3c ("perf/x86/intel: Support full width counting")
      
      sets the sampling interval to the full period, not half.
      
      Fixing that exposes another issue, in that we must not sign extend the
      delta value when we shift it right; the counter cannot have
      decremented after all.
      
      With both these issues fixed, counter overflow functions correctly
      again.
      Reported-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Tested-by: default avatarLiang, Kan <kan.liang@intel.com>
      Tested-by: default avatarOdzioba, Lukasz <lukasz.odzioba@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 069e0c3c ("perf/x86/intel: Support full width counting")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d9dba3a9
    • Thomas Gleixner's avatar
      locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() · 15543c65
      Thomas Gleixner authored
      commit 1be5d4fa upstream.
      
      While debugging the rtmutex unlock vs. dequeue race Will suggested to use
      READ_ONCE() in rt_mutex_owner() as it might race against the
      cmpxchg_release() in unlock_rt_mutex_safe().
      
      Will: "It's a minor thing which will most likely not matter in practice"
      
      Careful search did not unearth an actual problem in todays code, but it's
      better to be safe than surprised.
      Suggested-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      15543c65
    • Thomas Gleixner's avatar
      locking/rtmutex: Prevent dequeue vs. unlock race · a0346c32
      Thomas Gleixner authored
      commit dbb26055 upstream.
      
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a0346c32
    • Jan Kara's avatar
      ext4: fix data exposure after a crash · 10de8b68
      Jan Kara authored
      commit 06bd3c36 upstream.
      
      Huang has reported that in his powerfail testing he is seeing stale
      block contents in some of recently allocated blocks although he mounts
      ext4 in data=ordered mode. After some investigation I have found out
      that indeed when delayed allocation is used, we don't add inode to
      transaction's list of inodes needing flushing before commit. Originally
      we were doing that but commit f3b59291 removed the logic with a
      flawed argument that it is not needed.
      
      The problem is that although for delayed allocated blocks we write their
      contents immediately after allocating them, there is no guarantee that
      the IO scheduler or device doesn't reorder things and thus transaction
      allocating blocks and attaching them to inode can reach stable storage
      before actual block contents. Actually whenever we attach freshly
      allocated blocks to inode using a written extent, we should add inode to
      transaction's ordered inode list to make sure we properly wait for block
      contents to be written before committing the transaction. So that is
      what we do in this patch. This also handles other cases where stale data
      exposure was possible - like filling hole via mmap in
      data=ordered,nodelalloc mode.
      
      The only exception to the above rule are extending direct IO writes where
      blkdev_direct_IO() waits for IO to complete before increasing i_size and
      thus stale data exposure is not possible. For now we don't complicate
      the code with optimizing this special case since the overhead is pretty
      low. In case this is observed to be a performance problem we can always
      handle it using a special flag to ext4_map_blocks().
      
      Fixes: f3b59291Reported-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Tested-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      10de8b68
  2. 15 Dec, 2016 2 commits
  3. 13 Dec, 2016 4 commits