1. 11 Apr, 2017 9 commits
  2. 10 Apr, 2017 12 commits
    • Rafael J. Wysocki's avatar
      Revert "cpufreq: fix garbage kobjects on errors during suspend/resume" · e060479e
      Rafael J. Wysocki authored
      commit d4faadd5 upstream.
      
      Commit 2167e239 (cpufreq: fix garbage kobjects on errors during
      suspend/resume) breaks suspend/resume on Martin Ziegler's system
      (hard lockup during resume), so revert it.
      
      Fixes: 2167e239 (cpufreq: fix garbage kobjects on errors during suspend/resume)
      References: https://bugzilla.kernel.org/show_bug.cgi?id=66751Reported-by: default avatarMartin Ziegler <ziegler@uni-freiburg.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e060479e
    • Takashi Iwai's avatar
      ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call · 5bde2c27
      Takashi Iwai authored
      commit f363a066 upstream.
      
      In the commit [15c75b09: ALSA: ctxfi: Fallback DMA mask to 32bit],
      I forgot to put "!" at dam_set_mask() call check in cthw20k1.c (while
      cthw20k2.c is OK).  This patch fixes that obvious bug.
      
      (As a side note: although the original commit was completely wrong,
       it's still working for most of machines, as it sets to 32bit DMA mask
       in the end.  So the bug severity is low.)
      
      Fixes: 15c75b09 ("ALSA: ctxfi: Fallback DMA mask to 32bit")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5bde2c27
    • Takashi Iwai's avatar
      ALSA: ctxfi: Fallback DMA mask to 32bit · cbd32ce4
      Takashi Iwai authored
      commit 15c75b09 upstream.
      
      Currently ctxfi driver tries to set only the 64bit DMA mask on 64bit
      architectures, and bails out if it fails.  This causes a problem on
      some platforms since the 64bit DMA isn't always guaranteed.  We should
      fall back to the default 32bit DMA when 64bit DMA fails.
      
      Fixes: 6d74b86d ("ALSA: ctxfi - Allow 64bit DMA")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cbd32ce4
    • Jason A. Donenfeld's avatar
      padata: avoid race in reordering · c9645aa7
      Jason A. Donenfeld authored
      commit de5540d0 upstream.
      
      Under extremely heavy uses of padata, crashes occur, and with list
      debugging turned on, this happens instead:
      
      [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33
      __list_add+0xae/0x130
      [87487.301868] list_add corruption. prev->next should be next
      (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00).
      [87487.339011]  [<ffffffff9a53d075>] dump_stack+0x68/0xa3
      [87487.342198]  [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0
      [87487.345364]  [<ffffffff99d6b91f>] __warn+0xff/0x140
      [87487.348513]  [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50
      [87487.351659]  [<ffffffff9a58b5de>] __list_add+0xae/0x130
      [87487.354772]  [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70
      [87487.357915]  [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420
      [87487.361084]  [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120
      
      padata_reorder calls list_add_tail with the list to which its adding
      locked, which seems correct:
      
      spin_lock(&squeue->serial.lock);
      list_add_tail(&padata->list, &squeue->serial.list);
      spin_unlock(&squeue->serial.lock);
      
      This therefore leaves only place where such inconsistency could occur:
      if padata->list is added at the same time on two different threads.
      This pdata pointer comes from the function call to
      padata_get_next(pd), which has in it the following block:
      
      next_queue = per_cpu_ptr(pd->pqueue, cpu);
      padata = NULL;
      reorder = &next_queue->reorder;
      if (!list_empty(&reorder->list)) {
             padata = list_entry(reorder->list.next,
                                 struct padata_priv, list);
             spin_lock(&reorder->lock);
             list_del_init(&padata->list);
             atomic_dec(&pd->reorder_objects);
             spin_unlock(&reorder->lock);
      
             pd->processed++;
      
             goto out;
      }
      out:
      return padata;
      
      I strongly suspect that the problem here is that two threads can race
      on reorder list. Even though the deletion is locked, call to
      list_entry is not locked, which means it's feasible that two threads
      pick up the same padata object and subsequently call list_add_tail on
      them at the same time. The fix is thus be hoist that lock outside of
      that block.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c9645aa7
    • David Hildenbrand's avatar
      KVM: kvm_io_bus_unregister_dev() should never fail · a517ec56
      David Hildenbrand authored
      commit 90db1043 upstream.
      
      No caller currently checks the return value of
      kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
      freeing their device. A stale reference will remain in the io_bus,
      getting at least used again, when the iobus gets teared down on
      kvm_destroy_vm() - leading to use after free errors.
      
      There is nothing the callers could do, except retrying over and over
      again.
      
      So let's simply remove the bus altogether, print an error and make
      sure no one can access this broken bus again (returning -ENOMEM on any
      attempt to access it).
      
      Fixes: e93f8a0f ("KVM: convert io_bus to SRCU")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a517ec56
    • Uwe Kleine-König's avatar
      rtc: s35390a: improve irq handling · 64275cb4
      Uwe Kleine-König authored
      commit 3bd32722 upstream.
      
      On some QNAP NAS devices the rtc can wake the machine. Several people
      noticed that once the machine was woken this way it fails to shut down.
      That's because the driver fails to acknowledge the interrupt and so it
      keeps active and restarts the machine immediatly after shutdown. See
      https://bugs.debian.org/794266 for a bug report.
      
      Doing this correctly requires to interpret the INT2 flag of the first read
      of the STATUS1 register because this bit is cleared by read.
      
      Note this is not maximally robust though because a pending irq isn't
      detected when the STATUS1 register was already read (and so INT2 is not
      set) but the irq was not disabled. But that is a hardware imposed problem
      that cannot easily be fixed by software.
      Signed-off-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      64275cb4
    • Uwe Kleine-König's avatar
      rtc: s35390a: implement reset routine as suggested by the reference · b31f881e
      Uwe Kleine-König authored
      commit 8e6583f1 upstream.
      
      There were two deviations from the reference manual: you have to wait
      half a second when POC is active and you might have to repeat
      initialization when POC or BLD are still set after the sequence.
      
      Note however that as POC and BLD are cleared by read the driver might
      not be able to detect that a reset is necessary. I don't have a good
      idea how to fix this.
      
      Additionally report the value read from STATUS1 to the caller. This
      prepares the next patch.
      Signed-off-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b31f881e
    • Uwe Kleine-König's avatar
      rtc: s35390a: make sure all members in the output are set · ac4d4f65
      Uwe Kleine-König authored
      The rtc core calls the .read_alarm with all fields initialized to 0. As
      the s35390a driver doesn't touch some fields the returned date is
      interpreted as a date in January 1900. So make sure all fields are set
      to -1; some of them are then overwritten with the right data depending
      on the hardware state.
      
      In mainline this is done by commit d68778b8 ("rtc: initialize output
      parameter for read alarm to "uninitialized"") in the core. This is
      considered to dangerous for stable as it might have side effects for
      other rtc drivers that might for example rely on alarm->time.tm_sec
      being initialized to 0.
      Signed-off-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ac4d4f65
    • Uwe Kleine-König's avatar
      rtc: s35390a: fix reading out alarm · 19901cad
      Uwe Kleine-König authored
      commit f87e904d upstream.
      
      There are several issues fixed in this patch:
      
       - When alarm isn't enabled, set .enabled to zero instead of returning
         -EINVAL.
       - Ignore how IRQ1 is configured when determining if IRQ2 is on.
       - The three alarm registers have an enable flag which must be
         evaluated.
       - The chip always triggers when the seconds register gets 0.
      
      Note that the rtc framework however doesn't handle the result correctly
      because it doesn't check wday being initialized and so interprets an
      alarm being set for 10:00 AM in three days as 10:00 AM tomorrow (or
      today if that's not over yet).
      Signed-off-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      19901cad
    • Naoya Horiguchi's avatar
      mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() · 69caf454
      Naoya Horiguchi authored
      commit c9d398fa upstream.
      
      I found the race condition which triggers the following bug when
      move_pages() and soft offline are called on a single hugetlb page
      concurrently.
      
          Soft offlining page 0x119400 at 0x700000000000
          BUG: unable to handle kernel paging request at ffffea0011943820
          IP: follow_huge_pmd+0x143/0x190
          PGD 7ffd2067
          PUD 7ffd1067
          PMD 0
              [61163.582052] Oops: 0000 [#1] SMP
          Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
          CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
          Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
          RIP: 0010:follow_huge_pmd+0x143/0x190
          RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
          RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
          RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
          RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
          R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
          R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
          FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
          Call Trace:
           follow_page_mask+0x270/0x550
           SYSC_move_pages+0x4ea/0x8f0
           SyS_move_pages+0xe/0x10
           do_syscall_64+0x67/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: 0033:0x7fc976e03949
          RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
          RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
          RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
          RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
          R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
          R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
          Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
          RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
          CR2: ffffea0011943820
          ---[ end trace e4f81353a2d23232 ]---
          Kernel panic - not syncing: Fatal exception
          Kernel Offset: disabled
      
      This bug is triggered when pmd_present() returns true for non-present
      hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
      Using pmd_present() to determine present/non-present for hugetlb is not
      correct, because pmd_present() checks multiple bits (not only
      _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
      
      Fixes: e66f17ff ("mm/hugetlb: take page table lock in follow_huge_pmd()")
      Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      69caf454
    • Peter Xu's avatar
      KVM: x86: clear bus pointer when destroyed · 5e75d593
      Peter Xu authored
      commit df630b8c upstream.
      
      When releasing the bus, let's clear the bus pointers to mark it out. If
      any further device unregister happens on this bus, we know that we're
      done if we found the bus being released already.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5e75d593
    • Alan Stern's avatar
      USB: fix linked-list corruption in rh_call_control() · f8f0b420
      Alan Stern authored
      commit 16336820 upstream.
      
      Using KASAN, Dmitry found a bug in the rh_call_control() routine: If
      buffer allocation fails, the routine returns immediately without
      unlinking its URB from the control endpoint, eventually leading to
      linked-list corruption.
      
      This patch fixes the problem by jumping to the end of the routine
      (where the URB is unlinked) when an allocation failure occurs.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f8f0b420
  3. 07 Apr, 2017 19 commits
    • Josh Poimboeuf's avatar
      ACPI: Fix incompatibility with mcount-based function graph tracing · 5178151b
      Josh Poimboeuf authored
      commit 61b79e16 upstream.
      
      Paul Menzel reported a warning:
      
        WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
        Bad frame pointer: expected f6919d98, received f6919db0
          from func acpi_pm_device_sleep_wake return to c43b6f9d
      
      The warning means that function graph tracing is broken for the
      acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
      unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
      issue because mcount-based function graph tracing is incompatible with
      '-Os' on x86, thanks to the following gcc bug:
      
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
      
      I have another patch pending which will ensure that mcount-based
      function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
      x86.
      
      But this patch is needed in addition to that one because the ACPI
      Makefile overrides that config option for no apparent reason.  It has
      had this flag since the beginning of git history, and there's no related
      comment, so I don't know why it's there.  As far as I can tell, there's
      no reason for it to be there.  The appropriate behavior is for it to
      honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
      kernel.
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5178151b
    • Takashi Iwai's avatar
      ALSA: seq: Fix race during FIFO resize · d4b8e8a3
      Takashi Iwai authored
      commit 2d7d5400 upstream.
      
      When a new event is queued while processing to resize the FIFO in
      snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool
      that is being queued gets removed.  For avoiding this race, we need to
      close the pool to be deleted and sync its usage before actually
      deleting it.
      
      The issue was spotted by syzkaller.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d4b8e8a3
    • John Garry's avatar
      scsi: libsas: fix ata xfer length · a90c3a3c
      John Garry authored
      commit 9702c67c upstream.
      
      The total ata xfer length may not be calculated properly, in that we do
      not use the proper method to get an sg element dma length.
      
      According to the code comment, sg_dma_len() should be used after
      dma_map_sg() is called.
      
      This issue was found by turning on the SMMUv3 in front of the hisi_sas
      controller in hip07. Multiple sg elements were being combined into a
      single element, but the original first element length was being use as
      the total xfer length.
      
      Fixes: ff2aeb1e ("libata: convert to chained sg")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a90c3a3c
    • James Bottomley's avatar
      scsi: mpt3sas: fix hang on ata passthrough commands · e8caca34
      James Bottomley authored
      commit ffb58456 upstream.
      
      mpt3sas has a firmware failure where it can only handle one pass through
      ATA command at a time.  If another comes in, contrary to the SAT
      standard, it will hang until the first one completes (causing long
      commands like secure erase to timeout).  The original fix was to block
      the device when an ATA command came in, but this caused a regression
      with
      
      commit 669f0441
      Author: Bart Van Assche <bart.vanassche@sandisk.com>
      Date:   Tue Nov 22 16:17:13 2016 -0800
      
          scsi: srp_transport: Move queuecommand() wait code to SCSI core
      
      So fix the original fix of the secure erase timeout by properly
      returning SAM_STAT_BUSY like the SAT recommends.  The original patch
      also had a concurrency problem since scsih_qcmd is lockless at that
      point (this is fixed by using atomic bitops to set and test the flag).
      
      [mkp: addressed feedback wrt. test_bit and fixed whitespace]
      
      Fixes: 18f6084a (mpt3sas: Fix secure erase premature termination)
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Acked-by: default avatarSreekanth Reddy <Sreekanth.Reddy@broadcom.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Joe Korty <joe.korty@ccur.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e8caca34
    • Ilya Dryomov's avatar
      libceph: force GFP_NOIO for socket allocations · 80899b3a
      Ilya Dryomov authored
      commit 633ee407 upstream.
      
      sock_alloc_inode() allocates socket+inode and socket_wq with
      GFP_KERNEL, which is not allowed on the writeback path:
      
          Workqueue: ceph-msgr con_work [libceph]
          ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
          0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
          ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
          Call Trace:
          [<ffffffff816dd629>] schedule+0x29/0x70
          [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
          [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
          [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
          [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
          [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
          [<ffffffff81086335>] flush_work+0x165/0x250
          [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
          [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
          [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
          [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
          [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
          [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
          [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
          [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
          [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
          [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
          [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
          [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
          [<ffffffff8115af70>] shrink_slab+0x100/0x140
          [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
          [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
          [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
          [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
          [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
          [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
          [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
          [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
          [<ffffffff811d8566>] alloc_inode+0x26/0xa0
          [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
          [<ffffffff815b933e>] sock_alloc+0x1e/0x80
          [<ffffffff815ba855>] __sock_create+0x95/0x220
          [<ffffffff815baa04>] sock_create_kern+0x24/0x30
          [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
          [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
          [<ffffffff81084c19>] process_one_work+0x159/0x4f0
          [<ffffffff8108561b>] worker_thread+0x11b/0x530
          [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
          [<ffffffff8108b6f9>] kthread+0xc9/0xe0
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
          [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
      
      Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.
      
      Link: http://tracker.ceph.com/issues/19309Reported-by: default avatarSergey Jerusalimov <wintchester@gmail.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80899b3a
    • Sebastian Andrzej Siewior's avatar
      sched/rt: Add a missing rescheduling point · 258b4e67
      Sebastian Andrzej Siewior authored
      commit 619bd4a7 upstream.
      
      Since the change in commit:
      
        fd7a4bed ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks")
      
      ... we don't reschedule a task under certain circumstances:
      
      Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on
      CPU0) and holds a PI lock. This task is removed from the CPU because it
      used up its time slice and another SCHED_OTHER task is running. Task-B on
      CPU1 runs at RT priority and asks for the lock owned by task-A. This
      results in a priority boost for task-A. Task-B goes to sleep until the
      lock has been made available. Task-A is already runnable (but not active),
      so it receives no wake up.
      
      The reality now is that task-A gets on the CPU once the scheduler decides
      to remove the current task despite the fact that a high priority task is
      enqueued and waiting. This may take a long time.
      
      The desired behaviour is that CPU0 immediately reschedules after the
      priority boost which made task-A the task with the lowest priority.
      
      [js] no deadline in 3.12 yet
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fd7a4bed ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks")
      Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      
      258b4e67
    • Dave Martin's avatar
      metag/ptrace: Reject partial NT_METAG_RPIPE writes · ef40721f
      Dave Martin authored
      commit 7195ee31 upstream.
      
      It's not clear what behaviour is sensible when doing partial write of
      NT_METAG_RPIPE, so just don't bother.
      
      This patch assumes that userspace will never rely on a partial SETREGSET
      in this case, since it's not clear what should happen anyway.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ef40721f
    • Dave Martin's avatar
      metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS · 0adb097c
      Dave Martin authored
      commit 5fe81fe9 upstream.
      
      Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
      to fill TXSTATUS, a well-defined default value is used, based on the
      task's current value.
      Suggested-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0adb097c
    • Dave Martin's avatar
      metag/ptrace: Preserve previous registers for short regset write · ea347c31
      Dave Martin authored
      commit a78ce80d upstream.
      
      Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
      to fill all the registers, the thread's old registers are preserved.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ea347c31
    • Dave Martin's avatar
      sparc/ptrace: Preserve previous registers for short regset write · bc586353
      Dave Martin authored
      commit d3805c54 upstream.
      
      Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
      to fill all the registers, the thread's old registers are preserved.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bc586353
    • Dave Martin's avatar
      c6x/ptrace: Remove useless PTRACE_SETREGSET implementation · bd38b388
      Dave Martin authored
      commit fb411b83 upstream.
      
      gpr_set won't work correctly and can never have been tested, and the
      correct behaviour is not clear due to the endianness-dependent task
      layout.
      
      So, just remove it.  The core code will now return -EOPNOTSUPPORT when
      trying to set NT_PRSTATUS on this architecture until/unless a correct
      implementation is supplied.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd38b388
    • Ladi Prosek's avatar
      virtio_balloon: init 1st buffer in stats vq · 1614d285
      Ladi Prosek authored
      commit fc865322 upstream.
      
      When init_vqs runs, virtio_balloon.stats is either uninitialized or
      contains stale values. The host updates its state with garbage data
      because it has no way of knowing that this is just a marker buffer
      used for signaling.
      
      This patch updates the stats before pushing the initial buffer.
      
      Alternative fixes:
      * Push an empty buffer in init_vqs. Not easily done with the current
        virtio implementation and violates the spec "Driver MUST supply the
        same subset of statistics in all buffers submitted to the statsq".
      * Push a buffer with invalid tags in init_vqs. Violates the same
        spec clause, plus "invalid tag" is not really defined.
      
      Note: the spec says:
      	When using the legacy interface, the device SHOULD ignore all values in
      	the first buffer in the statsq supplied by the driver after device
      	initialization. Note: Historically, drivers supplied an uninitialized
      	buffer in the first buffer.
      
      Unfortunately QEMU does not seem to implement the recommendation
      even for the legacy interface.
      Signed-off-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1614d285
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder · 1b215097
      Andy Whitcroft authored
      commit f843ee6d upstream.
      
      Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
      wrapping issues.  To ensure we are correctly ensuring that the two ESN
      structures are the same size compare both the overall size as reported
      by xfrm_replay_state_esn_len() and the internal length are the same.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b215097
    • Andy Whitcroft's avatar
      xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window · be4f4140
      Andy Whitcroft authored
      commit 677e806d upstream.
      
      When a new xfrm state is created during an XFRM_MSG_NEWSA call we
      validate the user supplied replay_esn to ensure that the size is valid
      and to ensure that the replay_window size is within the allocated
      buffer.  However later it is possible to update this replay_esn via a
      XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
      buffer matches the existing state and if so inject the contents.  We do
      not at this point check that the replay_window is within the allocated
      memory.  This leads to out-of-bounds reads and writes triggered by
      netlink packets.  This leads to memory corruption and the potential for
      priviledge escalation.
      
      We already attempt to validate the incoming replay information in
      xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
      is not trying to change the size of the replay state buffer which
      includes the replay_esn.  It however does not check the replay_window
      remains within that buffer.  Add validation of the contained
      replay_window.
      
      CVE-2017-7184
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      be4f4140
    • Jiri Slaby's avatar
      crypto: algif_hash - avoid zero-sized array · 66c7dc1e
      Jiri Slaby authored
      commit 62071194 upstream.
      
      With this reproducer:
        struct sockaddr_alg alg = {
                .salg_family = 0x26,
                .salg_type = "hash",
                .salg_feat = 0xf,
                .salg_mask = 0x5,
                .salg_name = "digest_null",
        };
        int sock, sock2;
      
        sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(sock, (struct sockaddr *)&alg, sizeof(alg));
        sock2 = accept(sock, NULL, NULL);
        setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
        accept(sock2, NULL, NULL);
      
      ==== 8< ======== 8< ======== 8< ======== 8< ====
      
      one can immediatelly see an UBSAN warning:
      UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
      variable length array bound value 0 <= 0
      CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
      ...
      Call Trace:
      ...
       [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
       [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
       [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
       [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
       [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
       [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40
      
      It is a correct warning, as hash state is propagated to accept as zero,
      but creating a zero-length variable array is not allowed in C.
      
      Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
      similar happens in the code there, so we just allocate one byte even
      though we do not use the array.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      66c7dc1e
    • Takashi Iwai's avatar
      fbcon: Fix vc attr at deinit · 3fc017fe
      Takashi Iwai authored
      commit 8aac7f34 upstream.
      
      fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
      the vc attrs dynamically when vc_hi_font_mask is changed at
      fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
      the existing console buffer with one bit shift up (for 9 bits), while
      it remaps with one bit shift down (for 8 bits) when the value is
      cleared.  It works fine as long as the font gets updated after fbcon
      was initialized.
      
      However, we hit a bizarre problem when the console is switched to
      another fb driver (typically from vesafb or efifb to drmfb).  At
      switching to the new fb driver, we temporarily rebind the console to
      the dummy console, then rebind to the new driver.  During the
      switching, we leave the modified attrs as is.  Thus, the new fbcon
      takes over the old buffer as if it were to contain 8 bits chars
      (although the attrs are still shifted for 9 bits), and effectively
      this results in the yellow color texts instead of the original white
      color, as found in the bugzilla entry below.
      
      An easy fix for this is to re-adjust the attrs before leaving the
      fbcon at con_deinit callback.  Since the code to adjust the attrs is
      already present in the current fbcon code, in this patch, we simply
      factor out the relevant code, and call it from fbcon_deinit().
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3fc017fe
    • Sumit Semwal's avatar
      uvcvideo: uvc_scan_fallback() for webcams with broken chain · 2465421b
      Sumit Semwal authored
      From: Henrik Ingo <henrik.ingo@avoinelama.fi>
      
      [ Upstream commit e950267a ]
      
      Some devices have invalid baSourceID references, causing uvc_scan_chain()
      to fail, but if we just take the entities we can find and put them
      together in the most sensible chain we can think of, turns out they do
      work anyway. Note: This heuristic assumes there is a single chain.
      
      At the time of writing, devices known to have such a broken chain are
        - Acer Integrated Camera (5986:055a)
        - Realtek rtl157a7 (0bda:57a7)
      Signed-off-by: default avatarHenrik Ingo <henrik.ingo@avoinelama.fi>
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2465421b
    • Sumit Semwal's avatar
      block: allow WRITE_SAME commands with the SG_IO ioctl · 596be9a8
      Sumit Semwal authored
      From: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      
      [ Upstream commit 25cdb645 ]
      
      The WRITE_SAME commands are not present in the blk_default_cmd_filter
      write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
      is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
      [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]
      
      The problem can be reproduced with the sg_write_same command
      
        # sg_write_same --num 1 --xferlen 512 /dev/sda
        #
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
          Write same: pass through os error: Operation not permitted
        #
      
      For comparison, the WRITE_VERIFY command does not observe this problem,
      since it is in that list:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda'
        #
      
      So, this patch adds the WRITE_SAME commands to the list, in order
      for the SG_IO ioctl to finish successfully:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
        #
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
      which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).
      
      In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
      which are translated to write-same calls in the guest kernel, and then into
      SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:
      
        [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
        [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
        [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
        [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00
        [...] blk_update_request: I/O error, dev sda, sector 17096824
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarBrahadambal Srinivasan <latha@linux.vnet.ibm.com>
      Reported-by: default avatarManjunatha H R <manjuhr1@in.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      596be9a8
    • Darrick J. Wong's avatar
      xfs: clear _XBF_PAGES from buffers when readahead page · 6cf4695d
      Darrick J. Wong authored
      commit 2aa6ba7b upstream.
      
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Cc: Ivan Kozik <ivan@ludios.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6cf4695d