1. 03 Aug, 2015 40 commits
    • Pali Rohár's avatar
      dell-laptop: Fix allocating & freeing SMI buffer page · 4e5c0806
      Pali Rohár authored
      commit b8830a4e upstream.
      
      This commit fix kernel crash when probing for rfkill devices in dell-laptop
      driver failed. Function free_page() was incorrectly used on struct page *
      instead of virtual address of SMI buffer.
      
      This commit also simplify allocating page for SMI buffer by using
      __get_free_page() function instead of sequential call of functions
      alloc_page() and page_address().
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e5c0806
    • Jingoo Han's avatar
      of/address: use atomic allocation in pci_register_io_range() · a35b0d6c
      Jingoo Han authored
      commit 294240ff upstream.
      
      When kzalloc() is called under spin_lock(), GFP_ATOMIC should be
      used to avoid sleeping allocation.
      The call tree is:
        of_pci_range_to_resource()
          --> pci_register_io_range() <-- takes spin_lock(&io_range_lock);
             --> kzalloc()
      Signed-off-by: default avatarJingoo Han <jingoohan1@gmail.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a35b0d6c
    • Arnd Bergmann's avatar
      ideapad: fix software rfkill setting · 0b99ffc6
      Arnd Bergmann authored
      commit 4b200b46 upstream.
      
      This fixes a several year old regression that I found while trying
      to get the Yoga 3 11 to work. The ideapad_rfk_set function is meant
      to send a command to the embedded controller through ACPI, but
      as of c1f73658, it sends the index of the rfkill device instead
      of the command, and ignores the opcode field.
      
      This changes it back to the original behavior, which indeed
      flips the rfkill state as seen in the debugfs interface.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: c1f73658 ("ideapad: pass ideapad_priv as argument (part 2)")
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b99ffc6
    • Dmitry Tunin's avatar
      ideapad_laptop: Lenovo G50-30 fix rfkill reports wireless blocked · 65f5cac6
      Dmitry Tunin authored
      commit 4fa9dabc upstream.
      
      Lenovo G30-50 does not have a hardware wireless switch and wireless
      is always blocked.
      
      BugLink: https://bugs.launchpad.net/bugs/1397021Signed-off-by: default avatarDmitry Tunin <hanipouspilot@gmail.com>
      Signed-off-by: default avatarPhilippe Coval <philippe.coval@open.eurogiciel.org>
      [dvhart@linux.intel.com: Reordered dmi id per Phillippe's later version]
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65f5cac6
    • Damian Eppel's avatar
      clocksource: exynos_mct: Avoid blocking calls in the cpu hotplug notifier · 4f0b316f
      Damian Eppel authored
      commit 56a94f13 upstream.
      
      Whilst testing cpu hotplug events on kernel configured with
      DEBUG_PREEMPT and DEBUG_ATOMIC_SLEEP we get following BUG message,
      caused by calling request_irq() and free_irq() in the context of
      hotplug notification (which is in this case atomic context).
      
      [   40.785859] CPU1: Software reset
      [   40.786660] BUG: sleeping function called from invalid context at mm/slub.c:1241
      [   40.786668] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
      [   40.786678] Preemption disabled at:[<  (null)>]   (null)
      [   40.786681]
      [   40.786692] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc4-00024-g7dca860 #36
      [   40.786698] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      [   40.786728] [<c0014a00>] (unwind_backtrace) from [<c0011980>] (show_stack+0x10/0x14)
      [   40.786747] [<c0011980>] (show_stack) from [<c0449ba0>] (dump_stack+0x70/0xbc)
      [   40.786767] [<c0449ba0>] (dump_stack) from [<c00c6124>] (kmem_cache_alloc+0xd8/0x170)
      [   40.786785] [<c00c6124>] (kmem_cache_alloc) from [<c005d6f8>] (request_threaded_irq+0x64/0x128)
      [   40.786804] [<c005d6f8>] (request_threaded_irq) from [<c0350b8c>] (exynos4_local_timer_setup+0xc0/0x13c)
      [   40.786820] [<c0350b8c>] (exynos4_local_timer_setup) from [<c0350ca8>] (exynos4_mct_cpu_notify+0x30/0xa8)
      [   40.786838] [<c0350ca8>] (exynos4_mct_cpu_notify) from [<c003b330>] (notifier_call_chain+0x44/0x84)
      [   40.786857] [<c003b330>] (notifier_call_chain) from [<c0022fd4>] (__cpu_notify+0x28/0x44)
      [   40.786873] [<c0022fd4>] (__cpu_notify) from [<c0013714>] (secondary_start_kernel+0xec/0x150)
      [   40.786886] [<c0013714>] (secondary_start_kernel) from [<40008764>] (0x40008764)
      
      Interrupts cannot be requested/freed in the CPU_STARTING/CPU_DYING
      notifications which run on the hotplugged cpu with interrupts and
      preemption disabled.
      
      To avoid the issue, request the interrupts for all possible cpus in
      the boot code. The interrupts are marked NO_AUTOENABLE to avoid a racy
      request_irq/disable_irq() sequence. The flag prevents the
      request_irq() code from enabling the interrupt immediately.
      
      The interrupt is then enabled in the CPU_STARTING notifier of the
      hotplugged cpu and again disabled with disable_irq_nosync() in the
      CPU_DYING notifier.
      
      [ tglx: Massaged changelog to match the patch ]
      
      Fixes: 7114cd74 ("clocksource: exynos_mct: use (request/free)_irq calls for local timer registration")
      Reported-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Tested-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Tested-by: default avatarMarcin Jabrzyk <m.jabrzyk@samsung.com>
      Signed-off-by: default avatarDamian Eppel <d.eppel@samsung.com>
      Cc: m.szyprowski@samsung.com
      Cc: kyungmin.park@samsung.com
      Cc: daniel.lezcano@linaro.org
      Cc: kgene@kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1435324984-7328-1-git-send-email-d.eppel@samsung.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f0b316f
    • Alexander Duyck's avatar
      e1000e: Cleanup handling of VLAN_HLEN as a part of max frame size · 8b7c99ee
      Alexander Duyck authored
      commit 8084b86d upstream.
      
      When the VLAN_HLEN was added to the calculation for the maximum frame size
      there seems to have been a number of issues added to the driver.
      
      The first issue is that in some cases the maximum frame size for a device
      never really reached the actual maximum frame size as the VLAN header
      length was not included the calculation for that value.  As a result some
      parts only supported a maximum frame size of either 1496 in the case of
      parts that didn't support jumbo frames, and 8996 in the case of the parts
      that do.
      
      The second issue is the fact that there were several checks that weren't
      updated so as a result setting an MTU of 1500 was treated as enabling jumbo
      frames as the calculated value was 1522 instead of 1518.  I have addressed
      those by replacing ETH_FRAME_LEN with VLAN_ETH_FRAME_LEN where appropriate.
      
      The final issue was the fact that lowering the MTU below 1500 would cause
      the driver to allocate 2K buffers for the rings.  This is an old issue that
      was fixed several years ago in igb/ixgbe and I am addressing now by just
      replacing == with a <= so that we always just round up to 1522 for anything
      that isn't a jumbo frame.
      
      Fixes: c751a3d5 ("e1000e: Correctly include VLAN_HLEN when changing interface MTU")
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b7c99ee
    • Michal Kazior's avatar
      mac80211: prevent possible crypto tx tailroom corruption · 063c47a0
      Michal Kazior authored
      commit ab499db8 upstream.
      
      There was a possible race between
      ieee80211_reconfig() and
      ieee80211_delayed_tailroom_dec(). This could
      result in inability to transmit data if driver
      crashed during roaming or rekeying and subsequent
      skbs with insufficient tailroom appeared.
      
      This race was probably never seen in the wild
      because a device driver would have to crash AND
      recover within 0.5s which is very unlikely.
      
      I was able to prove this race exists after
      changing the delay to 10s locally and crashing
      ath10k via debugfs immediately after GTK
      rekeying. In case of ath10k the counter went below
      0. This was harmless but other drivers which
      actually require tailroom (e.g. for WEP ICV or
      MMIC) could end up with the counter at 0 instead
      of >0 and introduce insufficient skb tailroom
      failures because mac80211 would not resize skbs
      appropriately anymore.
      
      Fixes: 8d1f7ecd ("mac80211: defer tailroom counter manipulation when roaming")
      Signed-off-by: default avatarMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      063c47a0
    • Michal Kazior's avatar
      cfg80211: ignore netif running state when changing iftype · 7b53ca5c
      Michal Kazior authored
      commit 6cbfb1bb upstream.
      
      It was possible for mac80211 to be coerced into an
      unexpected flow causing sdata union to become
      corrupted. Station pointer was put into
      sdata->u.vlan.sta memory location while it was
      really master AP's sdata->u.ap.next_beacon. This
      led to station entry being later freed as
      next_beacon before __sta_info_flush() in
      ieee80211_stop_ap() and a subsequent invalid
      pointer dereference crash.
      
      The problem was that ieee80211_ptr->use_4addr
      wasn't cleared on interface type changes.
      
      This could be reproduced with the following steps:
      
       # host A and host B have just booted; no
       # wpa_s/hostapd running; all vifs are down
       host A> iw wlan0 set type station
       host A> iw wlan0 set 4addr on
       host A> printf 'interface=wlan0\nssid=4addrcrash\nchannel=1\nwds_sta=1' > /tmp/hconf
       host A> hostapd -B /tmp/conf
       host B> iw wlan0 set 4addr on
       host B> ifconfig wlan0 up
       host B> iw wlan0 connect -w hostAssid
       host A> pkill hostapd
       # host A crashed:
      
       [  127.928192] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c8
       [  127.929014] IP: [<ffffffff816f4f32>] __sta_info_flush+0xac/0x158
       ...
       [  127.934578]  [<ffffffff8170789e>] ieee80211_stop_ap+0x139/0x26c
       [  127.934578]  [<ffffffff8100498f>] ? dump_trace+0x279/0x28a
       [  127.934578]  [<ffffffff816dc661>] __cfg80211_stop_ap+0x84/0x191
       [  127.934578]  [<ffffffff816dc7ad>] cfg80211_stop_ap+0x3f/0x58
       [  127.934578]  [<ffffffff816c5ad6>] nl80211_stop_ap+0x1b/0x1d
       [  127.934578]  [<ffffffff815e53f8>] genl_family_rcv_msg+0x259/0x2b5
      
      Note: This isn't a revert of f8cdddb8
      ("cfg80211: check iface combinations only when
      iface is running") as far as functionality is
      considered because b6a55015 ("cfg80211/mac80211:
      move more combination checks to mac80211") moved
      the logic somewhere else already.
      
      Fixes: f8cdddb8 ("cfg80211: check iface combinations only when iface is running")
      Signed-off-by: default avatarMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b53ca5c
    • Eliad Peller's avatar
      iwlwifi: mvm: fix ROC reference accounting · a35fe5b6
      Eliad Peller authored
      commit c779273b upstream.
      
      commit b112889c ("iwlwifi: mvm: add Aux ROC request/response flow")
      added aux ROC flow in addition to the existing ROC flow. While doing
      it, it moved the ROC reference release to a common work item, which
      is being called for both the ROC and aux ROC flows.
      
      This resulted in invalid reference accounting, as no reference was
      taken in case of aux ROC, while a reference was released on completion.
      
      Fix it by adding a reference for the aux ROC as well, and release
      only the relevant references on completion (according to the set bits).
      
      While at it, convert cancel_work_sync() to flush_work(), in order
      to make sure the references are being cleaned properly.
      
      Fixes: b112889c ("iwlwifi: mvm: add Aux ROC request/response flow")
      Signed-off-by: default avatarEliad Peller <eliadx.peller@intel.com>
      Reviewed-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a35fe5b6
    • Chun-Yeow Yeoh's avatar
      mac80211: fix the beacon csa counter for mesh and ibss · 80879086
      Chun-Yeow Yeoh authored
      commit 8df734e8 upstream.
      
      The csa counter has moved from sdata to beacon/presp but
      it is not updated accordingly for mesh and ibss. Fix this.
      
      Fixes: af296bdb ("mac80211: move csa counters from sdata to beacon/presp")
      Signed-off-by: default avatarChun-Yeow Yeoh <yeohchunyeow@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80879086
    • Vasily Averin's avatar
      security_syslog() should be called once only · ae41bfc6
      Vasily Averin authored
      commit d194e5d6 upstream.
      
      The final version of commit 637241a9 ("kmsg: honor dmesg_restrict
      sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
      processed incorrectly:
      
      - open of /dev/kmsg checks syslog access permissions by using
        check_syslog_permissions() where security_syslog() is not called if
        dmesg_restrict is set.
      
      - syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
        can be executed twice (inside check_syslog_permissions() and then
        directly in do_syslog())
      
      With this patch security_syslog() is called once only in all
      syslog-related operations regardless of dmesg_restrict value.
      
      Fixes: 637241a9 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae41bfc6
    • Chris Metcalf's avatar
      __bitmap_parselist: fix bug in empty string handling · f843b096
      Chris Metcalf authored
      commit 2528a8b8 upstream.
      
      bitmap_parselist("", &mask, nmaskbits) will erroneously set bit zero in
      the mask.  The same bug is visible in cpumask_parselist() since it is
      layered on top of the bitmask code, e.g.  if you boot with "isolcpus=",
      you will actually end up with cpu zero isolated.
      
      The bug was introduced in commit 4b060420 ("bitmap, irq: add
      smp_affinity_list interface to /proc/irq") when bitmap_parselist() was
      generalized to support userspace as well as kernelspace.
      
      Fixes: 4b060420 ("bitmap, irq: add smp_affinity_list interface to /proc/irq")
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f843b096
    • Daniel Borkmann's avatar
      compiler-intel: fix wrong compiler barrier() macro · 9116f601
      Daniel Borkmann authored
      commit b86a50c3 upstream.
      
      Cleanup commit 73679e50 ("compiler-intel.h: Remove duplicate
      definition") removed the double definition of __memory_barrier()
      intrinsics.
      
      However, in doing so, it also removed the preceding #undef barrier by
      accident, meaning, the actual barrier() macro from compiler-gcc.h with
      inline asm is still in place as __GNUC__ is provided.
      
      Subsequently, barrier() can never be defined as __memory_barrier() from
      compiler.h since it already has a definition in place and if we trust
      the comment in compiler-intel.h, ecc doesn't support gcc specific asm
      statements.
      
      I don't have an ecc at hand (unsure if that's still used in the field?)
      and only found this by accident during code review, a revert of that
      cleanup would be simplest option.
      
      Fixes: 73679e50 ("compiler-intel.h: Remove duplicate definition")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: mancha security <mancha1@zoho.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9116f601
    • Jean Delvare's avatar
      firmware: dmi_scan: Only honor end-of-table for 64-bit tables · 534cc628
      Jean Delvare authored
      commit 17cd5bd5 upstream.
      
      A 32-bit entry point to a DMI table says how many structures the table
      contains. The SMBIOS specification explicitly says that end-of-table
      markers should be ignored if they are not actually at the end of the
      DMI table. So only honor the end-of-table marker for tables accessed
      through 64-bit entry points, as they do not specify a structure count.
      
      Fixes: fc430262 ("dmi: add support for SMBIOS 3.0 64-bit entry point")
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      534cc628
    • Takashi Iwai's avatar
      PM / sleep: Increase default DPM watchdog timeout to 60 · 01fed233
      Takashi Iwai authored
      commit fff3b16d upstream.
      
      Many harddisks (mostly WD ones) have firmware problems and take too
      long, more than 10 seconds, to resume from suspend.  And this often
      exceeds the default DPM watchdog timeout (12 seconds), resulting in a
      kernel panic out of sudden.
      
      Since most distros just take the default as is, we should give a bit
      more safer value.  This patch increases the default value from 12
      seconds to one minute, which has been confirmed to be long enough for
      such problematic disks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=91921
      Fixes: 70fea60d (PM / Sleep: Detect device suspend/resume lockup and log event)
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01fed233
    • Naoya Horiguchi's avatar
      mm/hugetlb: introduce minimum hugepage order · c791ad1e
      Naoya Horiguchi authored
      commit 641844f5 upstream.
      
      Currently the initial value of order in dissolve_free_huge_page is 64 or
      32, which leads to the following warning in static checker:
      
        mm/hugetlb.c:1203 dissolve_free_huge_pages()
        warn: potential right shift more than type allows '9,18,64'
      
      This is a potential risk of infinite loop, because 1 << order (== 0) is used
      in for-loop like this:
      
        for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D 1 << order)
            ...
      
      So this patch fixes it by using global minimum_order calculated at boot time.
      
          text    data     bss     dec     hex filename
         28313     469   84236  113018   1b97a mm/hugetlb.o
         28256     473   84236  112965   1b945 mm/hugetlb.o (patched)
      
      Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c791ad1e
    • Arnd Bergmann's avatar
      tty: remove platform_sysrq_reset_seq · 0bcd7774
      Arnd Bergmann authored
      commit ffb6e0c9 upstream.
      
      The platform_sysrq_reset_seq code was intended as a way for an embedded
      platform to provide its own sysrq sequence at compile time. After over two
      years, nobody has started using it in an upstream kernel, and the platforms
      that were interested in it have moved on to devicetree, which can be used
      to configure the sequence without requiring kernel changes. The method is
      also incompatible with the way that most architectures build support for
      multiple platforms into a single kernel.
      
      Now the code is producing warnings when built with gcc-5.1:
      
      drivers/tty/sysrq.c: In function 'sysrq_init':
      drivers/tty/sysrq.c:959:33: warning: array subscript is above array bounds [-Warray-bounds]
         key = platform_sysrq_reset_seq[i];
      
      We could fix this, but it seems unlikely that it will ever be used, so
      let's just remove the code instead. We still have the option to pass the
      sequence either in DT, using the kernel command line, or using the
      /sys/module/sysrq/parameters/reset_seq file.
      
      Fixes: 154b7a48 ("Input: sysrq - allow specifying alternate reset sequence")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bcd7774
    • Colin Ian King's avatar
      RDMA/ocrdma: fix double free on pd · f354666d
      Colin Ian King authored
      commit 4dc54442 upstream.
      
      A reorganisation of the PD allocation and deallocation in commit
      9ba1377d ("RDMA/ocrdma: Move PD resource management to driver.")
      introduced a double free on pd, as detected by static analysis by
      smatch:
      
      drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:682 ocrdma_alloc_pd()
        error: double free of 'pd'^
      
      The original call to ocrdma_mbx_dealloc_pd() (which does not kfree
      pd) was replaced with a call to _ocrdma_dealloc_pd() (which does
      kfree pd).  The kfree following this call causes the double free,
      so just remove it to fix the problem.
      
      Fixes: 9ba1377d ("RDMA/ocrdma: Move PD resource management to driver.")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-By: default avatarDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f354666d
    • Geert Uytterhoeven's avatar
      PM / clk: Fix clock error check in __pm_clk_add() · 32419b85
      Geert Uytterhoeven authored
      commit 3fc3a0be upstream.
      
      In the final iteration of commit 245bd6f6 ("PM / clock_ops: Add
      pm_clk_add_clk()"), a refcount increment was added by Grygorii Strashko.
      However, the accompanying IS_ERR() check operates on the wrong clock
      pointer, which is always zero at this point, i.e. not an error.
      This may lead to a NULL pointer dereference later, when __clk_get()
      tries to dereference an error pointer.
      
      Check the passed clock pointer instead to fix this.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Fixes: 245bd6f6 ("PM / clock_ops: Add pm_clk_add_clk()")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32419b85
    • Ulf Hansson's avatar
      mmc: sdhci: Restore behavior while creating OCR mask · 55df3292
      Ulf Hansson authored
      commit 5fd26c7e upstream.
      
      Commit 3a48edc4 ("mmc: sdhci: Use mmc core regulator infrastucture")
      changed the behavior for how to assign the ocr_avail mask for the mmc
      host. More precisely it started to mask the bits instead of assigning
      them.
      
      Restore the behavior, but also make it clear that an OCR mask created
      from an external regulator overrides the other ones. The OCR mask is
      determined by one of the following with this priority:
      
      1. Supported ranges of external regulator if one supplies VDD
      2. Host OCR mask if set by the driver (based on DT properties)
      3. The capabilities reported by the controller itself
      
      Fixes: 3a48edc4 ("mmc: sdhci: Use mmc core regulator infrastucture")
      Cc: Tim Kryger <tim.kryger@gmail.com>
      Reported-by: default avatarYangbo Lu <yangbo.lu@freescale.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarTim Kryger <tim.kryger@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55df3292
    • Ding Wang's avatar
      mmc: card: Fixup request missing in mmc_blk_issue_rw_rq · f213f0f7
      Ding Wang authored
      commit 29535f7b upstream.
      
      The current handler of MMC_BLK_CMD_ERR in mmc_blk_issue_rw_rq function
      may cause new coming request permanent missing when the ongoing
      request (previoulsy started) complete end.
      
      The problem scenario is as follows:
      (1) Request A is ongoing;
      (2) Request B arrived, and finally mmc_blk_issue_rw_rq() is called;
      (3) Request A encounters the MMC_BLK_CMD_ERR error;
      (4) In the error handling of MMC_BLK_CMD_ERR, suppose mmc_blk_cmd_err()
          end request A completed and return zero. Continue the error handling,
          suppose mmc_blk_reset() reset device success;
      (5) Continue the execution, while loop completed because variable ret
          is zero now;
      (6) Finally, mmc_blk_issue_rw_rq() return without processing request B.
      
      The process related to the missing request may wait that IO request
      complete forever, possibly crashing the application or hanging the system.
      
      Fix this issue by starting new request when reset success.
      Signed-off-by: default avatarDing Wang <justin.wang@spreadtrum.com>
      Fixes: 67716327 ("mmc: block: add eMMC hardware reset support")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f213f0f7
    • Arnd Bergmann's avatar
      serial: samsung: only use earlycon for console · 06ab12e6
      Arnd Bergmann authored
      commit 357d5615 upstream.
      
      A configuration that enables earlycon but not the core console
      code causes a link error:
      
        drivers/built-in.o: In function `setup_earlycon':
        drivers/tty/serial/earlycon.c:70: undefined reference to `uart_parse_earlycon'
      
      That error can be triggered by the newly added samsung earlycon support,
      which is missing a 'select' statement.
      
      As suggested by Peter Hurley, solves the problem by moving the
      'select SERIAL_EARLYCON' statement to the samsung console driver
      option, as it is done by all other console drivers.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: b94ba032 ("serial: samsung: Add support for early console")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06ab12e6
    • Jiang Liu's avatar
      ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel · 1d7a398b
      Jiang Liu authored
      commit 1fb01ca9 upstream.
      
      Zoltan Boszormenyi reported this regression:
        "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
         1565:230e) network chip on the mainboard. After the r8169 driver loaded
         the IRQs in the machine went berserk. Keyboard keypressed arrived with
         considerable latency and duplicated, so no real work was possible.
         The machine responded to the power button but didn't actually power
         down. It just stuck at the powering down message. I had to press the
         power button for 4 seconds to power it down.
      
         The computer is a POS machine with a big battery inside. Because of this,
         either ACPI or the Realtek chip kept the bad state and after rebooting,
         the network chip didn't even show up in lspci. Not even the PXE ROM
         announced itself during boot. I had to disconnect the battery to beat
         some sense back to the computer.
      
         The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
         good."
      
      The regression is caused by commit 593669c2 (x86/PCI/ACPI: Use common
      ACPI resource interfaces to simplify implementation). Since commit
      593669c2, x86 PCI ACPI host bridge driver validates ACPI resources by
      first converting an ACPI resource to a 'struct resource' structure and
      then applying checks against the converted resource structure. The 'start'
      and 'end' fields in 'struct resource' are defined to be type of
      resource_size_t, which may be 32 bits or 64 bits depending on
      CONFIG_PHYS_ADDR_T_64BIT.
      
      This may cause incorrect resource validation results with 32-bit kernels
      because 64-bit ACPI resource descriptors may get truncated when converting
      to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
      affects PCI resource allocation subsystem and makes some PCI devices and
      the system behave abnormally due to incorrect resource assignment.
      
      So enhance the ACPI resource parsing interfaces to ignore ACPI resource
      descriptors with address/offset above 4G when running in 32-bit mode.
      
      With the fix applied, the behavior of the machine was restored to how
      3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
      and lspci -vvxxx shows that everything is at the same memory window as
      they were with 3.18.16.
      Reported-and-tested-by: default avatarBoszormenyi Zoltan <zboszor@pr.hu>
      Fixes: 593669c2 (x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation)
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d7a398b
    • Lv Zheng's avatar
      ACPICA: Tables: Enable default 64-bit FADT addresses favor · 24b2b68e
      Lv Zheng authored
      commit 0ea61381 upstream.
      
      ACPICA commit 4da56eeae0749dfe8491285c1e1fad48f6efafd8
      
      The following commit temporarily disables correct 64-bit FADT addresses
      favor during the period the root cause of the bug is not fixed:
       Commit: 85dbd580
       ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.
      
      With enough protections, this patch re-enables 64-bit FADT addresses by
      default. If regressions are reported against such change, this patch should
      be bisected and reverted.
      Note that 64-bit FACS favor and 64-bit firmware waking vector favor are
      excluded by this commit in order not to break OSPMs. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/4da56eeaReported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24b2b68e
    • Lv Zheng's avatar
      ACPICA: Tables: Fix an issue that FACS initialization is performed twice · c0f23125
      Lv Zheng authored
      commit c04be184 upstream.
      
      ACPICA commit 90f5332a15e9d9ba83831ca700b2b9f708274658
      
      This patch adds a new FACS initialization flag for acpi_tb_initialize().
      acpi_enable_subsystem() might be invoked several times in OS bootup process,
      and we don't want FACS initialization to be invoked twice. Lv Zheng.
      
      Link: https://github.com/acpica/acpica/commit/90f5332aSigned-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0f23125
    • Lv Zheng's avatar
      ACPICA: Tables: Enable both 32-bit and 64-bit FACS · b1bce17e
      Lv Zheng authored
      commit c04e1fb4 upstream.
      
      ACPICA commit f7b86f35416e3d1f71c3d816ff5075ddd33ed486
      
      The following commit is reported to have broken s2ram on some platforms:
       Commit: 0249ed24
       ACPICA: Add option to favor 32-bit FADT addresses.
      The platform reports 2 FACS tables (which is not allowed by ACPI
      specification) and the new 32-bit address favor rule forces OSPMs to use
      the FACS table reported via FADT's X_FIRMWARE_CTRL field.
      
      The root cause of the reported bug might be one of the followings:
      1. BIOS may favor the 64-bit firmware waking vector address when the
         version of the FACS is greater than 0 and Linux currently only supports
         resuming from the real mode, so the 64-bit firmware waking vector has
         never been set and might be invalid to BIOS while the commit enables
         higher version FACS.
      2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the
         FADT while the commit doesn't set the firmware waking vector address of
         the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking
         vector address of the FACS reported by "X_FIRMWARE_CTRL".
      
      This patch excludes the cases that can trigger the bugs caused by the root
      cause 2.
      
      There is no handshaking mechanism can be used by OSPM to tell BIOS which
      FACS is currently used. Thus the FACS reported by "FIRMWARE_CTRL" may still
      be used by BIOS and the 0 value of the 32-bit firmware waking vector might
      trigger such failure.
      
      This patch tries to favor 32bit FACS address in another way where both the
      FACS reported by "FIRMWARE_CTRL" and the FACS reported by "X_FIRMWARE_CTRL"
      are loaded so that further commit can set firmware waking vector in the
      both tables to ensure we can exclude the cases that trigger the bugs caused
      by the root cause 2. The exclusion is split into 2 commits as this commit
      is also useful for dumping more ACPI tables, it won't get reverted when
      such exclusion is no longer necessary. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/f7b86f35Reported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1bce17e
    • Rafael J. Wysocki's avatar
      ACPI / LPSS: Fix up acpi_lpss_create_device() · af3cc772
      Rafael J. Wysocki authored
      commit d3e13ff3 upstream.
      
      Fix a return value (which should be a negative error code) and a
      memory leak (the list allocated by acpi_dev_get_resources() needs
      to be freed on ioremap() errors too) in acpi_lpss_create_device()
      introduced by commit 4483d59e 'ACPI / LPSS: check the result
      of ioremap()'.
      
      Fixes: 4483d59e 'ACPI / LPSS: check the result of ioremap()'
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3cc772
    • Rafael J. Wysocki's avatar
      ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 3dfbf877
      Rafael J. Wysocki authored
      commit 0294112e upstream.
      
      This effectively reverts the following three commits:
      
       7bc10388 ACPI / resources: free memory on error in add_region_before()
       0f1b414d ACPI / PNP: Avoid conflicting resource reservations
       b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()
      
      (commit b9a5e5e1 introduced regressions some of which, but not
      all, were addressed by commit 0f1b414d and commit 7bc10388
      was a fixup on top of the latter) and causes ACPI fixed hardware
      resources to be reserved at the fs_initcall_sync stage of system
      initialization.
      
      The story is as follows.  First, a boot regression was reported due
      to an apparent resource reservation ordering change after a commit
      that shouldn't lead to such changes.  Investigation led to the
      conclusion that the problem happened because acpi_reserve_resources()
      was executed at the device_initcall() stage of system initialization
      which wasn't strictly ordered with respect to driver initialization
      (and with respect to the initialization of the pcieport driver in
      particular), so a random change causing the device initcalls to be
      run in a different order might break things.
      
      The response to that was to attempt to run acpi_reserve_resources()
      as soon as we knew that ACPI would be in use (commit b9a5e5e1).
      However, that turned out to be too early, because it caused resource
      reservations made by the PNP system driver to fail on at least one
      system and that failure was addressed by commit 0f1b414d.
      
      That fix still turned out to be insufficient, though, because
      calling acpi_reserve_resources() before the fs_initcall stage of
      system initialization caused a boot regression to happen on the
      eCAFE EC-800-H20G/S netbook.  That meant that we only could call
      acpi_reserve_resources() at the fs_initcall initialization stage
      or later, but then we might just as well call it after the PNP
      initalization in which case commit 0f1b414d wouldn't be
      necessary any more.
      
      For this reason, the changes made by commit 0f1b414d are reverted
      (along with a memory leak fixup on top of that commit), the changes
      made by commit b9a5e5e1 that went too far are reverted too and
      acpi_reserve_resources() is changed into fs_initcall_sync, which
      will cause it to be executed after the PNP subsystem initialization
      (which is an fs_initcall) and before device initcalls (including
      the pcieport driver initialization) which should avoid the initial
      issue.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
      Link: http://marc.info/?t=143092384600002&r=1&w=2
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
      Link: http://marc.info/?t=143389402600001&r=1&w=2
      Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dfbf877
    • Dan Carpenter's avatar
      ACPI / resources: free memory on error in add_region_before() · 2dfdaa26
      Dan Carpenter authored
      commit 7bc10388 upstream.
      
      There is a small memory leak on error.
      
      Fixes: 0f1b414d (ACPI / PNP: Avoid conflicting resource reservations)
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2dfdaa26
    • Ilya Dryomov's avatar
      crush: fix a bug in tree bucket decode · 94fc3084
      Ilya Dryomov authored
      commit 82cd003a upstream.
      
      struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
      should be used.  -Wconversion catches this, but I guess it went
      unnoticed in all the noise it spews.  The actual problem (at least for
      common crushmaps) isn't the u32 -> u8 truncation though - it's the
      advancement by 4 bytes instead of 1 in the crushmap buffer.
      
      Fixes: http://tracker.ceph.com/issues/2759Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94fc3084
    • Miklos Szeredi's avatar
      fuse: initialize fc->release before calling it · 650b07ba
      Miklos Szeredi authored
      commit 0ad0b325 upstream.
      
      fc->release is called from fuse_conn_put() which was used in the error
      cleanup before fc->release was initialized.
      
      [Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
      fuse_conn_init(fc) instead of before.]
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Fixes: a325f9b9 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      650b07ba
    • Stephen Smalley's avatar
      selinux: fix mprotect PROT_EXEC regression caused by mm change · 872d2790
      Stephen Smalley authored
      commit 892e8cac upstream.
      
      commit 66fc1303 ("mm: shmem_zero_setup
      skip security check and lockdep conflict with XFS") caused a regression
      for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
      shared anonymous mappings.  However, even before that regression, the
      checking on such mprotect PROT_EXEC calls was inconsistent with the
      checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
      mmap, the security hook is passed a NULL file and knows it is dealing
      with an anonymous mapping and therefore applies an execmem check and no
      file checks.  On a mprotect, the security hook is passed a vma with a
      non-NULL vm_file (as this was set from the internally-created shmem
      file during mmap) and therefore applies the file-based execute check
      and no execmem check.  Since the aforementioned commit now marks the
      shmem zero inode with the S_PRIVATE flag, the file checks are disabled
      and we have no checking at all on mprotect PROT_EXEC.  Add a test to
      the mprotect hook logic for such private inodes, and apply an execmem
      check in that case.  This makes the mmap and mprotect checking
      consistent for shared anonymous mappings, as well as for /dev/zero and
      ashmem.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      872d2790
    • Paul Moore's avatar
      selinux: don't waste ebitmap space when importing NetLabel categories · 9d680e03
      Paul Moore authored
      commit 33246035 upstream.
      
      At present we don't create efficient ebitmaps when importing NetLabel
      category bitmaps.  This can present a problem when comparing ebitmaps
      since ebitmap_cmp() is very strict about these things and considers
      these wasteful ebitmaps not equal when compared to their more
      efficient counterparts, even if their values are the same.  This isn't
      likely to cause problems on 64-bit systems due to a bit of luck on
      how NetLabel/CIPSO works and the default ebitmap size, but it can be
      a problem on 32-bit systems.
      
      This patch fixes this problem by being a bit more intelligent when
      importing NetLabel category bitmaps by skipping over empty sections
      which should result in a nice, efficient ebitmap.
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d680e03
    • Filipe Manana's avatar
      Btrfs: fix file corruption after cloning inline extents · df7c9ca8
      Filipe Manana authored
      commit ed958762 upstream.
      
      Using the clone ioctl (or extent_same ioctl, which calls the same extent
      cloning function as well) we end up allowing copy an inline extent from
      the source file into a non-zero offset of the destination file. This is
      something not expected and that the btrfs code is not prepared to deal
      with - all inline extents must be at a file offset equals to 0.
      
      For example, the following excerpt of a test case for fstests triggers
      a crash/BUG_ON() on a write operation after an inline extent is cloned
      into a non-zero offset:
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test files. File foo has the same 2K of data at offset 4K
        # as file bar has at its offset 0.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
            -c "pwrite -S 0xbb 4k 2K" \
            -c "pwrite -S 0xcc 8K 4K" \
            $SCRATCH_MNT/foo | _filter_xfs_io
      
        # File bar consists of a single inline extent (2K size).
        $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
           $SCRATCH_MNT/bar | _filter_xfs_io
      
        # Now call the clone ioctl to clone the extent of file bar into file
        # foo at its offset 4K. This made file foo have an inline extent at
        # offset 4K, something which the btrfs code can not deal with in future
        # IO operations because all inline extents are supposed to start at an
        # offset of 0, resulting in all sorts of chaos.
        # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
        # what it returns for other cases dealing with inlined extents.
        $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
            $SCRATCH_MNT/bar $SCRATCH_MNT/foo
      
        # Because of the inline extent at offset 4K, the following write made
        # the kernel crash with a BUG_ON().
        $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        status=0
        exit
      
      The stack trace of the BUG_ON() triggered by the last write is:
      
        [152154.035903] ------------[ cut here ]------------
        [152154.036424] kernel BUG at mm/page-writeback.c:2286!
        [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
        [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
        [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
        [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
        [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
        [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
        [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
        [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
        [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
        [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
        [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
        [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
        [152154.036424] Stack:
        [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
        [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
        [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
        [152154.036424] Call Trace:
        [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
        [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
        [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
        [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
        [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
        [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
        [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
        [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
        [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424]  RSP <ffff880429effc68>
        [152154.242621] ---[ end trace e3d3376b23a57041 ]---
      
      Fix this by returning the error EOPNOTSUPP if an attempt to copy an
      inline extent into a non-zero offset happens, just like what is done for
      other scenarios that would require copying/splitting inline extents,
      which were introduced by the following commits:
      
         00fdf13a ("Btrfs: fix a crash of clone with inline extents's split")
         3f9e3df8 ("btrfs: replace error code from btrfs_drop_extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df7c9ca8
    • Filipe Manana's avatar
      Btrfs: fix list transaction->pending_ordered corruption · 98f7bfe6
      Filipe Manana authored
      commit d3efe084 upstream.
      
      When we call btrfs_commit_transaction(), we splice the list "ordered"
      of our transaction handle into the transaction's "pending_ordered"
      list, but we don't re-initialize the "ordered" list of our transaction
      handle, this means it still points to the same elements it used to
      before the splice. Then we check if the current transaction's state is
      >= TRANS_STATE_COMMIT_START and if it is we end up calling
      btrfs_end_transaction() which simply splices again the "ordered" list
      of our handle into the transaction's "pending_ordered" list, leaving
      multiple pointers to the same ordered extents which results in list
      corruption when we are iterating, removing and freeing ordered extents
      at btrfs_wait_pending_ordered(), resulting in access to dangling
      pointers / use-after-free issues.
      Similarly, btrfs_end_transaction() can end up in some cases calling
      btrfs_commit_transaction(), and both did a list splice of the transaction
      handle's "ordered" list into the transaction's "pending_ordered" without
      re-initializing the handle's "ordered" list, resulting in exactly the
      same problem.
      
      This produces the following warning on a kernel with linked list
      debugging enabled:
      
      [109749.265416] ------------[ cut here ]------------
      [109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
      [109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
      (...)
      [109749.287505] Call Trace:
      [109749.288135]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
      [109749.298080]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
      [109749.331605]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
      [109749.334849]  [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
      [109749.337093]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
      [109749.337847]  [<ffffffff81260642>] __list_del_entry+0x5a/0x98
      [109749.338678]  [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
      [109749.340145]  [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
      [109749.348313]  [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
      [109749.349745]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
      [109749.350819]  [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
      [109749.351976]  [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
      [109749.360341]  [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
      [109749.368828]  [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
      [109749.369790]  [<ffffffff8118f045>] SyS_fsync+0x10/0x14
      [109749.370925]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
      [109749.382274] ---[ end trace 48e0d07f7c03d95a ]---
      
      On a non-debug kernel this leads to invalid memory accesses, causing a
      crash. Fix this by using list_splice_init() instead of list_splice() in
      btrfs_commit_transaction() and btrfs_end_transaction().
      
      Fixes: 50d9aa99 ("Btrfs: make sure logged extents complete in the current transaction V3"
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98f7bfe6
    • Filipe Manana's avatar
      Btrfs: fix memory leak in the extent_same ioctl · 992a3fbb
      Filipe Manana authored
      commit 497b4050 upstream.
      
      We were allocating memory with memdup_user() but we were never releasing
      that memory. This affected pretty much every call to the ioctl, whether
      it deduplicated extents or not.
      
      This issue was reported on IRC by Julian Taylor and on the mailing list
      by Marcel Ritter, credit goes to them for finding the issue.
      Reported-by: default avatarJulian Taylor <jtaylor.debian@googlemail.com>
      Reported-by: default avatarMarcel Ritter <ritter.marcel@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      992a3fbb
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after append write · 544f8fbe
      Filipe Manana authored
      commit e4545de5 upstream.
      
      If we do an append write to a file (which increases its inode's i_size)
      that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
      and the previous transaction added a new hard link to the file, which sets
      the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
      the file, the inode's new i_size isn't logged. This has the consequence
      that after the fsync log is replayed, the file size remains what it was
      before the append write operation, which means users/applications will
      not be able to read the data that was successsfully fsync'ed before.
      
      This happens because neither the inode item nor the delayed inode get
      their i_size updated when the append write is made - doing so would
      require starting a transaction in the buffered write path, something that
      we do not do intentionally for performance reasons.
      
      Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
      set the inode is logged with its current i_size (log the in-memory inode
      into the log tree).
      
      This issue is not a recent regression and is easy to reproduce with the
      following test case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        here=`pwd`
        tmp=/tmp/$$
        status=1	# failure is the default!
      
        _cleanup()
        {
                _cleanup_flakey
                rm -f $tmp.*
        }
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs generic
        _supported_os Linux
        _need_to_be_root
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        _crash_and_mount()
        {
                # Simulate a crash/power loss.
                _load_flakey_table $FLAKEY_DROP_WRITES
                _unmount_flakey
                # Allow writes again and mount. This makes the fs replay its fsync log.
                _load_flakey_table $FLAKEY_ALLOW_WRITES
                _mount_flakey
        }
      
        rm -f $seqres.full
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create the test file with some initial data and then fsync it.
        # The fsync here is only needed to trigger the issue in btrfs, as it causes the
        # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                        -c "fsync" \
                        $SCRATCH_MNT/foo | _filter_xfs_io
        sync
      
        # Add a hard link to our file.
        # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
        # which is a necessary condition to trigger the issue.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Sync the filesystem to force a commit of the current btrfs transaction, this
        # is a necessary condition to trigger the bug on btrfs.
        sync
      
        # Now append more data to our file, increasing its size, and fsync the file.
        # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
        # write path did not update the inode item in the btree nor the delayed inode
        # item (in memory struture) in the current transaction (created by the fsync
        # handler), the fsync did not record the inode's new i_size in the fsync
        # log/journal. This made the data unavailable after the fsync log/journal is
        # replayed.
        $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
                     -c "fsync" \
                     $SCRATCH_MNT/foo | _filter_xfs_io
      
        echo "File content after fsync and before crash:"
        od -t x1 $SCRATCH_MNT/foo
      
        _crash_and_mount
      
        echo "File content after crash and log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected file output before and after the crash/power failure expects the
      appended data to be available, which is:
      
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0200000
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544f8fbe
    • Filipe Manana's avatar
      Btrfs: fix race between caching kthread and returning inode to inode cache · 9547e86b
      Filipe Manana authored
      commit ae9d8f17 upstream.
      
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9547e86b
    • Filipe Manana's avatar
      Btrfs: use kmem_cache_free when freeing entry in inode cache · 6f953ad8
      Filipe Manana authored
      commit c3f4a168 upstream.
      
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f953ad8
    • Firo Yang's avatar
      md: fix a build warning · 528feaea
      Firo Yang authored
      commit 4e023612 upstream.
      
      Warning like this:
      
      drivers/md/md.c: In function "update_array_info":
      drivers/md/md.c:6394:26: warning: logical not is only applied
      to the left hand side of comparison [-Wlogical-not-parentheses]
            !mddev->persistent  != info->not_persistent||
      
      Fix it as Neil Brown said:
      mddev->persistent != !info->not_persistent ||
      Signed-off-by: default avatarFiro Yang <firogm@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      528feaea