1. 26 Oct, 2016 40 commits
    • Zefan Li's avatar
      Linux 3.4.113 · 8d1988f8
      Zefan Li authored
      8d1988f8
    • Arnaldo Carvalho de Melo's avatar
      net: Fix use after free in the recvmmsg exit path · 887cbce4
      Arnaldo Carvalho de Melo authored
      commit 34b88a68 upstream.
      
      The syzkaller fuzzer hit the following use-after-free:
      
        Call Trace:
         [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
         [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
         [<     inline     >] SYSC_recvmmsg net/socket.c:2281
         [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
         [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
        arch/x86/entry/entry_64.S:185
      
      And, as Dmitry rightly assessed, that is because we can drop the
      reference and then touch it when the underlying recvmsg calls return
      some packets and then hit an error, which will make recvmmsg to set
      sock->sk->sk_err, oops, fix it.
      Reported-and-Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Fixes: a2e27255 ("net: Introduce recvmmsg socket syscall")
      http://lkml.kernel.org/r/20160122211644.GC2470@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      887cbce4
    • Michal Hocko's avatar
      mm, gup: close FOLL MAP_PRIVATE race · 1c8544a9
      Michal Hocko authored
      commit 19be0eaf upstream.
      
      faultin_page drops FOLL_WRITE after the page fault handler did the CoW
      and then we retry follow_page_mask to get our CoWed page. This is racy,
      however because the page might have been unmapped by that time and so
      we would have to do a page fault again, this time without CoW. This
      would cause the page cache corruption for FOLL_FORCE on MAP_PRIVATE
      read only mappings with obvious consequences.
      
      This is an ancient bug that was actually already fixed once by Linus
      eleven years ago in commit 4ceb5db9 ("Fix get_user_pages() race
      for write access") but that was then undone due to problems on s390
      by commit f33ea7f4 ("fix get_user_pages bug") because s390 didn't
      have proper dirty pte tracking until abf09bed ("s390/mm: implement
      software dirty bits"). This wasn't a problem at the time as pointed out
      by Hugh Dickins because madvise relied on mmap_sem for write up until
      0a27a14a ("mm: madvise avoid exclusive mmap_sem") but since then we
      can race with madvise which can unmap the fresh COWed page or with KSM
      and corrupt the content of the shared page.
      
      This patch is based on the Linus' approach to not clear FOLL_WRITE after
      the CoW page fault (aka VM_FAULT_WRITE) but instead introduces FOLL_COW
      to note this fact. The flag is then rechecked during follow_pfn_pte to
      enforce the page fault again if we do not see the CoWed page. Linus was
      suggesting to check pte_dirty again as s390 is OK now. But that would
      make backporting to some old kernels harder. So instead let's just make
      sure that vm_normal_page sees a pure anonymous page.
      
      This would guarantee we are seeing a real CoW page. Introduce
      can_follow_write_pte which checks both pte_write and falls back to
      PageAnon on forced write faults which passed CoW already. Thanks to Hugh
      to point out that a special care has to be taken for KSM pages because
      our COWed page might have been merged with a KSM one and keep its
      PageAnon flag.
      
      Fixes: 0a27a14a ("mm: madvise avoid exclusive mmap_sem")
      Reported-by: default avatarPhil "not Paul" Oester <kernel@linuxace.com>
      Disclosed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      [bwh: Backported to 3.2:
       - Adjust filename, context, indentation
       - The 'no_page' exit path in follow_page() is different, so open-code the
         cleanup
       - Delete a now-unused label]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      1c8544a9
    • Nikolay Aleksandrov's avatar
      net/core: revert "net: fix __netdev_update_features return.." and add comment · 86eecef7
      Nikolay Aleksandrov authored
      commit 17b85d29 upstream.
      
      This reverts commit 00ee5927 ("net: fix __netdev_update_features return
      on ndo_set_features failure")
      and adds a comment explaining why it's okay to return a value other than
      0 upon error. Some drivers might actually change flags and return an
      error so it's better to fire a spurious notification rather than miss
      these.
      
      CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      86eecef7
    • Paul Bolle's avatar
      ser_gigaset: use container_of() instead of detour · 17fd6bbd
      Paul Bolle authored
      commit 8d2c3ab4 upstream.
      
      The purpose of gigaset_device_release() is to kfree() the struct
      ser_cardstate that contains our struct device. This is done via a bit of
      a detour. First we make our struct device's driver_data point to the
      container of our struct ser_cardstate (which is a struct cardstate). In
      gigaset_device_release() we then retrieve that driver_data again. And
      after that we finally kfree() the struct ser_cardstate that was saved in
      the struct cardstate.
      
      All of this can be achieved much easier by using container_of() to get
      from our struct device to its container, struct ser_cardstate. Do so.
      
      Note that at the time the detour was implemented commit b8b2c7d8
      ("base/platform: assert that dev_pm_domain callbacks are called
      unconditionally") had just entered the tree. That commit disconnected
      our platform_device and our platform_driver. These were reconnected
      again in v4.5-rc2 through commit 25cad69f ("base/platform: Fix
      platform drivers with no probe callback"). And one of the consequences
      of that fix was that it broke the detour via driver_data. That's because
      it made __device_release_driver() stop being a NOP for our struct device
      and actually do stuff again. One of the things it now does, is setting
      our driver_data to NULL. That, in turn, makes it impossible for
      gigaset_device_release() to get to our struct cardstate. Which has the
      net effect of leaking a struct ser_cardstate at every call of this
      driver's tty close() operation. So using container_of() has the
      additional benefit of actually working.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Acked-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      17fd6bbd
    • Tilman Schmidt's avatar
      ser_gigaset: remove unnecessary kfree() calls from release method · 34428839
      Tilman Schmidt authored
      commit 8aeb3c3d upstream.
      
      device->platform_data and platform_device->resource are never used
      and remain NULL through their entire life. Drops the kfree() calls
      for them from the device release method.
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      34428839
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Save the number of MSI-X entries to be copied later. · 658ba629
      Konrad Rzeszutek Wilk authored
      commit d159457b upstream.
      
      Commit 8135cf8b (xen/pciback: Save
      xen_pci_op commands before processing it) broke enabling MSI-X because
      it would never copy the resulting vectors into the response.  The
      number of vectors requested was being overwritten by the return value
      (typically zero for success).
      
      Save the number of vectors before processing the op, so the correct
      number of vectors are copied afterwards.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      658ba629
    • Tetsuo Handa's avatar
      mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress · 98c65564
      Tetsuo Handa authored
      commit 564e81a5 upstream.
      
      Jan Stancek has reported that system occasionally hanging after "oom01"
      testcase from LTP triggers OOM.  Guessing from a result that there is a
      kworker thread doing memory allocation and the values between "Node 0
      Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
      up-to-date for some reason.
      
      According to commit 373ccbe5 ("mm, vmstat: allow WQ concurrency to
      discover memory reclaim doesn't make any progress"), it meant to force
      the kworker thread to take a short sleep, but it by error used
      schedule_timeout(1).  We missed that schedule_timeout() in state
      TASK_RUNNING doesn't do anything.
      
      Fix it by using schedule_timeout_uninterruptible(1) which forces the
      kworker thread to take a short sleep in order to make sure that vmstat
      is up-to-date.
      
      Fixes: 373ccbe5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      98c65564
    • John Stultz's avatar
      time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 8c095d35
      John Stultz authored
      commit 833f32d7 upstream.
      
      Currently, leapsecond adjustments are done at tick time. As a result,
      the leapsecond was applied at the first timer tick *after* the
      leapsecond (~1-10ms late depending on HZ), rather then exactly on the
      second edge.
      
      This was in part historical from back when we were always tick based,
      but correcting this since has been avoided since it adds extra
      conditional checks in the gettime fastpath, which has performance
      overhead.
      
      However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
      timers set for right after the leapsecond could fire a second early,
      since some timers may be expired before we trigger the timekeeping
      timer, which then applies the leapsecond.
      
      This isn't quite as bad as it sounds, since behaviorally it is similar
      to what is possible w/ ntpd made leapsecond adjustments done w/o using
      the kernel discipline. Where due to latencies, timers may fire just
      prior to the settimeofday call. (Also, one should note that all
      applications using CLOCK_REALTIME timers should always be careful,
      since they are prone to quirks from settimeofday() disturbances.)
      
      However, the purpose of having the kernel do the leap adjustment is to
      avoid such latencies, so I think this is worth fixing.
      
      So in order to properly keep those timers from firing a second early,
      this patch modifies the ntp and timekeeping logic so that we keep
      enough state so that the update_base_offsets_now accessor, which
      provides the hrtimer core the current time, can check and apply the
      leapsecond adjustment on the second edge. This prevents the hrtimer
      core from expiring timers too early.
      
      This patch does not modify any other time read path, so no additional
      overhead is incurred. However, this also means that the leap-second
      continues to be applied at tick time for all other read-paths.
      
      Apologies to Richard Cochran, who pushed for similar changes years
      ago, which I resisted due to the concerns about the performance
      overhead.
      
      While I suspect this isn't extremely critical, folks who care about
      strict leap-second correctness will likely want to watch
      this. Potentially a -stable candidate eventually.
      Originally-suggested-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [Yadi: Move do_adjtimex to timekeeping.c and solve context issues]
      Signed-off-by: default avatarHu <yadi.hu@windriver.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8c095d35
    • Eric Dumazet's avatar
      tcp: make challenge acks less predictable · d91a2aa4
      Eric Dumazet authored
      commit 75ff39cc upstream.
      
      Yue Cao claims that current host rate limiting of challenge ACKS
      (RFC 5961) could leak enough information to allow a patient attacker
      to hijack TCP sessions. He will soon provide details in an academic
      paper.
      
      This patch increases the default limit from 100 to 1000, and adds
      some randomization so that the attacker can no longer hijack
      sessions without spending a considerable amount of probes.
      
      Based on initial analysis and patch from Linus.
      
      Note that we also have per socket rate limiting, so it is tempting
      to remove the host limit in the future.
      
      v2: randomize the count of challenge acks per second, not the period.
      
      Fixes: 282f23c6 ("tcp: implement RFC 5961 3.2")
      Reported-by: default avatarYue Cao <ycao009@ucr.edu>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [lizf: Backported to 3.4:
       - adjust context
       - use ACCESS_ONCE instead WRITE_ONCE/READ_ONCE
       - open-code prandom_u32_max()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d91a2aa4
    • Zefan Li's avatar
      Revert "USB: Add OTG PET device to TPL" · 00e9ff59
      Zefan Li authored
      This reverts commit 97fa724b.
      
      Conflicts:
      	drivers/usb/core/quirks.c
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      00e9ff59
    • Zefan Li's avatar
      Revert "USB: Add device quirk for ASUS T100 Base Station keyboard" · 7862b8a3
      Zefan Li authored
      This reverts commit eea5a87d.
      
      Conflicts:
      	drivers/usb/core/quirks.c
      	include/linux/usb/quirks.h
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7862b8a3
    • Zefan Li's avatar
      Fix incomplete backport of commit 0f792cf9 · d64519bf
      Zefan Li authored
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d64519bf
    • Zefan Li's avatar
      Fix incomplete backport of commit 423f04d6 · 6883832b
      Zefan Li authored
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6883832b
    • Nicolas Dichtel's avatar
      ipv6: fix handling of blackhole and prohibit routes · af706acb
      Nicolas Dichtel authored
      commit ef2c7d7b upstream.
      
      When adding a blackhole or a prohibit route, they were handling like classic
      routes. Moreover, it was only possible to add this kind of routes by specifying
      an interface.
      
      Bug already reported here:
        http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
      
      Before the patch:
        $ ip route add blackhole 2001::1/128
        RTNETLINK answers: No such device
        $ ip route add blackhole 2001::1/128 dev eth0
        $ ip -6 route | grep 2001
        2001::1 dev eth0  metric 1024
      
      After:
        $ ip route add blackhole 2001::1/128
        $ ip -6 route | grep 2001
        blackhole 2001::1 dev lo  metric 1024  error -22
      
      v2: wrong patch
      v3: add a field fc_type in struct fib6_config to store RTN_* type
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      af706acb
    • Michal Kubeček's avatar
      ipv6: don't call fib6_run_gc() until routing is ready · a4ea6252
      Michal Kubeček authored
      commit 2c861cc6 upstream.
      
      When loading the ipv6 module, ndisc_init() is called before
      ip6_route_init(). As the former registers a handler calling
      fib6_run_gc(), this opens a window to run the garbage collector
      before necessary data structures are initialized. If a network
      device is initialized in this window, adding MAC address to it
      triggers a NETDEV_CHANGEADDR event, leading to a crash in
      fib6_clean_all().
      
      Take the event handler registration out of ndisc_init() into a
      separate function ndisc_late_init() and move it after
      ip6_route_init().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a4ea6252
    • Michal Kubeček's avatar
      ipv6: update ip6_rt_last_gc every time GC is run · 3b02ae3d
      Michal Kubeček authored
      commit 49a18d86 upstream.
      
      As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
      hold the last time garbage collector was run so that we should
      update it whenever fib6_run_gc() calls fib6_clean_all(), not only
      if we got there from ip6_dst_gc().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3b02ae3d
    • Karl Heiss's avatar
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · cc639575
      Karl Heiss authored
      commit 635682a1 upstream.
      
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Net namespaces are not used
       - Keep using sctp_bh_{,un}lock_sock()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      cc639575
    • Dave Airlie's avatar
      drm/radeon: fix hotplug race at startup · d78e7762
      Dave Airlie authored
      commit 7f98ca45 upstream.
      
      We apparantly get a hotplug irq before we've initialised
      modesetting,
      
      [drm] Loading R100 Microcode
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91
      *pde = 00000000
      Oops: 0002 [#1]
      Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys
      CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf674028 #111
      Hardware name: MicroLink                               /D850MV                         , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003
      Workqueue: events radeon_hotplug_work_func [radeon]
      task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000
      EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0
      EIP is at __mutex_lock_slowpath+0x23/0x91
      EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe
      ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c
       DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0
      Stack:
       f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca
       f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0
       c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940
      Call Trace:
       [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7
       [<c125f162>] ? mutex_lock+0x9/0xa
       [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon]
       [<c1034741>] ? process_one_work+0xfc/0x194
       [<c1034b58>] ? worker_thread+0x18d/0x218
       [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5
       [<c103742a>] ? kthread+0x7b/0x80
       [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30
       [<c10373af>] ? init_completion+0x18/0x18
      Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63
      Reported-and-Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d78e7762
    • Eric Dumazet's avatar
      udp: properly support MSG_PEEK with truncated buffers · 088be966
      Eric Dumazet authored
      commit 197c949e upstream.
      
      Backport of this upstream commit into stable kernels :
      89c22d8c ("net: Fix skb csum races when peeking")
      exposed a bug in udp stack vs MSG_PEEK support, when user provides
      a buffer smaller than skb payload.
      
      In this case,
      skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                       msg->msg_iov);
      returns -EFAULT.
      
      This bug does not happen in upstream kernels since Al Viro did a great
      job to replace this into :
      skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
      This variant is safe vs short buffers.
      
      For the time being, instead reverting Herbert Xu patch and add back
      skb->ip_summed invalid changes, simply store the result of
      udp_lib_checksum_complete() so that we avoid computing the checksum a
      second time, and avoid the problematic
      skb_copy_and_csum_datagram_iovec() call.
      
      This patch can be applied on recent kernels as it avoids a double
      checksumming, then backported to stable kernels as a bug fix.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      088be966
    • Herbert Xu's avatar
      net: Fix skb csum races when peeking · 538b3c02
      Herbert Xu authored
      [ Upstream commit 89c22d8c ]
      
      When we calculate the checksum on the recv path, we store the
      result in the skb as an optimisation in case we need the checksum
      again down the line.
      
      This is in fact bogus for the MSG_PEEK case as this is done without
      any locking.  So multiple threads can peek and then store the result
      to the same skb, potentially resulting in bogus skb states.
      
      This patch fixes this by only storing the result if the skb is not
      shared.  This preserves the optimisations for the few cases where
      it can be done safely due to locking or other reasons, e.g., SIOCINQ.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      538b3c02
    • Ben Hutchings's avatar
      USB: ti_usb_3410_502: Fix ID table size · c6192e5d
      Ben Hutchings authored
      Commit 35a2fbc9 ("USB: serial: ti_usb_3410_5052: new device id for
      Abbot strip port cable") failed to update the size of the
      ti_id_table_3410 array.  This doesn't need to be fixed upstream
      following commit d7ece651 ("USB: ti_usb_3410_5052: remove
      vendor/product module parameters") but should be fixed in stable
      branches older than 3.12.
      
      Backports of commit c9d09dc7 ("USB: serial: ti_usb_3410_5052: add
      Abbott strip port ID to combined table as well.") similarly failed to
      update the size of the ti_id_table_combined array.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c6192e5d
    • Eric Dumazet's avatar
      af_unix: fix a fatal race with bit fields · 5781d89c
      Eric Dumazet authored
      commit 60bc851a upstream.
      
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: default avatarAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: hejianet <hejianet@gmail.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5781d89c
    • Francesco Ruggeri's avatar
      net: possible use after free in dst_release · edd32246
      Francesco Ruggeri authored
      commit 07a5d384 upstream.
      
      dst_release should not access dst->flags after decrementing
      __refcnt to 0. The dst_entry may be in dst_busy_list and
      dst_gc_task may dst_destroy it before dst_release gets a chance
      to access dst->flags.
      
      Fixes: d69bbf88 ("net: fix a race in dst_release()")
      Fixes: 27b75c95 ("net: avoid RCU for NOCACHE dst")
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      edd32246
    • Colin Ian King's avatar
      ftrace/scripts: Fix incorrect use of sprintf in recordmcount · 626c6e6e
      Colin Ian King authored
      commit 713a3e4d upstream.
      
      Fix build warning:
      
      scripts/recordmcount.c:589:4: warning: format not a string
      literal and no format arguments [-Wformat-security]
          sprintf("%s: failed\n", file);
      
      Fixes: a50bd439 ("ftrace/scripts: Have recordmcount copy the object file")
      Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.king@canonical.com
      
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      626c6e6e
    • Andrew Banman's avatar
      mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() · 42d0b53b
      Andrew Banman authored
      commit 5f0f2887 upstream.
      
      test_pages_in_a_zone() does not account for the possibility of missing
      sections in the given pfn range.  pfn_valid_within always returns 1 when
      CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
      sections to pass the test, leading to a kernel oops.
      
      Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
      for missing sections before proceeding into the zone-check code.
      
      This also prevents a crash from offlining memory devices with missing
      sections.  Despite this, it may be a good idea to keep the related patch
      '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
      missing sections' because missing sections in a memory block may lead to
      other problems not covered by the scope of this fix.
      Signed-off-by: default avatarAndrew Banman <abanman@sgi.com>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      42d0b53b
    • Joseph Qi's avatar
      ocfs2: fix BUG when calculate new backup super · c173fefc
      Joseph Qi authored
      commit 5c9ee4cb upstream.
      
      When resizing, it firstly extends the last gd.  Once it should backup
      super in the gd, it calculates new backup super and update the
      corresponding value.
      
      But it currently doesn't consider the situation that the backup super is
      already done.  And in this case, it still sets the bit in gd bitmap and
      then decrease from bg_free_bits_count, which leads to a corrupted gd and
      trigger the BUG in ocfs2_block_group_set_bits:
      
          BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
      
      So check whether the backup super is done and then do the updates.
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c173fefc
    • Andrey Ryabinin's avatar
      ipv6/addrlabel: fix ip6addrlbl_get() · 2c165804
      Andrey Ryabinin authored
      commit e459dfee upstream.
      
      ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
      ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
      ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
      
      Fix this by inverting ip6addrlbl_hold() check.
      
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2c165804
    • Helge Deller's avatar
      parisc: Fix syscall restarts · 5244151e
      Helge Deller authored
      commit 71a71fb5 upstream.
      
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5244151e
    • David Howells's avatar
      KEYS: Fix race between read and revoke · 87f4dcb8
      David Howells authored
      commit b4a1b4f5 upstream.
      
      This fixes CVE-2015-7550.
      
      There's a race between keyctl_read() and keyctl_revoke().  If the revoke
      happens between keyctl_read() checking the validity of a key and the key's
      semaphore being taken, then the key type read method will see a revoked key.
      
      This causes a problem for the user-defined key type because it assumes in
      its read method that there will always be a payload in a non-revoked key
      and doesn't check for a NULL pointer.
      
      Fix this by making keyctl_read() check the validity of a key after taking
      semaphore instead of before.
      
      I think the bug was introduced with the original keyrings code.
      
      This was discovered by a multithreaded test program generated by syzkaller
      (http://github.com/google/syzkaller).  Here's a cleaned up version:
      
      	#include <sys/types.h>
      	#include <keyutils.h>
      	#include <pthread.h>
      	void *thr0(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		keyctl_revoke(key);
      		return 0;
      	}
      	void *thr1(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		char buffer[16];
      		keyctl_read(key, buffer, 16);
      		return 0;
      	}
      	int main()
      	{
      		key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
      		pthread_t th[5];
      		pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
      		pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
      		pthread_join(th[0], 0);
      		pthread_join(th[1], 0);
      		pthread_join(th[2], 0);
      		pthread_join(th[3], 0);
      		return 0;
      	}
      
      Build as:
      
      	cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
      
      Run as:
      
      	while keyctl-race; do :; done
      
      as it may need several iterations to crash the kernel.  The crash can be
      summarised as:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      	IP: [<ffffffff81279b08>] user_read+0x56/0xa3
      	...
      	Call Trace:
      	 [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
      	 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0
      	 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      87f4dcb8
    • Alan Stern's avatar
      USB: fix invalid memory access in hub_activate() · 7a42c72c
      Alan Stern authored
      commit e50293ef upstream.
      
      Commit 8520f380 ("USB: change hub initialization sleeps to
      delayed_work") changed the hub_activate() routine to make part of it
      run in a workqueue.  However, the commit failed to take a reference to
      the usb_hub structure or to lock the hub interface while doing so.  As
      a result, if a hub is plugged in and quickly unplugged before the work
      routine can run, the routine will try to access memory that has been
      deallocated.  Or, if the hub is unplugged while the routine is
      running, the memory may be deallocated while it is in active use.
      
      This patch fixes the problem by taking a reference to the usb_hub at
      the start of hub_activate() and releasing it at the end (when the work
      is finished), and by locking the hub interface while the work routine
      is running.  It also adds a check at the start of the routine to see
      if the hub has already been disconnected, in which nothing should be
      done.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Tested-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: add forward declaration of hub_release()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7a42c72c
    • Dan Carpenter's avatar
      USB: ipaq.c: fix a timeout loop · 681e2852
      Dan Carpenter authored
      commit abdc9a3b upstream.
      
      The code expects the loop to end with "retries" set to zero but, because
      it is a post-op, it will end set to -1.  I have fixed this by moving the
      decrement inside the loop.
      
      Fixes: 014aa2a3 ('USB: ipaq: minor ipaq_open() cleanup.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      681e2852
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. · c053c997
      Konrad Rzeszutek Wilk authored
      commit 408fb0e5 upstream.
      
      commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way")
      teaches us that dealing with MSI-X can be troublesome.
      
      Further checks in the MSI-X architecture shows that if the
      PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
      may not be able to access the BAR (since they are memory regions).
      
      Since the MSI-X tables are located in there.. that can lead
      to us causing PCIe errors. Inhibit us performing any
      operation on the MSI-X unless the MEMORY bit is set.
      
      Note that Xen hypervisor with:
      "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
      will return:
      xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
      
      When the generic MSI code tries to setup the PIRQ without
      MEMORY bit set. Which means with later versions of Xen
      (4.6) this patch is not neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c053c997
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. · 7e8c20ac
      Konrad Rzeszutek Wilk authored
      commit 7cfb905b upstream.
      
      Otherwise just continue on, returning the same values as
      previously (return of 0, and op->result has the PIRQ value).
      
      This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
      
      The pci_disable_msi or pci_disable_msix have the checks for
      msi_enabled or msix_enabled so they will error out immediately.
      
      However the guest can still call these operations and cause
      us to disable the 'ack_intr'. That means the backend IRQ handler
      for the legacy interrupt will not respond to interrupts anymore.
      
      This will lead to (if the device is causing an interrupt storm)
      for the Linux generic code to disable the interrupt line.
      
      Naturally this will only happen if the device in question
      is plugged in on the motherboard on shared level interrupt GSI.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7e8c20ac
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Do not install an IRQ handler for MSI interrupts. · 073a5ac1
      Konrad Rzeszutek Wilk authored
      commit a396f3a2 upstream.
      
      Otherwise an guest can subvert the generic MSI code to trigger
      an BUG_ON condition during MSI interrupt freeing:
      
       for (i = 0; i < entry->nvec_used; i++)
              BUG_ON(irq_has_action(entry->irq + i));
      
      Xen PCI backed installs an IRQ handler (request_irq) for
      the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
      (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
      done in case the device has legacy interrupts the GSI line
      is shared by the backend devices.
      
      To subvert the backend the guest needs to make the backend
      to change the dev->irq from the GSI to the MSI interrupt line,
      make the backend allocate an interrupt handler, and then command
      the backend to free the MSI interrupt and hit the BUG_ON.
      
      Since the backend only calls 'request_irq' when the guest
      writes to the PCI_COMMAND register the guest needs to call
      XEN_PCI_OP_enable_msi before any other operation. This will
      cause the generic MSI code to setup an MSI entry and
      populate dev->irq with the new PIRQ value.
      
      Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
      and cause the backend to setup an IRQ handler for dev->irq
      (which instead of the GSI value has the MSI pirq). See
      'xen_pcibk_control_isr'.
      
      Then the guest disables the MSI: XEN_PCI_OP_disable_msi
      which ends up triggering the BUG_ON condition in 'free_msi_irqs'
      as there is an IRQ handler for the entry->irq (dev->irq).
      
      Note that this cannot be done using MSI-X as the generic
      code does not over-write dev->irq with the MSI-X PIRQ values.
      
      The patch inhibits setting up the IRQ handler if MSI or
      MSI-X (for symmetry reasons) code had been called successfully.
      
      P.S.
      Xen PCIBack when it sets up the device for the guest consumption
      ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
      XSA-120 addendum patch removed that - however when upstreaming said
      addendum we found that it caused issues with qemu upstream. That
      has now been fixed in qemu upstream.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      073a5ac1
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled · 8f805dec
      Konrad Rzeszutek Wilk authored
      commit 5e0ce145 upstream.
      
      The guest sequence of:
      
        a) XEN_PCI_OP_enable_msix
        b) XEN_PCI_OP_enable_msix
      
      results in hitting an NULL pointer due to using freed pointers.
      
      The device passed in the guest MUST have MSI-X capability.
      
      The a) constructs and SysFS representation of MSI and MSI groups.
      The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
      'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
      in a) pdev->msi_irq_groups is still set) and also free's ALL of the
      MSI-X entries of the device (the ones allocated in step a) and b)).
      
      The unwind code: 'free_msi_irqs' deletes all the entries and tries to
      delete the pdev->msi_irq_groups (which hasn't been set to NULL).
      However the pointers in the SysFS are already freed and we hit an
      NULL pointer further on when 'strlen' is attempted on a freed pointer.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
      against that. The check for msi_enabled is not stricly neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8f805dec
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled · d3ec8867
      Konrad Rzeszutek Wilk authored
      commit 56441f3c upstream.
      
      The guest sequence of:
      
       a) XEN_PCI_OP_enable_msi
       b) XEN_PCI_OP_enable_msi
       c) XEN_PCI_OP_disable_msi
      
      results in hitting an BUG_ON condition in the msi.c code.
      
      The MSI code uses an dev->msi_list to which it adds MSI entries.
      Under the above conditions an BUG_ON() can be hit. The device
      passed in the guest MUST have MSI capability.
      
      The a) adds the entry to the dev->msi_list and sets msi_enabled.
      The b) adds a second entry but adding in to SysFS fails (duplicate entry)
      and deletes all of the entries from msi_list and returns (with msi_enabled
      is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:
      
      BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
      
      and blows up.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
      against that. The check for msix_enabled is not stricly neccessary.
      
      This is part of XSA-157.
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d3ec8867
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Save xen_pci_op commands before processing it · 550fe257
      Konrad Rzeszutek Wilk authored
      commit 8135cf8b upstream.
      
      Double fetch vulnerabilities that happen when a variable is
      fetched twice from shared memory but a security check is only
      performed the first time.
      
      The xen_pcibk_do_op function performs a switch statements on the op->cmd
      value which is stored in shared memory. Interestingly this can result
      in a double fetch vulnerability depending on the performed compiler
      optimization.
      
      This patch fixes it by saving the xen_pci_op command before
      processing it. We also use 'barrier' to make sure that the
      compiler does not perform any optimization.
      
      This is part of XSA155.
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      550fe257
    • Roger Pau Monné's avatar
      xen-blkback: only read request operation from shared ring once · a7bc1af5
      Roger Pau Monné authored
      commit 1f13d75c upstream.
      
      A compiler may load a switch statement value multiple times, which could
      be bad when the value is in memory shared with the frontend.
      
      When converting a non-native request to a native one, ensure that
      src->operation is only loaded once by using READ_ONCE().
      
      This is part of XSA155.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [lizf: Backported to 3.4:
       - adjust context
       - call ACCESS_ONCE instead of READ_ONCE]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a7bc1af5
    • David Vrabel's avatar
      xen-netback: use RING_COPY_REQUEST() throughout · f97ed0a9
      David Vrabel authored
      commit 68a33bfd upstream.
      
      Instead of open-coding memcpy()s and directly accessing Tx and Rx
      requests, use the new RING_COPY_REQUEST() that ensures the local copy
      is correct.
      
      This is more than is strictly necessary for guest Rx requests since
      only the id and gref fields are used and it is harmless if the
      frontend modifies these.
      
      This is part of XSA155.
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [lizf: Backported to 3.4:
       - adjust context
       - s/queue/vif/g]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f97ed0a9