1. 11 Mar, 2014 40 commits
    • Fabio Estevam's avatar
      ASoC: sglt5000: Fix the default value of CHIP_SSS_CTRL · 9e611474
      Fabio Estevam authored
      commit 016fcab8 upstream.
      
      According to the sgtl5000 reference manual, the default value of CHIP_SSS_CTRL
      is 0x10.
      Reported-by: default avatarOskar Schirmer <oskar@scara.com>
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      [bwh: Backported to 3.2: format of register defaults array is different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Weng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e611474
    • Sascha Hauer's avatar
      ASoC: imx-ssi: Fix occasional AC97 reset failure · d17395ac
      Sascha Hauer authored
      commit b6e51600 upstream.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarMarkus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Weng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d17395ac
    • Trond Myklebust's avatar
      SUNRPC: Prevent an rpc_task wakeup race · d4d811d5
      Trond Myklebust authored
      commit a3c3cac5 upstream.
      
      The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
      be careful about ordering the calls to rpc_test_and_set_running(task) and
      rpc_clear_queued(task). If we get the order wrong, then we may end up
      testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
      and changed the state of the rpc_task.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Weng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4d811d5
    • Jeff Layton's avatar
      sunrpc: clarify comments on rpc_make_runnable · e8d5ce17
      Jeff Layton authored
      commit 506026c3 upstream.
      
      rpc_make_runnable is not generally called with the queue lock held, unless
      it's waking up a task that has been sitting on a waitqueue. This is safe
      when the task has not entered the FSM yet, but the comments don't really
      spell this out.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Weng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8d5ce17
    • David Vrabel's avatar
      xen/events: mask events when changing their VCPU binding · b7d2a5e8
      David Vrabel authored
      commit 5e72fdb8 upstream.
      
      commit 4704fe4f upstream.
      
      When a event is being bound to a VCPU there is a window between the
      EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
      where an event may be lost.  The hypervisor upcalls the new VCPU but
      the kernel thinks that event is still bound to the old VCPU and
      ignores it.
      
      There is even a problem when the event is being bound to the same VCPU
      as there is a small window beween the clear_bit() and set_bit() calls
      in bind_evtchn_to_cpu().  When scanning for pending events, the kernel
      may read the bit when it is momentarily clear and ignore the event.
      
      Avoid this by masking the event during the whole bind operation.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      [bwh: Backported to 3.2: remove the BM() cast]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7d2a5e8
    • Konrad Rzeszutek Wilk's avatar
      xen/blkback: Check for insane amounts of request on the ring (v6). · 62047439
      Konrad Rzeszutek Wilk authored
      commit 9371cadb upstream.
      
      commit 8e3f8755 upstream.
      
      Check that the ring does not have an insane amount of requests
      (more than there could fit on the ring).
      
      If we detect this case we will stop processing the requests
      and wait until the XenBus disconnects the ring.
      
      The existing check RING_REQUEST_CONS_OVERFLOW which checks for how
      many responses we have created in the past (rsp_prod_pvt) vs
      requests consumed (req_cons) and whether said difference is greater or
      equal to the size of the ring, does not catch this case.
      
      Wha the condition does check if there is a need to process more
      as we still have a backlog of responses to finish. Note that both
      of those values (rsp_prod_pvt and req_cons) are not exposed on the
      shared ring.
      
      To understand this problem a mini crash course in ring protocol
      response/request updates is in place.
      
      There are four entries: req_prod and rsp_prod; req_event and rsp_event
      to track the ring entries. We are only concerned about the first two -
      which set the tone of this bug.
      
      The req_prod is a value incremented by frontend for each request put
      on the ring. Conversely the rsp_prod is a value incremented by the backend
      for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
      pushing the responses on the ring).  Both values can
      wrap and are modulo the size of the ring (in block case that is 32).
      Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.
      
      The culprit here is that if the difference between the
      req_prod and req_cons is greater than the ring size we have a problem.
      Fortunately for us, the '__do_block_io_op' loop:
      
      	rc = blk_rings->common.req_cons;
      	rp = blk_rings->common.sring->req_prod;
      
      	while (rc != rp) {
      
      		..
      		blk_rings->common.req_cons = ++rc; /* before make_response() */
      
      	}
      
      will loop up to the point when rc == rp. The macros inside of the
      loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
      of the ring size. If the frontend has provided a bogus req_prod value
      we will loop until the 'rc == rp' - which means we could be processing
      already processed requests (or responses) often.
      
      The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
      b/c it only tracks how many responses we have internally produced
      and whether we would should process more. The astute reader will
      notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
      arguments - more on this later.
      
      For example, if we were to enter this function with these values:
      
             	blk_rings->common.sring->req_prod =  X+31415 (X is the value from
      		the last time __do_block_io_op was called).
              blk_rings->common.req_cons = X
              blk_rings->common.rsp_prod_pvt = X
      
      The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
      is doing:
      
      	req_cons - rsp_prod_pvt >= 32
      
      Which is,
      	X - X >= 32 or 0 >= 32
      
      And that is false, so we continue on looping (this bug).
      
      If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
      instead (sring->req_prod) of rc, the this macro can do the check:
      
           req_prod - rsp_prov_pvt >= 32
      
      Which is,
             X + 31415 - X >= 32 , or 31415 >= 32
      
      which is true, so we can error out and break out of the function.
      
      Unfortunatly the difference between rsp_prov_pvt and req_prod can be
      at 32 (which would error out in the macro). This condition exists when
      the backend is lagging behind with the responses and still has not finished
      responding to all of them (so make_response has not been called), and
      the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able
      to use said macro.
      
      Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does
      a simple check of:
      
          req_prod - rsp_prod_pvt > RING_SIZE
      
      And with the X values from above:
      
         X + 31415 - X > 32
      
      Returns true. Also not that if the ring is full (which is where
      the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the
      same condition:
      
         X + 32 - X > 32
      
      Which is false.
      
      Lets use that macro.
      Note that in v5 of this patchset the macro was different - we used an
      earlier version.
      
      [v1: Move the check outside the loop]
      [v2: Add a pr_warn as suggested by David]
      [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
      [v4: Move wake_up after kthread_stop as suggested by Jan]
      [v5: Use RING_REQUEST_PROD_OVERFLOW instead]
      [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62047439
    • Jan Beulich's avatar
      xen/io/ring.h: new macro to detect whether there are too many requests on the ring · 23ced59b
      Jan Beulich authored
      commit 8d925690 upstream.
      
      Backends may need to protect themselves against an insane number of
      produced requests stored by a frontend, in case they iterate over
      requests until reaching the req_prod value. There can't be more
      requests on the ring than the difference between produced requests
      and produced (but possibly not yet published) responses.
      
      This is a more strict alternative to a patch previously posted by
      Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23ced59b
    • Wei Liu's avatar
      xen-netback: don't disconnect frontend when seeing oversize packet · f1369580
      Wei Liu authored
      commit 03393fd5 upstream.
      
      Some frontend drivers are sending packets > 64 KiB in length. This length
      overflows the length field in the first slot making the following slots have
      an invalid length.
      
      Turn this error back into a non-fatal error by dropping the packet. To avoid
      having the following slots having fatal errors, consume all slots in the
      packet.
      
      This does not reopen the security hole in XSA-39 as if the packet as an
      invalid number of slots it will still hit fatal error case.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1369580
    • Wei Liu's avatar
      xen-netback: coalesce slots in TX path and fix regressions · 9832f4a0
      Wei Liu authored
      commit 2810e5b9 upstream.
      
      This patch tries to coalesce tx requests when constructing grant copy
      structures. It enables netback to deal with situation when frontend's
      MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
      
      With the help of coalescing, this patch tries to address two regressions
      avoid reopening the security hole in XSA-39.
      
      Regression 1. The reduction of the number of supported ring entries (slots)
      per packet (from 18 to 17). This regression has been around for some time but
      remains unnoticed until XSA-39 security fix. This is fixed by coalescing
      slots.
      
      Regression 2. The XSA-39 security fix turning "too many frags" errors from
      just dropping the packet to a fatal error and disabling the VIF. This is fixed
      by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17)
      which rules out false positive (using 18 slots is legit) and dropping packets
      using 19 to `max_skb_slots` slots.
      
      To avoid reopening security hole in XSA-39, frontend sending packet using more
      than max_skb_slots is considered malicious.
      
      The behavior of netback for packet is thus:
      
          1-18            slots: valid
         19-max_skb_slots slots: drop and respond with an error
         max_skb_slots+   slots: fatal error
      
      max_skb_slots is configurable by admin, default value is 20.
      
      Also change variable name from "frags" to "slots" in netbk_count_requests.
      
      Please note that RX path still has dependency on MAX_SKB_FRAGS. This will be
      fixed with separate patch.
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9832f4a0
    • stephen hemminger's avatar
      xen-netback: fix sparse warning · 047140a3
      stephen hemminger authored
      commit 9eaee8be upstream.
      
      Fix warning about 0 used as NULL.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      047140a3
    • Konrad Rzeszutek Wilk's avatar
      xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline · 63f12e8d
      Konrad Rzeszutek Wilk authored
      commit 66ff0fe9 upstream.
      
      While we don't use the spinlock interrupt line (see for details
      commit f10cd522 -
      xen: disable PV spinlocks on HVM) - we should still do the proper
      init / deinit sequence. We did not do that correctly and for the
      CPU init for PVHVM guest we would allocate an interrupt line - but
      failed to deallocate the old interrupt line.
      
      This resulted in leakage of an irq_desc but more importantly this splat
      as we online an offlined CPU:
      
      genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
      Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
      Call Trace:
       [<ffffffff811156de>] __setup_irq+0x23e/0x4a0
       [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
       [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff810cad35>] ? update_max_interval+0x15/0x40
       [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
       [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
       [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
       [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff8109402b>] __cpu_notify+0x1b/0x30
       [<ffffffff8166834a>] _cpu_up+0xa0/0x14b
       [<ffffffff816684ce>] cpu_up+0xd9/0xec
       [<ffffffff8165f754>] store_online+0x94/0xd0
       [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
       [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
       [<ffffffff811a2864>] vfs_write+0xb4/0x130
       [<ffffffff811a302a>] sys_write+0x5a/0xa0
       [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
      cpu 1 spinlock event irq -16
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      
      And if one looks at the /proc/interrupts right after
      offlining (CPU1):
      
        70:          0          0  xen-percpu-ipi       spinlock0
        71:          0          0  xen-percpu-ipi       spinlock1
        77:          0          0  xen-percpu-ipi       spinlock2
      
      There is the oddity of the 'spinlock1' still being present.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63f12e8d
    • Konrad Rzeszutek Wilk's avatar
      xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. · 32ed904e
      Konrad Rzeszutek Wilk authored
      commit 888b65b4 upstream.
      
      In the PVHVM path when we do CPU online/offline path we would
      leak the timer%d IRQ line everytime we do a offline event. The
      online path (xen_hvm_setup_cpu_clockevents via
      x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
      line for the timer%d.
      
      But we would still use the old interrupt line leading to:
      
      kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
      invalid opcode: 0000 [#1] SMP
      RIP: 0010:[<ffffffff810b9e21>]  [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
      .. snip..
       <IRQ>
       [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
       [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
       [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
       [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
       [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
       [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
       [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
       <EOI>
       [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
       [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
      
      There is also the oddity (timer1) in the /proc/interrupts after
      offlining CPU1:
      
        64:       1121          0  xen-percpu-virq      timer0
        78:          0          0  xen-percpu-virq      timer1
        84:          0       2483  xen-percpu-virq      timer2
      
      This patch fixes it.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32ed904e
    • Konrad Rzeszutek Wilk's avatar
      xen/boot: Disable BIOS SMP MP table search. · 79b6a2d6
      Konrad Rzeszutek Wilk authored
      commit bd49940a upstream.
      
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79b6a2d6
    • Takashi Iwai's avatar
      saa7134: Fix unlocked snd_pcm_stop() call · d54ecc0f
      Takashi Iwai authored
      commit e6355ad7 upstream.
      
      snd_pcm_stop() must be called in the PCM substream lock context.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [wml: Backported to 3.4: Adjust filename]
      Signed-off-by: default avatarWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d54ecc0f
    • Theodore Ts'o's avatar
      ext4: return ENOMEM if sb_getblk() fails · c28414d3
      Theodore Ts'o authored
      commit 860d21e2 upstream.
      
      The only reason for sb_getblk() failing is if it can't allocate the
      buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
      make sure that the file system is marked as being inconsistent if
      sb_getblk() fails.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [xr: Backported to 3.4:
       - Drop change to inline.c
       - Call to ext4_ext_check() from ext4_ext_find_extent() is conditional]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c28414d3
    • Roland Dreier's avatar
      block: Don't access request after it might be freed · 0ef4c881
      Roland Dreier authored
      commit 893d290f upstream.
      
      After we've done __elv_add_request() and __blk_run_queue() in
      blk_execute_rq_nowait(), the request might finish and be freed
      immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
      isn't safe afterwards, because if it isn't, rq might be gone.
      Instead, check beforehand and stash the result in a temporary.
      
      This fixes crashes in blk_execute_rq_nowait() I get occasionally when
      running with lots of memory debugging options enabled -- I think this
      race is usually harmless because the window for rq to be reallocated
      is so small.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [xr: Backported to 3.4: adjust context]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ef4c881
    • Paul Clements's avatar
      nbd: correct disconnect behavior · 50e97121
      Paul Clements authored
      commit c378f70a upstream.
      
      Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
      ioctl) the return from NBD_DO_IT is undefined (it is usually one of
      several error codes).  This means that nbd-client does not know if a
      manual disconnect was performed or whether a network error occurred.
      Because of this, nbd-client's persist mode (which tries to reconnect after
      error, but not after manual disconnect) does not always work correctly.
      
      This change fixes this by causing NBD_DO_IT to always return 0 if a user
      requests a disconnect.  This means that nbd-client can correctly either
      persist the connection (if an error occurred) or disconnect (if the user
      requested it).
      Signed-off-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Acked-by: default avatarRob Landley <rob@landley.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [xr: Backported to 3.4: adjust context]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50e97121
    • Jeff Layton's avatar
      cifs: adjust sequence number downward after signing NT_CANCEL request · b0f9634d
      Jeff Layton authored
      commit 31efee60 upstream.
      
      When a call goes out, the signing code adjusts the sequence number
      upward by two to account for the request and the response. An NT_CANCEL
      however doesn't get a response of its own, it just hurries the server
      along to get it to respond to the original request more quickly.
      Therefore, we must adjust the sequence number back down by one after
      signing a NT_CANCEL request.
      Reported-by: default avatarTim Perry <tdparmor-sambabugs@yahoo.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0f9634d
    • Jan Kara's avatar
      ext4: fix possible use-after-free with AIO · b54e3acc
      Jan Kara authored
      commit 091e26df upstream.
      
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b54e3acc
    • Adam Thomas's avatar
      UBIFS: fix double free of ubifs_orphan objects · 8a4188e2
      Adam Thomas authored
      commit 8afd500c upstream.
      
      The last orphan in the dnext list has its dnext set to NULL. Because
      of that, ubifs_delete_orphan assumes that it is not on the dnext list
      and frees it immediately instead ignoring it as a second delete. The
      orphan is later freed again by erase_deleted.
      
      This change adds an explicit flag to ubifs_orphan indicating whether
      it is pending delete.
      Signed-off-by: default avatarAdam Thomas <adamthomas1111@gmail.com>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a4188e2
    • Theodore Ts'o's avatar
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · ebdc12a0
      Theodore Ts'o authored
      commit d76a3a77 upstream.
      
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: default avatarGeorge Barnett <gbarnett@atlassian.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebdc12a0
    • Dave Chiluk's avatar
      ncpfs: fix rmdir returns Device or resource busy · d7d659d6
      Dave Chiluk authored
      commit 698b8223 upstream.
      
      1d2ef590 caused a regression in ncpfs such that
      directories could no longer be removed.  This was because ncp_rmdir checked
      to see if a dentry could be unhashed before allowing it to be removed. Since
      1d2ef590 introduced a change that incremented
      dentry->d_count causing it to always be greater than 1 unhash would always
      fail.  Thus causing the error path in ncp_rmdir to always be taken.  Removing
      this error path is safe as unhashing is still accomplished by calls to dput
      from vfs_rmdir.
      Signed-off-by: default avatarDave Chiluk <chiluk@canonical.com>
      Signed-off-by: default avatarPetr Vandrovec <petr@vandrovec.name>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7d659d6
    • Jeff Layton's avatar
      cifs: don't instantiate new dentries in readdir for inodes that need to be revalidated immediately · 3d956c8a
      Jeff Layton authored
      commit 757c4f62 upstream.
      
      David reported that commit c2b93e06 (cifs: only set ops for inodes in
      I_NEW state) caused a regression with mfsymlinks. Prior to that patch,
      if a mfsymlink dentry was instantiated at readdir time, the inode would
      get a new set of ops when it was revalidated. After that patch, this
      did not occur.
      
      This patch addresses this by simply skipping instantiating dentries in
      the readdir codepath when we know that they will need to be immediately
      revalidated. The next attempt to use that dentry will cause a new lookup
      to occur (which is basically what we want to happen anyway).
      
      Cc: "Stefan (metze) Metzmacher" <metze@samba.org>
      Cc: Sachin Prabhu <sprabhu@redhat.com>
      Reported-and-Tested-by: default avatarDavid McBride <dwm37@cam.ac.uk>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.2: need to return NULL]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d956c8a
    • majianpeng's avatar
      libceph: unregister request in __map_request failed and nofail == false · b3f19e7f
      majianpeng authored
      commit 73d9f7ee upstream.
      
      For nofail == false request, if __map_request failed, the caller does
      cleanup work, like releasing the relative pages.  It doesn't make any sense
      to retry this request.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      [bwh: Backported to 3.2: adjust indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3f19e7f
    • Maxim Patlasov's avatar
      fuse: hotfix truncate_pagecache() issue · fe17c202
      Maxim Patlasov authored
      commit 06a7c3c2 upstream.
      
      The way how fuse calls truncate_pagecache() from fuse_change_attributes()
      is completely wrong. Because, w/o i_mutex held, we never sure whether
      'oldsize' and 'attr->size' are valid by the time of execution of
      truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
      released fc->lock in the middle of fuse_change_attributes(), we completely
      loose control of actions which may happen with given inode until we reach
      truncate_pagecache. The list of potentially dangerous actions includes
      mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.
      
      The typical outcome of doing truncate_pagecache() with outdated arguments
      is data corruption from user point of view. This is (in some sense)
      acceptable in cases when the issue is triggered by a change of the file on
      the server (i.e. externally wrt fuse operation), but it is absolutely
      intolerable in scenarios when a single fuse client modifies a file without
      any external intervention. A real life case I discovered by fsx-linux
      looked like this:
      
      1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
      FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
      2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
      to the server synchronously, then calls fuse_change_attributes(). The
      latter updates i_size, releases fc->lock, but before comparing oldsize vs
      attr->size..
      3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
      updating attributes and i_size, but now oldsize is equal to
      outarg.attr.size because i_size has just been updated (step 2). Hence,
      fuse_do_setattr() returns w/o calling truncate_pagecache().
      4. As soon as ftruncate(2) completes, the user extends file size by
      write(2) making a hole in the middle of file, then reads data from the hole
      either by read(2) or mmap-ed read. The user expects to get zero data from
      the hole, but gets stale data because truncate_pagecache() is not executed
      yet.
      
      The scenario above illustrates one side of the problem: not truncating the
      page cache even though we should. Another side corresponds to truncating
      page cache too late, when the state of inode changed significantly.
      Theoretically, the following is possible:
      
      1. As in the previous scenario fuse_dentry_revalidate() discovered that
      i_size changed (due to our own fuse_do_setattr()) and is going to call
      truncate_pagecache() for some 'new_size' it believes valid right now. But
      by the time that particular truncate_pagecache() is called ...
      2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      not -- it doesn't matter).
      3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      4. mmap-ed write makes a page in the extended region dirty.
      
      The result will be the lost of data user wrote on the fourth step.
      
      The patch is a hotfix resolving the issue in a simplistic way: let's skip
      dangerous i_size update and truncate_pagecache if an operation changing
      file size is in progress. This simplistic approach looks correct for the
      cases w/o external changes. And to handle them properly, more sophisticated
      and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
      postpone it until the issue is well discussed on the mailing list(s).
      
      Changed in v2:
       - improved patch description to cover both sides of the issue.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: add the fuse_inode::state field which we didn't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe17c202
    • Miklos Szeredi's avatar
      fuse: readdir: check for slash in names · f4a69e06
      Miklos Szeredi authored
      commit efeb9e60 upstream.
      
      Userspace can add names containing a slash character to the directory
      listing.  Don't allow this as it could cause all sorts of trouble.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: drop changes to parse_dirplusfile() which we
       don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4a69e06
    • Vyacheslav Dubeyko's avatar
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 831c8764
      Vyacheslav Dubeyko authored
      commit 7f42ec39 upstream.
      
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: default avatarJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: default avatarAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: nilfs_clear_dirty_page() has not been separated
       from nilfs_clear_dirty_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      831c8764
    • Jiri Olsa's avatar
      perf tools: Fix cache event name generation · 73a11828
      Jiri Olsa authored
      commit 275ef387 upstream.
      
      If the event name is specified with all 3 components, the last one
      overwrites the previous one during the name composing within the
      parse_events_add_cache function.
      
      Fixing this by properly adjusting the string index.
      Reported-by: default avatarJoel Uckelman <joel@lightboxtechnologies.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joel Uckelman <joel@lightboxtechnologies.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LPU-Reference: 20120905175133.GA18352@krava.brq.redhat.com
      [ committer note: Remove the newline fix, done already in 42e1fb77 ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73a11828
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove extraneous newline when parsing hardware cache events · f25c118b
      Arnaldo Carvalho de Melo authored
      commit 42e1fb77 upstream.
      
      Noticed while developing a 'perf test' entry to verify that
      perf_evsel__name works.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xz6zgh38mp3cjnd2udh38z8f@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25c118b
    • Jiang Liu's avatar
      mm/hotplug: correctly add new zone to all other nodes' zone lists · 446327d6
      Jiang Liu authored
      commit 08dff7b7 upstream.
      
      When online_pages() is called to add new memory to an empty zone, it
      rebuilds all zone lists by calling build_all_zonelists().  But there's a
      bug which prevents the new zone to be added to other nodes' zone lists.
      
      online_pages() {
      	build_all_zonelists()
      	.....
      	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
      }
      
      Here the node of the zone is put into N_HIGH_MEMORY state after calling
      build_all_zonelists(), but build_all_zonelists() only adds zones from
      nodes in N_HIGH_MEMORY state to the fallback zone lists.
      build_all_zonelists()
      
          ->__build_all_zonelists()
      	->build_zonelists()
      	    ->find_next_best_node()
      		->for_each_node_state(n, N_HIGH_MEMORY)
      
      So memory in the new zone will never be used by other nodes, and it may
      cause strange behavor when system is under memory pressure.  So put node
      into N_HIGH_MEMORY state before calling build_all_zonelists().
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      446327d6
    • Tejun Heo's avatar
      cgroup: fix RCU accesses to task->cgroups · 71a20685
      Tejun Heo authored
      commit 14611e51 upstream.
      
      task->cgroups is a RCU pointer pointing to struct css_set.  A task
      switches to a different css_set on cgroup migration but a css_set
      doesn't change once created and its pointers to cgroup_subsys_states
      aren't RCU protected.
      
      task_subsys_state[_check]() is the macro to acquire css given a task
      and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
      task->cgroups, so the RCU pointer task->cgroups ends up being
      dereferenced without read_barrier_depends() after it.  It's broken.
      
      Fix it by introducing task_css_set[_check]() which does
      RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
      reimplemented to directly dereference ->subsys[] of the css_set
      returned from task_css_set[_check]().
      
      This removes some of sparse RCU warnings in cgroup.
      
      v2: Fixed unbalanced parenthsis and there's no need to use
          rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - Remove CONFIG_PROVE_RCU condition
       - s/lockdep_is_held(&cgroup_mutex)/cgroup_lock_is_held()/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a20685
    • Kees Cook's avatar
      proc connector: reject unprivileged listener bumps · 2f590c47
      Kees Cook authored
      commit e70ab977 upstream.
      
      While PROC_CN_MCAST_LISTEN/IGNORE is entirely advisory, it was possible
      for an unprivileged user to turn off notifications for all listeners by
      sending PROC_CN_MCAST_IGNORE. Instead, require the same privileges as
      required for a multicast bind.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarEvgeniy Polyakov <zbr@ioremap.net>
      Acked-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f590c47
    • Greg Edwards's avatar
      KVM: IOMMU: hva align mapping page size · f7741e3b
      Greg Edwards authored
      commit 27ef63c7 upstream.
      
      When determining the page size we could use to map with the IOMMU, the
      page size should also be aligned with the hva, not just the gfn.  The
      gfn may not reflect the real alignment within the hugetlbfs file.
      
      Most of the time, this works fine.  However, if the hugetlbfs file is
      backed by non-contiguous huge pages, a multi-huge page memslot starts at
      an unaligned offset within the hugetlbfs file, and the gfn is aligned
      with respect to the huge page size, kvm_host_page_size() will return the
      huge page size and we will use that to map with the IOMMU.
      
      When we later unpin that same memslot, the IOMMU returns the unmap size
      as the huge page size, and we happily unpin that many pfns in
      monotonically increasing order, not realizing we are spanning
      non-contiguous huge pages and partially unpin the wrong huge page.
      
      Ensure the IOMMU mapping page size is aligned with the hva corresponding
      to the gfn, which does reflect the alignment within the hugetlbfs file.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Edwards <gedwards@ddn.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      [bwh: Backported to 3.2: s/__gfn_to_hva_memslot/gfn_to_hva_memslot/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7741e3b
    • Alexander Graf's avatar
      KVM: PPC: Emulate dcbf · 67bc20f7
      Alexander Graf authored
      commit d3286144 upstream.
      
      Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
      incoherent MMIO, just do nothing and move on.
      Reported-by: default avatarBen Collins <ben.c@servergy.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarBen Collins <ben.c@servergy.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67bc20f7
    • Christian Borntraeger's avatar
      s390/kvm: dont announce RRBM support · 115f5063
      Christian Borntraeger authored
      commit 87cac8f8 upstream.
      
      Newer kernels (linux-next with the transparent huge page patches)
      use rrbm if the feature is announced via feature bit 66.
      RRBM will cause intercepts, so KVM does not handle it right now,
      causing an illegal instruction in the guest.
      The  easy solution is to disable the feature bit for the guest.
      
      This fixes bugs like:
      Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
      illegal operation: 0001 [#1] SMP
      Modules linked in: virtio_balloon virtio_net ipv6 autofs4
      CPU: 0 Not tainted 3.5.4 #1
      Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
      Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
           00000000003cc000 0000000000000004 0000000000000000 0000000079800000
           0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
           0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
           000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
       Krnl Code:>0000000000124c2a: b9810025          ogr     %r2,%r5
                 0000000000124c2e: 41343000           la      %r3,0(%r4,%r3)
                 0000000000124c32: a716fffa           brct    %r1,124c26
                 0000000000124c36: b9010022           lngr    %r2,%r2
                 0000000000124c3a: e3d0f0800004       lg      %r13,128(%r15)
                 0000000000124c40: eb22003f000c       srlg    %r2,%r2,63
      [ 2150.713198] Call Trace:
      [ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c)
      [ 2150.713749]  [<0000000000233812>] page_referenced+0x32a/0x410
      [...]
      
      CC: Alex Graf <agraf@suse.de>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      115f5063
    • Dominik Dingel's avatar
      KVM: s390: move kvm_guest_enter,exit closer to sie · bf5597c6
      Dominik Dingel authored
      commit 2b29a9fd upstream.
      
      Any uaccess between guest_enter and guest_exit could trigger a page fault,
      the page fault handler would handle it as a guest fault and translate a
      user address as guest address.
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5597c6
    • Tejun Heo's avatar
      cgroup: cgroup_subsys->fork() should be called after the task is added to css_set · 30ec268b
      Tejun Heo authored
      commit 5edee61e upstream.
      
      cgroup core has a bug which violates a basic rule about event
      notifications - when a new entity needs to be added, you add that to
      the notification list first and then make the new entity conform to
      the current state.  If done in the reverse order, an event happening
      inbetween will be lost.
      
      cgroup_subsys->fork() is invoked way before the new task is added to
      the css_set.  Currently, cgroup_freezer is the only user of ->fork()
      and uses it to make new tasks conform to the current state of the
      freezer.  If FROZEN state is requested while fork is in progress
      between cgroup_fork_callbacks() and cgroup_post_fork(), the child
      could escape freezing - the cgroup isn't frozen when ->fork() is
      called and the freezer couldn't see the new task on the css_set.
      
      This patch moves cgroup_subsys->fork() invocation to
      cgroup_post_fork() after the new task is added to the css_set.
      cgroup_fork_callbacks() is removed.
      
      Because now a task may be migrated during cgroup_subsys->fork(),
      freezer_fork() is updated so that it adheres to the usual RCU locking
      and the rather pointless comment on why locking can be different there
      is removed (if it doesn't make anything simpler, why even bother?).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      [hq: Backported to 3.4:
       - Adjust context
       - Iterate over first CGROUP_BUILTIN_SUBSYS_COUNT elements of subsys]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30ec268b
    • Johannes Weiner's avatar
      mm: vmscan: fix endless loop in kswapd balancing · f47929fd
      Johannes Weiner authored
      commit 60cefed4 upstream.
      
      Kswapd does not in all places have the same criteria for a balanced
      zone.  Zones are only being reclaimed when their high watermark is
      breached, but compaction checks loop over the zonelist again when the
      zone does not meet the low watermark plus two times the size of the
      allocation.  This gets kswapd stuck in an endless loop over a small
      zone, like the DMA zone, where the high watermark is smaller than the
      compaction requirement.
      
      Add a function, zone_balanced(), that checks the watermark, and, for
      higher order allocations, if compaction has enough free memory.  Then
      use it uniformly to check for balanced zones.
      
      This makes sure that when the compaction watermark is not met, at least
      reclaim happens and progress is made - or the zone is declared
      unreclaimable at some point and skipped entirely.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarGeorge Spelvin <linux@horizon.com>
      Reported-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: default avatarTomas Racek <tracek@redhat.com>
      Tested-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      f47929fd
    • Hannes Reinecke's avatar
      dm mpath: fix stalls when handling invalid ioctls · 8371cffe
      Hannes Reinecke authored
      commit a1989b33 upstream.
      
      An invalid ioctl will never be valid, irrespective of whether multipath
      has active paths or not.  So for invalid ioctls we do not have to wait
      for multipath to activate any paths, but can rather return an error
      code immediately.  This fix resolves numerous instances of:
      
       udevd[]: worker [] unexpectedly returned with status 0x0100
      
      that have been seen during testing.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8371cffe
    • Linus Walleij's avatar
      dma: ste_dma40: don't dereference free:d descriptor · 8d8e4839
      Linus Walleij authored
      commit e9baa9d9 upstream.
      
      It appears that in the DMA40 driver the DMA tasklet will very
      often dereference memory for a descriptor just free:d from the
      DMA40 slab. Nothing happens because no other part of the driver
      has yet had a chance to claim this memory, but it's really
      nasty to dereference free:d memory, so let's check the flag
      before the descriptor is free and store it in a bool variable.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d8e4839