1. 23 Mar, 2015 22 commits
    • Jan Beulich's avatar
      xen-pciback: limit guest control of command register · e321556c
      Jan Beulich authored
      commit af6fc858 upstream.
      
      Otherwise the guest can abuse that control to cause e.g. PCIe
      Unsupported Request responses by disabling memory and/or I/O decoding
      and subsequently causing (CPU side) accesses to the respective address
      ranges, which (depending on system configuration) may be fatal to the
      host.
      
      Note that to alter any of the bits collected together as
      PCI_COMMAND_GUEST permissive mode is now required to be enabled
      globally or on the specific device.
      
      This is CVE-2015-2150 / XSA-120.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e321556c
    • Christian König's avatar
      drm/radeon: drop setting UPLL to sleep mode · ff37ae6a
      Christian König authored
      commit a17d4996 upstream.
      
      Just keep it working, seems to fix some PLL problems.
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73378Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ff37ae6a
    • Peter Chen's avatar
      ARM: imx6sl-evk: set swbst_reg as vbus's parent reg · 410ac5d8
      Peter Chen authored
      commit 2de9dd03 upstream.
      
      USB vbus 5V is from PMIC SWBST, so set swbst_reg as vbus's
      parent reg, it fixed a bug that the voltage of vbus is incorrect
      due to swbst_reg is disabled after boots up.
      Signed-off-by: default avatarPeter Chen <peter.chen@freescale.com>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      410ac5d8
    • Peter Chen's avatar
      ARM: imx6qdl-sabresd: set swbst_reg as vbus's parent reg · 4a3082e1
      Peter Chen authored
      commit 40f73779 upstream.
      
      USB vbus 5V is from PMIC SWBST, so set swbst_reg as vbus's
      parent reg, it fixed a bug that the voltage of vbus is incorrect
      due to swbst_reg is disabled after boots up.
      Signed-off-by: default avatarPeter Chen <peter.chen@freescale.com>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4a3082e1
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled · 297c2311
      Steven Rostedt (Red Hat) authored
      commit 524a3868 upstream.
      
      Some archs (specifically PowerPC), are sensitive with the ordering of
      the enabling of the calls to function tracing and setting of the
      function to use to be traced.
      
      That is, update_ftrace_function() sets what function the ftrace_caller
      trampoline should call. Some archs require this to be set before
      calling ftrace_run_update_code().
      
      Another bug was discovered, that ftrace_startup_sysctl() called
      ftrace_run_update_code() directly. If the function the ftrace_caller
      trampoline changes, then it will not be updated. Instead a call
      to ftrace_startup_enable() should be called because it tests to see
      if the callback changed since the code was disabled, and will
      tell the arch to update appropriately. Most archs do not need this
      notification, but PowerPC does.
      
      The problem could be seen by the following commands:
      
       # echo 0 > /proc/sys/kernel/ftrace_enabled
       # echo function > /sys/kernel/debug/tracing/current_tracer
       # echo 1 > /proc/sys/kernel/ftrace_enabled
       # cat /sys/kernel/debug/tracing/trace
      
      The trace will show that function tracing was not active.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      297c2311
    • Pratyush Anand's avatar
      ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl · 729ea71c
      Pratyush Anand authored
      commit 1619dc3f upstream.
      
      When ftrace is enabled globally through the proc interface, we must check if
      ftrace_graph_active is set. If it is set, then we should also pass the
      FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
      ftrace is disabled globally through the proc interface, we must check if
      ftrace_graph_active is set. If it is set, then we should also pass the
      FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().
      
      Consider the following situation.
      
       # echo 0 > /proc/sys/kernel/ftrace_enabled
      
      After this ftrace_enabled = 0.
      
       # echo function_graph > /sys/kernel/debug/tracing/current_tracer
      
      Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
      called.
      
       # echo 1 > /proc/sys/kernel/ftrace_enabled
      
      Now ftrace_enabled will be set to true, but still
      ftrace_enable_ftrace_graph_caller() will not be called, which is not
      desired.
      
      Further if we execute the following after this:
        # echo nop > /sys/kernel/debug/tracing/current_tracer
      
      Now since ftrace_enabled is set it will call
      ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
      the ARM platform.
      
      On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
      it checks whether the old instruction is a nop or not. If it's not a nop,
      then it returns an error. If it is a nop then it replaces instruction at
      that address with a branch to ftrace_graph_caller.
      ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
      if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
      or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
      then it will return an error, which will cause the generic ftrace code to
      raise a warning.
      
      Note, x86 does not have an issue with this because the architecture
      specific code for ftrace_enable_ftrace_graph_caller() and
      ftrace_disable_ftrace_graph_caller() does not check the previous state,
      and calling either of these functions twice in a row has no ill effect.
      
      Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.comSigned-off-by: default avatarPratyush Anand <panand@redhat.com>
      [
        removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
        if CONFIG_FUNCTION_GRAPH_TRACER is not set.
      ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      729ea71c
    • Ahmed S. Darwish's avatar
      can: kvaser_usb: Read all messages in a bulk-in URB buffer · d5da6576
      Ahmed S. Darwish authored
      commit 2fec5104 upstream.
      
      The Kvaser firmware can only read and write messages that are
      not crossing the USB endpoint's wMaxPacketSize boundary. While
      receiving commands from the CAN device, if the next command in
      the same URB buffer crossed that max packet size boundary, the
      firmware puts a zero-length placeholder command in its place
      then moves the real command to the next boundary mark.
      
      The driver did not recognize such behavior, leading to missing
      a good number of rx events during a heavy rx load session.
      
      Moreover, a tx URB context only gets freed upon receiving its
      respective tx ACK event. Over time, the free tx URB contexts
      pool gets depleted due to the missing ACK events. Consequently,
      the netif transmission queue gets __permanently__ stopped; no
      frames could be sent again except after restarting the CAN
      newtwork interface.
      Signed-off-by: default avatarAhmed S. Darwish <ahmed.darwish@valeo.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      d5da6576
    • Ahmed S. Darwish's avatar
      can: kvaser_usb: Avoid double free on URB submission failures · 8c27234f
      Ahmed S. Darwish authored
      commit deb2701c upstream.
      
      Upon a URB submission failure, the driver calls usb_free_urb()
      but then manually frees the URB buffer by itself.  Meanwhile
      usb_free_urb() has alredy freed out that transfer buffer since
      we're the only code path holding a reference to this URB.
      
      Remove two of such invalid manual free().
      Signed-off-by: default avatarAhmed S. Darwish <ahmed.darwish@valeo.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      8c27234f
    • Oliver Hartkopp's avatar
      can: add missing initialisations in CAN related skbuffs · 96336fb9
      Oliver Hartkopp authored
      commit 96943901 upstream.
      
      When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
      this can lead to a skb_under_panic due to missing skb initialisations.
      
      Add the missing initialisations at the CAN skbuff creation times on driver
      level (rx path) and in the network layer (tx path).
      Reported-by: default avatarAustin Schuh <austin@peloton-tech.com>
      Reported-by: default avatarDaniel Steer <daniel.steer@mclaren.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      96336fb9
    • Takashi Iwai's avatar
      ALSA: hda - Fix regression of HD-audio controller fallback modes · 77198b34
      Takashi Iwai authored
      commit a1f3f1ca upstream.
      
      The commit [63e51fd7: ALSA: hda - Don't take unresponsive D3
      transition too serious] introduced a conditional fallback behavior to
      the HD-audio controller depending on the flag set.  However, it
      introduced a silly bug, too, that the flag was evaluated in a reverse
      way.  This resulted in a regression of HD-audio controller driver
      where it can't go to the fallback mode at communication errors.
      
      Unfortunately (or fortunately?) this didn't come up until recently
      because the affected code path is an error handling that happens only
      on an unstable hardware chip.  Most of recent chips work stably, thus
      they didn't hit this problem.  Now, we've got a regression report with
      a VIA chip, and this seems indeed requiring the fallback to the
      polling mode, and finally the bug was revealed.
      
      The fix is a oneliner to remove the wrong logical NOT in the check.
      (Lesson learned - be careful about double negation.)
      
      The bug should be backported to stable, but the patch won't be
      applicable to 3.13 or earlier because of the code splits.  The stable
      fix patches for earlier kernels will be posted later manually.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94021
      Fixes: 63e51fd7 ('ALSA: hda - Don't take unresponsive D3 transition too serious')
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      77198b34
    • Maxime Ripard's avatar
      irqchip: armada-370-xp: Fix chained per-cpu interrupts · 5a8b9eaa
      Maxime Ripard authored
      commit 5724be84 upstream.
      
      On the Cortex-A9-based Armada SoCs, the MPIC is not the primary interrupt
      controller. Yet, it still has to handle some per-cpu interrupt.
      
      To do so, it is chained with the GIC using a per-cpu interrupt. However, the
      current code only call irq_set_chained_handler, which is called and enable that
      interrupt only on the boot CPU, which means that the parent per-CPU interrupt
      is never unmasked on the secondary CPUs, preventing the per-CPU interrupt to
      actually work as expected.
      
      This was not seen until now since the only MPIC PPI users were the Marvell
      timers that were not working, but not used either since the system use the ARM
      TWD by default, and the ethernet controllers, that are faking there interrupts
      as SPI, and don't really expect to have interrupts on the secondary cores
      anyway.
      
      Add a CPU notifier that will enable the PPI on the secondary cores when they
      are brought up.
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Link: https://lkml.kernel.org/r/1425378443-28822-1-git-send-email-maxime.ripard@free-electrons.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      5a8b9eaa
    • James Bottomley's avatar
      libsas: Fix Kernel Crash in smp_execute_task · 742275d4
      James Bottomley authored
      commit 6302ce4d upstream.
      
      This crash was reported:
      
      [  366.947370] sd 3:0:1:0: [sdb] Spinning up disk....
      [  368.804046] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  368.804072] IP: [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
      [  368.804098] PGD 0
      [  368.804114] Oops: 0002 [#1] SMP
      [  368.804143] CPU 1
      [  368.804151] Modules linked in: sg netconsole s3g(PO) uinput joydev hid_multitouch usbhid hid snd_hda_codec_via cpufreq_userspace cpufreq_powersave cpufreq_stats uhci_hcd cpufreq_conservative snd_hda_intel snd_hda_codec snd_hwdep snd_pcm sdhci_pci snd_page_alloc sdhci snd_timer snd psmouse evdev serio_raw pcspkr soundcore xhci_hcd shpchp s3g_drm(O) mvsas mmc_core ahci libahci drm i2c_core acpi_cpufreq mperf video processor button thermal_sys dm_dmirror exfat_fs exfat_core dm_zcache dm_mod padlock_aes aes_generic padlock_sha iscsi_target_mod target_core_mod configfs sswipe libsas libata scsi_transport_sas picdev via_cputemp hwmon_vid fuse parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif usb_storage scsi_mod ehci_hcd usbcore usb_common
      [  368.804749]
      [  368.804764] Pid: 392, comm: kworker/u:3 Tainted: P        W  O 3.4.87-logicube-ng.22 #1 To be filled by O.E.M. To be filled by O.E.M./EPIA-M920
      [  368.804802] RIP: 0010:[<ffffffff81358457>]  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
      [  368.804827] RSP: 0018:ffff880117001cc0  EFLAGS: 00010246
      [  368.804842] RAX: 0000000000000000 RBX: ffff8801185030d0 RCX: ffff88008edcb420
      [  368.804857] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8801185030d4
      [  368.804873] RBP: ffff8801181531c0 R08: 0000000000000020 R09: 00000000fffffffe
      [  368.804885] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801185030d4
      [  368.804899] R13: 0000000000000002 R14: ffff880117001fd8 R15: ffff8801185030d8
      [  368.804916] FS:  0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
      [  368.804931] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  368.804946] CR2: 0000000000000000 CR3: 000000000160b000 CR4: 00000000000006e0
      [  368.804962] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  368.804978] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  368.804995] Process kworker/u:3 (pid: 392, threadinfo ffff880117000000, task ffff8801181531c0)
      [  368.805009] Stack:
      [  368.805017]  ffff8801185030d8 0000000000000000 ffffffff8161ddf0 ffffffff81056f7c
      [  368.805062]  000000000000b503 ffff8801185030d0 ffff880118503000 0000000000000000
      [  368.805100]  ffff8801185030d0 ffff8801188b8000 ffff88008edcb420 ffffffff813583ac
      [  368.805135] Call Trace:
      [  368.805153]  [<ffffffff81056f7c>] ? up+0xb/0x33
      [  368.805168]  [<ffffffff813583ac>] ? mutex_lock+0x16/0x25
      [  368.805194]  [<ffffffffa018c414>] ? smp_execute_task+0x4e/0x222 [libsas]
      [  368.805217]  [<ffffffffa018ce1c>] ? sas_find_bcast_dev+0x3c/0x15d [libsas]
      [  368.805240]  [<ffffffffa018ce4f>] ? sas_find_bcast_dev+0x6f/0x15d [libsas]
      [  368.805264]  [<ffffffffa018e989>] ? sas_ex_revalidate_domain+0x37/0x2ec [libsas]
      [  368.805280]  [<ffffffff81355a2a>] ? printk+0x43/0x48
      [  368.805296]  [<ffffffff81359a65>] ? _raw_spin_unlock_irqrestore+0xc/0xd
      [  368.805318]  [<ffffffffa018b767>] ? sas_revalidate_domain+0x85/0xb6 [libsas]
      [  368.805336]  [<ffffffff8104e5d9>] ? process_one_work+0x151/0x27c
      [  368.805351]  [<ffffffff8104f6cd>] ? worker_thread+0xbb/0x152
      [  368.805366]  [<ffffffff8104f612>] ? manage_workers.isra.29+0x163/0x163
      [  368.805382]  [<ffffffff81052c4e>] ? kthread+0x79/0x81
      [  368.805399]  [<ffffffff8135fea4>] ? kernel_thread_helper+0x4/0x10
      [  368.805416]  [<ffffffff81052bd5>] ? kthread_flush_work_fn+0x9/0x9
      [  368.805431]  [<ffffffff8135fea0>] ? gs_change+0x13/0x13
      [  368.805442] Code: 83 7d 30 63 7e 04 f3 90 eb ab 4c 8d 63 04 4c 8d 7b 08 4c 89 e7 e8 fa 15 00 00 48 8b 43 10 4c 89 3c 24 48 89 63 10 48 89 44 24 08 <48> 89 20 83 c8 ff 48 89 6c 24 10 87 03 ff c8 74 35 4d 89 ee 41
      [  368.805851] RIP  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
      [  368.805877]  RSP <ffff880117001cc0>
      [  368.805886] CR2: 0000000000000000
      [  368.805899] ---[ end trace b720682065d8f4cc ]---
      
      It's directly caused by 89d3cf6a [SCSI] libsas: add mutex for SMP task
      execution, but shows a deeper cause: expander functions expect to be able to
      cast to and treat domain devices as expanders.  The correct fix is to only do
      expander discover when we know we've got an expander device to avoid wrongly
      casting a non-expander device.
      Reported-by: default avatarPraveen Murali <pmurali@logicube.com>
      Tested-by: default avatarPraveen Murali <pmurali@logicube.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      742275d4
    • jmlatten@linux.vnet.ibm.com's avatar
      tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send · 690f544a
      jmlatten@linux.vnet.ibm.com authored
      commit 62dfd912 upstream.
      
      Problem: When IMA and VTPM are both enabled in kernel config,
      kernel hangs during bootup on LE OS.
      
      Why?: IMA calls tpm_pcr_read() which results in tpm_ibmvtpm_send
      and tpm_ibmtpm_recv getting called. A trace showed that
      tpm_ibmtpm_recv was hanging.
      
      Resolution: tpm_ibmtpm_recv was hanging because tpm_ibmvtpm_send
      was sending CRQ message that probably did not make much sense
      to phype because of Endianness. The fix below sends correctly
      converted CRQ for LE. This was not caught before because it
      seems IMA is not enabled by default in kernel config and
      IMA exercises this particular code path in vtpm.
      
      Tested with IMA and VTPM enabled in kernel config and VTPM
      enabled on both a BE OS and a LE OS ppc64 lpar. This exercised
      CRQ and TPM command code paths in vtpm.
      Patch is against Peter's tpmdd tree on github which included
      Vicky's previous vtpm le patches.
      Signed-off-by: default avatarJoy Latten <jmlatten@linux.vnet.ibm.com>
      Reviewed-by: default avatarAshley Lai <ashley@ahsleylai.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      690f544a
    • Alexander Sverdlin's avatar
      spi: pl022: Fix race in giveback() leading to driver lock-up · b973ad9a
      Alexander Sverdlin authored
      commit cd6fa8d2 upstream.
      
      Commit fd316941 ("spi/pl022: disable port when unused") introduced a race,
      which leads to possible driver lock up (easily reproducible on SMP).
      
      The problem happens in giveback() function where the completion of the transfer
      is signalled to SPI subsystem and then the HW SPI controller is disabled. Another
      transfer might be setup in between, which brings driver in locked-up state.
      
      Exact event sequence on SMP:
      
      core0                                   core1
      
                                              => pump_transfers()
                                              /* message->state == STATE_DONE */
                                                => giveback()
                                                  => spi_finalize_current_message()
      
      => pl022_unprepare_transfer_hardware()
      => pl022_transfer_one_message
        => flush()
        => do_interrupt_dma_transfer()
          => set_up_next_transfer()
          /* Enable SSP, turn on interrupts */
          writew((readw(SSP_CR1(pl022->virtbase)) |
                 SSP_CR1_MASK_SSE), SSP_CR1(pl022->virtbase));
      
      ...
      
      => pl022_interrupt_handler()
        => readwriter()
      
                                              /* disable the SPI/SSP operation */
                                              => writew((readw(SSP_CR1(pl022->virtbase)) &
                                                        (~SSP_CR1_MASK_SSE)), SSP_CR1(pl022->virtbase));
      
      Lockup! SPI controller is disabled and the data will never be received. Whole
      SPI subsystem is waiting for transfer ACK and blocked.
      
      So, only signal transfer completion after disabling the controller.
      
      Fixes: fd316941 (spi/pl022: disable port when unused)
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b973ad9a
    • Juergen Gross's avatar
      xen/events: avoid NULL pointer dereference in dom0 on large machines · a922ccc2
      Juergen Gross authored
      commit 85e40b05 upstream.
      
      Using the pvops kernel a NULL pointer dereference was detected on a
      large machine (144 processors) when booting as dom0 in
      evtchn_fifo_unmask() during assignment of a pirq.
      
      The event channel in question was the first to need a new entry in
      event_array[] in events_fifo.c. Unfortunately xen_irq_info_pirq_setup()
      is called with evtchn being 0 for a new pirq and the real event channel
      number is assigned to the pirq only during __startup_pirq().
      
      It is mandatory to call xen_evtchn_port_setup() after assigning the
      event channel number to the pirq to make sure all memory needed for the
      event channel is allocated.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a922ccc2
    • Brian King's avatar
      bnx2x: Force fundamental reset for EEH recovery · ef3112c8
      Brian King authored
      commit da293700 upstream.
      
      EEH recovery for bnx2x based adapters is not reliable on all Power
      systems using the default hot reset, which can result in an
      unrecoverable EEH error. Forcing the use of fundamental reset
      during EEH recovery fixes this.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ef3112c8
    • Tejun Heo's avatar
      workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 1264f1e3
      Tejun Heo authored
      commit 8603e1b3 upstream.
      
      cancel[_delayed]_work_sync() are implemented using
      __cancel_work_timer() which grabs the PENDING bit using
      try_to_grab_pending() and then flushes the work item with PENDING set
      to prevent the on-going execution of the work item from requeueing
      itself.
      
      try_to_grab_pending() can always grab PENDING bit without blocking
      except when someone else is doing the above flushing during
      cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
      this case, __cancel_work_timer() currently invokes flush_work().  The
      assumption is that the completion of the work item is what the other
      canceling task would be waiting for too and thus waiting for the same
      condition and retrying should allow forward progress without excessive
      busy looping
      
      Unfortunately, this doesn't work if preemption is disabled or the
      latter task has real time priority.  Let's say task A just got woken
      up from flush_work() by the completion of the target work item.  If,
      before task A starts executing, task B gets scheduled and invokes
      __cancel_work_timer() on the same work item, its try_to_grab_pending()
      will return -ENOENT as the work item is still being canceled by task A
      and flush_work() will also immediately return false as the work item
      is no longer executing.  This puts task B in a busy loop possibly
      preventing task A from executing and clearing the canceling state on
      the work item leading to a hang.
      
      task A			task B			worker
      
      						executing work
      __cancel_work_timer()
        try_to_grab_pending()
        set work CANCELING
        flush_work()
          block for work completion
      						completion, wakes up A
      			__cancel_work_timer()
      			while (forever) {
      			  try_to_grab_pending()
      			    -ENOENT as work is being canceled
      			  flush_work()
      			    false as work is no longer executing
      			}
      
      This patch removes the possible hang by updating __cancel_work_timer()
      to explicitly wait for clearing of CANCELING rather than invoking
      flush_work() after try_to_grab_pending() fails with -ENOENT.
      
      Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
      
      v3: bit_waitqueue() can't be used for work items defined in vmalloc
          area.  Switched to custom wake function which matches the target
          work item and exclusive wait and wakeup.
      
      v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
          the target bit waitqueue has wait_bit_queue's on it.  Use
          DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
          Vizoso.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
      Tested-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Tested-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      1264f1e3
    • Jason Low's avatar
      cpuset: Fix cpuset sched_relax_domain_level · 6efeedf2
      Jason Low authored
      commit 283cb41f upstream.
      
      The cpuset.sched_relax_domain_level can control how far we do
      immediate load balancing on a system. However, it was found on recent
      kernels that echo'ing a value into cpuset.sched_relax_domain_level
      did not reduce any immediate load balancing.
      
      The reason this occurred was because the update_domain_attr_tree() traversal
      did not update for the "top_cpuset". This resulted in nothing being changed
      when modifying the sched_relax_domain_level parameter.
      
      This patch is able to address that problem by having update_domain_attr_tree()
      allow updates for the root in the cpuset traversal.
      
      Fixes: fc560a26 ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
      Signed-off-by: default avatarJason Low <jason.low2@hp.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      6efeedf2
    • Maxime Ripard's avatar
      mtd: nand: pxa3xx: Fix PIO FIFO draining · 41ba1af9
      Maxime Ripard authored
      commit 8dad0386 upstream.
      
      The NDDB register holds the data that are needed by the read and write
      commands.
      
      However, during a read PIO access, the datasheet specifies that after each 32
      bytes read in that register, when BCH is enabled, we have to make sure that the
      RDDREQ bit is set in the NDSR register.
      
      This fixes an issue that was seen on the Armada 385, and presumably other mvebu
      SoCs, when a read on a newly erased page would end up in the driver reporting a
      timeout from the NAND.
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Reviewed-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Acked-by: default avatarEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      41ba1af9
    • Torsten Fleischer's avatar
      spi: atmel: Fix interrupt setup for PDC transfers · 293a6372
      Torsten Fleischer authored
      commit 76e1d14b upstream.
      
      Additionally to the current DMA transfer the PDC allows to set up a next DMA
      transfer. This is useful for larger SPI transfers.
      
      The driver currently waits for ENDRX as end of the transfer. But ENDRX is set
      when the current DMA transfer is done (RCR = 0), i.e. it doesn't include the
      next DMA transfer.
      Thus a subsequent SPI transfer could be started although there is currently a
      transfer in progress. This can cause invalid accesses to the SPI slave devices
      and to SPI transfer errors.
      
      This issue has been observed on a hardware with a M25P128 SPI NOR flash.
      
      So instead of ENDRX we should wait for RXBUFF. This flag is set if there is
      no more DMA transfer in progress (RCR = RNCR = 0).
      Signed-off-by: default avatarTorsten Fleischer <torfl6749@gmail.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      293a6372
    • Andy Shevchenko's avatar
      spi: dw: revisit FIFO size detection again · e44ea338
      Andy Shevchenko authored
      commit 9d239d35 upstream.
      
      The commit d297933c (spi: dw: Fix detecting FIFO depth) tries to fix the
      logic of the FIFO detection based on the description on the comments. However,
      there is a slight difference between numbers in TX Level and TX FIFO size.
      
      So, by specification the FIFO size would be in a range 2-256 bytes. From TX
      Level prospective it means we can set threshold in the range 0-(FIFO size - 1)
      bytes. Hence there are currently two issues:
        a) FIFO size 2 bytes is actually skipped since TX Level is 1 bit and could be
           either 0 or 1 byte;
        b) FIFO size is incorrectly decreased by 1 which already done by meaning of
           TX Level register.
      
      This patch fixes it eventually right.
      
      Fixes: d297933c (spi: dw: Fix detecting FIFO depth)
      Reviewed-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      e44ea338
    • Sasha Levin's avatar
      PCI: Don't read past the end of sysfs "driver_override" buffer · d81b1275
      Sasha Levin authored
      commit 4efe874a upstream.
      
      When printing the driver_override parameter when it is 4095 and 4094 bytes
      long, the printing code would access invalid memory because we need count+1
      bytes for printing.
      
      Fixes: 782a985d ("PCI: Introduce new device binding path using pci_dev.driver_override")
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: Alexander Graf <agraf@suse.de>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      d81b1275
  2. 20 Mar, 2015 4 commits
  3. 18 Mar, 2015 14 commits
    • Sergey Ryazanov's avatar
      ath5k: fix spontaneus AR5312 freezes · 032f0e4f
      Sergey Ryazanov authored
      commit 8bfae4f9 upstream.
      
      Sometimes while CPU have some load and ath5k doing the wireless
      interface reset the whole WiSoC completely freezes. Set of tests shows
      that using atomic delay function while we wait interface reset helps to
      avoid such freezes.
      
      The easiest way to reproduce this issue: create a station interface,
      start continous scan with wpa_supplicant and load CPU by something. Or
      just create multiple station interfaces and put them all in continous
      scan.
      
      This patch partially reverts the commit 1846ac3d ("ath5k: Use
      usleep_range where possible"), which replaces initial udelay()
      by usleep_range().
      
      I do not know actual source of this issue, but all looks like that HW
      freeze is caused by transaction on internal SoC bus, while wireless
      block is in reset state.
      
      Also I should note that I do not know how many chips are affected, but I
      did not see this issue with chips, other than AR5312.
      
      CC: Jiri Slaby <jirislaby@gmail.com>
      CC: Nick Kossifidis <mickflemm@gmail.com>
      CC: Luis R. Rodriguez <mcgrof@do-not-panic.com>
      Fixes: 1846ac3d ("ath5k: Use usleep_range where possible")
      Reported-by: default avatarChristophe Prevotaux <c.prevotaux@rural-networks.com>
      Tested-by: default avatarChristophe Prevotaux <c.prevotaux@rural-networks.com>
      Tested-by: default avatarEric Bree <ebree@nltinc.com>
      Signed-off-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      032f0e4f
    • Chuck Lever's avatar
      SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock · bd074fc2
      Chuck Lever authored
      commit 813b00d6 upstream.
      
      Other code that accesses rq_bc_pa_list holds xprt->bc_pa_lock.
      xprt_complete_bc_request() should do the same.
      
      Fixes: 2ea24497 ("SUNRPC: RPC callbacks may be split . . .")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      bd074fc2
    • David Ramos's avatar
      svcrpc: fix memory leak in gssp_accept_sec_context_upcall · ee0edc3e
      David Ramos authored
      commit a1d1e9be upstream.
      
      Our UC-KLEE tool found a kernel memory leak of 512 bytes (on x86_64) for
      each call to gssp_accept_sec_context_upcall()
      (net/sunrpc/auth_gss/gss_rpc_upcall.c). Since it appears that this call
      can be triggered by remote connections (at least, from a cursory a
      glance at the call chain), it may be exploitable to cause kernel memory
      exhaustion. We found the bug in kernel 3.16.3, but it appears to date
      back to commit 9dfd87da (2013-08-20).
      
      The gssp_accept_sec_context_upcall() function performs a pair of calls
      to gssp_alloc_receive_pages() and gssp_free_receive_pages().  The first
      allocates memory for arg->pages.  The second then frees the pages
      pointed to by the arg->pages array, but not the array itself.
      Reported-by: default avatarDavid A. Ramos <daramos@stanford.edu>
      Fixes: 9dfd87da ("rpc: fix huge kmalloc's in gss-proxy”)
      Signed-off-by: default avatarDavid A. Ramos <daramos@stanford.edu>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ee0edc3e
    • Al Viro's avatar
      sunrpc: fix braino in ->poll() · 4574bc36
      Al Viro authored
      commit 1711fd9a upstream.
      
      POLL_OUT isn't what callers of ->poll() are expecting to see; it's
      actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
      bit...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Bruce Fields <bfields@fieldses.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4574bc36
    • Johan Hovold's avatar
      TTY: fix tty_wait_until_sent on 64-bit machines · 5a4dbe15
      Johan Hovold authored
      commit 79fbf4a5 upstream.
      
      Fix overflow bug in tty_wait_until_sent on 64-bit machines, where an
      infinite timeout (0) would be passed to the underlying tty-driver's
      wait_until_sent-operation as a negative timeout (-1), causing it to
      return immediately.
      
      This manifests itself for example as tcdrain() returning immediately,
      drivers not honouring the drain flags when setting terminal attributes,
      or even dropped data on close as a requested infinite closing-wait
      timeout would be ignored.
      
      The first symptom  was reported by Asier LLANO who noted that tcdrain()
      returned prematurely when using the ftdi_sio usb-serial driver.
      
      Fix this by passing 0 rather than MAX_SCHEDULE_TIMEOUT (LONG_MAX) to the
      underlying tty driver.
      
      Note that the serial-core wait_until_sent-implementation is not affected
      by this bug due to a lucky chance (comparison to an unsigned maximum
      timeout), and neither is the cyclades one that had an explicit check for
      negative timeouts, but all other tty drivers appear to be affected.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarZIV-Asier Llano Palacios <asier.llano@cgglobal.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      5a4dbe15
    • Johan Hovold's avatar
      USB: serial: fix infinite wait_until_sent timeout · 8b47d032
      Johan Hovold authored
      commit f528bf4f upstream.
      
      Make sure to handle an infinite timeout (0).
      
      Note that wait_until_sent is currently never called with a 0-timeout
      argument due to a bug in tty_wait_until_sent.
      
      Fixes: dcf01050 ("USB: serial: add generic wait_until_sent
      implementation")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      8b47d032
    • Johan Hovold's avatar
      net: irda: fix wait_until_sent poll timeout · 232cccf2
      Johan Hovold authored
      commit 2c3fbe3c upstream.
      
      In case an infinite timeout (0) is requested, the irda wait_until_sent
      implementation would use a zero poll timeout rather than the default
      200ms.
      
      Note that wait_until_sent is currently never called with a 0-timeout
      argument due to a bug in tty_wait_until_sent.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      232cccf2
    • Peter Hurley's avatar
      console: Fix console name size mismatch · 11f41b27
      Peter Hurley authored
      commit 30a22c21 upstream.
      
      commit 6ae9200f ("enlarge console.name") increased the storage
      for the console name to 16 bytes, but not the corresponding
      struct console_cmdline::name storage. Console names longer than
      8 bytes cause read beyond end-of-string and failure to match
      console; I'm not sure if there are other unexpected consequences.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      11f41b27
    • Jiri Slaby's avatar
      tty: fix up atime/mtime mess, take four · c468bf26
      Jiri Slaby authored
      commit f0bf0bd0 upstream.
      
      This problem was taken care of three times already in
      * b0de59b5 (TTY: do not update
        atime/mtime on read/write),
      * 37b7f3c7 (TTY: fix atime/mtime
        regression), and
      * b0b88565 (tty: fix up atime/mtime
        mess, take three)
      
      But it still misses one point. As John Paul correctly points out, we
      do not care about setting date. If somebody ever changes wall
      time backwards (by mistake for example), tty timestamps are never
      updated until the original wall time passes.
      
      So check the absolute difference of times and if it large than "8
      seconds or so", always update the time. That means we will update
      immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
      check, but it was always that way.
      
      Thanks John for serving me this so nicely debugged.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarJohn Paul Perry <john_paul.perry@alcatel-lucent.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c468bf26
    • Russell King's avatar
      Change email address for 8250_pci · 5b428ea8
      Russell King authored
      commit f2e0ea86 upstream.
      
      I'm still receiving reports to my email address, so let's point this
      at the linux-serial mailing list instead.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      5b428ea8
    • Mathias Nyman's avatar
      xhci: Workaround for PME stuck issues in Intel xhci · 41f2c377
      Mathias Nyman authored
      commit b8cb91e0 upstream.
      
      The xhci in Intel Sunrisepoint and Cherryview platforms need a driver
      workaround for a Stuck PME that might either block PME events in suspend,
      or create spurious PME events preventing runtime suspend.
      
      Workaround is to clear a internal PME flag, BIT(28) in a vendor specific
      PMCTRL register at offset 0x80a4, in both suspend resume callbacks
      
      Without this, xhci connected usb devices might never be able to wake up the
      system from suspend, or prevent device from going to suspend (xhci d3)
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      41f2c377
    • Aleksander Morgado's avatar
      xhci: fix reporting of 0-sized URBs in control endpoint · 34a58e7e
      Aleksander Morgado authored
      commit 45ba2154 upstream.
      
      When a control transfer has a short data stage, the xHCI controller generates
      two transfer events: a COMP_SHORT_TX event that specifies the untransferred
      amount, and a COMP_SUCCESS event. But when the data stage is not short, only the
      COMP_SUCCESS event occurs. Therefore, xhci-hcd must set urb->actual_length to
      urb->transfer_buffer_length while processing the COMP_SUCCESS event, unless
      urb->actual_length was set already by a previous COMP_SHORT_TX event.
      
      The driver checks this by seeing whether urb->actual_length == 0, but this alone
      is the wrong test, as it is entirely possible for a short transfer to have an
      urb->actual_length = 0.
      
      This patch changes the xhci driver to rely on a new td->urb_length_set flag,
      which is set to true when a COMP_SHORT_TX event is received and the URB length
      updated at that stage.
      
      This fixes a bug which affected the HSO plugin, which relies on URBs with
      urb->actual_length == 0 to halt re-submitting the RX URB in the control
      endpoint.
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      34a58e7e
    • Quentin Casasnovas's avatar
      Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. · 028a0a83
      Quentin Casasnovas authored
      commit dd9ef135 upstream.
      
      Improper arithmetics when calculting the address of the extended ref could
      lead to an out of bounds memory read and kernel panic.
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      028a0a83
    • Filipe Manana's avatar
      Btrfs: fix data loss in the fast fsync path · f8da4b44
      Filipe Manana authored
      commit 3a8b36f3 upstream.
      
      When using the fast file fsync code path we can miss the fact that new
      writes happened since the last file fsync and therefore return without
      waiting for the IO to finish and write the new extents to the fsync log.
      
      Here's an example scenario where the fsync will miss the fact that new
      file data exists that wasn't yet durably persisted:
      
      1. fs_info->last_trans_committed == N - 1 and current transaction is
         transaction N (fs_info->generation == N);
      
      2. do a buffered write;
      
      3. fsync our inode, this clears our inode's full sync flag, starts
         an ordered extent and waits for it to complete - when it completes
         at btrfs_finish_ordered_io(), the inode's last_trans is set to the
         value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
         btrfs_set_inode_last_trans);
      
      4. transaction N is committed, so fs_info->last_trans_committed is now
         set to the value N and fs_info->generation remains with the value N;
      
      5. do another buffered write, when this happens btrfs_file_write_iter
         sets our inode's last_trans to the value N + 1 (that is
         fs_info->generation + 1 == N + 1);
      
      6. transaction N + 1 is started and fs_info->generation now has the
         value N + 1;
      
      7. transaction N + 1 is committed, so fs_info->last_trans_committed
         is set to the value N + 1;
      
      8. fsync our inode - because it doesn't have the full sync flag set,
         we only start the ordered extent, we don't wait for it to complete
         (only in a later phase) therefore its last_trans field has the
         value N + 1 set previously by btrfs_file_write_iter(), and so we
         have:
      
             inode->last_trans <= fs_info->last_trans_committed
                 (N + 1)              (N + 1)
      
         Which made us not log the last buffered write and exit the fsync
         handler immediately, returning success (0) to user space and resulting
         in data loss after a crash.
      
      This can actually be triggered deterministically and the following excerpt
      from a testcase I made for xfstests triggers the issue. It moves a dummy
      file across directories and then fsyncs the old parent directory - this
      is just to trigger a transaction commit, so moving files around isn't
      directly related to the issue but it was chosen because running 'sync' for
      example does more than just committing the current transaction, as it
      flushes/waits for all file data to be persisted. The issue can also happen
      at random periods, since the transaction kthread periodicaly commits the
      current transaction (about every 30 seconds by default).
      The body of the test is:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our main test file 'foo', the one we check for data loss.
        # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
        # bit from its flags (btrfs inode specific flags).
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                        -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Now create one other file and 2 directories. We will move this second file
        # from one directory to the other later because it forces btrfs to commit its
        # currently open transaction if we fsync the old parent directory. This is
        # necessary to trigger the data loss bug that affected btrfs.
        mkdir $SCRATCH_MNT/testdir_1
        touch $SCRATCH_MNT/testdir_1/bar
        mkdir $SCRATCH_MNT/testdir_2
      
        # Make sure everything is durably persisted.
        sync
      
        # Write more 8Kb of data to our file.
        $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Move our 'bar' file into a new directory.
        mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
      
        # Fsync our first directory. Because it had a file moved into some other
        # directory, this made btrfs commit the currently open transaction. This is
        # a condition necessary to trigger the data loss bug.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
      
        # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
        # data we wrote previously to be persisted and available if a crash happens.
        # This did not happen with btrfs, because of the transaction commit that
        # happened when we fsynced the parent directory.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # Now check that all data we wrote before are available.
        echo "File content after log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected golden output for the test, which is what we get with this
      fix applied (or when running against ext3/4 and xfs), is:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0040000
      
      Without this fix applied, the output shows the test file does not have
      the second 8Kb extent that we successfully fsynced:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000
      
      So fix this by skipping the fsync only if we're doing a full sync and
      if the inode's last_trans is <= fs_info->last_trans_committed, or if
      the inode is already in the log. Also remove setting the inode's
      last_trans in btrfs_file_write_iter since it's useless/unreliable.
      
      Also because btrfs_file_write_iter no longer sets inode->last_trans to
      fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
      bail out if last_trans is 0, otherwise something as simple as the following
      example wouldn't log the second write on the last fsync:
      
        1. write to file
      
        2. fsync file
      
        3. fsync file
             |--> btrfs_inode_in_log() returns true and it set last_trans to 0
      
        4. write to file
             |--> btrfs_file_write_iter() no longers sets last_trans, so it
                  remained with a value of 0
        5. fsync
             |--> inode->last_trans == 0, so it bails out without logging the
                  second write
      
      A test case for xfstests will be sent soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f8da4b44