1. 18 Apr, 2014 2 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-3.15-rc2' of git://oss.sgi.com/xfs/xfs · 962bf3ea
      Linus Torvalds authored
      Pull xfs bug fixes from Dave Chinner:
       "The fixes are for data corruption issues, memory corruption and
        regressions for changes merged in -rc1.
      
        Data corruption fixes:
         - fix a bunch of delayed allocation state mismatches
         - fix collapse/zero range bugs
         - fix a direct IO block mapping bug @ EOF
      
        Other fixes:
         - fix a use after free on metadata IO error
         - fix a use after free on IO error during unmount
         - fix an incorrect error sign on direct IO write errors
         - add missing O_TMPFILE inode security context initialisation"
      
      * tag 'xfs-for-linus-3.15-rc2' of git://oss.sgi.com/xfs/xfs:
        xfs: fix tmpfile/selinux deadlock and initialize security
        xfs: fix buffer use after free on IO error
        xfs: wrong error sign conversion during failed DIO writes
        xfs: unmount does not wait for shutdown during unmount
        xfs: collapse range is delalloc challenged
        xfs: don't map ranges that span EOF for direct IO
        xfs: zeroing space needs to punch delalloc blocks
        xfs: xfs_vm_write_end truncates too much on failure
        xfs: write failure beyond EOF truncates too much data
        xfs: kill buffers over failed write ranges properly
      962bf3ea
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.15-rc1' of... · 7d77879b
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "This contains two fixes.
      
        The first is to remove a duplication of creating debugfs files that
        already exist and causes an error report to be printed due to the
        failure of the second creation.
      
        The second is a memory leak fix that was introduced in 3.14"
      
      * tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/uprobes: Fix uprobe_cpu_buffer memory leak
        tracing: Do not try to recreated toplevel set_ftrace_* files
      7d77879b
  2. 17 Apr, 2014 17 commits
  3. 16 Apr, 2014 18 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6ca2a88a
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Various fixes:
      
         - reboot regression fix
         - build message spam fix
         - GPU quirk fix
         - 'make kvmconfig' fix
      
        plus the wire-up of the renameat2() system call on i386"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Remove the PCI reboot method from the default chain
        x86/build: Supress "Nothing to be done for ..." messages
        x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
        x86/platform: Fix "make O=dir kvmconfig"
        i386: Wire up the renameat2() syscall
      6ca2a88a
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2a83dc7e
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, plus a simple hardware-enablement patch for the Intel
        RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
        still fine at this stage"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Instead of redirecting flex output, use -o
        perf tools: Fix double free in perf test 21 (code-reading.c)
        perf stat: Initialize statistics correctly
        perf bench: Set more defaults in the 'numa' suite
        perf bench: Fix segfault at the end of an 'all' execution
        perf bench: Update manpage to mention numa and futex
        perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
        perf probe: Fix to handle errors in line_range searching
        perf probe: Fix --line option behavior
        perf tools: Pick up libdw without explicit LIBDW_DIR
        MAINTAINERS: Change e-mail to kernel.org one
        perf callchains: Disable unwind libraries when libelf isn't found
        tools lib traceevent: Do not call warning() directly
        tools lib traceevent: Print event name when show warning if possible
        perf top: Fix documentation of invalid -s option
        perf/x86: Enable DRAM RAPL support on Intel Haswell
      2a83dc7e
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 17cf7db2
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "ARM VIC (Vectored Irq Controller) irqchip driver fix"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: vic: Properly chain the cascaded IRQs
      17cf7db2
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d99d5917
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "liblockdep fixes and mutex debugging fixes"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/mutex: Fix debug_mutexes
        tools/liblockdep: Add proper versioning to the shared obj
        tools/liblockdep: Ignore asmlinkage and visible
      d99d5917
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Do not try to recreated toplevel set_ftrace_* files · 5d6c97c5
      Steven Rostedt (Red Hat) authored
      With the restructing of the function tracer working with instances, the
      "top level" buffer is a bit special, as the function tracing is mapped
      to the same set of filters. This is done by using a "global_ops" descriptor
      and having the "set_ftrace_filter" and "set_ftrace_notrace" map to it.
      
      When an instance is created, it creates the same files but its for the
      local instance and not the global_ops.
      
      The issues is that the local instance creation shares some code with
      the global instance one and we end up trying to create th top level
      "set_ftrace_*" files twice, and on boot up, we get an error like this:
      
       Could not create debugfs 'set_ftrace_filter' entry
       Could not create debugfs 'set_ftrace_notrace' entry
      
      The reason they failed to be created was because they were created
      twice, and the second time gives this error as you can not create the
      same file twice.
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5d6c97c5
    • Linus Torvalds's avatar
      Merge tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · 498f9620
      Linus Torvalds authored
      Pull fbdev fixes from Tomi Valkeinen:
       - fix build errors for bf54x-lq043fb and imxfb
       - fbcon fix for da8xx-fb
       - omapdss fixes for hdmi audio, irq handling and fclk calculation
      
      * tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
        video: bf54x-lq043fb: fix build error
        OMAPDSS: Change struct reg_field to dispc_reg_field
        OMAPDSS: Take pixelclock unit change into account in hdmi_compute_acr()
        OMAPDSS: fix shared irq handlers
        video: imxfb: Select LCD_CLASS_DEVICE unconditionally
        OMAPDSS: fix rounding when calculating fclk rate
        video: da8xx-fb: Fix casting of info->pseudo_palette
      498f9620
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 5f63517c
      Linus Torvalds authored
      Pull pincontrol fixes from Linus Walleij:
       "A first set of pin control fixes for the v3.15 series:
      
         - Fix a couple of barnsjukdomar on the Rockchip driver.
      
         - Remove an idiotic debug print I happened to leave behind in the
           Nomadik driver.
      
         - Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.
      
         - Three patches renaming the Broadcom Capri driver to BCM28155.  This
           has been falling between the chairs for some time due to some
           cross-tree synchronization misunderstandings, now I'm fed up with
           this and just rename it in this -rc1 phase"
      
      * tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: fix typo in bindings documentation
        Update bcm_defconfig with new pinctrl CONFIG
        pinctrl: Rename Broadcom Capri pinctrl driver
        pinctrl: msm: Correct interrupt code for TLMM v2
        pinctrl: nomadik: delete stray debug print
        pinctrl: rockchip: handle first half of rk3188-bank0 correctly
        pinctrl: rockchip: add return value to rockchip_set_mux
        pinctrl: rockchip: fix offset of mux registers for rk3188
      5f63517c
    • Brian Foster's avatar
      xfs: fix tmpfile/selinux deadlock and initialize security · 330033d6
      Brian Foster authored
      xfstests generic/004 reproduces an ilock deadlock using the tmpfile
      interface when selinux is enabled. This occurs because
      xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
      latter eventually calls into xfs_xattr_get() which attempts to get the
      lock again. E.g.:
      
      xfs_io          D ffffffff81c134c0  4096  3561   3560 0x00000080
      ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
      00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
      ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
      Call Trace:
      [<ffffffff8177f969>] schedule+0x29/0x70
      [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120
      [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0
      [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs]
      [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs]
      [<ffffffff8124842f>] generic_getxattr+0x4f/0x70
      [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650
      [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20
      [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30
      [<ffffffff81237db0>] d_instantiate+0x50/0x70
      [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0
      [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs]
      [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs]
      [<ffffffff81230388>] path_openat+0x228/0x6a0
      [<ffffffff810230f9>] ? sched_clock+0x9/0x10
      [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8123101a>] do_filp_open+0x3a/0x90
      [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210
      [<ffffffff8121e4ce>] SyS_open+0x1e/0x20
      [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b
      
      xfs_vn_tmpfile() also fails to initialize security on the newly created
      inode.
      
      Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
      has been committed and the inode unlocked. Also, initialize security on
      the inode based on the parent directory provided via the tmpfile call.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      330033d6
    • Eric Sandeen's avatar
      xfs: fix buffer use after free on IO error · 8d6c1210
      Eric Sandeen authored
      When testing exhaustion of dm snapshots, the following appeared
      with CONFIG_DEBUG_OBJECTS_FREE enabled:
      
      ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]
      
      indicating that we'd freed a buffer which still had a pending reference,
      down this path:
      
      [  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
      [  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
      [  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
      [  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
      [  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]
      
      At issue is the fact that if IO fails in xfs_buf_iorequest,
      we'll queue completion unconditionally, and then call
      xfs_buf_rele; but if IO failed, there are no IOs remaining,
      and xfs_buf_rele will free the bp while work is still queued.
      
      Fix this by not scheduling completion if the buffer has
      an error on it; run it immediately.  The rest is only comment
      changes.
      
      Thanks to dchinner for spotting the root cause.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      8d6c1210
    • Dave Chinner's avatar
      xfs: wrong error sign conversion during failed DIO writes · 07d5035a
      Dave Chinner authored
      We negate the error value being returned from a generic function
      incorrectly. The code path that it is running in returned negative
      errors, so there is no need to negate it to get the correct error
      signs here.
      
      This was uncovered by generic/019.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      07d5035a
    • Dave Chinner's avatar
      xfs: unmount does not wait for shutdown during unmount · 9c23eccc
      Dave Chinner authored
      And interesting situation can occur if a log IO error occurs during
      the unmount of a filesystem. The cases reported have the same
      signature - the update of the superblock counters fails due to a log
      write IO error:
      
      XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
      XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
      XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
      XFS (dm-16): xfs_log_force: error 5 returned.
      XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
      
      It can be seen that the last line of output contains a corrupt
      device name - this is because the log and xfs_mount structures have
      already been freed by the time this message is printed. A kernel
      oops closely follows.
      
      The issue is that the shutdown is occurring in a separate IO
      completion thread to the unmount. Once the shutdown processing has
      started and all the iclogs are marked with XLOG_STATE_IOERROR, the
      log shutdown code wakes anyone waiting on a log force so they can
      process the shutdown error. This wakes up the unmount code that
      is doing a synchronous transaction to update the superblock
      counters.
      
      The unmount path now sees all the iclogs are marked with
      XLOG_STATE_IOERROR and so never waits on them again, knowing that if
      it does, there will not be a wakeup trigger for it and we will hang
      the unmount if we do. Hence the unmount runs through all the
      remaining code and frees all the filesystem structures while the
      xlog_iodone() is still processing the shutdown. When the log
      shutdown processing completes, xfs_do_force_shutdown() emits the
      "Please umount the filesystem and rectify the problem(s)" message,
      and xlog_iodone() then aborts all the objects attached to the iclog.
      An iclog that has already been freed....
      
      The real issue here is that there is no serialisation point between
      the log IO and the unmount. We have serialisations points for log
      writes, log forces, reservations, etc, but we don't actually have
      any code that wakes for log IO to fully complete. We do that for all
      other types of object, so why not iclogbufs?
      
      Well, it turns out that we can easily do this. We've got xfs_buf
      handles, and that's what everyone else uses for IO serialisation.
      i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
      release the lock in xlog_iodone() when we are finished with the
      buffer. That way before we tear down the iclog, we can lock and
      unlock the buffer to ensure IO completion has finished completely
      before we tear it down.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Tested-by: default avatarBob Mastors <bob.mastors@solidfire.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      9c23eccc
    • Dave Chinner's avatar
      xfs: collapse range is delalloc challenged · d39a2ced
      Dave Chinner authored
      FSX has been detecting data corruption after to collapse range
      calls. The key observation is that the offset of the last extent in
      the file was not being shifted, and hence when the file size was
      adjusted it was truncating away data because the extents handled
      been correctly shifted.
      
      Tracing indicated that before the collapse, the extent list looked
      like:
      
      ....
      ino 0x5788 state  idx 6 offset 26 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 39 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      and after the shift of 2 blocks:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      Note that the last extent did not change offset. After the changing
      of the file size:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 30 flag 0
      
      You can see that the last extent had it's length truncated,
      indicating that we've lost data.
      
      The reason for this is that the xfs_bmap_shift_extents() loop uses
      XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
      This, unfortunately, doesn't take into account delayed allocation
      extents - it's a count of physically allocated extents - and hence
      when the file being collapsed has a delalloc extent like this one
      does prior to the range being collapsed:
      
      ....
      ino 0x5788 state  idx 4 offset 11 block 4503599627239429 count 1 flag 0
      ....
      
      it gets the count wrong and terminates the shift loop early.
      
      Fix it by using the in-memory extent array size that includes
      delayed allocation extents to determine the number of extents on the
      inode.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      d39a2ced
    • Dave Chinner's avatar
      xfs: don't map ranges that span EOF for direct IO · 0e1f789d
      Dave Chinner authored
      Al Viro tracked down the problem that has caused generic/263 to fail
      on XFS since the test was introduced. If is caused by
      xfs_get_blocks() mapping a single extent that spans EOF without
      marking it as buffer-new() so that the direct IO code does not zero
      the tail of the block at the new EOF. This is a long standing bug
      that has been around for many, many years.
      
      Because xfs_get_blocks() starts the map before EOF, it can't set
      buffer_new(), because that causes he direct IO code to also zero
      unaligned sectors at the head of the IO. This would overwrite valid
      data with zeros, and hence we cannot validly return a single extent
      that spans EOF to direct IO.
      
      Fix this by detecting a mapping that spans EOF and truncate it down
      to EOF. This results in the the direct IO code doing the right thing
      for unaligned data blocks before EOF, and then returning to get
      another mapping for the region beyond EOF which XFS treats correctly
      by setting buffer_new() on it. This makes direct Io behave correctly
      w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.
      
      Again, thanks to Al Viro for finding what I couldn't.
      
      [ dchinner: Fix for __divdi3 build error:
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Tested-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      ]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0e1f789d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 0f689a33
      Linus Torvalds authored
      Pull s390 patches from Martin Schwidefsky:
       "An update to the oops output with additional information about the
        crash.  The renameat2 system call is enabled.  Two patches in regard
        to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp_vt220: Fix kernel panic due to early terminal input
        s390/compat: fix typo
        s390/uaccess: fix possible register corruption in strnlen_user_srst()
        s390: add 31 bit warning message
        s390: wire up sys_renameat2
        s390: show_registers() should not map user space addresses to kernel symbols
        s390/mm: print control registers and page table walk on crash
        s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
        s390: fix control register update
      0f689a33
    • Linus Torvalds's avatar
      Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 7d38cc02
      Linus Torvalds authored
      Pull itanium erratum fix from Tony Luck:
       "Small workaround for a rare, but annoying, erratum #237"
      
      * tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
      7d38cc02
    • Tony Luck's avatar
      [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237) · c0b5a64d
      Tony Luck authored
      April 2014 Itanium processor specification update:
      
      http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
      
      describes this erratum:
      
      =========================================================================
      237. Under a complex set of conditions, store to load forwarding for a
      sub 8-byte load may complete incorrectly
      
      Problem: A load instruction may complete incorrectly when a code sequence
      using 4-byte or smaller load and store operations to the same address
      is executed in combination with specific timing of all the following
      concurrent conditions: store to load forwarding, alignment checking
      enabled, a mis-predicted branch, and complex cache utilization activity.
      
      Implication: The affected sub 8-byte instruction may complete
      incorrectly resulting in unpredictable system behavior. There is an
      extremely low probability of exposure due to the significant number of
      complex microarchitectural concurrent conditions required to encounter
      the erratum.
      
      Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
      Hyper-Threading will significantly reduce exposure to the conditions
      that contribute to encountering the erratum.
      
      Status: See the Summary Table of Changes for the affected steppings.
      =========================================================================
      
      [Table of changes essentially lists all models from McKinley to Tukwila]
      
      The PSR.ac bit controls whether the processor will always generate
      an unaligned reference trap (0x5a00) for a misaligned data access
      (when PSR.ac=1) or if it will let the access succeed when running
      on a cpu that implements logic to handle some unaligned accesses.
      
      Way back in 2008 in commit b704882e
        [IA64] Rationalize kernel mode alignment checking
      we made the decision to always enable strict checking. We were
      already doing so in trap/interrupt context because the common
      preamble code set this bit - but the rest of supervisor code
      (and by inheritance user code) ran with PSR.ac=0.
      
      We now reverse that decision and set PSR.ac=0 everywhere in the
      kernel (also inherited by user processes). This will avoid the
      erratum using the method described in the Itanium specification
      update.  Net effect for users is that the processor will handle
      unaligned access when it can (typically with a tiny performance
      bubble in the pipeline ... but much less invasive than taking a
      trap and having the OS perform the access).
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      c0b5a64d
    • Ingo Molnar's avatar
      x86: Remove the PCI reboot method from the default chain · 5be44a6f
      Ingo Molnar authored
      Steve reported a reboot hang and bisected it back to this commit:
      
        a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list
      
      He heroically tested all reboot methods and found the following:
      
        reboot=t       # triple fault                  ok
        reboot=k       # keyboard ctrl                 FAIL
        reboot=b       # BIOS                          ok
        reboot=a       # ACPI                          FAIL
        reboot=e       # EFI                           FAIL   [system has no EFI]
        reboot=p       # PCI 0xcf9                     FAIL
      
      And I think it's pretty obvious that we should only try PCI 0xcf9 as a
      last resort - if at all.
      
      The other observation is that (on this box) we should never try
      the PCI reboot method, but close with either the 'triple fault'
      or the 'BIOS' (terminal!) reboot methods.
      
      Thirdly, CF9_COND is a total misnomer - it should be something like
      CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
      
      So this patch fixes the worst problems:
      
       - it orders the actual reboot logic to follow the reboot ordering
         pattern - it was in a pretty random order before for no good
         reason.
      
       - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
         BOOT_CF9_SAFE flags to make the code more obvious.
      
       - it tries the BIOS reboot method before the PCI reboot method.
         (Since 'BIOS' is a terminal reboot method resulting in a hang
          if it does not work, this is essentially equivalent to removing
          the PCI reboot method from the default reboot chain.)
      
       - just for the miraculous possibility of terminal (resulting
         in hang) reboot methods of triple fault or BIOS returning
         without having done their job, there's an ordering between
         them as well.
      Reported-and-bisected-and-tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5be44a6f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 10ec34fc
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix BPF filter validation of netlink attribute accesses, from
          Mathias Kruase.
      
       2) Netfilter conntrack generation seqcount not initialized properly,
          from Andrey Vagin.
      
       3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
          from Patrick McHardy.
      
       4) Properly limit MTU over ipv6, from Eric Dumazet.
      
       5) Fix seccomp system call argument population on 32-bit, from Daniel
          Borkmann.
      
       6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
          skb->mac_len needs to be used.  From Vlad Yasevich.
      
       7) We have several cases of using socket based communications to
          implement a tunnel.  For example, some tunnels are encapsulations
          over UDP so we use an internal kernel UDP socket to do the
          transmits.
      
          These tunnels should behave just like other software devices and
          pass the packets on down to the next layer.
      
          Most importantly we want the top-level socket (eg TCP) that created
          the traffic to be charged for the SKB memory.
      
          However, once you get into the IP output path, we have code that
          assumed that whatever was attached to skb->sk is an IP socket.
      
          To keep the top-level socket being charged for the SKB memory,
          whilst satisfying the needs of the IP output path, we now pass in an
          explicit 'sk' argument.
      
          From Eric Dumazet.
      
       8) ping_init_sock() leaks group info, from Xiaoming Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
        cxgb4: use the correct max size for firmware flash
        qlcnic: Fix MSI-X initialization code
        ip6_gre: don't allow to remove the fb_tunnel_dev
        ipv4: add a sock pointer to dst->output() path.
        ipv4: add a sock pointer to ip_queue_xmit()
        driver/net: cosa driver uses udelay incorrectly
        at86rf230: fix __at86rf230_read_subreg function
        at86rf230: remove check if AVDD settled
        net: cadence: Add architecture dependencies
        net: Start with correct mac_len in skb_network_protocol
        Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
        cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
        net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
        seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
        qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
        qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
        qlcnic: Fix PVID configuration on eSwitch port.
        qlcnic: Fix max ring count calculation
        qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
        qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
        ...
      10ec34fc
  4. 15 Apr, 2014 3 commits