1. 06 Feb, 2014 11 commits
  2. 29 Jan, 2014 10 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.78 · a1322407
      Greg Kroah-Hartman authored
      a1322407
    • NeilBrown's avatar
      md/raid10: fix two bugs in handling of known-bad-blocks. · 7e34f43d
      NeilBrown authored
      commit b50c259e upstream.
      
      If we discover a bad block when reading we split the request and
      potentially read some of it from a different device.
      
      The code path of this has two bugs in RAID10.
      1/ we get a spin_lock with _irq, but unlock without _irq!!
      2/ The calculation of 'sectors_handled' is wrong, as can be clearly
         seen by comparison with raid1.c
      
      This leads to at least 2 warnings and a probable crash is a RAID10
      ever had known bad blocks.
      
      Fixes: 856e08e2Reported-by: default avatarDamian Nowak <spam@nowaker.net>
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e34f43d
    • NeilBrown's avatar
      md/raid10: fix bug when raid10 recovery fails to recover a block. · 511375d1
      NeilBrown authored
      commit e8b84915 upstream.
      
      commit e875ecea
          md/raid10 record bad blocks as needed during recovery.
      
      added code to the "cannot recover this block" path to record a bad
      block rather than fail the whole recovery.
      Unfortunately this new case was placed *after* r10bio was freed rather
      than *before*, yet it still uses r10bio.
      This is will crash with a null dereference.
      
      So move the freeing of r10bio down where it is safe.
      
      Fixes: e875eceaReported-by: default avatarDamian Nowak <spam@nowaker.net>
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      511375d1
    • Andreas Rohner's avatar
      nilfs2: fix segctor bug that causes file system corruption · ddcb318f
      Andreas Rohner authored
      commit 70f2fe3a upstream.
      
      There is a bug in the function nilfs_segctor_collect, which results in
      active data being written to a segment, that is marked as clean.  It is
      possible, that this segment is selected for a later segment
      construction, whereby the old data is overwritten.
      
      The problem shows itself with the following kernel log message:
      
        nilfs_sufile_do_cancel_free: segment 6533 must be clean
      
      Usually a few hours later the file system gets corrupted:
      
        NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
        NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)
      
      The issue can be reproduced with a file system that is nearly full and
      with the cleaner running, while some IO intensive task is running.
      Although it is quite hard to reproduce.
      
      This is what happens:
      
       1. The cleaner starts the segment construction
       2. nilfs_segctor_collect is called
       3. sc_stage is on NILFS_ST_SUFILE and segments are freed
       4. sc_stage is on NILFS_ST_DAT current segment is full
       5. nilfs_segctor_extend_segments is called, which
          allocates a new segment
       6. The new segment is one of the segments freed in step 3
       7. nilfs_sufile_cancel_freev is called and produces an error message
       8. Loop around and the collection starts again
       9. sc_stage is on NILFS_ST_SUFILE and segments are freed
          including the newly allocated segment, which will contain active
          data and can be allocated at a later time
      10. A few hours later another segment construction allocates the
          segment and causes file system corruption
      
      This can be prevented by simply reordering the statements.  If
      nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
      the freed segments are marked as dirty and cannot be allocated any more.
      Signed-off-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Reviewed-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddcb318f
    • Steven Rostedt's avatar
      SELinux: Fix possible NULL pointer dereference in selinux_inode_permission() · 9e74d93d
      Steven Rostedt authored
      commit 3dc91d43 upstream.
      
      While running stress tests on adding and deleting ftrace instances I hit
      this bug:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
        IP: selinux_inode_permission+0x85/0x160
        PGD 63681067 PUD 7ddbe067 PMD 0
        Oops: 0000 [#1] PREEMPT
        CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
        Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
        task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
        RIP: 0010:[<ffffffff812d8bc5>]  [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
        RSP: 0018:ffff88007ddb1c48  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
        RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
        RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
        R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
        R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
        FS:  00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
        CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
        Call Trace:
          security_inode_permission+0x1c/0x30
          __inode_permission+0x41/0xa0
          inode_permission+0x18/0x50
          link_path_walk+0x66/0x920
          path_openat+0xa6/0x6c0
          do_filp_open+0x43/0xa0
          do_sys_open+0x146/0x240
          SyS_open+0x1e/0x20
          system_call_fastpath+0x16/0x1b
        Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
        RIP  selinux_inode_permission+0x85/0x160
        CR2: 0000000000000020
      
      Investigating, I found that the inode->i_security was NULL, and the
      dereference of it caused the oops.
      
      in selinux_inode_permission():
      
      	isec = inode->i_security;
      
      	rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
      
      Note, the crash came from stressing the deletion and reading of debugfs
      files.  I was not able to recreate this via normal files.  But I'm not
      sure they are safe.  It may just be that the race window is much harder
      to hit.
      
      What seems to have happened (and what I have traced), is the file is
      being opened at the same time the file or directory is being deleted.
      As the dentry and inode locks are not held during the path walk, nor is
      the inodes ref counts being incremented, there is nothing saving these
      structures from being discarded except for an rcu_read_lock().
      
      The rcu_read_lock() protects against freeing of the inode, but it does
      not protect freeing of the inode_security_struct.  Now if the freeing of
      the i_security happens with a call_rcu(), and the i_security field of
      the inode is not changed (it gets freed as the inode gets freed) then
      there will be no issue here.  (Linus Torvalds suggested not setting the
      field to NULL such that we do not need to check if it is NULL in the
      permission check).
      
      Note, this is a hack, but it fixes the problem at hand.  A real fix is
      to restructure the destroy_inode() to call all the destructor handlers
      from the RCU callback.  But that is a major job to do, and requires a
      lot of work.  For now, we just band-aid this bug with this fix (it
      works), and work on a more maintainable solution in the future.
      
      Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
      Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.homeSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e74d93d
    • Jean Delvare's avatar
      hwmon: (coretemp) Fix truncated name of alarm attributes · e34cdde4
      Jean Delvare authored
      commit 3f9aec76 upstream.
      
      When the core number exceeds 9, the size of the buffer storing the
      alarm attribute name is insufficient and the attribute name is
      truncated. This causes libsensors to skip these attributes as the
      truncated name is not recognized.
      Reported-by: default avatarAndreas Hollmann <hollmann@in.tum.de>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e34cdde4
    • Jianguo Wu's avatar
      mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully · 9a22404c
      Jianguo Wu authored
      commit a49ecbcd upstream.
      
      After a successful hugetlb page migration by soft offline, the source
      page will either be freed into hugepage_freelists or buddy(over-commit
      page).  If page is in buddy, page_hstate(page) will be NULL.  It will
      hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
        IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
        PGD c23762067 PUD c24be2067 PMD 0
        Oops: 0000 [#1] SMP
      
      So check PageHuge(page) after call migrate_pages() successfully.
      
      [wujg: backport to 3.4:
       - adjust context
       - s/num_poisoned_pages/mce_bad_pages/]
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Tested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      9a22404c
    • Robert Richter's avatar
      perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h · 7e99f216
      Robert Richter authored
      commit bee09ed9 upstream.
      
      On AMD family 10h we see following error messages while waking up from
      S3 for all non-boot CPUs leading to a failed IBS initialization:
      
       Enabling non-boot CPUs ...
       smpboot: Booting Node 0 Processor 1 APIC 0x1
       [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
       perf: IBS APIC setup failed on cpu #1
       process: Switch to broadcast mode on CPU1
       CPU1 is up
       ...
       ACPI: Waking up from system sleep state S3
      
      Reason for this is that during suspend the LVT offset for the IBS
      vector gets lost and needs to be reinialized while resuming.
      
      The offset is read from the IBSCTL msr. On family 10h the offset needs
      to be 1 as offset 0 is used for the MCE threshold interrupt, but
      firmware assings it for IBS to 0 too. The kernel needs to reprogram
      the vector. The msr is a readonly node msr, but a new value can be
      written via pci config space access. The reinitialization is
      implemented for family 10h in setup_ibs_ctl() which is forced during
      IBS setup.
      
      This patch fixes IBS setup after waking up from S3 by adding
      resume/supend hooks for the boot cpu which does the offset
      reinitialization.
      
      Marking it as stable to let distros pick up this fix.
      Signed-off-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e99f216
    • Ian Abbott's avatar
      staging: comedi: 8255_pci: fix for newer PCI-DIO48H · 6013932c
      Ian Abbott authored
      commit 0283f7a1 upstream.
      
      At some point, Measurement Computing / ComputerBoards redesigned the
      PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip.
      This meant they had to put their hardware registers in the PCI BAR 2
      region instead of PCI BAR 1.  Unfortunately, they kept the same PCI
      device ID for the new design.  This means the driver recognizes the
      newer cards, but doesn't work (and is likely to screw up the local
      configuration registers of the PLX chip) because it's using the wrong
      region.
      
      Since  the PCI subvendor and subdevice IDs were both zero on the old
      design, but are the same as the vendor and device on the new design, we
      can tell the old design and new design apart easily enough.  Split the
      existing entry for the PCI-DIO48H in `pci_8255_boards[]` into two new
      entries, referenced by different entries in the PCI device ID table
      `pci_8255_pci_table[]`.  Use the same board name for both entries.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Acked-by: default avatarH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6013932c
    • Andy Honig's avatar
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · 777f8f3b
      Andy Honig authored
      commit fda4e2e8 upstream.
      
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [ lizf: backported to 3.4: based on Paolo's backport hints for <3.10 ]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      777f8f3b
  3. 15 Jan, 2014 19 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.77 · 4b9c8e9b
      Greg Kroah-Hartman authored
      4b9c8e9b
    • Paul Turner's avatar
      sched: Guarantee new group-entities always have weight · 9af6b695
      Paul Turner authored
      commit 0ac9b1c2 upstream.
      
      Currently, group entity load-weights are initialized to zero. This
      admits some races with respect to the first time they are re-weighted in
      earlty use. ( Let g[x] denote the se for "g" on cpu "x". )
      
      Suppose that we have root->a and that a enters a throttled state,
      immediately followed by a[0]->t1 (the only task running on cpu[0])
      blocking:
      
        put_prev_task(group_cfs_rq(a[0]), t1)
        put_prev_entity(..., t1)
        check_cfs_rq_runtime(group_cfs_rq(a[0]))
        throttle_cfs_rq(group_cfs_rq(a[0]))
      
      Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first
      time:
      
        enqueue_task_fair(rq[0], t2)
        enqueue_entity(group_cfs_rq(b[0]), t2)
        enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
        account_entity_enqueue(group_cfs_ra(b[0]), t2)
        update_cfs_shares(group_cfs_rq(b[0]))
        < skipped because b is part of a throttled hierarchy >
        enqueue_entity(group_cfs_rq(a[0]), b[0])
        ...
      
      We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0
      which violates invariants in several code-paths. Eliminate the
      possibility of this by initializing group entity weight.
      Signed-off-by: default avatarPaul Turner <pjt@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9af6b695
    • Ben Segall's avatar
      sched: Fix hrtimer_cancel()/rq->lock deadlock · 03d35a39
      Ben Segall authored
      commit 927b54fc upstream.
      
      __start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
      waiting for the hrtimer to finish. However, if sched_cfs_period_timer
      runs for another loop iteration, the hrtimer can attempt to take
      rq->lock, resulting in deadlock.
      
      Fix this by ensuring that cfs_b->timer_active is cleared only if the
      _latest_ call to do_sched_cfs_period_timer is returning as idle. Then
      __start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
      that to succeed or timer_active == 1.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03d35a39
    • Ben Segall's avatar
      sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining · 9b318052
      Ben Segall authored
      commit db06e78c upstream.
      
      hrtimer_expires_remaining does not take internal hrtimer locks and thus
      must be guarded against concurrent __hrtimer_start_range_ns (but
      returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b318052
    • Ben Segall's avatar
      sched: Fix race on toggling cfs_bandwidth_used · 16e7480c
      Ben Segall authored
      commit 1ee14e6c upstream.
      
      When we transition cfs_bandwidth_used to false, any currently
      throttled groups will incorrectly return false from cfs_rq_throttled.
      While tg_set_cfs_bandwidth will unthrottle them eventually, currently
      running code (including at least dequeue_task_fair and
      distribute_cfs_runtime) will cause errors.
      
      Fix this by turning off cfs_bandwidth_used only after unthrottling all
      cfs_rqs.
      
      Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
      crashes in minutes without the patch, hasn't crashed with it.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16e7480c
    • Linus Torvalds's avatar
      x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround · a63f31f1
      Linus Torvalds authored
      commit 26bef131 upstream.
      
      Before we do an EMMS in the AMD FXSAVE information leak workaround we
      need to clear any pending exceptions, otherwise we trap with a
      floating-point exception inside this code.
      Reported-by: default avatarhalfdog <me@halfdog.net>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63f31f1
    • Laurent Pinchart's avatar
      ARM: shmobile: mackerel: Fix coherent DMA mask · 925ece07
      Laurent Pinchart authored
      commit b6328a6b upstream.
      
      Commit 4dcfa600 ("ARM: DMA-API: better
      handing of DMA masks for coherent allocations") added an additional
      check to the coherent DMA mask that results in an error when the mask is
      larger than what dma_addr_t can address.
      
      Set the LCDC coherent DMA mask to DMA_BIT_MASK(32) instead of ~0 to fix
      the problem.
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Signed-off-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      925ece07
    • Russell King's avatar
      ARM: fix "bad mode in ... handler" message for undefined instructions · 10252aa2
      Russell King authored
      commit 29c350bf upstream.
      
      The array was missing the final entry for the undefined instruction
      exception handler; this commit adds it.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10252aa2
    • Curt Brune's avatar
      bridge: use spin_lock_bh() in br_multicast_set_hash_max · 09e333b0
      Curt Brune authored
      [ Upstream commit fe0d692b ]
      
      br_multicast_set_hash_max() is called from process context in
      net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
      
      br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
      which can deadlock the CPU if a softirq that also tries to take the
      same lock interrupts br_multicast_set_hash_max() while the lock is
      held .  This can happen quite easily when any of the bridge multicast
      timers expire, which try to take the same lock.
      
      The fix here is to use spin_lock_bh(), preventing other softirqs from
      executing on this CPU.
      
      Steps to reproduce:
      
      1. Create a bridge with several interfaces (I used 4).
      2. Set the "multicast query interval" to a low number, like 2.
      3. Enable the bridge as a multicast querier.
      4. Repeatedly set the bridge hash_max parameter via sysfs.
      
        # brctl addbr br0
        # brctl addif br0 eth1 eth2 eth3 eth4
        # brctl setmcqi br0 2
        # brctl setmcquerier br0 1
      
        # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
      Signed-off-by: default avatarCurt Brune <curt@cumulusnetworks.com>
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09e333b0
    • Daniel Borkmann's avatar
      net: llc: fix use after free in llc_ui_recvmsg · 5a5bf44e
      Daniel Borkmann authored
      [ Upstream commit 4d231b76 ]
      
      While commit 30a584d9 fixes datagram interface in LLC, a use
      after free bug has been introduced for SOCK_STREAM sockets that do
      not make use of MSG_PEEK.
      
      The flow is as follow ...
      
        if (!(flags & MSG_PEEK)) {
          ...
          sk_eat_skb(sk, skb, false);
          ...
        }
        ...
        if (used + offset < skb->len)
          continue;
      
      ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache
      original length and work on skb_len to check partial reads.
      
      Fixes: 30a584d9 ("[LLX]: SOCK_DGRAM interface fixes")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a5bf44e
    • David S. Miller's avatar
      vlan: Fix header ops passthru when doing TX VLAN offload. · 86dc6b93
      David S. Miller authored
      [ Upstream commit 2205369a ]
      
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86dc6b93
    • Florian Westphal's avatar
      net: rose: restore old recvmsg behavior · e25027c9
      Florian Westphal authored
      [ Upstream commit f81152e3 ]
      
      recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.
      
      After commit f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic), we now
      always take the else branch due to namelen being initialized to 0.
      
      Digging in netdev-vger-cvs git repo shows that msg_namelen was
      initialized with a fixed-size since at least 1995, so the else branch
      was never taken.
      
      Compile tested only.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e25027c9
    • Sasha Levin's avatar
      rds: prevent dereference of a NULL device · 7918313d
      Sasha Levin authored
      [ Upstream commit c2349758 ]
      
      Binding might result in a NULL device, which is dereferenced
      causing this BUG:
      
      [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097
      4
      [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0
      [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 1317.264179] Dumping ftrace buffer:
      [ 1317.264774]    (ftrace buffer empty)
      [ 1317.265220] Modules linked in:
      [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G        W    3.13.0-rc4-
      next-20131218-sasha-00013-g2cebb9b-dirty #4159
      [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000
      [ 1317.268399] RIP: 0010:[<ffffffff84225f52>]  [<ffffffff84225f52>] rds_ib_laddr_check+
      0x82/0x110
      [ 1317.269670] RSP: 0000:ffff8803cd31bdf8  EFLAGS: 00010246
      [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000
      [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286
      [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000
      [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031
      [ 1317.270230] FS:  00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000
      0000
      [ 1317.270230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0
      [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
      [ 1317.270230] Stack:
      [ 1317.270230]  0000000054086700 5408670000a25de0 5408670000000002 0000000000000000
      [ 1317.270230]  ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160
      [ 1317.270230]  ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280
      [ 1317.270230] Call Trace:
      [ 1317.270230]  [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0
      [ 1317.270230]  [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0
      [ 1317.270230]  [<ffffffff8421c9c3>] rds_bind+0x73/0xf0
      [ 1317.270230]  [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0
      [ 1317.270230]  [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0
      [ 1317.270230]  [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10
      [ 1317.270230]  [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290
      [ 1317.270230]  [<ffffffff83e4cece>] SyS_bind+0xe/0x10
      [ 1317.270230]  [<ffffffff843a6ad0>] tracesys+0xdd/0xe2
      [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00
      89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7
      4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02
      [ 1317.270230] RIP  [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.270230]  RSP <ffff8803cd31bdf8>
      [ 1317.270230] CR2: 0000000000000974
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7918313d
    • Salva Peiró's avatar
      hamradio/yam: fix info leak in ioctl · eb2da112
      Salva Peiró authored
      [ Upstream commit 8e3fbf87 ]
      
      The yam_ioctl() code fails to initialise the cmd field
      of the struct yamdrv_ioctl_cfg. Add an explicit memset(0)
      before filling the structure to avoid the 4-byte info leak.
      Signed-off-by: default avatarSalva Peiró <speiro@ai2.upv.es>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb2da112
    • Wenliang Fan's avatar
      drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() · b7e3762c
      Wenliang Fan authored
      [ Upstream commit e9db5c21 ]
      
      The local variable 'bi' comes from userspace. If userspace passed a
      large number to 'bi.data.calibrate', there would be an integer overflow
      in the following line:
      	s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
      Signed-off-by: default avatarWenliang Fan <fanwlexca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7e3762c
    • Daniel Borkmann's avatar
      net: inet_diag: zero out uninitialized idiag_{src,dst} fields · 931a701d
      Daniel Borkmann authored
      [ Upstream commit b1aac815 ]
      
      Jakub reported while working with nlmon netlink sniffer that parts of
      the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
      That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
      
      In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
      memory through this. At least, in udp_dump_one(), we allocate a skb in ...
      
        rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
      
      ... and then pass that to inet_sk_diag_fill() that puts the whole struct
      inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
      r->id.idiag_dst[0] and leave the rest untouched:
      
        r->id.idiag_src[0] = inet->inet_rcv_saddr;
        r->id.idiag_dst[0] = inet->inet_daddr;
      
      struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
      fully filled out in IPv6 case, but for IPv4 not.
      
      So just zero them out by using plain memset (for this little amount of
      bytes it's probably not worth the extra check for idiag_family == AF_INET).
      
      Similarly, fix also other places where we fill that out.
      Reported-by: default avatarJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      931a701d
    • Sasha Levin's avatar
      net: unix: allow bind to fail on mutex lock · c0be5de1
      Sasha Levin authored
      [ Upstream commit 37ab4fa7 ]
      
      This is similar to the set_peek_off patch where calling bind while the
      socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
      spew after a while.
      
      This is also the last place that did a straightforward mutex_lock(), so
      there shouldn't be any more of these patches.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0be5de1
    • Jason Wang's avatar
      netvsc: don't flush peers notifying work during setting mtu · 0248d4b4
      Jason Wang authored
      [ Upstream commit 50dc875f ]
      
      There's a possible deadlock if we flush the peers notifying work during setting
      mtu:
      
      [   22.991149] ======================================================
      [   22.991173] [ INFO: possible circular locking dependency detected ]
      [   22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted
      [   22.991219] -------------------------------------------------------
      [   22.991243] ip/974 is trying to acquire lock:
      [   22.991261]  ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [<ffffffff8108af95>] flush_work+0x5/0x2e0
      [   22.991307]
      but task is already holding lock:
      [   22.991330]  (rtnl_mutex){+.+.+.}, at: [<ffffffff81539deb>] rtnetlink_rcv+0x1b/0x40
      [   22.991367]
      which lock already depends on the new lock.
      
      [   22.991398]
      the existing dependency chain (in reverse order) is:
      [   22.991426]
      -> #1 (rtnl_mutex){+.+.+.}:
      [   22.991449]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
      [   22.991477]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
      [   22.991501]        [<ffffffff81673659>] mutex_lock_nested+0x89/0x4f0
      [   22.991529]        [<ffffffff815392b7>] rtnl_lock+0x17/0x20
      [   22.991552]        [<ffffffff815230b2>] netdev_notify_peers+0x12/0x30
      [   22.991579]        [<ffffffffa0340212>] netvsc_send_garp+0x22/0x30 [hv_netvsc]
      [   22.991610]        [<ffffffff8108d251>] process_one_work+0x211/0x6e0
      [   22.991637]        [<ffffffff8108d83b>] worker_thread+0x11b/0x3a0
      [   22.991663]        [<ffffffff81095e5d>] kthread+0xed/0x100
      [   22.991686]        [<ffffffff81681c6c>] ret_from_fork+0x7c/0xb0
      [   22.991715]
      -> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}:
      [   22.991715]        [<ffffffff810de817>] check_prevs_add+0x967/0x970
      [   22.991715]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
      [   22.991715]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
      [   22.991715]        [<ffffffff8108afde>] flush_work+0x4e/0x2e0
      [   22.991715]        [<ffffffff8108e1b5>] __cancel_work_timer+0x95/0x130
      [   22.991715]        [<ffffffff8108e303>] cancel_delayed_work_sync+0x13/0x20
      [   22.991715]        [<ffffffffa03404e4>] netvsc_change_mtu+0x84/0x200 [hv_netvsc]
      [   22.991715]        [<ffffffff815233d4>] dev_set_mtu+0x34/0x80
      [   22.991715]        [<ffffffff8153bc2a>] do_setlink+0x23a/0xa00
      [   22.991715]        [<ffffffff8153d054>] rtnl_newlink+0x394/0x5e0
      [   22.991715]        [<ffffffff81539eac>] rtnetlink_rcv_msg+0x9c/0x260
      [   22.991715]        [<ffffffff8155cdd9>] netlink_rcv_skb+0xa9/0xc0
      [   22.991715]        [<ffffffff81539dfa>] rtnetlink_rcv+0x2a/0x40
      [   22.991715]        [<ffffffff8155c41d>] netlink_unicast+0xdd/0x190
      [   22.991715]        [<ffffffff8155c807>] netlink_sendmsg+0x337/0x750
      [   22.991715]        [<ffffffff8150d219>] sock_sendmsg+0x99/0xd0
      [   22.991715]        [<ffffffff8150d63e>] ___sys_sendmsg+0x39e/0x3b0
      [   22.991715]        [<ffffffff8150eba2>] __sys_sendmsg+0x42/0x80
      [   22.991715]        [<ffffffff8150ebf2>] SyS_sendmsg+0x12/0x20
      [   22.991715]        [<ffffffff81681d19>] system_call_fastpath+0x16/0x1b
      
      This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
      the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
      called from worker and also trying to hold the rtnl_lock. This will lead the
      flush won't succeed forever. Solve this by not canceling and flushing the work,
      this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
      synchronized with the netif_tx_disable() called by netvsc_change_mtu().
      Reported-by: default avatarYaju Cao <yacao@redhat.com>
      Tested-by: default avatarYaju Cao <yacao@redhat.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0248d4b4
    • Nat Gurumoorthy's avatar
      tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 · e5893b25
      Nat Gurumoorthy authored
      [ Upstream commit 388d3335 ]
      
      The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120)
      uninitialized. From power on reset this register may have garbage in it. The
      Register Base Address register defines the device local address of a
      register. The data pointed to by this location is read or written using
      the Register Data register (PCI config offset 128). When REG_BASE_ADDR has
      garbage any read or write of Register Data Register (PCI 128) will cause the
      PCI bus to lock up. The TCO watchdog will fire and bring down the system.
      Signed-off-by: default avatarNat Gurumoorthy <natg@google.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5893b25