1. 07 Oct, 2015 28 commits
    • Jann Horn's avatar
      fs: if a coredump already exists, unlink and recreate with O_EXCL · 5068ce15
      Jann Horn authored
      [ Upstream commit fbb18169 ]
      
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5068ce15
    • Jaewon Kim's avatar
      vmscan: fix increasing nr_isolated incurred by putback unevictable pages · e6064491
      Jaewon Kim authored
      [ Upstream commit c54839a7 ]
      
      reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
      number of pages removed from the candidate list.  But shrink_page_list()
      puts back mlocked pages without passing it to caller and without
      counting as nr_reclaimed.  This increases nr_isolated.
      
      To fix this, this patch changes shrink_page_list() to pass unevictable
      pages back to caller.  Caller will take care those pages.
      
      Minchan said:
      
      It fixes two issues.
      
      1. With unevictable page, cma_alloc will be successful.
      
      Exactly speaking, cma_alloc of current kernel will fail due to
      unevictable pages.
      
      2. fix leaking of NR_ISOLATED counter of vmstat
      
      With it, too_many_isolated works.  Otherwise, it could make hang until
      the process get SIGKILL.
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e6064491
    • Helge Deller's avatar
      parisc: Filter out spurious interrupts in PA-RISC irq handler · c9cfe460
      Helge Deller authored
      [ Upstream commit b1b4e435 ]
      
      When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
      long way to go to find the right IRQ line, registering it, then registering the
      serial port and the irq handler for the serial port. During this phase spurious
      interrupts for the serial port may happen which then crashes the kernel because
      the action handler might not have been set up yet.
      
      So, basically it's a race condition between the serial port hardware and the
      CPU which sets up the necessary fields in the irq sructs. The main reason for
      this race is, that we unmask the serial port irqs too early without having set
      up everything properly before (which isn't easily possible because we need the
      IRQ number to register the serial ports).
      
      This patch is a work-around for this problem. It adds checks to the CPU irq
      handler to verify if the IRQ action field has been initialized already. If not,
      we just skip this interrupt (which isn't critical for a serial port at bootup).
      The real fix would probably involve rewriting all PA-RISC specific IRQ code
      (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
      the irq chips and proper irq enabling along this line.
      
      This bug has been in the PA-RISC port since the beginning, but the crashes
      happened very rarely with currently used hardware.  But on the latest machine
      which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
      1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
      crashed at every boot because of this race. So, without this patch the machine
      would currently be unuseable.
      
      For the record, here is the flow logic:
      1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
      2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
      3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
      4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
      5. serial_init_chip() then registers the 8250 port.
      Problems:
      - In step 4 the CPU irq shouldn't have been registered yet, but after step 5
      - If serial irq happens between 4 and 5 have finished, the kernel will crash
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c9cfe460
    • John David Anglin's avatar
      parisc: Use double word condition in 64bit CAS operation · 4ea349a7
      John David Anglin authored
      [ Upstream commit 1b59ddfc ]
      
      The attached change fixes the condition used in the "sub" instruction.
      A double word comparison is needed.  This fixes the 64-bit LWS CAS
      operation on 64-bit kernels.
      
      I can now enable 64-bit atomic support in GCC.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: John David Anglin <dave.anglin>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4ea349a7
    • Trond Myklebust's avatar
      NFS: nfs_set_pgio_error sometimes misses errors · 8c87cb09
      Trond Myklebust authored
      [ Upstream commit e9ae58ae ]
      
      We should ensure that we always set the pgio_header's error field
      if a READ or WRITE RPC call returns an error. The current code depends
      on 'hdr->good_bytes' always being initialised to a large value, which
      is not always done correctly by callers.
      When this happens, applications may end up missing important errors.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c87cb09
    • Kinglong Mee's avatar
      NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client · 7730c1b9
      Kinglong Mee authored
      [ Upstream commit 18e3b739 ]
      
      ---Steps to Reproduce--
      <nfs-server>
      # cat /etc/exports
      /nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
      /nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)
      
      <nfs-client>
      # mount -t nfs nfs-server:/nfs/ /mnt/
      # ll /mnt/*/
      
      <nfs-server>
      # cat /etc/exports
      /nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
      /nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
      # service nfs restart
      
      <nfs-client>
      # ll /mnt/*/    --->>>>> oops here
      
      [ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
      [ 5123.104131] Oops: 0000 [#1]
      [ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
      [ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
      [ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
      [ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
      [ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
      [ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
      [ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
      [ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
      [ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
      [ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
      [ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
      [ 5123.112888] Stack:
      [ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
      [ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
      [ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
      [ 5123.115264] Call Trace:
      [ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
      [ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
      [ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
      [ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
      [ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
      [ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
      [ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
      [ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
      [ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.122308]  RSP <ffff88005877fdb8>
      [ 5123.122942] CR2: 0000000000000000
      
      Fixes: ec011fe8 ("NFS: Introduce a vector of migration recovery ops")
      Cc: stable@vger.kernel.org # v3.13+
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7730c1b9
    • NeilBrown's avatar
      NFSv4: don't set SETATTR for O_RDONLY|O_EXCL · 47bd9168
      NeilBrown authored
      [ Upstream commit efcbc04e ]
      
      It is unusual to combine the open flags O_RDONLY and O_EXCL, but
      it appears that libre-office does just that.
      
      [pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
      [pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
      
      NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
      probably to reset the timestamps.
      
      When it was an O_RDONLY open, the SETATTR command does not
      identify any actual attributes to change.
      If no delegation was provided to the open, the SETATTR uses the
      all-zeros stateid and the request is accepted (at least by the
      Linux NFS server - no harm, no foul).
      
      If a read-delegation was provided, this is used in the SETATTR
      request, and a Netapp filer will justifiably claim
      NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
      to retry - indefinitely.
      
      So only treat O_EXCL specially if O_CREAT was also given.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      47bd9168
    • Filipe Manana's avatar
      Btrfs: check if previous transaction aborted to avoid fs corruption · 04b4d54f
      Filipe Manana authored
      [ Upstream commit 1f9b8c8f ]
      
      While we are committing a transaction, it's possible the previous one is
      still finishing its commit and therefore we wait for it to finish first.
      However we were not checking if that previous transaction ended up getting
      aborted after we waited for it to commit, so we ended up committing the
      current transaction which can lead to fs corruption because the new
      superblock can point to trees that have had one or more nodes/leafs that
      were never durably persisted.
      The following sequence diagram exemplifies how this is possible:
      
                CPU 0                                                        CPU 1
      
        transaction N starts
      
        (...)
      
        btrfs_commit_transaction(N)
      
          cur_trans->state = TRANS_STATE_COMMIT_START;
          (...)
          cur_trans->state = TRANS_STATE_COMMIT_DOING;
          (...)
      
          cur_trans->state = TRANS_STATE_UNBLOCKED;
          root->fs_info->running_transaction = NULL;
      
                                                                    btrfs_start_transaction()
                                                                       --> starts transaction N + 1
      
          btrfs_write_and_wait_transaction(trans, root);
            --> starts writing all new or COWed ebs created
                at transaction N
      
                                                                    creates some new ebs, COWs some
                                                                    existing ebs but doesn't COW or
                                                                    deletes eb X
      
                                                                    btrfs_commit_transaction(N + 1)
                                                                      (...)
                                                                      cur_trans->state = TRANS_STATE_COMMIT_START;
                                                                      (...)
                                                                      wait_for_commit(root, prev_trans);
                                                                        --> prev_trans == transaction N
      
          btrfs_write_and_wait_transaction() continues
          writing ebs
             --> fails writing eb X, we abort transaction N
                 and set bit BTRFS_FS_STATE_ERROR on
                 fs_info->fs_state, so no new transactions
                 can start after setting that bit
      
             cleanup_transaction()
               btrfs_cleanup_one_transaction()
                 wakes up task at CPU 1
      
                                                                      continues, doesn't abort because
                                                                      cur_trans->aborted (transaction N + 1)
                                                                      is zero, and no checks for bit
                                                                      BTRFS_FS_STATE_ERROR in fs_info->fs_state
                                                                      are made
      
                                                                      btrfs_write_and_wait_transaction(trans, root);
                                                                        --> succeeds, no errors during writeback
      
                                                                      write_ctree_super(trans, root, 0);
                                                                        --> succeeds
                                                                        --> we have now a superblock that points us
                                                                            to some root that uses eb X, which was
                                                                            never written to disk
      
      In this scenario future attempts to read eb X from disk results in an
      error message like "parent transid verify failed on X wanted Y found Z".
      
      So fix this by aborting the current transaction if after waiting for the
      previous transaction we verify that it was aborted.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      04b4d54f
    • Sakari Ailus's avatar
      [media] v4l: omap3isp: Fix sub-device power management code · bf4f1059
      Sakari Ailus authored
      [ Upstream commit 9d39f054 ]
      
      Commit 813f5c0a ("media: Change media device link_notify behaviour")
      modified the media controller link setup notification API and updated the
      OMAP3 ISP driver accordingly. As a side effect it introduced a bug by
      turning power on after setting the link instead of before. This results in
      sub-devices not being powered down in some cases when they should be. Fix
      it.
      
      Fixes: 813f5c0a [media] media: Change media device link_notify behaviour
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@iki.fi>
      Cc: stable@vger.kernel.org # since v3.10
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bf4f1059
    • David Härdeman's avatar
      [media] rc-core: fix remove uevent generation · a2e1f66e
      David Härdeman authored
      [ Upstream commit a66b0c41 ]
      
      The input_dev is already gone when the rc device is being unregistered
      so checking for its presence only means that no remove uevent will be
      generated.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarDavid Härdeman <david@hardeman.nu>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a2e1f66e
    • Minfei Huang's avatar
      x86/mm: Initialize pmd_idx in page_table_range_init_count() · 946f8cba
      Minfei Huang authored
      [ Upstream commit 9962eea9 ]
      
      The variable pmd_idx is not initialized for the first iteration of the
      for loop.
      
      Assign the proper value which indexes the start address.
      
      Fixes: 719272c4 'x86, mm: only call early_ioremap_page_table_range_init() once'
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Cc: tony.luck@intel.com
      Cc: wangnan0@huawei.com
      Cc: david.vrabel@citrix.com
      Reviewed-by: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      946f8cba
    • Jeffery Miller's avatar
      Add radeon suspend/resume quirk for HP Compaq dc5750. · 2f4bef0f
      Jeffery Miller authored
      [ Upstream commit 09bfda10 ]
      
      With the radeon driver loaded the HP Compaq dc5750
      Small Form Factor machine fails to resume from suspend.
      Adding a quirk similar to other devices avoids
      the problem and the system resumes properly.
      Signed-off-by: default avatarJeffery Miller <jmiller@neverware.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2f4bef0f
    • Jann Horn's avatar
      CIFS: fix type confusion in copy offload ioctl · ea807983
      Jann Horn authored
      [ Upstream commit 4c17a6d5 ]
      
      This might lead to local privilege escalation (code execution as
      kernel) for systems where the following conditions are met:
      
       - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
       - a cifs filesystem is mounted where:
        - the mount option "vers" was used and set to a value >=2.0
        - the attacker has write access to at least one file on the filesystem
      
      To attack this, an attacker would have to guess the target_tcon
      pointer (but guessing wrong doesn't cause a crash, it just returns an
      error code) and win a narrow race.
      
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ea807983
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Recompute hash value after a failed update · 522cbcf3
      Aneesh Kumar K.V authored
      [ Upstream commit 36b35d5d ]
      
      If we had secondary hash flag set, we ended up modifying hash value in
      the updatepp code path. Hence with a failed updatepp we will be using
      a wrong hash value for the following hash insert. Fix this by
      recomputing hash before insert.
      
      Without this patch we can end up with using wrong slot number in linux
      pte. That can result in us missing an hash pte update or invalidate
      which can cause memory corruption or even machine check.
      
      Fixes: 6d492ecc ("powerpc/THP: Add code to handle HPTE faults for hugepages")
      Cc: stable@vger.kernel.org # v3.11+
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      522cbcf3
    • Thomas Huth's avatar
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 5fc706cc
      Thomas Huth authored
      [ Upstream commit 1c2cb594 ]
      
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
      enabled:
      
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      
      Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5fc706cc
    • Michael Ellerman's avatar
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · e937b1ac
      Michael Ellerman authored
      [ Upstream commit 74b5037b ]
      
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e937b1ac
    • Takashi Iwai's avatar
      ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 · e0e8eb9e
      Takashi Iwai authored
      [ Upstream commit a161574e ]
      
      It turned out that the machine has a bass speaker, so take a correct
      fixup entry.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e0e8eb9e
    • Takashi Iwai's avatar
      ALSA: hda - Enable headphone jack detect on old Fujitsu laptops · 48ab75cf
      Takashi Iwai authored
      [ Upstream commit bb148bde ]
      
      According to the bug report, FSC Amilo laptops with ALC880 can detect
      the headphone jack but currently the driver disables it.  It's partly
      intentionally, as non-working jack detect was reported in the past.
      Let's enable now.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      48ab75cf
    • Takashi Iwai's avatar
      Input: evdev - do not report errors form flush() · 638f3101
      Takashi Iwai authored
      [ Upstream commit eb38f3a4 ]
      
      We've got bug reports showing the old systemd-logind (at least
      system-210) aborting unexpectedly, and this turned out to be because
      of an invalid error code from close() call to evdev devices.  close()
      is supposed to return only either EINTR or EBADFD, while the device
      returned ENODEV.  logind was overreacting to it and decided to kill
      itself when an unexpected error code was received.  What a tragedy.
      
      The bad error code comes from flush fops, and actually evdev_flush()
      returns ENODEV when device is disconnected or client's access to it is
      revoked. But in these cases the fact that flush did not actually happen is
      not an error, but rather normal behavior. For non-disconnected devices
      result of flush is also not that interesting as there is no potential of
      data loss and even if it fails application has no way of handling the
      error. Because of that we are better off always returning success from
      evdev_flush().
      
      Also returning EINTR from flush()/close() is discouraged (as it is not
      clear how application should handle this error), so let's stop taking
      evdev->mutex interruptibly.
      
      Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      638f3101
    • Marc Zyngier's avatar
      arm64: KVM: Disable virtual timer even if the guest is not using it · 325f7917
      Marc Zyngier authored
      [ Upstream commit c4cbba9f ]
      
      When running a guest with the architected timer disabled (with QEMU and
      the kernel_irqchip=off option, for example), it is important to make
      sure the timer gets turned off. Otherwise, the guest may try to
      enable it anyway, leading to a screaming HW interrupt.
      
      The fix is to unconditionally turn off the virtual timer on guest
      exit.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      325f7917
    • Will Deacon's avatar
      arm64: errata: add module build workaround for erratum #843419 · be9d2b0e
      Will Deacon authored
      [ Upstream commit df057cc7 ]
      
      Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can
      lead to a memory access using an incorrect address in certain sequences
      headed by an ADRP instruction.
      
      There is a linker fix to generate veneers for ADRP instructions, but
      this doesn't work for kernel modules which are built as unlinked ELF
      objects.
      
      This patch adds a new config option for the erratum which, when enabled,
      builds kernel modules with the mcmodel=large flag. This uses absolute
      addressing for all kernel symbols, thereby removing the use of ADRP as
      a PC-relative form of addressing. The ADRP relocs are removed from the
      module loader so that we fail to load any potentially affected modules.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      be9d2b0e
    • Will Deacon's avatar
      arm64: head.S: initialise mdcr_el2 in el2_setup · dbb4b0e5
      Will Deacon authored
      [ Upstream commit d10bcd47 ]
      
      When entering the kernel at EL2, we fail to initialise the MDCR_EL2
      register which controls debug access and PMU capabilities at EL1.
      
      This patch ensures that the register is initialised so that all traps
      are disabled and all the PMU counters are available to the host. When a
      guest is scheduled, KVM takes care to configure trapping appropriately.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dbb4b0e5
    • Will Deacon's avatar
      arm64: compat: fix vfp save/restore across signal handlers in big-endian · 0e49849f
      Will Deacon authored
      [ Upstream commit bdec97a8 ]
      
      When saving/restoring the VFP registers from a compat (AArch32)
      signal frame, we rely on the compat registers forming a prefix of the
      native register file and therefore make use of copy_{to,from}_user to
      transfer between the native fpsimd_state and the compat_vfp_sigframe.
      
      Unfortunately, this doesn't work so well in a big-endian environment.
      Our fpsimd save/restore code operates directly on 128-bit quantities
      (Q registers) whereas the compat_vfp_sigframe represents the registers
      as an array of 64-bit (D) registers. The architecture packs the compat D
      registers into the Q registers, with the least significant bytes holding
      the lower register. Consequently, we need to swap the 64-bit halves when
      converting between these two representations on a big-endian machine.
      
      This patch replaces the __copy_{to,from}_user invocations in our
      compat VFP signal handling code with explicit __put_user loops that
      operate on 64-bit values and swap them accordingly.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0e49849f
    • Jeff Vander Stoep's avatar
      arm64: kconfig: Move LIST_POISON to a safe value · 698cebfe
      Jeff Vander Stoep authored
      [ Upstream commit bf0c4e04 ]
      
      Move the poison pointer offset to 0xdead000000000000, a
      recognized value that is not mappable by user-space exploits.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarThierry Strudel <tstrudel@google.com>
      Signed-off-by: default avatarJeff Vander Stoep <jeffv@google.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      698cebfe
    • Bob Copeland's avatar
      mac80211: enable assoc check for mesh interfaces · 0e90b248
      Bob Copeland authored
      [ Upstream commit 3633ebeb ]
      
      We already set a station to be associated when peering completes, both
      in user space and in the kernel.  Thus we should always have an
      associated sta before sending data frames to that station.
      
      Failure to check assoc state can cause crashes in the lower-level driver
      due to transmitting unicast data frames before driver sta structures
      (e.g. ampdu state in ath9k) are initialized.  This occurred when
      forwarding in the presence of fixed mesh paths: frames were transmitted
      to stations with whom we hadn't yet completed peering.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAlexis Green <agreen@cococorp.com>
      Tested-by: default avatarJesse Jones <jjones@cococorp.com>
      Signed-off-by: default avatarBob Copeland <me@bobcopeland.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0e90b248
    • Jean Delvare's avatar
      tg3: Fix temperature reporting · 39691dda
      Jean Delvare authored
      [ Upstream commit d3d11fe0 ]
      
      The temperature registers appear to report values in degrees Celsius
      while the hwmon API mandates values to be exposed in millidegrees
      Celsius. Do the conversion so that the values reported by "sensors"
      are correct.
      
      Fixes: aed93e0b ("tg3: Add hwmon support for temperature")
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: Prashant Sreedharan <prashant@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: stable@vger.kernel.org [v3.6+]
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      39691dda
    • Eric W. Biederman's avatar
      unshare: Unsharing a thread does not require unsharing a vm · e12419a5
      Eric W. Biederman authored
      [ Upstream commit 12c641ab ]
      
      In the logic in the initial commit of unshare made creating a new
      thread group for a process, contingent upon creating a new memory
      address space for that process.  That is wrong.  Two separate
      processes in different thread groups can share a memory address space
      and clone allows creation of such proceses.
      
      This is significant because it was observed that mm_users > 1 does not
      mean that a process is multi-threaded, as reading /proc/PID/maps
      temporarily increments mm_users, which allows other processes to
      (accidentally) interfere with unshare() calls.
      
      Correct the check in check_unshare_flags() to test for
      !thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
      For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
      For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.
      
      By using the correct checks in unshare this removes the possibility of
      an accidental denial of service attack.
      
      Additionally using the correct checks in unshare ensures that only an
      explicit unshare(CLONE_VM) can possibly trigger the slow path of
      current_is_single_threaded().  As an explict unshare(CLONE_VM) is
      pointless it is not expected there are many applications that make
      that call.
      
      Cc: stable@vger.kernel.org
      Fixes: b2e0d987 userns: Implement unshare of the user namespace
      Reported-by: default avatarRicky Zhou <rickyz@chromium.org>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e12419a5
    • Ming Lei's avatar
      blk-mq: fix buffer overflow when reading sysfs file of 'pending' · 89ca6eef
      Ming Lei authored
      [ Upstream commit 596f5aad ]
      
      There may be lots of pending requests so that the buffer of PAGE_SIZE
      can't hold them at all.
      
      One typical example is scsi-mq, the queue depth(.can_queue) of
      scsi_host and blk-mq is quite big but scsi_device's queue_depth
      is a bit small(.cmd_per_lun), then it is quite easy to have lots
      of pending requests in hw queue.
      
      This patch fixes the following warning and the related memory
      destruction.
      
      [  359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
      [  359.055595] irq event stamp: 15537^M
      [  359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
      [  359.055614] Dumping ftrace buffer:^M
      [  359.055660]    (ftrace buffer empty)^M
      [  359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
      [  359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
      [  359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
      [  359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
      [  359.055693] RIP: 0010:[<ffffffff811541c5>]  [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      89ca6eef
  2. 01 Oct, 2015 1 commit
  3. 28 Sep, 2015 11 commits
    • Julian Anastasov's avatar
      net: call rcu_read_lock early in process_backlog · 52135f13
      Julian Anastasov authored
      [ Upstream commit 2c17d27c ]
      
      Incoming packet should be either in backlog queue or
      in RCU read-side section. Otherwise, the final sequence of
      flush_backlog() and synchronize_net() may miss packets
      that can run without device reference:
      
      CPU 1                  CPU 2
                             skb->dev: no reference
                             process_backlog:__skb_dequeue
                             process_backlog:local_irq_enable
      
      on_each_cpu for
      flush_backlog =>       IPI(hardirq): flush_backlog
                             - packet not found in backlog
      
                             CPU delayed ...
      synchronize_net
      - no ongoing RCU
      read-side sections
      
      netdev_run_todo,
      rcu_barrier: no
      ongoing callbacks
                             __netif_receive_skb_core:rcu_read_lock
                             - too late
      free dev
                             process packet for freed dev
      
      Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue")
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      52135f13
    • James Smart's avatar
      lpfc: Fix scsi prep dma buf error. · cf76d3de
      James Smart authored
      [ Upstream commit 5116fbf1 ]
      
      Didn't check for less-than-or-equal zero. Means we may later call
      scsi_dma_unmap() even though we don't have valid mappings.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@avagotech.com>
      Signed-off-by: default avatarJames Smart <james.smart@avagotech.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cf76d3de
    • Dan Carpenter's avatar
      rds: fix an integer overflow test in rds_info_getsockopt() · 573f4d61
      Dan Carpenter authored
      [ Upstream commit 468b732b ]
      
      "len" is a signed integer.  We check that len is not negative, so it
      goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
      is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
      INT_MAX so the condition can never be true.
      
      I don't know if this is harmful but it seems safe to limit "len" to
      INT_MAX - 4095.
      
      Fixes: a8c879a7 ('RDS: Info and stats')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      573f4d61
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix wrong index in propagating port change event to VFs · 5e1cc32f
      Jack Morgenstein authored
      [ Upstream commit 1c1bf349 ]
      
      The port-change event processing in procedure mlx4_eq_int() uses "slave"
      as the vf_oper array index. Since the value of "slave" is the PF function
      index, the result is that the PF link state is used for deciding to
      propagate the event for all the VFs. The VF link state should be used,
      so the VF function index should be used here.
      
      Fixes: 948e306d ('net/mlx4: Add VF link state support')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5e1cc32f
    • Florian Westphal's avatar
      netlink: don't hold mutex in rcu callback when releasing mmapd ring · 6c897d8c
      Florian Westphal authored
      [ Upstream commit 0470eb99 ]
      
      Kirill A. Shutemov says:
      
      This simple test-case trigers few locking asserts in kernel:
      
      int main(int argc, char **argv)
      {
              unsigned int block_size = 16 * 4096;
              struct nl_mmap_req req = {
                      .nm_block_size          = block_size,
                      .nm_block_nr            = 64,
                      .nm_frame_size          = 16384,
                      .nm_frame_nr            = 64 * block_size / 16384,
              };
              unsigned int ring_size;
      	int fd;
      
      	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                      exit(1);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                      exit(1);
      
      	ring_size = req.nm_block_nr * req.nm_block_size;
      	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	return 0;
      }
      
      +++ exited with 0 +++
      BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
      in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
      3 locks held by init/1:
       #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
       #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
       #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
      Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
      
      CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
       ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
       0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
       ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
      Call Trace:
       <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
       [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
       [<ffffffff81085bed>] __might_sleep+0x4d/0x90
       [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
       [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
       [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
       [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
       [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
       [<ffffffff817e484d>] __sk_free+0x1d/0x160
       [<ffffffff817e49a9>] sk_free+0x19/0x20
      [..]
      
      Cong Wang says:
      
      We can't hold mutex lock in a rcu callback, [..]
      
      Thomas Graf says:
      
      The socket should be dead at this point. It might be simpler to
      add a netlink_release_ring() function which doesn't require
      locking at all.
      Reported-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Diagnosed-by: default avatarCong Wang <cwang@twopensource.com>
      Suggested-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6c897d8c
    • Edward Hyunkoo Jee's avatar
      inet: frags: fix defragmented packet's IP header for af_packet · 02948c19
      Edward Hyunkoo Jee authored
      [ Upstream commit 0848f642 ]
      
      When ip_frag_queue() computes positions, it assumes that the passed
      sk_buff does not contain L2 headers.
      
      However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
      functions can be called on outgoing packets that contain L2 headers.
      
      Also, IPv4 checksum is not corrected after reassembly.
      
      Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.")
      Signed-off-by: default avatarEdward Hyunkoo Jee <edjee@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      02948c19
    • dingtianhong's avatar
      bonding: correct the MAC address for "follow" fail_over_mac policy · 13e4ceb7
      dingtianhong authored
      [ Upstream commit a951bc1e ]
      
      The "follow" fail_over_mac policy is useful for multiport devices that
      either become confused or incur a performance penalty when multiple
      ports are programmed with the same MAC address, but the same MAC
      address still may happened by this steps for this policy:
      
      1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
         bond0 has the same mac address with eth0, it is MAC1.
      
      2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
         eth1 is backup, eth1 has MAC2.
      
      3) ifconfig eth0 down
         eth1 became active slave, bond will swap MAC for eth0 and eth1,
         so eth1 has MAC1, and eth0 has MAC2.
      
      4) ifconfig eth1 down
         there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
      
      5) ifconfig eth0 up
         the eth0 became active slave again, the bond set eth0 to MAC1.
      
      Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
      MAC address, it will break this policy for ACTIVE_BACKUP mode.
      
      This patch will fix this problem by finding the old active slave and
      swap them MAC address before change active slave.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Tested-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      13e4ceb7
    • Nikolay Aleksandrov's avatar
      bonding: fix destruction of bond with devices different from arphrd_ether · 09fad06d
      Nikolay Aleksandrov authored
      [ Upstream commit 06f6d109 ]
      
      When the bonding is being unloaded and the netdevice notifier is
      unregistered it executes NETDEV_UNREGISTER for each device which should
      remove the bond's proc entry but if the device enslaved is not of
      ARPHRD_ETHER type and is in front of the bonding, it may execute
      bond_release_and_destroy() first which would release the last slave and
      destroy the bond device leaving the proc entry and thus we will get the
      following error (with dynamic debug on for bond_netdev_event to see the
      events order):
      [  908.963051] eql: event: 9
      [  908.963052] eql: IFF_SLAVE
      [  908.963054] eql: event: 2
      [  908.963056] eql: IFF_SLAVE
      [  908.963058] eql: event: 6
      [  908.963059] eql: IFF_SLAVE
      [  908.963110] bond0: Releasing active interface eql
      [  908.976168] bond0: Destroying bond bond0
      [  908.976266] bond0 (unregistering): Released all slaves
      [  908.984097] ------------[ cut here ]------------
      [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
      remove_proc_entry+0x112/0x160()
      [  908.984110] remove_proc_entry: removing non-empty directory
      'net/bonding', leaking at least 'bond0'
      [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
      snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
      gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
      psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
      drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
      pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
      autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
      ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
      virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
      usb_common [last unloaded: bonding]
      
      [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
      4.2.0-rc2+ #8
      [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
      ffff8800358dfda8
      [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
      ffff88003e3a4280
      [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
      ffffffff8172ebd0
      [  908.984181] Call Trace:
      [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
      [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
      [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
      [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
      [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
      [bonding]
      [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
      [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
      [  908.984225]  [<ffffffff8142f52d>] ?
      unregister_pernet_operations+0x8d/0xd0
      [  908.984228]  [<ffffffff8142f58d>] ?
      unregister_pernet_subsys+0x1d/0x30
      [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
      [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
      [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
      [  908.984244]  [<ffffffff8152b732>] ?
      entry_SYSCALL_64_fastpath+0x16/0x75
      [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
      
      Thus remove the proc entry manually if bond_release_and_destroy() is
      used. Because of the checks in bond_remove_proc_entry() it's not a
      problem for a bond device to change namespaces (the bug fixed by the
      Fixes commit) but since commit
      f9399814 ("bonding: Don't allow bond devices to change network
      namespaces.") that can't happen anyway.
      Reported-by: default avatarCarol Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: a64d49c3 ("bonding: Manage /proc/net/bonding/ entries from
                            the netdev events")
      Tested-by: default avatarCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      09fad06d
    • Eric Dumazet's avatar
      ipv6: lock socket in ip6_datagram_connect() · 4bbded3d
      Eric Dumazet authored
      [ Upstream commit 03645a11 ]
      
      ip6_datagram_connect() is doing a lot of socket changes without
      socket being locked.
      
      This looks wrong, at least for udp_lib_rehash() which could corrupt
      lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4bbded3d
    • Tilman Schmidt's avatar
      isdn/gigaset: reset tty->receive_room when attaching ser_gigaset · 7cc24090
      Tilman Schmidt authored
      [ Upstream commit fd98e941 ]
      
      Commit 79901317 ("n_tty: Don't flush buffer when closing ldisc"),
      first merged in kernel release 3.10, caused the following regression
      in the Gigaset M101 driver:
      
      Before that commit, when closing the N_TTY line discipline in
      preparation to switching to N_GIGASET_M101, receive_room would be
      reset to a non-zero value by the call to n_tty_flush_buffer() in
      n_tty's close method. With the removal of that call, receive_room
      might be left at zero, blocking data reception on the serial line.
      
      The present patch fixes that regression by setting receive_room
      to an appropriate value in the ldisc open method.
      
      Fixes: 79901317 ("n_tty: Don't flush buffer when closing ldisc")
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7cc24090
    • Nikolay Aleksandrov's avatar
      bridge: mdb: fix double add notification · dbb5ff1d
      Nikolay Aleksandrov authored
      [ Upstream commit 5ebc7846 ]
      
      Since the mdb add/del code was introduced there have been 2 br_mdb_notify
      calls when doing br_mdb_add() resulting in 2 notifications on each add.
      
      Example:
       Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
       Before patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      
       After patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: cfd56754 ("bridge: add support of adding and deleting mdb entries")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dbb5ff1d