1. 24 May, 2016 20 commits
  2. 20 May, 2016 1 commit
  3. 18 May, 2016 1 commit
  4. 12 May, 2016 1 commit
  5. 09 May, 2016 17 commits
    • Hariprasad S's avatar
      RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips · 5fd753b3
      Hariprasad S authored
      commit 32cc92c7 upstream.
      
      For T4, kernel mode qps don't use the user doorbell. User mode qps during
      flow control db ringing are forced into kernel, where user doorbell is
      treated as kernel doorbell and proper bar2 offset in bar2 virtual space is
      calculated, which incase of T4 is a bogus address, causing a kernel panic
      due to illegal write during doorbell ringing.
      In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4
      check during bar2 virtual address calculation to return 0. Fixed Bar2
      range checks based on bar2 physical address.
      
      The below oops will be fixed
      
        <1>BUG: unable to handle kernel paging request at 000000000002aa08
        <1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4>PGD 1416a8067 PUD 15bf35067 PMD 0
        <4>Oops: 0002 [#1] SMP
        <4>last sysfs file:
        /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid
        <4>CPU 5
        <4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs
        ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
        iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
        ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4
        target_core_iblock target_core_file target_core_pscsi target_core_mod
        configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q
        garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap
        macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev
        serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca
        i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi
        ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror
        dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
        <4>
        Supermicro X8ST3/X8ST3
        <4>RIP: 0010:[<ffffffffa011d800>]  [<ffffffffa011d800>]
        c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4>RSP: 0000:ffff880155a03db0  EFLAGS: 00010006
        <4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180
        <4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8
        <4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001
        <4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
        <4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88
        <4>FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
        <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
        <4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0
        <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        <4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0)
        <4>Stack:
        <4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b
        <4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000
        <4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88
        <4>Call Trace:
        <4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4]
        <4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4]
        <4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4]
        <4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0
        <4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40
        <4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0
        <4> [<ffffffff810a660e>] kthread+0x9e/0xc0
        <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
        <4> [<ffffffff810a6570>] ? kthread+0x0/0xc0
        <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
        <4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85
        71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8
        66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01
        <1>RIP  [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4> RSP <ffff880155a03db0>
        <4>CR2: 000000000002aa08`
      
      Based on original work by Bharat Potnuri <bharat@chelsio.com>
      
      Fixes: 74217d4c ("iw_cxgb4: support for bar2 qid densities exceeding the page size")
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Reviewed-by: default avatarLeon Romanovsky <leon@leon.nu>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5fd753b3
    • Jiri Benc's avatar
      cxgbi: fix uninitialized flowi6 · dc1a748a
      Jiri Benc authored
      commit 3d6d30d6 upstream.
      
      ip6_route_output looks into different fields in the passed flowi6 structure,
      yet cxgbi passes garbage in nearly all those fields. Zero the structure out
      first.
      
      Fixes: fc8d0590 ("libcxgbi: Add ipv6 api to driver")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dc1a748a
    • Rana Shahout's avatar
      net/mlx5e: Fix MLX5E_100BASE_T define · 74daeff3
      Rana Shahout authored
      commit 6e4c2189 upstream.
      
      Bit 25 of eth_proto_capability in PTYS register is
      1000Base-TT and not 100Base-T.
      
      Fixes: f62b8bb8 ('net/mlx5: Extend mlx5_core to
      support ConnectX-4 Ethernet functionality')
      Signed-off-by: default avatarRana Shahout <ranas@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      74daeff3
    • Krzysztof Kozlowski's avatar
      ARM: EXYNOS: Properly skip unitialized parent clock in power domain on · 503f54a5
      Krzysztof Kozlowski authored
      commit a0a966b8 upstream.
      
      We want to skip reparenting a clock on turning on power domain, if we
      do not have the parent yet. The parent is obtained when turning the
      domain off. However due to a typo, the loop is continued on IS_ERR() of
      clock being reparented, not on the IS_ERR() of the parent.
      
      Theoretically this could lead to OOPS on first turn on of a power
      domain, if there was no turn off before. Practically that should never
      happen because all power domains are turned on by default (reset value,
      bootloader does not turn off them usually) so the first action will be
      always turn off.
      
      Fixes: 29e5eea0 ("ARM: EXYNOS: Get current parent clock for power domain on/off")
      Reported-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      503f54a5
    • Jason Baron's avatar
      mm: update min_free_kbytes from khugepaged after core initialization · 1032c8e5
      Jason Baron authored
      commit bc22af74 upstream.
      
      Khugepaged attempts to raise min_free_kbytes if its set too low.
      However, on boot khugepaged sets min_free_kbytes first from
      subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
      after from init_per_zone_wmark_min(), via a module_init() call.
      
      Khugepaged used to use a late_initcall() to set min_free_kbytes (such
      that it occurred after the core initialization), however this was
      removed when the initialization of min_free_kbytes was integrated into
      the starting of the khugepaged thread.
      
      The fix here is simply to invoke the core initialization using a
      core_initcall() instead of module_init(), such that the previous
      initialization ordering is restored.  I didn't restore the
      late_initcall() since start_stop_khugepaged() already sets
      min_free_kbytes via set_recommended_min_free_kbytes().
      
      This was noticed when we had a number of page allocation failures when
      moving a workload to a kernel with this new initialization ordering.  On
      an 8GB system this restores min_free_kbytes back to 67584 from 11365
      when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
      CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
      CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
      
      Fixes: 79553da2 ("thp: cleanup khugepaged startup")
      Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1032c8e5
    • Linus Lüssing's avatar
      batman-adv: Fix broadcast/ogm queue limit on a removed interface · 6fcce5bd
      Linus Lüssing authored
      commit c4fdb6cf upstream.
      
      When removing a single interface while a broadcast or ogm packet is
      still pending then we will free the forward packet without releasing the
      queue slots again.
      
      This patch is supposed to fix this issue.
      
      Fixes: 6d5808d4 ("batman-adv: Add missing hardif_free_ref in forw_packet_free")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      [sven@narfation.org: fix conflicts with current version]
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6fcce5bd
    • Sven Eckelmann's avatar
      batman-adv: Check skb size before using encapsulated ETH+VLAN header · 51eff931
      Sven Eckelmann authored
      commit c7829666 upstream.
      
      The encapsulated ethernet and VLAN header may be outside the received
      ethernet frame. Thus the skb buffer size has to be checked before it can be
      parsed to find out if it encapsulates another batman-adv packet.
      
      Fixes: 42019357 ("batman-adv: softif bridge loop avoidance")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      51eff931
    • Sven Eckelmann's avatar
      batman-adv: Reduce refcnt of removed router when updating route · 71ede9b0
      Sven Eckelmann authored
      commit d1a65f17 upstream.
      
      _batadv_update_route rcu_derefences orig_ifinfo->router outside of a
      spinlock protected region to print some information messages to the debug
      log. But this pointer is not checked again when the new pointer is assigned
      in the spinlock protected region. Thus is can happen that the value of
      orig_ifinfo->router changed in the meantime and thus the reference counter
      of the wrong router gets reduced after the spinlock protected region.
      
      Just rcu_dereferencing the value of orig_ifinfo->router inside the spinlock
      protected region (which also set the new pointer) is enough to get the
      correct old router object.
      
      Fixes: e1a5382f ("batman-adv: Make orig_node->router an rcu protected pointer")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      71ede9b0
    • Eric Dumazet's avatar
      net/mlx4_en: fix spurious timestamping callbacks · 46f59986
      Eric Dumazet authored
      commit fc96256c upstream.
      
      When multiple skb are TX-completed in a row, we might incorrectly keep
      a timestamp of a prior skb and cause extra work.
      
      Fixes: ec693d47 ("net/mlx4_en: Add HW timestamping (TS) support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      46f59986
    • Guo-Fu Tseng's avatar
      jme: Fix device PM wakeup API usage · f099f0a8
      Guo-Fu Tseng authored
      commit 81422e67 upstream.
      
      According to Documentation/power/devices.txt
      
      The driver should not use device_set_wakeup_enable() which is the policy
      for user to decide.
      
      Using device_init_wakeup() to initialize dev->power.should_wakeup and
      dev->power.can_wakeup on driver initialization.
      
      And use device_may_wakeup() on suspend to decide if WoL function should
      be enabled on NIC.
      Reported-by: default avatarDiego Viola <diego.viola@gmail.com>
      Signed-off-by: default avatarGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f099f0a8
    • Guo-Fu Tseng's avatar
      jme: Do not enable NIC WoL functions on S0 · c429ec8f
      Guo-Fu Tseng authored
      commit 0772a99b upstream.
      
      Otherwise it might be back on resume right after going to suspend in
      some hardware.
      Reported-by: default avatarDiego Viola <diego.viola@gmail.com>
      Signed-off-by: default avatarGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c429ec8f
    • Dmitry V. Levin's avatar
      parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls · 47d24373
      Dmitry V. Levin authored
      commit f0b22d1b upstream.
      
      Do not load one entry beyond the end of the syscall table when the
      syscall number of a traced process equals to __NR_Linux_syscalls.
      Similar bug with regular processes was fixed by commit 3bb457af
      ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      47d24373
    • Chen Yu's avatar
      x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO · df5d9f5e
      Chen Yu authored
      commit 886123fb upstream.
      
      Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
      
      Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
      (35.5), the ratio bits are bit 8-15.
      
      Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
      TSC calibration and the Local APIC timer frequency to be incorrect.
      
      Fix this problem by masking 0xff instead.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 7da7c156 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      df5d9f5e
    • Hugh Dickins's avatar
      mm, cma: prevent nr_isolated_* counters from going negative · b33d8d45
      Hugh Dickins authored
      commit 14af4a5e upstream.
      
      /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
      increasingly negative under compaction: which would add delay when
      should be none, or no delay when should delay.  The bug in compaction
      was due to a recent mmotm patch, but much older instance of the bug was
      also noticed in isolate_migratepages_range() which is used for CMA and
      gigantic hugepage allocations.
      
      The bug is caused by putback_movable_pages() in an error path
      decrementing the isolated counters without them being previously
      incremented by acct_isolated().  Fix isolate_migratepages_range() by
      removing the error-path putback, thus reaching acct_isolated() with
      migratepages still isolated, and leaving putback to caller like most
      other places do.
      
      Fixes: edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
      [vbabka@suse.cz: expanded the changelog]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b33d8d45
    • Howard Cochran's avatar
      writeback: Fix performance regression in wb_over_bg_thresh() · 52f2abea
      Howard Cochran authored
      commit 74d36944 upstream.
      
      Commit 947e9762 ("writeback: update wb_over_bg_thresh() to use
      wb_domain aware operations") unintentionally changed this function's
      meaning from "are there more dirty pages than the background writeback
      threshold" to "are there more dirty pages than the writeback threshold".
      The background writeback threshold is typically half of the writeback
      threshold, so this had the effect of raising the number of dirty pages
      required to cause a writeback worker to perform background writeout.
      
      This can cause a very severe performance regression when a BDI uses
      BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
      can now disagree on whether writeback should be initiated.
      
      For example, in a system having 1GB of RAM, a single spinning disk, and a
      "pass-through" FUSE filesystem mounted over the disk, application code
      mmapped a 128MB file on the disk and was randomly dirtying pages in that
      mapping.
      
      Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
      balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
      dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
      dirtying processes when we have 151 dirty pages and wakes up a background
      writeback worker. But the worker tests the wrong threshold (200 instead of
      100), so it does not initiate writeback and just returns.
      
      Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
      worker who will do nothing. It remains stuck in this state until the few
      dirty pages that we have finally expire and we write them back for that
      reason. Then the whole process repeats, resulting in near-zero throughput
      through the FUSE BDI.
      
      The fix is to call the parameterized variant of wb_calc_thresh, so that the
      worker will do writeback if the bg_thresh is exceeded which was the
      behavior before the referenced commit.
      
      Fixes: 947e9762 ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations")
      Signed-off-by: default avatarHoward Cochran <hcochran@kernelspring.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Tested-by Sedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      52f2abea
    • Eric W. Biederman's avatar
      propogate_mnt: Handle the first propogated copy being a slave · 7aa7d365
      Eric W. Biederman authored
      commit 5ec0811d upstream.
      
      When the first propgated copy was a slave the following oops would result:
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > PGD bacd4067 PUD bac66067 PMD 0
      > Oops: 0000 [#1] SMP
      > Modules linked in:
      > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
      > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
      > RIP: 0010:[<ffffffff811fba4e>]  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > RSP: 0018:ffff8800bac3fd38  EFLAGS: 00010283
      > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
      > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
      > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
      > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
      > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
      > FS:  00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
      > Stack:
      >  ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
      >  ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
      >  0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
      > Call Trace:
      >  [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
      >  [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
      >  [<ffffffff811f1ec3>] graft_tree+0x63/0x70
      >  [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
      >  [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
      >  [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
      >  [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
      >  [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
      >  [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
      > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
      > RIP  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      >  RSP <ffff8800bac3fd38>
      > CR2: 0000000000000010
      > ---[ end trace 2725ecd95164f217 ]---
      
      This oops happens with the namespace_sem held and can be triggered by
      non-root users.  An all around not pleasant experience.
      
      To avoid this scenario when finding the appropriate source mount to
      copy stop the walk up the mnt_master chain when the first source mount
      is encountered.
      
      Further rewrite the walk up the last_source mnt_master chain so that
      it is clear what is going on.
      
      The reason why the first source mount is special is that it it's
      mnt_parent is not a mount in the dest_mnt propagation tree, and as
      such termination conditions based up on the dest_mnt mount propgation
      tree do not make sense.
      
      To avoid other kinds of confusion last_dest is not changed when
      computing last_source.  last_dest is only used once in propagate_one
      and that is above the point of the code being modified, so changing
      the global variable is meaningless and confusing.
      
      fixes: f2ebb3a9 ("smarter propagate_mnt()")
      Reported-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      Reviewed-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Tested-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7aa7d365
    • Maxim Patlasov's avatar
      fs/pnode.c: treat zero mnt_group_id-s as unequal · a56967e3
      Maxim Patlasov authored
      commit 7ae8fd03 upstream.
      
      propagate_one(m) calculates "type" argument for copy_tree() like this:
      
      >    if (m->mnt_group_id == last_dest->mnt_group_id) {
      >        type = CL_MAKE_SHARED;
      >    } else {
      >        type = CL_SLAVE;
      >        if (IS_MNT_SHARED(m))
      >           type |= CL_MAKE_SHARED;
      >   }
      
      The "type" argument then governs clone_mnt() behavior with respect to flags
      and mnt_master of new mount. When we iterate through a slave group, it is
      possible that both current "m" and "last_dest" are not shared (although,
      both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
      above erroneously makes new mount shared and sets its mnt_master to
      last_source->mnt_master. The patch fixes the problem by handling zero
      mnt_group_id-s as though they are unequal.
      
      The similar problem exists in the implementation of "else" clause above
      when we have to ascend upward in the master/slave tree by calling:
      
      >    last_source = last_source->mnt_master;
      >    last_dest = last_source->mnt_parent;
      
      proper number of times. The last step is governed by
      "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
      both are zero. The patch fixes this case in the same way as the former one.
      
      [AV: don't open-code an obvious helper...]
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a56967e3