1. 10 Jun, 2015 2 commits
    • Vladimir Davydov's avatar
      memcg: do not call reclaim if !__GFP_WAIT · 7d638093
      Vladimir Davydov authored
      When trimming memcg consumption excess (see memory.high), we call
      try_to_free_mem_cgroup_pages without checking if we are allowed to sleep
      in the current context, which can result in a deadlock.  Fix this.
      
      Fixes: 241994ed ("mm: memcontrol: default hierarchy interface for memory")
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d638093
    • Gu Zheng's avatar
      mm/memory_hotplug.c: set zone->wait_table to null after freeing it · 85bd8399
      Gu Zheng authored
      Izumi found the following oops when hot re-adding a node:
      
          BUG: unable to handle kernel paging request at ffffc90008963690
          IP: __wake_up_bit+0x20/0x70
          Oops: 0000 [#1] SMP
          CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
          Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
          task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
          RIP: 0010:[<ffffffff810dff80>]  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
          RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
          RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
          RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
          RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
          R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
          FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
          Call Trace:
            unlock_page+0x6d/0x70
            generic_write_end+0x53/0xb0
            xfs_vm_write_end+0x29/0x80 [xfs]
            generic_perform_write+0x10a/0x1e0
            xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
            xfs_file_write_iter+0x79/0x120 [xfs]
            __vfs_write+0xd4/0x110
            vfs_write+0xac/0x1c0
            SyS_write+0x58/0xd0
            system_call_fastpath+0x12/0x76
          Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
          RIP  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
           RSP <ffff880017b97be8>
          CR2: ffffc90008963690
      
      Reproduce method (re-add a node)::
        Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)
      
      This seems an use-after-free problem, and the root cause is
      zone->wait_table was not set to *NULL* after free it in
      try_offline_node.
      
      When hot re-add a node, we will reuse the pgdat of it, so does the zone
      struct, and when add pages to the target zone, it will init the zone
      first (including the wait_table) if the zone is not initialized.  The
      judgement of zone initialized is based on zone->wait_table:
      
      	static inline bool zone_is_initialized(struct zone *zone)
      	{
      		return !!zone->wait_table;
      	}
      
      so if we do not set the zone->wait_table to *NULL* after free it, the
      memory hotplug routine will skip the init of new zone when hot re-add
      the node, and the wait_table still points to the freed memory, then we
      will access the invalid address when trying to wake up the waiting
      people after the i/o operation with the page is done, such as mentioned
      above.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85bd8399
  2. 09 Jun, 2015 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5879ae5f
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix stack allocation in s390 BPF JIT, from Michael Holzheu.
      
       2) Disable LRO on openvswitch paths, from Jiri Benc.
      
       3) UDP early demux doesn't handle multicast group membership properly,
          fix from Shawn Bohrer.
      
       4) Fix TX queue hang due to incorrect handling of mixed sized fragments
          and linearlization in i40e driver, from Anjali Singhai Jain.
      
       5) Cannot use disable_irq() in timer handler of AMD xgbe driver, from
          Thomas Lendacky.
      
       6) b2net driver improperly assumes pci_alloc_consistent() gives zero'd
          out memory, use dma_zalloc_coherent().  From Sriharsha Basavapatna.
      
       7) Fix use-after-free in MPLS and ipv6, from Robert Shearman.
      
       8) Missing neif_napi_del() calls in cleanup paths of b44 driver, from
          Hauke Mehrtens.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: replace last open coded skb_orphan_frags with function call
        net: bcmgenet: power on MII block for all MII modes
        ipv6: Fix protocol resubmission
        ipv6: fix possible use after free of dev stats
        b44: call netif_napi_del()
        bridge: disable softirqs around br_fdb_update to avoid lockup
        Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup"
        mpls: fix possible use after free of device
        be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent()
        bridge: use _bh spinlock variant for br_fdb_update to avoid lockup
        amd-xgbe: Use disable_irq_nosync from within timer function
        rhashtable: add missing import <linux/export.h>
        i40e: Make sure to be in VEB mode if SRIOV is enabled at probe
        i40e: start up in VEPA mode by default
        i40e/i40evf: Fix mixed size frags and linearization
        ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
        openvswitch: disable LRO
        s390/bpf: fix bpf frame pointer setup
        s390/bpf: fix stack allocation
      5879ae5f
  3. 08 Jun, 2015 15 commits
  4. 07 Jun, 2015 10 commits
  5. 06 Jun, 2015 11 commits
  6. 05 Jun, 2015 1 commit