1. 05 May, 2015 1 commit
    • Len Brown's avatar
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power... · 420d57e8
      Len Brown authored
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
      
      commit b253149b upstream.
      
      In Linux-3.9 we removed the mwait_idle() loop:
      
        69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
      
      The reasoning was that modern machines should be sufficiently
      happy during the boot process using the default_idle() HALT
      loop, until cpuidle loads and either acpi_idle or intel_idle
      invoke the newer MWAIT-with-hints idle loop.
      
      But two machines reported problems:
      
       1. Certain Core2-era machines support MWAIT-C1 and HALT only.
          MWAIT-C1 is preferred for optimal power and performance.
          But if they support just C1, cpuidle never loads and
          so they use the boot-time default idle loop forever.
      
       2. Some laptops will boot-hang if HALT is used,
          but will boot successfully if MWAIT is used.
          This appears to be a hidden assumption in BIOS SMI,
          that is presumably valid on the proprietary OS
          where the BIOS was validated.
      
             https://bugzilla.kernel.org/show_bug.cgi?id=60770
      
      So here we effectively revert the patch above, restoring
      the mwait_idle() loop.  However, we don't bother restoring
      the idle=mwait cmdline parameter, since it appears to add
      no value.
      
      Maintainer notes:
      
        For 3.9, simply revert 69fb3676
        for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
        context For 3.11, 3.12, 3.13, this patch applies cleanly
      Tested-by: default avatarMike Galbraith <bitbucket@online.de>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Acked-by: default avatarMike Galbraith <bitbucket@online.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ian Malone <ibmalone@gmail.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
      [ Ported to recent kernels. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      [ luis: backported to 3.16: used Mike's backport to 3.10 and 3.14 ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      420d57e8
  2. 04 May, 2015 19 commits
  3. 29 Apr, 2015 7 commits
    • Eric Dumazet's avatar
      net: fix crash in build_skb() · fcc9ecaa
      Eric Dumazet authored
      commit 2ea2f62c upstream.
      
      When I added pfmemalloc support in build_skb(), I forgot netlink
      was using build_skb() with a vmalloc() area.
      
      In this patch I introduce __build_skb() for netlink use,
      and build_skb() is a wrapper handling both skb->head_frag and
      skb->pfmemalloc
      
      This means netlink no longer has to hack skb->head_frag
      
      [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
      [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [ 1567.700067] Dumping ftrace buffer:
      [ 1567.700067]    (ftrace buffer empty)
      [ 1567.700067] Modules linked in:
      [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
      [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
      [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
      [ 1567.700067] RSP: 0018:ffff8802467779d8  EFLAGS: 00010202
      [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
      [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
      [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
      [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
      [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
      [ 1567.700067] FS:  00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
      [ 1567.700067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
      [ 1567.700067] Stack:
      [ 1567.700067]  ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
      [ 1567.700067]  ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
      [ 1567.700067]  ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
      [ 1567.700067] Call Trace:
      [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
      [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
      [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
      [ 1567.774369] sock_write_iter (net/socket.c:823)
      [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
      [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
      [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
      [ 1567.774369] ? default_llseek (fs/read_write.c:487)
      [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
      [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
      [ 1567.774369] vfs_write (fs/read_write.c:539)
      [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
      [ 1567.774369] ? SyS_read (fs/read_write.c:577)
      [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
      [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
      [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
      [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
      
      Fixes: 79930f58 ("net: do not deplete pfmemalloc reserve")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      fcc9ecaa
    • Eric Dumazet's avatar
      net: do not deplete pfmemalloc reserve · dff9eb05
      Eric Dumazet authored
      commit 79930f58 upstream.
      
      build_skb() should look at the page pfmemalloc status.
      If set, this means page allocator allocated this page in the
      expectation it would help to free other pages. Networking
      stack can do that only if skb->pfmemalloc is also set.
      
      Also, we must refrain using high order pages from the pfmemalloc
      reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
      them. Under memory pressure, using order-0 pages is probably the best
      strategy.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ luis: backported to 3.16: used davem's backport to 3.14 ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      dff9eb05
    • Eric Dumazet's avatar
      tcp: avoid looping in tcp_send_fin() · 929b845f
      Eric Dumazet authored
      commit 845704a5 upstream.
      
      Presence of an unbound loop in tcp_send_fin() had always been hard
      to explain when analyzing crash dumps involving gigantic dying processes
      with millions of sockets.
      
      Lets try a different strategy :
      
      In case of memory pressure, try to add the FIN flag to last packet
      in write queue, even if packet was already sent. TCP stack will
      be able to deliver this FIN after a timeout event. Note that this
      FIN being delivered by a retransmit, it also carries a Push flag
      given our current implementation.
      
      By checking sk_under_memory_pressure(), we anticipate that cooking
      many FIN packets might deplete tcp memory.
      
      In the case we could not allocate a packet, even with __GFP_WAIT
      allocation, then not sending a FIN seems quite reasonable if it allows
      to get rid of this socket, free memory, and not block the process from
      eventually doing other useful work.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      929b845f
    • Eric Dumazet's avatar
      tcp: fix possible deadlock in tcp_send_fin() · 049ba123
      Eric Dumazet authored
      commit d83769a5 upstream.
      
      Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
      case a huge process is killed by OOM, and tcp_mem[2] is hit.
      
      To be able to free memory we need to make progress, so this
      patch allows FIN packets to not care about tcp_mem[2], if
      skb allocation succeeded.
      
      In a follow-up patch, we might abort tcp_send_fin() infinite loop
      in case TIF_MEMDIE is set on this thread, as memory allocator
      did its best getting extra memory already.
      
      This patch reverts d22e1537 ("tcp: fix tcp fin memory accounting")
      
      Fixes: d22e1537 ("tcp: fix tcp fin memory accounting")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      049ba123
    • Tom Herbert's avatar
      ppp: call skb_checksum_complete_unset in ppp_receive_frame · a283d4e4
      Tom Herbert authored
      commit 3dfb0534 upstream.
      
      Call checksum_complete_unset in PPP receive to discard checksum-complete
      value. PPP does not pull checksum for headers and also modifies packet
      as in VJ compression.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a283d4e4
    • Tom Herbert's avatar
      net: add skb_checksum_complete_unset · c0fd4b0e
      Tom Herbert authored
      commit 4e18b9ad upstream.
      
      This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE
      is set. This is called to discard checksum-complete when packet
      is being modified and checksum is not pulled for headers in a layer.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c0fd4b0e
    • Sebastian Pöhn's avatar
      ip_forward: Drop frames with attached skb->sk · ee446a4d
      Sebastian Pöhn authored
      commit 2ab95749 upstream.
      
      Initial discussion was:
      [FYI] xfrm: Don't lookup sk_policy for timewait sockets
      
      Forwarded frames should not have a socket attached. Especially
      tw sockets will lead to panics later-on in the stack.
      
      This was observed with TPROXY assigning a tw socket and broken
      policy routing (misconfigured). As a result frame enters
      forwarding path instead of input. We cannot solve this in
      TPROXY as it cannot know that policy routing is broken.
      
      v2:
      Remove useless comment
      Signed-off-by: default avatarSebastian Poehn <sebastian.poehn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ee446a4d
  4. 28 Apr, 2015 1 commit
  5. 27 Apr, 2015 12 commits