1. 08 Aug, 2014 29 commits
    • Tyler Hall's avatar
      slip: Fix deadlock in write_wakeup · ccd4bc19
      Tyler Hall authored
      [ Upstream commit 661f7fda ]
      
      Use schedule_work() to avoid potentially taking the spinlock in
      interrupt context.
      
      Commit cc9fa74e ("slip/slcan: added locking in wakeup function") added
      necessary locking to the wakeup function and 367525c8/ddcde142 ("can:
      slcan: Fix spinlock variant") converted it to spin_lock_bh() because the lock
      is also taken in timers.
      
      Disabling softirqs is not sufficient, however, as tty drivers may call
      write_wakeup from interrupt context. This driver calls tty->ops->write() with
      its spinlock held, which may immediately cause an interrupt on the same CPU and
      subsequent spin_bug().
      
      Simply converting to spin_lock_irq/irqsave() prevents this deadlock, but
      causes lockdep to point out a possible circular locking dependency
      between these locks:
      
      (&(&sl->lock)->rlock){-.....}, at: slip_write_wakeup
      (&port_lock_key){-.....}, at: serial8250_handle_irq.part.13
      
      The slip transmit is holding the slip spinlock when calling the tty write.
      This grabs the port lock. On an interrupt, the handler grabs the port
      lock and calls write_wakeup which grabs the slip lock. This could be a
      problem if a serial interrupt occurs on another CPU during the slip
      transmit.
      
      To deal with these issues, don't grab the lock in the wakeup function by
      deferring the writeout to a workqueue. Also hold the lock during close
      when de-assigning the tty pointer to safely disarm the worker and
      timers.
      
      This bug is easily reproducible on the first transmit when slip is
      used with the standard 8250 serial driver.
      
      [<c0410b7c>] (spin_bug+0x0/0x38) from [<c006109c>] (do_raw_spin_lock+0x60/0x1d0)
       r5:eab27000 r4:ec02754c
      [<c006103c>] (do_raw_spin_lock+0x0/0x1d0) from [<c04185c0>] (_raw_spin_lock+0x28/0x2c)
       r10:0000001f r9:eabb814c r8:eabb8140 r7:40070193 r6:ec02754c r5:eab27000
       r4:ec02754c r3:00000000
      [<c0418598>] (_raw_spin_lock+0x0/0x2c) from [<bf3a0220>] (slip_write_wakeup+0x50/0xe0 [slip])
       r4:ec027540 r3:00000003
      [<bf3a01d0>] (slip_write_wakeup+0x0/0xe0 [slip]) from [<c026e420>] (tty_wakeup+0x48/0x68)
       r6:00000000 r5:ea80c480 r4:eab27000 r3:bf3a01d0
      [<c026e3d8>] (tty_wakeup+0x0/0x68) from [<c028a8ec>] (uart_write_wakeup+0x2c/0x30)
       r5:ed68ea90 r4:c06790d8
      [<c028a8c0>] (uart_write_wakeup+0x0/0x30) from [<c028dc44>] (serial8250_tx_chars+0x114/0x170)
      [<c028db30>] (serial8250_tx_chars+0x0/0x170) from [<c028dffc>] (serial8250_handle_irq+0xa0/0xbc)
       r6:000000c2 r5:00000060 r4:c06790d8 r3:00000000
      [<c028df5c>] (serial8250_handle_irq+0x0/0xbc) from [<c02933a4>] (dw8250_handle_irq+0x38/0x64)
       r7:00000000 r6:edd2f390 r5:000000c2 r4:c06790d8
      [<c029336c>] (dw8250_handle_irq+0x0/0x64) from [<c028d2f4>] (serial8250_interrupt+0x44/0xc4)
       r6:00000000 r5:00000000 r4:c06791c4 r3:c029336c
      [<c028d2b0>] (serial8250_interrupt+0x0/0xc4) from [<c0067fe4>] (handle_irq_event_percpu+0xb4/0x2b0)
       r10:c06790d8 r9:eab27000 r8:00000000 r7:00000000 r6:0000001f r5:edd52980
       r4:ec53b6c0 r3:c028d2b0
      [<c0067f30>] (handle_irq_event_percpu+0x0/0x2b0) from [<c006822c>] (handle_irq_event+0x4c/0x6c)
       r10:c06790d8 r9:eab27000 r8:c0673ae0 r7:c05c2020 r6:ec53b6c0 r5:edd529d4
       r4:edd52980
      [<c00681e0>] (handle_irq_event+0x0/0x6c) from [<c006b140>] (handle_level_irq+0xe8/0x100)
       r6:00000000 r5:edd529d4 r4:edd52980 r3:00022000
      [<c006b058>] (handle_level_irq+0x0/0x100) from [<c00676f8>] (generic_handle_irq+0x30/0x40)
       r5:0000001f r4:0000001f
      [<c00676c8>] (generic_handle_irq+0x0/0x40) from [<c000f57c>] (handle_IRQ+0xd0/0x13c)
       r4:ea997b18 r3:000000e0
      [<c000f4ac>] (handle_IRQ+0x0/0x13c) from [<c00086c4>] (armada_370_xp_handle_irq+0x4c/0x118)
       r8:000003ff r7:ea997b18 r6:ffffffff r5:60070013 r4:c0674dc0
      [<c0008678>] (armada_370_xp_handle_irq+0x0/0x118) from [<c0013840>] (__irq_svc+0x40/0x70)
      Exception stack(0xea997b18 to 0xea997b60)
      7b00:                                                       00000001 20070013
      7b20: 00000000 0000000b 20070013 eab27000 20070013 00000000 ed10103e eab27000
      7b40: c06790d8 ea997b74 ea997b60 ea997b60 c04186c0 c04186c8 60070013 ffffffff
       r9:eab27000 r8:ed10103e r7:ea997b4c r6:ffffffff r5:60070013 r4:c04186c8
      [<c04186a4>] (_raw_spin_unlock_irqrestore+0x0/0x54) from [<c0288fc0>] (uart_start+0x40/0x44)
       r4:c06790d8 r3:c028ddd8
      [<c0288f80>] (uart_start+0x0/0x44) from [<c028982c>] (uart_write+0xe4/0xf4)
       r6:0000003e r5:00000000 r4:ed68ea90 r3:0000003e
      [<c0289748>] (uart_write+0x0/0xf4) from [<bf3a0d20>] (sl_xmit+0x1c4/0x228 [slip])
       r10:ed388e60 r9:0000003c r8:ffffffdd r7:0000003e r6:ec02754c r5:ea717eb8
       r4:ec027000
      [<bf3a0b5c>] (sl_xmit+0x0/0x228 [slip]) from [<c0368d74>] (dev_hard_start_xmit+0x39c/0x6d0)
       r8:eaf163c0 r7:ec027000 r6:ea717eb8 r5:00000000 r4:00000000
      Signed-off-by: default avatarTyler Hall <tylerwhall@gmail.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ccd4bc19
    • Dmitry Popov's avatar
      ip_tunnel: fix ip_tunnel_lookup · 50c2bd26
      Dmitry Popov authored
      [ Upstream commit e0056593 ]
      
      This patch fixes 3 similar bugs where incoming packets might be routed into
      wrong non-wildcard tunnels:
      
      1) Consider the following setup:
          ip address add 1.1.1.1/24 dev eth0
          ip address add 1.1.1.2/24 dev eth0
          ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0
          ip link set ipip1 up
      
      Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst =
      1.1.1.2. Moreover even if there was wildcard tunnel like
         ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0
      but it was created before explicit one (with local 1.1.1.1), incoming ipip
      packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1.
      
      Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti)
      
      2)  ip address add 1.1.1.1/24 dev eth0
          ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0
          ip link set ipip1 up
      
      Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what
      src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this
      issue, 2.2.146.85 is just an example, there are more than 4 million of them.
      And again, wildcard tunnel like
         ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0
      wouldn't be ever matched if it was created before explicit tunnel like above.
      
      Gre & vti tunnels had the same issue.
      
      3)  ip address add 1.1.1.1/24 dev eth0
          ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0
          ip link set gre1 up
      
      Any incoming gre packet with key = 1 were routed into gre1, no matter what
      src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised
      the issue, 2.2.146.84 is just an example, there are more than 4 million of them.
      Wildcard tunnel like
         ip tunnel add gre2 remote any local any key 1 mode gre dev eth0
      wouldn't be ever matched if it was created before explicit tunnel like above.
      
      All this stuff happened because while looking for a wildcard tunnel we didn't
      check that matched tunnel is a wildcard one. Fixed.
      Signed-off-by: default avatarDmitry Popov <ixaphire@qrator.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      50c2bd26
    • Wei Yang's avatar
      net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one() · b2710b82
      Wei Yang authored
      [ Upstream commit befdf897 ]
      
      pci_match_id() just match the static pci_device_id, which may return NULL if
      someone binds the driver to a device manually using
      /sys/bus/pci/drivers/.../new_id.
      
      This patch wrap up a helper function __mlx4_remove_one() which does the tear
      down function but preserve the drv_data. Functions like
      mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
      releasing drvdata.
      
      Fixes: 97a5221f "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset".
      
      CC: Bjorn Helgaas <bhelgaas@google.com>
      CC: Amir Vadai <amirv@mellanox.com>
      CC: Jack Morgenstein <jackm@dev.mellanox.co.il>
      CC: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b2710b82
    • Wei Yang's avatar
      net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset · 55264a65
      Wei Yang authored
      [ No upstream commit, this is a cherry picked backport enabler. ]
      
      The second parameter of __mlx4_init_one() is used to identify whether the
      pci_dev is a PF or VF. Currently, when it is invoked in mlx4_pci_slot_reset()
      this information is missed.
      
      This patch match the pci_dev with mlx4_pci_table and passes the
      pci_device_id.driver_data to __mlx4_init_one() in mlx4_pci_slot_reset().
      Signed-off-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55264a65
    • Eric Dumazet's avatar
      udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup · d3412627
      Eric Dumazet authored
      [ Upstream commit 63c6f81c ]
      
      Its too easy to add thousand of UDP sockets on a particular bucket,
      and slow down an innocent multicast receiver.
      
      Early demux is supposed to be an optimization, we should avoid spending
      too much time in it.
      
      It is interesting to note __udp4_lib_demux_lookup() only tries to
      match first socket in the chain.
      
      10 is the threshold we already have in __udp4_lib_lookup() to switch
      to secondary hash.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDavid Held <drheld@google.com>
      Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d3412627
    • Cong Wang's avatar
      vxlan: use dev->needed_headroom instead of dev->hard_header_len · ef442ce9
      Cong Wang authored
      [ Upstream commit 2853af6a ]
      
      When we mirror packets from a vxlan tunnel to other device,
      the mirror device should see the same packets (that is, without
      outer header). Because vxlan tunnel sets dev->hard_header_len,
      tcf_mirred() resets mac header back to outer mac, the mirror device
      actually sees packets with outer headers
      
      Vxlan tunnel should set dev->needed_headroom instead of
      dev->hard_header_len, like what other ip tunnels do. This fixes
      the above problem.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: stephen hemminger <stephen@networkplumber.org>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ef442ce9
    • Michal Schmidt's avatar
      rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · afadafe7
      Michal Schmidt authored
      [ Upstream commit e5eca6d4 ]
      
      When running RHEL6 userspace on a current upstream kernel, "ip link"
      fails to show VF information.
      
      The reason is a kernel<->userspace API change introduced by commit
      88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
      after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
      in the netlink request.
      
      iproute2 adjusted for the API change in its commit 63338dca4513
      ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").
      
      The problem has been noticed before:
      http://marc.info/?l=linux-netdev&m=136692296022182&w=2
      (Subject: Re: getting VF link info seems to be broken in 3.9-rc8)
      
      We can do better than tell those with old userspace to upgrade. We can
      recognize the old iproute2 in the kernel by checking the netlink message
      length. Even when including the IFLA_EXT_MASK attribute, its netlink
      message is shorter than struct ifinfomsg.
      
      With this patch "ip link" shows VF information in both old and new
      iproute2 versions.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      afadafe7
    • Eric Dumazet's avatar
      ipv4: fix a race in ip4_datagram_release_cb() · d32874a9
      Eric Dumazet authored
      [ Upstream commit 9709674e ]
      
      Alexey gave a AddressSanitizer[1] report that finally gave a good hint
      at where was the origin of various problems already reported by Dormando
      in the past [2]
      
      Problem comes from the fact that UDP can have a lockless TX path, and
      concurrent threads can manipulate sk_dst_cache, while another thread,
      is holding socket lock and calls __sk_dst_set() in
      ip4_datagram_release_cb() (this was added in linux-3.8)
      
      It seems that all we need to do is to use sk_dst_check() and
      sk_dst_set() so that all the writers hold same spinlock
      (sk->sk_dst_lock) to prevent corruptions.
      
      TCP stack do not need this protection, as all sk_dst_cache writers hold
      the socket lock.
      
      [1]
      https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
      
      AddressSanitizer: heap-use-after-free in ipv4_dst_check
      Read of size 2 by thread T15453:
       [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
       [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
       [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
       [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
       [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      Freed by thread T15455:
       [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
       [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
       [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      Allocated by thread T15453:
       [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
       [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
       [<     inlined    >] __ip_route_output_key+0x3e8/0xf70
      __mkroute_output ./net/ipv4/route.c:1939
       [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
       [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
       [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      [2]
      <4>[196727.311203] general protection fault: 0000 [#1] SMP
      <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
      <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
      <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
      <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
      <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
      <4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
      <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
      <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
      <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
      <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
      <4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
      <4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
      <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[196727.311713] Stack:
      <4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
      <4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
      <4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
      <4>[196727.311885] Call Trace:
      <4>[196727.311907]  <IRQ>
      <4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
      <4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
      <4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
      <4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
      <4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
      <4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
      <4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
      <4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
      <4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
      <4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
      <4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
      <4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
      <4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
      <4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
      <4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
      <4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
      <4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
      <4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
      <4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
      <4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
      <4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
      <4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
      <4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
      <4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
      <4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
      <4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
      <4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
      <4>[196727.312722]  <EOI>
      <1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
      <4>[196727.313100]  RSP <ffff885effd23a70>
      <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
      <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
      Reported-by: default avatarAlexey Preobrazhensky <preobr@google.com>
      Reported-by: default avatardormando <dormando@rydia.ne>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 8141ed9f ("ipv4: Add a socket release callback for datagram sockets")
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d32874a9
    • Jon Cooper's avatar
      sfc: PIO:Restrict to 64bit arch and use 64-bit writes. · 36b4e591
      Jon Cooper authored
      [ Upstream commit daf37b55 ]
      
      Fixes:ee45fd92
      ("sfc: Use TX PIO for sufficiently small packets")
      
      The linux net driver uses memcpy_toio() in order to copy into
      the PIO buffers.
      Even on a 64bit machine this causes 32bit accesses to a write-
      combined memory region.
      There are hardware limitations that mean that only 64bit
      naturally aligned accesses are safe in all cases.
      Due to being write-combined memory region two 32bit accesses
      may be coalesced to form a 64bit non 64bit aligned access.
      Solution was to open-code the memory copy routines using pointers
      and to only enable PIO for x86_64 machines.
      
      Not tested on platforms other than x86_64 because this patch
      disables the PIO feature on other platforms.
      Compile-tested on x86 to ensure that works.
      
      The WARN_ON_ONCE() code in the previous version of this patch
      has been moved into the internal sfc debug driver as the
      assertion was unnecessary in the upstream kernel code.
      
      This bug fix applies to v3.13 and v3.14 stable branches.
      Signed-off-by: default avatarShradha Shah <sshah@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      36b4e591
    • Dmitry Popov's avatar
      ipip, sit: fix ipv4_{update_pmtu,redirect} calls · 4b7abc6c
      Dmitry Popov authored
      [ Upstream commit 2346829e ]
      
      ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a
      tunnel netdevice). It caused wrong route lookup and failure of pmtu update or
      redirect. We should use the same ifindex that we use in ip_route_output_* in
      *tunnel_xmit code. It is t->parms.link .
      Signed-off-by: default avatarDmitry Popov <ixaphire@qrator.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4b7abc6c
    • Eric Dumazet's avatar
      net: force a list_del() in unregister_netdevice_many() · b31f6907
      Eric Dumazet authored
      [ Upstream commit 87757a91 ]
      
      unregister_netdevice_many() API is error prone and we had too
      many bugs because of dangling LIST_HEAD on stacks.
      
      See commit f87e6f47 ("net: dont leave active on stack LIST_HEAD")
      
      In fact, instead of making sure no caller leaves an active list_head,
      just force a list_del() in the callee. No one seems to need to access
      the list after unregister_netdevice_many()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b31f6907
    • Bjørn Mork's avatar
      net: qmi_wwan: add Olivetti Olicard modems · bf66e576
      Bjørn Mork authored
      [ Upstream commit ba6de0f5 ]
      
      Lars writes: "I'm only 99% sure that the net interfaces are qmi
      interfaces, nothing to lose by adding them in my opinion."
      
      And I tend to agree based on the similarity with the two Olicard
      modems we already have here.
      Reported-by: default avatarLars Melin <larsm17@gmail.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf66e576
    • Alexei Starovoitov's avatar
      net: filter: fix sparc32 typo · 6e313e54
      Alexei Starovoitov authored
      [ Upstream commit 588f5d62 ]
      
      Fixes: 569810d1 ("net: filter: fix typo in sparc BPF JIT")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6e313e54
    • Alexei Starovoitov's avatar
      net: filter: fix typo in sparc BPF JIT · edda90ea
      Alexei Starovoitov authored
      [ Upstream commit 569810d1 ]
      
      fix typo in sparc codegen for SKF_AD_IFINDEX and SKF_AD_HATYPE
      classic BPF extensions
      
      Fixes: 2809a208 ("net: filter: Just In Time compiler for sparc")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      edda90ea
    • Sergei Shtylyov's avatar
      sh_eth: fix SH7619/771x support · abf517e3
      Sergei Shtylyov authored
      [ Upstream commit d8b0426a ]
      
      Commit 4a55530f (net: sh_eth: modify the definitions of register) managed
      to leave out the E-DMAC register entries in sh_eth_offset_fast_sh3_sh2[], thus
      totally breaking SH7619/771x support.  Add the missing entries using  the data
      from before that commit.
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Acked-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      abf517e3
    • Ben Dooks's avatar
      sh_eth: use RNC mode for packet reception · fc2e4a20
      Ben Dooks authored
      [ Upstream commit 530aa2d0 ]
      
      The current behaviour of the sh_eth driver is not to use the RNC bit
      for the receive ring. This means that every packet recieved is not only
      generating an IRQ but it also stops the receive ring DMA as well until
      the driver re-enables it after unloading the packet.
      
      This means that a number of the following errors are generated due to
      the receive packet FIFO overflowing due to nowhere to put packets:
      
      	net eth0: Receive FIFO Overflow
      
      Since feedback from Yoshihiro Shimoda shows that every supported LSI
      for this driver should have the bit enabled it seems the best way is
      to remove the RMCR default value from the per-system data and just
      write it when initialising the RMCR value. This is discussed in
      the message (http://www.spinics.net/lists/netdev/msg284912.html).
      
      I have tested the RMCR_RNC configuration with NFS root filesystem and
      the driver has not failed yet.  There are further test reports from
      Sergei Shtylov and others for both the R8A7790 and R8A7791.
      
      There is also feedback fron Cao Minh Hiep[1] which reports the
      same issue in (http://comments.gmane.org/gmane.linux.network/316285)
      showing this fixes issues with losing UDP datagrams under iperf.
      Tested-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Acked-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Acked-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fc2e4a20
    • Yuchung Cheng's avatar
      tcp: fix cwnd undo on DSACK in F-RTO · ce2d4ab4
      Yuchung Cheng authored
      [ Upstream commit 0cfa5c07 ]
      
      This bug is discovered by an recent F-RTO issue on tcpm list
      https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html
      
      The bug is that currently F-RTO does not use DSACK to undo cwnd in
      certain cases: upon receiving an ACK after the RTO retransmission in
      F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
      the sender only calls tcp_try_undo_loss() if some never retransmisted
      data is sacked (FLAG_ORIG_DATA_SACKED).
      
      The correct behavior is to unconditionally call tcp_try_undo_loss so
      the DSACK information is used properly to undo the cwnd reduction.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ce2d4ab4
    • Jiri Pirko's avatar
      team: fix mtu setting · 3691c776
      Jiri Pirko authored
      [ Upstream commit 9d0d68fa ]
      
      Now it is not possible to set mtu to team device which has a port
      enslaved to it. The reason is that when team_change_mtu() calls
      dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU
      event is called and team_device_event() returns NOTIFY_BAD forbidding
      the change. So fix this by returning NOTIFY_DONE here in case team is
      changing mtu in team_change_mtu().
      
      Introduced-by: 3d249d4c "net: introduce ethernet teaming device"
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3691c776
    • Eric Dumazet's avatar
      net: fix inet_getid() and ipv6_select_ident() bugs · 25de3837
      Eric Dumazet authored
      [ Upstream commit 39c36094 ]
      
      I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
      is disabled.
      Note how GSO/TSO packets do not have monotonically incrementing ID.
      
      06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
      06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
      06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
      06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
      06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
      
      It appears I introduced this bug in linux-3.1.
      
      inet_getid() must return the old value of peer->ip_id_count,
      not the new one.
      
      Lets revert this part, and remove the prevention of
      a null identification field in IPv6 Fragment Extension Header,
      which is dubious and not even done properly.
      
      Fixes: 87c48fa3 ("ipv6: make fragment identifications less predictable")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      25de3837
    • Tom Gundersen's avatar
      net: tunnels - enable module autoloading · fcdb3fa2
      Tom Gundersen authored
      [ Upstream commit f98f89a0 ]
      
      Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.
      
      This is in line with how most other netdev kinds work, and will allow userspace
      to create tunnels without having CAP_SYS_MODULE.
      Signed-off-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fcdb3fa2
    • Toshiaki Makita's avatar
      bridge: Prevent insertion of FDB entry with disallowed vlan · 69c3a915
      Toshiaki Makita authored
      [ Upstream commit e0d7968a ]
      
      br_handle_local_finish() is allowing us to insert an FDB entry with
      disallowed vlan. For example, when port 1 and 2 are communicating in
      vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
      interfere with their communication by spoofed src mac address with
      vlan id 10.
      
      Note: Even if it is judged that a frame should not be learned, it should
      not be dropped because it is destined for not forwarding layer but higher
      layer. See IEEE 802.1Q-2011 8.13.10.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      69c3a915
    • Michal Schmidt's avatar
      netlink: rate-limit leftover bytes warning and print process name · d213d665
      Michal Schmidt authored
      [ Upstream commit bfc5184b ]
      
      Any process is able to send netlink messages with leftover bytes.
      Make the warning rate-limited to prevent too much log spam.
      
      The warning is supposed to help find userspace bugs, so print the
      triggering command name to implicate the buggy program.
      
      [v2: Use pr_warn_ratelimited instead of printk_ratelimited.]
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d213d665
    • Dan Carpenter's avatar
      qlcnic: info leak in qlcnic_dcb_peer_app_info() · cdc2ae50
      Dan Carpenter authored
      [ Upstream commit 7df566bb ]
      
      This function is called from dcbnl_build_peer_app().  The "info"
      struct isn't initialized at all so we disclose 2 bytes of uninitialized
      stack data.  We should clear it before passing it to the user.
      
      Fixes: 48365e48 ('qlcnic: dcb: Add support for CEE Netlink interface.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cdc2ae50
    • Eric W. Biederman's avatar
      netlink: Only check file credentials for implicit destinations · ca2c63c0
      Eric W. Biederman authored
      [ Upstream commit 2d7a85f4 ]
      
      It was possible to get a setuid root or setcap executable to write to
      it's stdout or stderr (which has been set made a netlink socket) and
      inadvertently reconfigure the networking stack.
      
      To prevent this we check that both the creator of the socket and
      the currentl applications has permission to reconfigure the network
      stack.
      
      Unfortunately this breaks Zebra which always uses sendto/sendmsg
      and creates it's socket without any privileges.
      
      To keep Zebra working don't bother checking if the creator of the
      socket has privilege when a destination address is specified.  Instead
      rely exclusively on the privileges of the sender of the socket.
      
      Note from Andy: This is exactly Eric's code except for some comment
      clarifications and formatting fixes.  Neither I nor, I think, anyone
      else is thrilled with this approach, but I'm hesitant to wait on a
      better fix since 3.15 is almost here.
      
      Note to stable maintainers: This is a mess.  An earlier series of
      patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
      but they did so in a way that breaks Zebra.  The offending series
      includes:
      
          commit aa4cf945
          Author: Eric W. Biederman <ebiederm@xmission.com>
          Date:   Wed Apr 23 14:28:03 2014 -0700
      
              net: Add variants of capable for use on netlink messages
      
      If a given kernel version is missing that series of fixes, it's
      probably worth backporting it and this patch.  if that series is
      present, then this fix is critical if you care about Zebra.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca2c63c0
    • Eric W. Biederman's avatar
      net: Use netlink_ns_capable to verify the permisions of netlink messages · 7539d2b6
      Eric W. Biederman authored
      [ Upstream commit 90f62cf3 ]
      
      It is possible by passing a netlink socket to a more privileged
      executable and then to fool that executable into writing to the socket
      data that happens to be valid netlink message to do something that
      privileged executable did not intend to do.
      
      To keep this from happening replace bare capable and ns_capable calls
      with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
      Which act the same as the previous calls except they verify that the
      opener of the socket had the desired permissions as well.
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ kamal: backport to 3.13-stable from David's 3.14 port: remove unused 'net' ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7539d2b6
    • Eric W. Biederman's avatar
      net: Add variants of capable for use on netlink messages · ad0ac1a6
      Eric W. Biederman authored
      [ Upstream commit aa4cf945 ]
      
      netlink_net_capable - The common case use, for operations that are safe on a network namespace
      netlink_capable - For operations that are only known to be safe for the global root
      netlink_ns_capable - The general case of capable used to handle special cases
      
      __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
      		       the skbuff of a netlink message.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ad0ac1a6
    • Eric W. Biederman's avatar
      net: Add variants of capable for use on on sockets · c57c185d
      Eric W. Biederman authored
      [ Upstream commit a3b299da ]
      
      sk_net_capable - The common case, operations that are safe in a network namespace.
      sk_capable - Operations that are not known to be safe in a network namespace
      sk_ns_capable - The general case for special cases.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c57c185d
    • Eric W. Biederman's avatar
      net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump · ea254771
      Eric W. Biederman authored
      [ Upstream commit a53b72c8 ]
      
      The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
      from it's sources it is not clear why it is wrong.  Move the computation
      into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.
      
      This does not yet correct the capability check but instead simply moves it to make
      it clear what is going on.
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ea254771
    • Eric W. Biederman's avatar
      netlink: Rename netlink_capable netlink_allowed · e11c7f82
      Eric W. Biederman authored
      [ Upstream commit 5187cd05 ]
      
      netlink_capable is a static internal function in af_netlink.c and we
      have better uses for the name netlink_capable.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e11c7f82
  2. 07 Aug, 2014 11 commits
    • Anssi Hannula's avatar
      dm cache: fix race affecting dirty block count · c3422f91
      Anssi Hannula authored
      commit 44fa816b upstream.
      
      nr_dirty is updated without locking, causing it to drift so that it is
      non-zero (either a small positive integer, or a very large one when an
      underflow occurs) even when there are no actual dirty blocks.  This was
      due to a race between the workqueue and map function accessing nr_dirty
      in parallel without proper protection.
      
      People were seeing under runs due to a race on increment/decrement of
      nr_dirty, see: https://lkml.org/lkml/2014/6/3/648
      
      Fix this by using an atomic_t for nr_dirty.
      
      Reported-by: roma1390@gmail.com
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [ luis: backported to 3.11: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c3422f91
    • Greg Thelen's avatar
      dm bufio: fully initialize shrinker · b45bb0fd
      Greg Thelen authored
      commit d8c712ea upstream.
      
      1d3d4437 ("vmscan: per-node deferred work") added a flags field to
      struct shrinker assuming that all shrinkers were zero filled.  The dm
      bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
      in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
      But there are proposed patches which add other bits to shrinker.flags
      (e.g. memcg awareness).
      
      Rather than simply initializing the shrinker, this patch uses kzalloc()
      when allocating the dm_bufio_client to ensure that the embedded shrinker
      and any other similar structures are zeroed.
      
      This fixes theoretical over aggressive shrinking of dm bufio objects.
      If the uninitialized dm_bufio_client.shrinker.flags contains
      SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
      each numa node rather than just once.  This has been broken since 3.12.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b45bb0fd
    • Jan Kara's avatar
      timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks · 97653aba
      Jan Kara authored
      commit 504d5874 upstream.
      
      clockevents_increase_min_delta() calls printk() from under
      hrtimer_bases.lock. That causes lock inversion on scheduler locks because
      printk() can call into the scheduler. Lockdep puts it as:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc8-06195-g939f04be #2 Not tainted
      -------------------------------------------------------
      trinity-main/74 is trying to acquire lock:
       (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
      
      but task is already holding lock:
       (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (hrtimer_bases.lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
             [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
             [<81080792>] task_clock_event_start+0x3a/0x3f
             [<810807a4>] task_clock_event_add+0xd/0x14
             [<8108259a>] event_sched_in+0xb6/0x17a
             [<810826a2>] group_sched_in+0x44/0x122
             [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
             [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
             [<81082bf6>] __perf_install_in_context+0x8b/0xa3
             [<8107eb8e>] remote_function+0x12/0x2a
             [<8105f5af>] smp_call_function_single+0x2d/0x53
             [<8107e17d>] task_function_call+0x30/0x36
             [<8107fb82>] perf_install_in_context+0x87/0xbb
             [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
             [<810856f9>] SyS_perf_event_open+0x17/0x19
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #4 (&ctx->lock){......}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      -> #3 (&rq->lock){-.-.-.}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81040873>] __task_rq_lock+0x33/0x3a
             [<8104184c>] wake_up_new_task+0x25/0xc2
             [<8102474b>] do_fork+0x15c/0x2a0
             [<810248a9>] kernel_thread+0x1a/0x1f
             [<814232a2>] rest_init+0x1a/0x10e
             [<817af949>] start_kernel+0x303/0x308
             [<817af2ab>] i386_start_kernel+0x79/0x7d
      
      -> #2 (&p->pi_lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<810413dd>] try_to_wake_up+0x1d/0xd6
             [<810414cd>] default_wake_function+0xb/0xd
             [<810461f3>] __wake_up_common+0x39/0x59
             [<81046346>] __wake_up+0x29/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #1 (&tty->write_wait){-.....}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<81046332>] __wake_up+0x15/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #0 (&port_lock_key){-.....}:
             [<8104a62d>] __lock_acquire+0x9ea/0xc6d
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<811c60be>] serial8250_console_write+0x8c/0x10c
             [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
             [<8104f5d5>] console_unlock+0x1d7/0x398
             [<8104fb70>] vprintk_emit+0x3da/0x3e4
             [<81425f76>] printk+0x17/0x19
             [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
             [<8105c548>] clockevents_program_event+0xe7/0xf3
             [<8105cc1c>] tick_program_event+0x1e/0x23
             [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
             [<8103c49e>] __remove_hrtimer+0x5b/0x79
             [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
             [<8103cb4b>] hrtimer_cancel+0xd/0x18
             [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
             [<81080705>] task_clock_event_stop+0x20/0x64
             [<81080756>] task_clock_event_del+0xd/0xf
             [<81081350>] event_sched_out+0xab/0x11e
             [<810813e0>] group_sched_out+0x1d/0x66
             [<81081682>] ctx_sched_out+0xaf/0xbf
             [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &port_lock_key --> &ctx->lock --> hrtimer_bases.lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(hrtimer_bases.lock);
                                     lock(&ctx->lock);
                                     lock(hrtimer_bases.lock);
        lock(&port_lock_key);
      
       *** DEADLOCK ***
      
      4 locks held by trinity-main/74:
       #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
       #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
       #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
      
      stack backtrace:
      CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2
       00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
       8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
       8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
      Call Trace:
       [<81426f69>] dump_stack+0x16/0x18
       [<81425a99>] print_circular_bug+0x18f/0x19c
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104af87>] ? lock_release+0x191/0x223
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8104416d>] ? __dequeue_entity+0x23/0x27
       [<81044505>] ? pick_next_task_fair+0xb1/0x120
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
       [<810475b0>] ? trace_hardirqs_off+0xb/0xd
       [<81056346>] ? rcu_irq_exit+0x64/0x77
      
      Fix the problem by using printk_deferred() which does not call into the
      scheduler.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      97653aba
    • John Stultz's avatar
      printk: rename printk_sched to printk_deferred · 8341e650
      John Stultz authored
      commit aac74dc4 upstream.
      
      After learning we'll need some sort of deferred printk functionality in
      the timekeeping core, Peter suggested we rename the printk_sched function
      so it can be reused by needed subsystems.
      
      This only changes the function name. No logic changes.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ luis: prereq for 504d5874 ("timer: Fix lock inversion between
        hrtimer_bases.lock and scheduler locks"); backported to 3.11:
        - dropped changes to kernel/sched/deadline.c ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8341e650
    • Milan Broz's avatar
      crypto: af_alg - properly label AF_ALG socket · ccd13b37
      Milan Broz authored
      commit 4c63f83c upstream.
      
      Th AF_ALG socket was missing a security label (e.g. SELinux)
      which means that socket was in "unlabeled" state.
      
      This was recently demonstrated in the cryptsetup package
      (cryptsetup v1.6.5 and later.)
      See https://bugzilla.redhat.com/show_bug.cgi?id=1115120
      
      This patch clones the sock's label from the parent sock
      and resolves the issue (similar to AF_BLUETOOTH protocol family).
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ccd13b37
    • Michal Hocko's avatar
      memcg: oom_notify use-after-free fix · 473ce8c0
      Michal Hocko authored
      commit 2bcf2e92 upstream.
      
      Paul Furtado has reported the following GPF:
      
        general protection fault: 0000 [#1] SMP
        Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront
        CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1
        task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000
        RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240
        RSP: e02b:ffff8801d2ec7d48  EFLAGS: 00010283
        RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e
        RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200
        RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800
        R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30
        FS:  00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000
        CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660
        Call Trace:
          pagefault_out_of_memory+0x18/0x90
          mm_fault_error+0xa9/0x1a0
          __do_page_fault+0x478/0x4c0
          do_page_fault+0x2c/0x40
          page_fault+0x28/0x30
        Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75
        RIP  mem_cgroup_oom_synchronize+0x140/0x240
      
      Commit fb2a6fc5 ("mm: memcg: rework and document OOM waiting and
      wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock
      assuming it is protected by the hierarchical OOM-lock.
      
      Although this is true for the notification part the protection doesn't
      cover unregistration of event which can happen in parallel now so
      mem_cgroup_oom_notify can see already unlinked and/or freed
      mem_cgroup_eventfd_list.
      
      Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881
      
      Fixes: fb2a6fc5 (mm: memcg: rework and document OOM waiting and wakeup)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reported-by: default avatarPaul Furtado <paulfurtado91@gmail.com>
      Tested-by: default avatarPaul Furtado <paulfurtado91@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      473ce8c0
    • Alexandre Bounine's avatar
      rapidio/tsi721_dma: fix failure to obtain transaction descriptor · 4a4f762e
      Alexandre Bounine authored
      commit 0193ed82 upstream.
      
      This is a bug fix for the situation when function tsi721_desc_get() fails
      to obtain a free transaction descriptor.
      
      The bug usually results in a memory access crash dump when data transfer
      scatter-gather list has more entries than size of hardware buffer
      descriptors ring.  This fix ensures that error is properly returned to a
      caller instead of an invalid entry.
      
      This patch is applicable to kernel versions starting from v3.5.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4a4f762e
    • David Rientjes's avatar
      mm, thp: do not allow thp faults to avoid cpuset restrictions · 1ba32b22
      David Rientjes authored
      commit b104a35d upstream.
      
      The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
      should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
      should be restricted only to the set of allowed cpuset mems.
      
      Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
      the fault path from using memory compaction or direct reclaim.  Thus, it
      is unfairly able to allocate outside of its cpuset mems restriction as a
      side-effect.
      
      This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
      truly GFP_ATOMIC by verifying it is also not a thp allocation.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1ba32b22
    • Maxim Patlasov's avatar
      mm/page-writeback.c: fix divide by zero in bdi_dirty_limits() · fbeb558d
      Maxim Patlasov authored
      commit f6789593 upstream.
      
      Under memory pressure, it is possible for dirty_thresh, calculated by
      global_dirty_limits() in balance_dirty_pages(), to equal zero.  Then, if
      strictlimit is true, bdi_dirty_limits() tries to resolve the proportion:
      
        bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh
      
      by dividing by zero.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fbeb558d
    • Andrey Ryabinin's avatar
      net: sendmsg: fix NULL pointer dereference · 158c3d64
      Andrey Ryabinin authored
      commit 40eea803 upstream.
      
      Sasha's report:
      	> While fuzzing with trinity inside a KVM tools guest running the latest -next
      	> kernel with the KASAN patchset, I've stumbled on the following spew:
      	>
      	> [ 4448.949424] ==================================================================
      	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
      	> [ 4448.952988] Read of size 2 by thread T19638:
      	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
      	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
      	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
      	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
      	> [ 4448.961266] Call Trace:
      	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
      	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
      	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
      	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
      	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
      	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
      	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
      	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
      	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
      	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
      	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
      	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
      	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
      	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
      	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
      	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
      	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
      	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
      	> [ 4448.988929] ==================================================================
      
      This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
      
      After this report there was no usual "Unable to handle kernel NULL pointer dereference"
      and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
      
      This bug was introduced in f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic).
      Commit message states that:
      	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      	 affect sendto as it would bail out earlier while trying to copy-in the
      	 address."
      But in fact this affects sendto when address 0 is mapped and contains
      socket address structure in it. In such case copy-in address will succeed,
      verify_iovec() function will successfully exit with msg->msg_namelen > 0
      and msg->msg_name == NULL.
      
      This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      158c3d64
    • Russell King's avatar
      ARM: fix alignment of keystone page table fixup · 6b6b053d
      Russell King authored
      commit 823a19cd upstream.
      
      If init_mm.brk is not section aligned, the LPAE fixup code will miss
      updating the final PMD.  Fix this by aligning map_end.
      
      Fixes: a77e0c7b ("ARM: mm: Recreate kernel mappings in early_paging_init()")
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6b6b053d