1. 28 Jul, 2013 27 commits
    • Maarten Lankhorst's avatar
      alx: fix lockdep annotation · 61b6f128
      Maarten Lankhorst authored
      [ Upstream commit a8798a5c ]
      
      Move spin_lock_init to be called before the spinlocks are used, preventing a lockdep splat.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61b6f128
    • Sasha Levin's avatar
      9p: fix off by one causing access violations and memory corruption · 53effe16
      Sasha Levin authored
      [ Upstream commit 110ecd69 ]
      
      p9_release_pages() would attempt to dereference one value past the end of
      pages[]. This would cause the following crashes:
      
      [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
      [ 6293.174146] IP: [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
      [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 6293.180060] Modules linked in:
      [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G        W    3.10.0-next-20130710-sasha #3954
      [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
      [ 6293.180060] RIP: 0010:[<ffffffff8412793b>]  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.214316] RSP: 0000:ffff880787ddfc28  EFLAGS: 00010202
      [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
      [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
      [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
      [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
      [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
      [ 6293.222017] FS:  00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
      [ 6293.256784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
      [ 6293.256784] Stack:
      [ 6293.256784]  ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
      [ 6293.256784]  ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
      [ 6293.256784]  ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
      [ 6293.256784] Call Trace:
      [ 6293.256784]  [<ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630
      [ 6293.256784]  [<ffffffff8115c610>] ? wake_up_bit+0x40/0x40
      [ 6293.256784]  [<ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0
      [ 6293.256784]  [<ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120
      [ 6293.256784]  [<ffffffff84122a21>] p9_client_read+0xe1/0x2c0
      [ 6293.256784]  [<ffffffff81708a90>] v9fs_file_read+0x90/0xc0
      [ 6293.256784]  [<ffffffff812bd073>] vfs_read+0xc3/0x130
      [ 6293.256784]  [<ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10
      [ 6293.256784]  [<ffffffff812bd5a2>] SyS_read+0x62/0xa0
      [ 6293.256784]  [<ffffffff841a1a00>] tracesys+0xdd/0xe2
      [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
      [ 6293.256784] RIP  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.256784]  RSP <ffff880787ddfc28>
      [ 6293.256784] CR2: ffff8807c96f3000
      [ 6293.256784] ---[ end trace 50822ee72cd360fc ]---
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53effe16
    • Hannes Frederic Sowa's avatar
      ipv6: in case of link failure remove route directly instead of letting it expire · a025e28a
      Hannes Frederic Sowa authored
      [ Upstream commit 1eb4f758 ]
      
      We could end up expiring a route which is part of an ecmp route set. Doing
      so would invalidate the rt->rt6i_nsiblings calculations and could provoke
      the following panic:
      
      [   80.144667] ------------[ cut here ]------------
      [   80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
      [   80.145172] invalid opcode: 0000 [#1] SMP
      [   80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
      +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
      [   80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
      [   80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
      [   80.145172] RIP: 0010:[<ffffffff815f3b5d>]  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172] RSP: 0018:ffff880118771798  EFLAGS: 00010202
      [   80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
      [   80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
      [   80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
      [   80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
      [   80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
      [   80.145172] FS:  00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
      [   80.145172] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
      [   80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   80.145172] Stack:
      [   80.145172]  0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
      [   80.145172]  0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
      [   80.145172]  0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
      [   80.145172] Call Trace:
      [   80.145172]  [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
      [   80.145172]  [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
      [   80.145172]  [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40
      [   80.145172]  [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
      [   80.145172]  [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
      [   80.145172]  [<ffffffff81616077>] fib6_rule_action+0xd7/0x210
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140
      [   80.145172]  [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff815edea3>] ip6_route_output+0x73/0xb0
      [   80.145172]  [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
      [   80.145172]  [<ffffffff813007b1>] ? list_del+0x11/0x40
      [   80.145172]  [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
      [   80.145172]  [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
      [   80.145172]  [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
      [   80.145172]  [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
      [   80.145172]  [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
      [   80.145172]  [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
      [   80.145172]  [<ffffffff81524a68>] SYSC_sendto+0x128/0x180
      [   80.145172]  [<ffffffff8109825c>] ? update_curr+0xec/0x170
      [   80.145172]  [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
      [   80.145172]  [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
      [   80.145172]  [<ffffffff8152509e>] SyS_sendto+0xe/0x10
      [   80.145172]  [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
      [   80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
      [   80.145172] RIP  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172]  RSP <ffff880118771798>
      [   80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
      [   80.390154] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a025e28a
    • Jason Wang's avatar
      macvtap: correctly linearize skb when zerocopy is used · bd31fdd2
      Jason Wang authored
      [ Upstream commit 61d46bf9 ]
      
      Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
      linearize parts of the skb to let the rest of iov to be fit in
      the frags, we need count copylen into linear when calling macvtap_alloc_skb()
      instead of partly counting it into data_len. Since this breaks
      zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
      be zero at beginning. This cause nr_frags to be increased wrongly without
      setting the correct frags.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd31fdd2
    • Jason Wang's avatar
      tuntap: correctly linearize skb when zerocopy is used · d09ec76a
      Jason Wang authored
      [ Upstream commit 3dd5c330 ]
      
      Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
      linearize parts of the skb to let the rest of iov to be fit in
      the frags, we need count copylen into linear when calling tun_alloc_skb()
      instead of partly counting it into data_len. Since this breaks
      zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
      be zero at beginning. This cause nr_frags to be increased wrongly without
      setting the correct frags.
      
      This bug were introduced from 0690899b
      (tun: experimental zero copy tx support)
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d09ec76a
    • dingtianhong's avatar
      ifb: fix rcu_sched self-detected stalls · 6683151a
      dingtianhong authored
      [ Upstream commit 440d57bc ]
      
      According to the commit 16b0dc29
      (dummy: fix rcu_sched self-detected stalls)
      
      Eric Dumazet fix the problem in dummy, but the ifb will occur the
      same problem like the dummy modules.
      
      Trying to "modprobe ifb numifbs=30000" triggers :
      
      INFO: rcu_sched self-detected stall on CPU
      
      After this splat, RTNL is locked and reboot is needed.
      
      We must call cond_resched() to avoid this, even holding RTNL.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6683151a
    • Dave Kleikamp's avatar
      sunvnet: vnet_port_remove must call unregister_netdev · c51a7a30
      Dave Kleikamp authored
      [ Upstream commit aabb9875 ]
      
      The missing call to unregister_netdev() leaves the interface active
      after the driver is unloaded by rmmod.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c51a7a30
    • Michael S. Tsirkin's avatar
      vhost-net: fix use-after-free in vhost_net_flush · f5ce1d25
      Michael S. Tsirkin authored
      [ Upstream commit c38e39c3 ]
      
      vhost_net_ubuf_put_and_wait has a confusing name:
      it will actually also free it's argument.
      Thus since commit 1280c27f
          "vhost-net: flush outstanding DMAs on memory change"
      vhost_net_flush tries to use the argument after passing it
      to vhost_net_ubuf_put_and_wait, this results
      in use after free.
      To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
      add an new API for callers that want to free ubufs.
      Acked-by: default avatarAsias He <asias@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5ce1d25
    • Michael S. Tsirkin's avatar
      virtio_net: fix race in RX VQ processing · 2b0e8a4f
      Michael S. Tsirkin authored
      [ Upstream commit cbdadbbf ]
      
      virtio net called virtqueue_enable_cq on RX path after napi_complete, so
      with NAPI_STATE_SCHED clear - outside the implicit napi lock.
      This violates the requirement to synchronize virtqueue_enable_cq wrt
      virtqueue_add_buf.  In particular, used event can move backwards,
      causing us to lose interrupts.
      In a debug build, this can trigger panic within START_USE.
      
      Jason Wang reports that he can trigger the races artificially,
      by adding udelay() in virtqueue_enable_cb() after virtio_mb().
      
      However, we must call napi_complete to clear NAPI_STATE_SCHED before
      polling the virtqueue for used buffers, otherwise napi_schedule_prep in
      a callback will fail, causing us to lose RX events.
      
      To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
      set (under napi lock), later call virtqueue_poll with
      NAPI_STATE_SCHED clear (outside the lock).
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Tested-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b0e8a4f
    • Michael S. Tsirkin's avatar
      virtio: support unlocked queue poll · c23b1ece
      Michael S. Tsirkin authored
      [ Upstream commit cc229884 ]
      
      This adds a way to check ring empty state after enable_cb outside any
      locks. Will be used by virtio_net.
      
      Note: there's room for more optimization: caller is likely to have a
      memory barrier already, which means we might be able to get rid of a
      barrier here.  Deferring this optimization until we do some
      benchmarking.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c23b1ece
    • Jongsung Kim's avatar
    • Ben Hutchings's avatar
      sfc: Fix memory leak when discarding scattered packets · 1c6d3d1d
      Ben Hutchings authored
      [ Upstream commit 734d4e15 ]
      
      Commit 2768935a ('sfc: reuse pages to avoid DMA mapping/unmapping
      costs') did not fully take account of DMA scattering which was
      introduced immediately before.  If a received packet is invalid and
      must be discarded, we only drop a reference to the first buffer's
      page, but we need to drop a reference for each buffer the packet
      used.
      
      I think this bug was missed partly because efx_recycle_rx_buffers()
      was not renamed and so no longer does what its name says.  It does not
      change the state of buffers, but only prepares the underlying pages
      for recycling.  Rename it accordingly.
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c6d3d1d
    • Hannes Frederic Sowa's avatar
      ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available · c3a54912
      Hannes Frederic Sowa authored
      [ Upstream commit 3630d400 ]
      
      After the removal of rt->n we do not create a neighbour entry at route
      insertion time (rt6_bind_neighbour is gone). As long as no neighbour is
      created because of "useful traffic" we skip this routing entry because
      rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and
      thus returns false.
      
      This change was introduced by commit
      887c95cc ("ipv6: Complete neighbour
      entry removal from dst_entry.")
      
      To quote RFC4191:
      "If the host has no information about the router's reachability, then
      the host assumes the router is reachable."
      
      and also:
      "A host MUST NOT probe a router's reachability in the absence of useful
      traffic that the host would have sent to the router if it were reachable."
      
      So, just assume the router is reachable and let's rt6_probe do the
      rest. We don't need to create a neighbour on route insertion time.
      
      If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support)
      a neighbour is only valid if its nud_state is NUD_VALID. I did not find
      any references that we should probe the router on route insertion time
      via the other RFCs. So skip this route in that case.
      
      v2:
      a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov)
      Reported-by: default avatarPierre Emeriaud <petrus.lt@gmail.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3a54912
    • Hannes Frederic Sowa's avatar
      ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size · 7852c5bf
      Hannes Frederic Sowa authored
      [ Upstream commit 75a493e6 ]
      
      If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track
      of this when appending the second frame on a corked socket. This results
      in the following splat:
      
      [37598.993962] ------------[ cut here ]------------
      [37598.994008] kernel BUG at net/core/skbuff.c:2064!
      [37598.994008] invalid opcode: 0000 [#1] SMP
      [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
      +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi
      +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm
      [37598.994008]  snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc
      +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
      [37598.994008] CPU 0
      [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG
      [37598.994008] RIP: 0010:[<ffffffff815443a5>]  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
      [37598.994008] RSP: 0018:ffff88003670da18  EFLAGS: 00010202
      [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0
      [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00
      [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040
      [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8
      [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000
      [37598.994008] FS:  00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
      [37598.994008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0
      [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0)
      [37598.994008] Stack:
      [37598.994008]  ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8
      [37598.994008]  ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200
      [37598.994008]  0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4
      [37598.994008] Call Trace:
      [37598.994008]  [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0
      [37598.994008]  [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0
      [37598.994008]  [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40
      [37598.994008]  [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10
      [37598.994008]  [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90
      [37598.994008]  [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0
      [37598.994008]  [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30
      [37598.994008]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
      [37598.994008]  [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
      [37598.994008]  [<ffffffff8153d97d>] sys_sendto+0x12d/0x180
      [37598.994008]  [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0
      [37598.994008]  [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240
      [37598.994008]  [<ffffffff8166a7e7>] tracesys+0xdd/0xe2
      [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48
      [37598.994008] RIP  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
      [37598.994008]  RSP <ffff88003670da18>
      [37599.007323] ---[ end trace d69f6a17f8ac8eee ]---
      
      While there, also check if path mtu discovery is activated for this
      socket. The logic was adapted from ip6_append_data when first writing
      on the corked socket.
      
      This bug was introduced with commit
      0c183379 ("ipv6: fix incorrect ipsec
      fragment").
      
      v2:
      a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE.
      b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao
         feng, thanks!).
      c) Change mtu to unsigned int, else we get a warning about
         non-matching types because of the min()-macro type-check.
      Acked-by: default avatarGao feng <gaofeng@cn.fujitsu.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7852c5bf
    • Hannes Frederic Sowa's avatar
      ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data · 07243c3d
      Hannes Frederic Sowa authored
      [ Upstream commit 8822b64a ]
      
      We accidentally call down to ip6_push_pending_frames when uncorking
      pending AF_INET data on a ipv6 socket. This results in the following
      splat (from Dave Jones):
      
      skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:126!
      invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
      +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
      CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
      task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
      RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
      RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
      RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
      RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
      RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
      R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
      FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Stack:
       ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
       ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
       ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
      Call Trace:
       [<ffffffff8159a9aa>] skb_push+0x3a/0x40
       [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
       [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
       [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
       [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
       [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
       [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
       [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
       [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
       [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
       [<ffffffff816f5d54>] tracesys+0xdd/0xe2
      Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
      RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
       RSP <ffff8801e6431de8>
      
      This patch adds a check if the pending data is of address family AF_INET
      and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
      if that is the case.
      
      This bug was found by Dave Jones with trinity.
      
      (Also move the initialization of fl6 below the AF_INET check, even if
      not strictly necessary.)
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07243c3d
    • Cong Wang's avatar
      ipip: fix a regression in ioctl · 0e7eadef
      Cong Wang authored
      [ Upstream commit 3b7b514f ]
      
      This is a regression introduced by
      commit fd58156e (IPIP: Use ip-tunneling code.)
      
      Similar to GRE tunnel, previously we only check the parameters
      for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
      check is moved for all commands.
      
      So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
      
      Also, the check for i_key, o_key etc. is suspicious too,
      which did not exist before, reset them before passing
      to ip_tunnel_ioctl().
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e7eadef
    • Wei Yongjun's avatar
      l2tp: add missing .owner to struct pppox_proto · ed7f614a
      Wei Yongjun authored
      [ Upstream commit e1558a93 ]
      
      Add missing .owner of struct pppox_proto. This prevents the
      module from being removed from underneath its users.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed7f614a
    • Pravin B Shelar's avatar
      ip_tunnels: Use skb-len to PMTU check. · edb42cad
      Pravin B Shelar authored
      [ Upstream commit 23a3647b ]
      
      In path mtu check, ip header total length works for gre device
      but not for gre-tap device.  Use skb len which is consistent
      for all tunneling types.  This is old bug in gre.
      This also fixes mtu calculation bug introduced by
      commit c5441932 (GRE: Refactor GRE tunneling code).
      Reported-by: default avatarTimo Teras <timo.teras@iki.fi>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edb42cad
    • Amerigo Wang's avatar
      ipv6,mcast: always hold idev->lock before mca_lock · 36bddbad
      Amerigo Wang authored
      [ Upstream commit 8965779d, with
        some bits from commit b7b1bfce
        ("ipv6: split duplicate address detection and router solicitation timer")
        to get the __ipv6_get_lladdr() used by this patch. ]
      
      dingtianhong reported the following deadlock detected by lockdep:
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       3.4.24.05-0.1-default #1 Not tainted
       -------------------------------------------------------
       ksoftirqd/0/3 is trying to acquire lock:
        (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120
      
       but task is already holding lock:
        (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&mc->mca_lock){+.+...}:
              [<ffffffff810a8027>] validate_chain+0x637/0x730
              [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
              [<ffffffff810a8734>] lock_acquire+0x114/0x150
              [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60
              [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120
              [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60
              [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80
              [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0
              [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80
              [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20
              [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60
              [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80
              [<ffffffff813d9360>] dev_change_flags+0x40/0x70
              [<ffffffff813ea627>] do_setlink+0x237/0x8a0
              [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600
              [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310
              [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0
              [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40
              [<ffffffff81403e20>] netlink_unicast+0x140/0x180
              [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380
              [<ffffffff813c4252>] sock_sendmsg+0x112/0x130
              [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460
              [<ffffffff813c5544>] sys_sendmsg+0x44/0x70
              [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b
      
       -> #0 (&ndev->lock){+.+...}:
              [<ffffffff810a798e>] check_prev_add+0x3de/0x440
              [<ffffffff810a8027>] validate_chain+0x637/0x730
              [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
              [<ffffffff810a8734>] lock_acquire+0x114/0x150
              [<ffffffff814f6c82>] rt_read_lock+0x42/0x60
              [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120
              [<ffffffff8149b036>] mld_newpack+0xb6/0x160
              [<ffffffff8149b18b>] add_grhead+0xab/0xc0
              [<ffffffff8149d03b>] add_grec+0x3ab/0x460
              [<ffffffff8149d14a>] mld_send_report+0x5a/0x150
              [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0
              [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0
              [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0
              [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0
              [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0
              [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210
              [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0
              [<ffffffff8106c7de>] kthread+0xae/0xc0
              [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10
      
      actually we can just hold idev->lock before taking pmc->mca_lock,
      and avoid taking idev->lock again when iterating idev->addr_list,
      since the upper callers of mld_newpack() already take
      read_lock_bh(&idev->lock).
      Reported-by: default avatardingtianhong <dingtianhong@huawei.com>
      Cc: dingtianhong <dingtianhong@huawei.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Tested-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Tested-by: default avatarChen Weilong <chenweilong@huawei.com>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36bddbad
    • Cong Wang's avatar
      vti: remove duplicated code to fix a memory leak · da04e7df
      Cong Wang authored
      [ Upstream commit ab6c7a0a ]
      
      vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
      and in vti_tunnel_init(), this lead to a memory leak of
      dev->tstats.
      
      Just remove the duplicated operations in vti_fb_tunnel_init().
      
      (candidate for -stable)
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da04e7df
    • Cong Wang's avatar
      gre: fix a regression in ioctl · 3d3fa8bc
      Cong Wang authored
      [ Upstream commit 6c734fb8 ]
      
      When testing GRE tunnel, I got:
      
       # ip tunnel show
       get tunnel gre0 failed: Invalid argument
       get tunnel gre1 failed: Invalid argument
      
      This is a regression introduced by commit c5441932
      ("GRE: Refactor GRE tunneling code.") because previously we
      only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
      after that commit, the check is moved for all commands.
      
      So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
      
      After this patch I got:
      
       # ip tunnel show
       gre0: gre/ip  remote any  local any  ttl inherit  nopmtudisc
       gre1: gre/ip  remote 192.168.122.101  local 192.168.122.45  ttl inherit
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d3fa8bc
    • Changli Gao's avatar
      net: Swap ver and type in pppoe_hdr · 51778da5
      Changli Gao authored
      [ Upstream commit b1a5a34b ]
      
      Ver and type in pppoe_hdr should be swapped as defined by RFC2516
      section-4.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51778da5
    • Dave Jones's avatar
      x25: Fix broken locking in ioctl error paths. · ea4c218f
      Dave Jones authored
      [ Upstream commit 4ccb93ce ]
      
      Two of the x25 ioctl cases have error paths that break out of the function without
      unlocking the socket, leading to this warning:
      
      ================================================
      [ BUG: lock held when returning to user space! ]
      3.10.0-rc7+ #36 Not tainted
      ------------------------------------------------
      trinity-child2/31407 is leaving the kernel with locks still held!
      1 lock held by trinity-child2/31407:
       #0:  (sk_lock-AF_X25){+.+.+.}, at: [<ffffffffa024b6da>] x25_ioctl+0x8a/0x740 [x25]
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea4c218f
    • Eric Dumazet's avatar
      neighbour: fix a race in neigh_destroy() · ac294f13
      Eric Dumazet authored
      [ Upstream commit c9ab4d85 ]
      
      There is a race in neighbour code, because neigh_destroy() uses
      skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
      while other parts of the code assume neighbour rwlock is what
      protects arp_queue
      
      Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
      
      Use __skb_queue_head_init() instead of skb_queue_head_init()
      to make clear we do not use arp_queue.lock
      
      And hold neigh->lock in neigh_destroy() to close the race.
      Reported-by: default avatarJoe Jin <joe.jin@oracle.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac294f13
    • Hannes Frederic Sowa's avatar
      ipv6: only apply anti-spoofing checks to not-pointopoint tunnels · ce08aa04
      Hannes Frederic Sowa authored
      [ Upstream commit 5c29fb12 ]
      
      Because of commit 218774dc ("ipv6: add
      anti-spoofing checks for 6to4 and 6rd") the sit driver dropped packets
      for 2002::/16 destinations and sources even when configured to work as a
      tunnel with fixed endpoint. We may only apply the 6rd/6to4 anti-spoofing
      checks if the device is not in pointopoint mode.
      
      This was an oversight from me in the above commit, sorry.  Thanks to
      Roman Mamedov for reporting this!
      Reported-by: default avatarRoman Mamedov <rm@romanrm.ru>
      Cc: David Miller <davem@davemloft.net>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce08aa04
    • Olivier DANET's avatar
      sparc32: vm_area_struct access for old Sun SPARCs. · b692b298
      Olivier DANET authored
      upstream commit 961246b4.
      
      Commit e4c6bfd2 ("mm: rearrange
      vm_area_struct for fewer cache misses") changed the layout of the
      vm_area_struct structure, it broke several SPARC32 assembly routines
      which used numerical constants for accessing the vm_mm field.
      
      This patch defines the VMA_VM_MM constant to replace the immediate values.
      Signed-off-by: default avatarOlivier DANET <odanet@caramail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b692b298
    • Jan Kara's avatar
      writeback: Fix periodic writeback after fs mount · 9fa65e09
      Jan Kara authored
      commit a5faeaf9 upstream.
      
      Code in blkdev.c moves a device inode to default_backing_dev_info when
      the last reference to the device is put and moves the device inode back
      to its bdi when the first reference is acquired. This includes moving to
      wb.b_dirty list if the device inode is dirty. The code however doesn't
      setup timer to wake corresponding flusher thread and while wb.b_dirty
      list is non-empty __mark_inode_dirty() will not set it up either. Thus
      periodic writeback is effectively disabled until a sync(2) call which can
      lead to unexpected data loss in case of crash or power failure.
      
      Fix the problem by setting up a timer for periodic writeback in case we
      add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().
      Reported-by: default avatarBert De Jonghe <Bert.DeJonghe@amplidata.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fa65e09
  2. 25 Jul, 2013 13 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.3 · 81a46483
      Greg Kroah-Hartman authored
      81a46483
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add trace_array_get/put() to event handling · fc82a11a
      Steven Rostedt (Red Hat) authored
      commit 8e2e2fa4 upstream.
      
      Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
      tried to fix a race between deleting a trace instance and reading contents
      of a trace file. But it wasn't good enough. The following could crash the kernel:
      
       # cd /sys/kernel/debug/tracing/instances
       # ( while :; do mkdir foo; rmdir foo; done ) &
       # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) &
      
      Luckily this can only be done by root user, but it should be fixed regardless.
      
      The problem is that a delete of the file can happen after the write to the event
      is opened, but before the enabling happens.
      
      The solution is to make sure the trace_array is available before succeeding in
      opening for write, and incerment the ref counter while opened.
      
      Now the instance can be deleted when the events are writing to the buffer,
      but the deletion of the instance will disable all events before the instance
      is actually deleted.
      Reported-by: default avatarAlexander Lam <azl@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc82a11a
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix race between deleting buffer and setting events · 68cebd26
      Steven Rostedt (Red Hat) authored
      commit 2a6c24af upstream.
      
      While analyzing the code, I discovered that there's a potential race between
      deleting a trace instance and setting events. There are a few races that can
      occur if events are being traced as the buffer is being deleted. Mostly the
      problem comes with freeing the descriptor used by the trace event callback.
      To prevent problems like this, the events are disabled before the buffer is
      deleted. The problem with the current solution is that the event_mutex is let
      go between disabling the events and freeing the files, which means that the events
      could be enabled again while the freeing takes place.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68cebd26
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Get trace_array ref counts when accessing trace files · 6492334c
      Steven Rostedt (Red Hat) authored
      commit 7b85af63 upstream.
      
      When a trace file is opened that may access a trace array, it must
      increment its ref count to prevent it from being deleted.
      Reported-by: default avatarAlexander Lam <azl@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6492334c
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add trace_array_get/put() to handle instance refs better · 59d8f488
      Steven Rostedt (Red Hat) authored
      commit ff451961 upstream.
      
      Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
      tried to fix a race between deleting a trace instance and reading contents
      of a trace file. But it wasn't good enough. The following could crash the kernel:
      
       # cd /sys/kernel/debug/tracing/instances
       # ( while :; do mkdir foo; rmdir foo; done ) &
       # ( while :; do cat foo/trace &> /dev/null; done ) &
      
      Luckily this can only be done by root user, but it should be fixed regardless.
      
      The problem is that a delete of the file can happen after the reader starts
      to open the file but before it grabs the trace_types_mutex.
      
      The solution is to validate the trace array before using it. If the trace
      array does not exist in the list of trace arrays, then it returns -ENODEV.
      
      There's a possibility that a trace_array could be deleted and a new one
      created and the open would open its file instead. But that is very minor as
      it will just return the data of the new trace array, it may confuse the user
      but it will not crash the system. As this can only be done by root anyway,
      the race will only occur if root is deleting what its trying to read at
      the same time.
      Reported-by: default avatarAlexander Lam <azl@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59d8f488
    • Alexander Z Lam's avatar
      tracing: Protect ftrace_trace_arrays list in trace_events.c · 9713f785
      Alexander Z Lam authored
      commit a8227415 upstream.
      
      There are multiple places where the ftrace_trace_arrays list is accessed in
      trace_events.c without the trace_types_lock held.
      
      Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.comSigned-off-by: default avatarAlexander Z Lam <azl@google.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Alexander Z Lam <lambchop468@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9713f785
    • Alexander Z Lam's avatar
      tracing: Make trace_marker use the correct per-instance buffer · b7f15519
      Alexander Z Lam authored
      commit 2d71619c upstream.
      
      The trace_marker file was present for each new instance created, but it
      added the trace mark to the global trace buffer instead of to
      the instance's buffer.
      
      Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.comSigned-off-by: default avatarAlexander Z Lam <azl@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Alexander Z Lam <lambchop468@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7f15519
    • zhangwei(Jovi)'s avatar
      tracing: Fix irqs-off tag display in syscall tracing · 86515381
      zhangwei(Jovi) authored
      commit 11034ae9 upstream.
      
      All syscall tracing irqs-off tags are wrong, the syscall enter entry doesn't
      disable irqs.
      
       [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
       [root@jovi tracing]# cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 13/13   #P:2
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
             irqbalance-513   [000] d... 56115.496766: sys_open(filename: 804e1a6, flags: 0, mode: 1b6)
             irqbalance-513   [000] d... 56115.497008: sys_open(filename: 804e1bb, flags: 0, mode: 1b6)
               sendmail-771   [000] d... 56115.827982: sys_open(filename: b770e6d1, flags: 0, mode: 1b6)
      
      The reason is syscall tracing doesn't record irq_flags into buffer.
      The proper display is:
      
       [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
       [root@jovi tracing]# cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 14/14   #P:2
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
             irqbalance-514   [001] ....    46.213921: sys_open(filename: 804e1a6, flags: 0, mode: 1b6)
             irqbalance-514   [001] ....    46.214160: sys_open(filename: 804e1bb, flags: 0, mode: 1b6)
                  <...>-920   [001] ....    47.307260: sys_open(filename: 4e82a0c5, flags: 80000, mode: 0)
      
      Link: http://lkml.kernel.org/r/1365564393-10972-3-git-send-email-jovi.zhangwei@huawei.comSigned-off-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86515381
    • Steven Rostedt's avatar
      tracing: Failed to create system directory · e6929efa
      Steven Rostedt authored
      commit 6e94a780 upstream.
      
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo p:i do_sys_open > kprobe_events
       # echo p:j schedule >> kprobe_events
       # cat kprobe_events
      p:kprobes/i do_sys_open
      p:kprobes/j schedule
       # echo p:i do_sys_open >> kprobe_events
       # cat kprobe_events
      p:kprobes/j schedule
      p:kprobes/i do_sys_open
       # ls /sys/kernel/debug/tracing/events/kprobes/
      enable  filter  j
      
      Notice that the 'i' is missing from the kprobes directory.
      
      The console produces:
      
      "Failed to create system directory kprobes"
      
      This is because kprobes passes in a allocated name for the system
      and the ftrace event subsystem saves off that name instead of creating
      a duplicate for it. But the kprobes may free the system name making
      the pointer to it invalid.
      
      This bug was introduced by 92edca07 "tracing: Use direct field, type
      and system names" which switched from using kstrdup() on the system name
      in favor of just keeping apointer to it, as the internal ftrace event
      system names are static and exist for the life of the computer being booted.
      
      Instead of reverting back to duplicating system names again, we can use
      core_kernel_data() to determine if the passed in name was allocated or
      static. Then use the MSB of the ref_count to be a flag to keep track if
      the name was allocated or not. Then we can still save from having to duplicate
      strings that will always exist, but still copy the ones that may be freed.
      Reported-by: default avatar"zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
      Reported-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6929efa
    • Peter Zijlstra's avatar
      perf: Fix perf_lock_task_context() vs RCU · 65e303d7
      Peter Zijlstra authored
      commit 058ebd0e upstream.
      
      Jiri managed to trigger this warning:
      
       [] ======================================================
       [] [ INFO: possible circular locking dependency detected ]
       [] 3.10.0+ #228 Tainted: G        W
       [] -------------------------------------------------------
       [] p/6613 is trying to acquire lock:
       []  (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250
       []
       [] but task is already holding lock:
       []  (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0
       []
       [] which lock already depends on the new lock.
       []
       [] the existing dependency chain (in reverse order) is:
       []
       [] -> #4 (&ctx->lock){-.-...}:
       [] -> #3 (&rq->lock){-.-.-.}:
       [] -> #2 (&p->pi_lock){-.-.-.}:
       [] -> #1 (&rnp->nocb_gp_wq[1]){......}:
       [] -> #0 (rcu_node_0){..-...}:
      
      Paul was quick to explain that due to preemptible RCU we cannot call
      rcu_read_unlock() while holding scheduler (or nested) locks when part
      of the read side critical section was preemptible.
      
      Therefore solve it by making the entire RCU read side non-preemptible.
      
      Also pull out the retry from under the non-preempt to play nice with RT.
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Helped-out-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65e303d7
    • Jiri Olsa's avatar
      perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario · b2412679
      Jiri Olsa authored
      commit 06f41796 upstream.
      
      The '!ctx->is_active' check has a valid scenario, so
      there's no need for the warning.
      
      The reason is that there's a time window between the
      'ctx->is_active' check in the perf_event_enable() function
      and the __perf_event_enable() function having:
      
        - IRQs on
        - ctx->lock unlocked
      
      where the task could be killed and 'ctx' deactivated by
      perf_event_exit_task(), ending up with the warning below.
      
      So remove the WARN_ON_ONCE() check and add comments to
      explain it all.
      
      This addresses the following warning reported by Vince Weaver:
      
      [  324.983534] ------------[ cut here ]------------
      [  324.984420] WARNING: at kernel/events/core.c:1953 __perf_event_enable+0x187/0x190()
      [  324.984420] Modules linked in:
      [  324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246
      [  324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [  324.984420]  0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00
      [  324.984420]  ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860
      [  324.984420]  0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a
      [  324.984420] Call Trace:
      [  324.984420]  <IRQ>  [<ffffffff8160ea0b>] dump_stack+0x19/0x1b
      [  324.984420]  [<ffffffff81080ff0>] warn_slowpath_common+0x70/0xa0
      [  324.984420]  [<ffffffff8108103a>] warn_slowpath_null+0x1a/0x20
      [  324.984420]  [<ffffffff81134437>] __perf_event_enable+0x187/0x190
      [  324.984420]  [<ffffffff81130030>] remote_function+0x40/0x50
      [  324.984420]  [<ffffffff810e51de>] generic_smp_call_function_single_interrupt+0xbe/0x130
      [  324.984420]  [<ffffffff81066a47>] smp_call_function_single_interrupt+0x27/0x40
      [  324.984420]  [<ffffffff8161fd2f>] call_function_single_interrupt+0x6f/0x80
      [  324.984420]  <EOI>  [<ffffffff816161a1>] ? _raw_spin_unlock_irqrestore+0x41/0x70
      [  324.984420]  [<ffffffff8113799d>] perf_event_exit_task+0x14d/0x210
      [  324.984420]  [<ffffffff810acd04>] ? switch_task_namespaces+0x24/0x60
      [  324.984420]  [<ffffffff81086946>] do_exit+0x2b6/0xa40
      [  324.984420]  [<ffffffff8161615c>] ? _raw_spin_unlock_irq+0x2c/0x30
      [  324.984420]  [<ffffffff81087279>] do_group_exit+0x49/0xc0
      [  324.984420]  [<ffffffff81096854>] get_signal_to_deliver+0x254/0x620
      [  324.984420]  [<ffffffff81043057>] do_signal+0x57/0x5a0
      [  324.984420]  [<ffffffff8161a164>] ? __do_page_fault+0x2a4/0x4e0
      [  324.984420]  [<ffffffff8161665c>] ? retint_restore_args+0xe/0xe
      [  324.984420]  [<ffffffff816166cd>] ? retint_signal+0x11/0x84
      [  324.984420]  [<ffffffff81043605>] do_notify_resume+0x65/0x80
      [  324.984420]  [<ffffffff81616702>] retint_signal+0x46/0x84
      [  324.984420] ---[ end trace 442ec2f04db3771a ]---
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Suggested-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1373384651-6109-2-git-send-email-jolsa@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2412679
    • Jiri Olsa's avatar
      perf: Clone child context from parent context pmu · f38bac3d
      Jiri Olsa authored
      commit 734df5ab upstream.
      
      Currently when the child context for inherited events is
      created, it's based on the pmu object of the first event
      of the parent context.
      
      This is wrong for the following scenario:
      
        - HW context having HW and SW event
        - HW event got removed (closed)
        - SW event stays in HW context as the only event
          and its pmu is used to clone the child context
      
      The issue starts when the cpu context object is touched
      based on the pmu context object (__get_cpu_context). In
      this case the HW context will work with SW cpu context
      ending up with following WARN below.
      
      Fixing this by using parent context pmu object to clone
      from child context.
      
      Addresses the following warning reported by Vince Weaver:
      
      [ 2716.472065] ------------[ cut here ]------------
      [ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x)
      [ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
      [ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2
      [ 2716.476035] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
      [ 2716.476035]  0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18
      [ 2716.476035]  ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad
      [ 2716.476035]  ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550
      [ 2716.476035] Call Trace:
      [ 2716.476035]  [<ffffffff8102e215>] ? warn_slowpath_common+0x5b/0x70
      [ 2716.476035]  [<ffffffff810ab2bd>] ? task_ctx_sched_out+0x3c/0x5f
      [ 2716.476035]  [<ffffffff810af02a>] ? perf_event_exit_task+0xbf/0x194
      [ 2716.476035]  [<ffffffff81032a37>] ? do_exit+0x3e7/0x90c
      [ 2716.476035]  [<ffffffff810cd5ab>] ? __do_fault+0x359/0x394
      [ 2716.476035]  [<ffffffff81032fe6>] ? do_group_exit+0x66/0x98
      [ 2716.476035]  [<ffffffff8103dbcd>] ? get_signal_to_deliver+0x479/0x4ad
      [ 2716.476035]  [<ffffffff810ac05c>] ? __perf_event_task_sched_out+0x230/0x2d1
      [ 2716.476035]  [<ffffffff8100205d>] ? do_signal+0x3c/0x432
      [ 2716.476035]  [<ffffffff810abbf9>] ? ctx_sched_in+0x43/0x141
      [ 2716.476035]  [<ffffffff810ac2ca>] ? perf_event_context_sched_in+0x7a/0x90
      [ 2716.476035]  [<ffffffff810ac311>] ? __perf_event_task_sched_in+0x31/0x118
      [ 2716.476035]  [<ffffffff81050dd9>] ? mmdrop+0xd/0x1c
      [ 2716.476035]  [<ffffffff81051a39>] ? finish_task_switch+0x7d/0xa6
      [ 2716.476035]  [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d
      [ 2716.476035]  [<ffffffff813654f5>] ? retint_signal+0x3d/0x78
      [ 2716.476035] ---[ end trace 827178d8a5966c3d ]---
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1373384651-6109-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f38bac3d
    • Takashi Iwai's avatar
      staging: line6: Fix unlocked snd_pcm_stop() call · df3b055c
      Takashi Iwai authored
      commit 86f0b5b8 upstream.
      
      snd_pcm_stop() must be called in the PCM substream lock context.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df3b055c