- 31 Mar, 2018 19 commits
-
-
Eric Dumazet authored
[ Upstream commit ca0edb13 ] A tun device type can trivially be set to arbitrary value using TUNSETLINK ioctl(). Therefore, lowpan_device_event() must really check that ieee802154_ptr is not NULL. Fixes: 2c88b528 ("ieee802154: 6lowpan: remove check on null") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Acked-by:
Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 35d889d1 ] When we exceed current packets limit and we have more than one segment in the list returned by skb_gso_segment(), netem drops only the first one, skipping the rest, hence kmemleak reports: unreferenced object 0xffff880b5d23b600 (size 1024): comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) hex dump (first 32 bytes): 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 [<000000001709b32f>] skb_segment+0x8c8/0x3710 [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 [<0000000013d06d02>] tasklet_action+0x1ca/0x250 [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 [<00000000e7ed027c>] irq_exit+0x1e2/0x210 Fix it by adding the rest of the segments, if any, to skb 'to_free' list. Add new __qdisc_drop_all() and qdisc_drop_all() functions because they can be useful in the future if we need to drop segmented GSO packets in other places. Fixes: 6071bd1a ("netem: Segment GSO packets on enqueue") Signed-off-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tom Herbert authored
[ Upstream commit 2cc683e8 ] Need to lock lower socket in order to provide mutual exclusion with kcm_unattach. v2: Add Reported-by for syzbot Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com Signed-off-by:
Tom Herbert <tom@quantonium.net> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Blakey authored
[ Upstream commit d3dcf8eb ] When inserting duplicate objects (those with the same key), current rhlist implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Issue: 1241076 Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7 Fixes: ca26893f ('rhashtable: Add rhlist interface') Signed-off-by:
Paul Blakey <paulb@mellanox.com> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guillaume Nault authored
[ Upstream commit 6d066734 ] We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by:
xu heng <xuheng333@zoho.com> Fixes: 55454a56 ("ppp: avoid dealock on recursive xmit") Signed-off-by:
Guillaume Nault <g.nault@alphalink.fr> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roman Mashak authored
[ Upstream commit 51d4740f ] If set/unset mode of the tunnel_key action is not provided, ->init() still returns 0, and the caller proceeds with bogus 'struct tc_action *' object, this results in crash: % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1 [ 35.805515] general protection fault: 0000 [#1] SMP PTI [ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd serio_raw [ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286 [ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190 [ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206 [ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000 [ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910 [ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808 [ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38 [ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910 [ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000) knlGS:0000000000000000 [ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0 [ 35.816457] Call Trace: [ 35.817158] tc_ctl_action+0x11a/0x220 [ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0 [ 35.818457] ? __slab_alloc+0x1c/0x30 [ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0 [ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0 [ 35.820231] netlink_rcv_skb+0xce/0x100 [ 35.820744] netlink_unicast+0x164/0x220 [ 35.821500] netlink_sendmsg+0x293/0x370 [ 35.822040] sock_sendmsg+0x30/0x40 [ 35.822508] ___sys_sendmsg+0x2c5/0x2e0 [ 35.823149] ? pagecache_get_page+0x27/0x220 [ 35.823714] ? filemap_fault+0xa2/0x640 [ 35.824423] ? page_add_file_rmap+0x108/0x200 [ 35.825065] ? alloc_set_pte+0x2aa/0x530 [ 35.825585] ? finish_fault+0x4e/0x70 [ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0 [ 35.826723] ? __sys_sendmsg+0x41/0x70 [ 35.827230] __sys_sendmsg+0x41/0x70 [ 35.827710] do_syscall_64+0x68/0x120 [ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 35.828859] RIP: 0033:0x7f3d0ca4da67 [ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67 [ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003 [ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000 [ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640 [ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2 fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48 8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c [ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0 [ 35.838291] ---[ end trace a095c06ee4b97a26 ]--- Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key") Signed-off-by:
Roman Mashak <mrv@mojatatu.com> Acked-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brad Mouring authored
[ Upstream commit a2c054a8 ] In 664fcf12 (net: phy: Threaded interrupts allow some simplification) the phy_interrupt system was changed to use a traditional threaded interrupt scheme instead of a workqueue approach. With this change, the phy status check moved into phy_change, which did not report back to the caller whether or not the interrupt was handled. This means that, in the case of a shared phy interrupt, only the first phydev's interrupt registers are checked (since phy_interrupt() would always return IRQ_HANDLED). This leads to interrupt storms when it is a secondary device that's actually the interrupt source. Signed-off-by:
Brad Mouring <brad.mouring@ni.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ido Schimmel authored
[ Upstream commit bcdd5de8 ] In commit 9ffcc372 ("mlxsw: spectrum: Allow packets to be trapped from any PG") I fixed a problem where packets could not be trapped to the CPU due to exceeded shared buffer quotas. The mentioned commit explains the problem in detail. The problem was fixed by assigning a minimum quota for the CPU port and the traffic class used for scheduling traffic to the CPU. However, commit 117b0dad ("mlxsw: Create a different trap group list for each device") assigned different traffic classes to different packet types and rendered the fix useless. Fix the problem by assigning a minimum quota for the CPU port and all the traffic classes that are currently in use. Fixes: 117b0dad ("mlxsw: Create a different trap group list for each device") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reported-by:
Eddie Shklaer <eddies@mellanox.com> Tested-by:
Eddie Shklaer <eddies@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Lebrun authored
[ Upstream commit 191f86ca ] The seg6_build_state() function is called with RCU read lock held, so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead. [ 92.770271] ============================= [ 92.770628] WARNING: suspicious RCU usage [ 92.770921] 4.16.0-rc4+ #12 Not tainted [ 92.771277] ----------------------------- [ 92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 92.772279] [ 92.772279] other info that might help us debug this: [ 92.772279] [ 92.773067] [ 92.773067] rcu_scheduler_active = 2, debug_locks = 1 [ 92.773514] 2 locks held by ip/2413: [ 92.773765] #0: (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0 [ 92.774377] #1: (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210 [ 92.775065] [ 92.775065] stack backtrace: [ 92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12 [ 92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 [ 92.776608] Call Trace: [ 92.776852] dump_stack+0x7d/0xbc [ 92.777130] __schedule+0x133/0xf00 [ 92.777393] ? unwind_get_return_address_ptr+0x50/0x50 [ 92.777783] ? __sched_text_start+0x8/0x8 [ 92.778073] ? rcu_is_watching+0x19/0x30 [ 92.778383] ? kernel_text_address+0x49/0x60 [ 92.778800] ? __kernel_text_address+0x9/0x30 [ 92.779241] ? unwind_get_return_address+0x29/0x40 [ 92.779727] ? pcpu_alloc+0x102/0x8f0 [ 92.780101] _cond_resched+0x23/0x50 [ 92.780459] __mutex_lock+0xbd/0xad0 [ 92.780818] ? pcpu_alloc+0x102/0x8f0 [ 92.781194] ? seg6_build_state+0x11d/0x240 [ 92.781611] ? save_stack+0x9b/0xb0 [ 92.781965] ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0 [ 92.782480] ? seg6_build_state+0x11d/0x240 [ 92.782925] ? lwtunnel_build_state+0x1bd/0x210 [ 92.783393] ? ip6_route_info_create+0x687/0x1640 [ 92.783846] ? ip6_route_add+0x74/0x110 [ 92.784236] ? inet6_rtm_newroute+0x8a/0xd0 Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by:
David Lebrun <dlebrun@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Lebrun authored
[ Upstream commit 8936ef76 ] When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the source address of the outer IPv6 header, in case none was specified. Using skb->dev can lead to BUG() when it is in an inconsistent state. This patch uses the net_device attached to the skb's dst instead. [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0 [940807.815725] PGD 0 P4D 0 [940807.847173] Oops: 0000 [#1] SMP PTI [940807.890073] Modules linked in: [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.16.0-rc1-seg6bpf+ #2 [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26 09/06/2010 [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0 [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206 [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500 [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850 [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002 [940808.684675] FS: 0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000 [940808.783036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0 [940808.939634] Call Trace: [940808.970041] <IRQ> [940808.995250] ? ip6t_do_table+0x265/0x640 [940809.043341] seg6_do_srh_encap+0x28f/0x300 [940809.093516] ? seg6_do_srh+0x1a0/0x210 [940809.139528] seg6_do_srh+0x1a0/0x210 [940809.183462] seg6_output+0x28/0x1e0 [940809.226358] lwtunnel_output+0x3f/0x70 [940809.272370] ip6_xmit+0x2b8/0x530 [940809.313185] ? ac6_proc_exit+0x20/0x20 [940809.359197] inet6_csk_xmit+0x7d/0xc0 [940809.404173] tcp_transmit_skb+0x548/0x9a0 [940809.453304] __tcp_retransmit_skb+0x1a8/0x7a0 [940809.506603] ? ip6_default_advmss+0x40/0x40 [940809.557824] ? tcp_current_mss+0x24/0x90 [940809.605925] tcp_retransmit_skb+0xd/0x80 [940809.654016] tcp_xmit_retransmit_queue.part.17+0xf9/0x210 [940809.719797] tcp_ack+0xa47/0x1110 [940809.760612] tcp_rcv_established+0x13c/0x570 [940809.812865] tcp_v6_do_rcv+0x151/0x3d0 [940809.858879] tcp_v6_rcv+0xa5c/0xb10 [940809.901770] ? seg6_output+0xdd/0x1e0 [940809.946745] ip6_input_finish+0xbb/0x460 [940809.994837] ip6_input+0x74/0x80 [940810.034612] ? ip6_rcv_finish+0xb0/0xb0 [940810.081663] ipv6_rcv+0x31c/0x4c0 ... Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Reported-by:
Tom Herbert <tom@quantonium.net> Signed-off-by:
David Lebrun <dlebrun@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stefano Brivio authored
[ Upstream commit 5f2fb802 ] Fixes: 2f987a76 ("net: ipv6: keep sk status consistent after datagram connect failure") Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Acked-by:
Guillaume Nault <g.nault@alphalink.fr> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Abeni authored
[ Upstream commit 2f987a76 ] On unsuccesful ip6_datagram_connect(), if the failure is caused by ip6_datagram_dst_update(), the sk peer information are cleared, but the sk->sk_state is preserved. If the socket was already in an established status, the overall sk status is inconsistent and fouls later checks in datagram code. Fix this saving the old peer information and restoring them in case of failure. This also aligns ipv6 datagram connect() behavior with ipv4. v1 -> v2: - added missing Fixes tag Fixes: 85cb73ff ("net: ipv6: reset daddr and dport in sk if connect() fails") Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shannon Nelson authored
[ Upstream commit 13fbcc8d ] Adding a macvlan device on top of a lowerdev that supports the xfrm offloads fails with a new regression: # ip link add link ens1f0 mv0 type macvlan RTNETLINK answers: Operation not permitted Tracing down the failure shows that the macvlan device inherits the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags from the lowerdev, but with no dev->xfrmdev_ops API filled in, it doesn't actually support xfrm. When the request is made to add the new macvlan device, the XFRM listener for NETDEV_REGISTER calls xfrm_api_check() which fails the new registration because dev->xfrmdev_ops is NULL. The macvlan creation succeeds when we filter out the ESP feature flags in macvlan_fix_features(), so let's filter them out like we're already filtering out ~NETIF_F_NETNS_LOCAL. When XFRM support is added in the future, we can add the flags into MACVLAN_FEATURES. This same problem could crop up in the future with any other new feature flags, so let's filter out any flags that aren't defined as supported in macvlan. Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API") Reported-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by:
Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arkadi Sharshevsky authored
[ Upstream commit 7fe4d6dc ] The current code performs unneeded free. Remove the redundant skb freeing during the error path. Fixes: 1555d204 ("devlink: Support for pipeline debug (dpipe)") Signed-off-by:
Arkadi Sharshevsky <arkadis@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Grygorii Strashko authored
[ Upstream commit 4414b3ed ] Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per one netdevice, as result such drivers will produce warning during system boot and fail to connect second phy to netdevice when PHYLIB framework will try to create sysfs link netdev->phydev for second PHY in phy_attach_direct(), because sysfs link with the same name has been created already for the first PHY. As result, second CPSW external port will became unusable. Fix it by relaxing error checking when PHYLIB framework is creating sysfs link netdev->phydev in phy_attach_direct(), suppressing warning by using sysfs_create_link_nowarn() and adding error message instead. After this change links (phy->netdev and netdev->phy) creation failure is not fatal any more and system can continue working, which fixes TI CPSW issue. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: a3995460 ("net: phy: Relax error checking on sysfs_create_link()") Signed-off-by:
Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Grygorii Strashko authored
[ Upstream commit 2399ac42 ] The sysfs_create_link_nowarn() is going to be used in phylib framework in subsequent patch which can be built as module. Hence, export sysfs_create_link_nowarn() to avoid build errors. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: a3995460 ("net: phy: Relax error checking on sysfs_create_link()") Signed-off-by:
Grygorii Strashko <grygorii.strashko@ti.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Kalderon authored
[ Upstream commit 16da0904 ] FW workaround. The iWARP LL2 connection did not expect TCP packets to arrive on it's connection. The fix drops any non-tcp packets Fixes b5c29ca7 ("qed: iWARP CM - setup a ll2 connection for handling SYN packets") Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Soheil Hassas Yeganeh authored
[ Upstream commit e05836ac ] When the connection is aborted, there is no point in keeping the packets on the write queue until the connection is closed. Similar to a27fd7a8 ('tcp: purge write queue upon RST'), this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is aborted. Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY") Signed-off-by:
Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Soheil Hassas Yeganeh authored
tcp_write_queue_purge clears all the SKBs in the write queue but does not reset the sk_send_head. As a result, we can have a NULL pointer dereference anywhere that we use tcp_send_head instead of the tcp_write_queue_tail. For example, after a27fd7a8 (tcp: purge write queue upon RST), we can purge the write queue on RST. Prior to 75c119af (tcp: implement rb-tree based retransmit queue), tcp_push will only check tcp_send_head and then accesses tcp_write_queue_tail to send the actual SKB. As a result, it will dereference a NULL pointer. This has been reported twice for 4.14 where we don't have 75c119af: By Timofey Titovets: [ 422.081094] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 [ 422.081254] IP: tcp_push+0x42/0x110 [ 422.081314] PGD 0 P4D 0 [ 422.081364] Oops: 0002 [#1] SMP PTI By Yongjian Xu: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: tcp_push+0x48/0x120 PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0 Oops: 0002 [#18] SMP PTI Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1 Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A 09/22/2014 task: ffff8807d78d8140 task.stack: ffffc9000e944000 RIP: 0010:tcp_push+0x48/0x120 RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246 RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000 RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000 R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000 R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000 FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0 Call Trace: tcp_sendmsg_locked+0x33d/0xe50 tcp_sendmsg+0x37/0x60 inet_sendmsg+0x39/0xc0 sock_sendmsg+0x49/0x60 sock_write_iter+0xb6/0x100 do_iter_readv_writev+0xec/0x130 ? rw_verify_area+0x49/0xb0 do_iter_write+0x97/0xd0 vfs_writev+0x7e/0xe0 ? __wake_up_common_lock+0x80/0xa0 ? __fget_light+0x2c/0x70 ? __do_page_fault+0x1e7/0x530 do_writev+0x60/0xf0 ? inet_shutdown+0xac/0x110 SyS_writev+0x10/0x20 do_syscall_64+0x6f/0x140 ? prepare_exit_to_usermode+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x3135ce0c57 RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57 RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38 R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002 Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39 d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74 06 00 00 44 89 8f 7c 06 00 00 83 e6 01 RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88 CR2: 0000000000000038 ---[ end trace 8d545c2e93515549 ]--- Fixes: a27fd7a8 (tcp: purge write queue upon RST) Reported-by:
Timofey Titovets <nefelim4ag@gmail.com> Reported-by:
Yongjian Xu <yongjianchn@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Soheil Hassas Yeganeh <soheil@google.com> Tested-by:
Yongjian Xu <yongjianchn@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 Mar, 2018 21 commits
-
-
Greg Kroah-Hartman authored
-
Daniel Borkmann authored
commit 6007b080 upstream. In Cilium some of the main programs we run today are hitting 9 passes on x64's JIT compiler, and we've had cases already where we surpassed the limit where the JIT then punts the program to the interpreter instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON or insertion failures due to the prog array owner being JITed but the program to insert not (both must have the same JITed/non-JITed property). One concrete case the program image shrunk from 12,767 bytes down to 10,288 bytes where the image converged after 16 steps. I've measured that this took 340us in the JIT until it converges on my i7-6600U. Thus, increase the original limit we had from day one where the JIT covered cBPF only back then before we run into the case (as similar with the complexity limit) where we trip over this and hit program rejections. Also add a cond_resched() into the compilation loop, the JIT process runs without any locks and may sleep anyway. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Alexei Starovoitov <ast@kernel.org> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chenbo Feng authored
commit 0fa4fe85 upstream. The current check statement in BPF syscall will do a capability check for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This code path will trigger unnecessary security hooks on capability checking and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN access. This can be resolved by simply switch the order of the statement and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is allowed. Signed-off-by:
Chenbo Feng <fengc@google.com> Acked-by:
Lorenzo Colitti <lorenzo@google.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
commit 87e0d4f0 upstream. Prasad reported that he has seen crashes in BPF subsystem with netd on Android with arm64 in the form of (note, the taint is unrelated): [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001 [ 4134.820925] Mem abort info: [ 4134.901283] Exception class = DABT (current EL), IL = 32 bits [ 4135.016736] SET = 0, FnV = 0 [ 4135.119820] EA = 0, S1PTW = 0 [ 4135.201431] Data abort info: [ 4135.301388] ISV = 0, ISS = 0x00000021 [ 4135.359599] CM = 0, WnR = 0 [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000 [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000 [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP [ 4135.674610] Modules linked in: [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1 [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000 [ 4135.731599] PC is at bpf_prog_add+0x20/0x68 [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145 [ 4135.769062] sp : ffffff801d4e3ce0 [...] [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000) [ 4136.273746] Call trace: [...] [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006 [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584 [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68 [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88 Android's netd was basically pinning the uid cookie BPF map in BPF fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it again resulting in above panic. Issue is that the map was wrongly identified as a prog! Above kernel was compiled with clang 4.0, and it turns out that clang decided to merge the bpf_prog_iops and bpf_map_iops into a single memory location, such that the two i_ops could then not be distinguished anymore. Reason for this miscompilation is that clang has the more aggressive -fmerge-all-constants enabled by default. In fact, clang source code has a comment about it in lib/AST/ExprConstant.cpp on why it is okay to do so: Pointers with different bases cannot represent the same object. (Note that clang defaults to -fmerge-all-constants, which can lead to inconsistent results for comparisons involving the address of a constant; this generally doesn't matter in practice.) The issue never appeared with gcc however, since gcc does not enable -fmerge-all-constants by default and even *explicitly* states in it's option description that using this flag results in non-conforming behavior, quote from man gcc: Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior. There are also various clang bug reports open on that matter [1], where clang developers acknowledge the non-conforming behavior, and refer to disabling it with -fno-merge-all-constants. But even if this gets fixed in clang today, there are already users out there that triggered this. Thus, fix this issue by explicitly adding -fno-merge-all-constants to the kernel's Makefile to generically disable this optimization, since potentially other places in the kernel could subtly break as well. Note, there is also a flag called -fmerge-constants (not supported by clang), which is more conservative and only applies to strings and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In gcc's code, the two flags -fmerge-{all-,}constants share the same variable internally, so when disabling it via -fno-merge-all-constants, then we really don't merge any const data (e.g. strings), and text size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o). $ gcc -fverbose-asm -O2 foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S -> foo.S doesn't list -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled Thus, as a workaround we need to set both -fno-merge-all-constants *and* -fmerge-constants in the Makefile in order for text size to stay as is. [1] https://bugs.llvm.org/show_bug.cgi?id=18538Reported-by:
Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chenbo Feng <fengc@google.com> Cc: Richard Smith <richard-llvm@metafoo.co.uk> Cc: Chandler Carruth <chandlerc@gmail.com> Cc: linux-kernel@vger.kernel.org Tested-by:
Prasad Sodagudi <psodagud@codeaurora.org> Acked-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Hansen authored
commit 91c49c2d upstream. 'si_pkey' is now #defined to be the name of the new siginfo field that protection keys uses. Rename it not to conflict. Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171111001231.DFFC8285@viggo.jf.intel.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lu Baolu authored
commit cd3f1790 upstream. xhci_disable_slot() allows the invoker to pass a command pointer as paramenter. Otherwise, it will allocate one. This will cause memory leak when a command structure was allocated inside of this function while queuing command trb fails. Another problem comes up when the invoker passed a command pointer, but xhci_disable_slot() frees it when it detects a dead host. This patch fixes these two problems by removing the command parameter from xhci_disable_slot(). Fixes: f9e609b8 ("usb: xhci: Add helper function xhci_disable_slot().") Cc: Guoqing Zhang <guoqing.zhang@intel.com> Signed-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lu Baolu authored
commit b64149ca upstream. xhci_disable_slot() is a helper for disabling a slot when a device goes away or recovers from error situations. Currently, it checks the corespoding virt-dev pointer and returns directly (w/o issuing disable slot command) if it's null. This is unnecessary and will cause problems in case where virt-dev allocation fails and xhci_disable_slot() is called to roll back the hardware state. Refer to the implementation of xhci_alloc_dev(). This patch removes lines to check virt-dev in xhci_disable_slot(). Fixes: f9e609b8 ("usb: xhci: Add helper function xhci_disable_slot().") Cc: Guoqing Zhang <guoqing.zhang@intel.com> Signed-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nadav Amit authored
commit c3eec596 upstream. rq_reqbuf is allocated using kvmalloc() but released in one occasion using kfree() instead of kvfree(). The issue was found using grep based on a similar bug. Fixes: d7e09d03 ("add Lustre file system client support") Fixes: ee0ec194 ("lustre: ptlrpc: Replace uses of OBD_{ALLOC,FREE}_LARGE") Cc: Peng Tao <bergwolf@gmail.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: James Simmons <jsimmons@infradead.org> Signed-off-by:
Nadav Amit <namit@vmware.com> Signed-off-by:
Andreas Dilger <andreas.dilger@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liam Mark authored
commit 6d79bd5b upstream. Since commit 204f6722 ("staging: android: ion: Use CMA APIs directly") the CMA API is now used directly and therefore the allocated memory is no longer automatically zeroed. Explicitly zero CMA allocated memory to ensure that no data is exposed to userspace. Fixes: 204f6722 ("staging: android: ion: Use CMA APIs directly") Signed-off-by:
Liam Mark <lmark@codeaurora.org> Acked-by:
Laura Abbott <labbott@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lorenzo Bianconi authored
commit 7b9ebe42 upstream. Apply le16_to_cpu() to data read from the sensor in order to take into account architecture endianness Fixes: 290a6ce1 (iio: imu: add support to lsm6dsx driver) Signed-off-by:
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Walleij authored
commit b9a35893 upstream. The name of the file is "current_timetamp_clock" not "timestamp_clock". Fixes: bc2b7dab ("iio:core: timestamping clock selection support") Cc: Gregor Boirie <gregor.boirie@parrot.com> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> Signed-off-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kan Liang authored
commit 320b0651 upstream. The number of CHAs is miscalculated on multi-domain PCI Skylake server systems, resulting in an uncore driver initialization error. Gary Kroening explains: "For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box()." To fix this bug, query the number of CHAs from the CAPID6 register: it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Reported-by:
Kroening, Gary <gary.kroening@hpe.com> Tested-by:
Kroening, Gary <gary.kroening@hpe.com> Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: abanman@hpe.com Cc: dimitri.sivanich@hpe.com Cc: hpa@zytor.com Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Fixes: cd34cd97 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit e5ea9b54 upstream. We intended to clear the lowest 6 bits but because of a type bug we clear the high 32 bits as well. Andi says that periods are rarely more than U32_MAX so this bug probably doesn't have a huge runtime impact. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwandaSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Song Liu authored
commit bd903afe upstream. In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is added. However, ctx_resched() calculates ctx_event_type before checking this condition. As a result, pinned events will NOT get higher priority than flexible events. The following shows this issue on an Intel CPU (where ref-cycles can only use one hardware counter). 1. First start: perf stat -C 0 -e ref-cycles -I 1000 2. Then, in the second console, run: perf stat -C 0 -e ref-cycles:D -I 1000 The second perf uses pinned events, which is expected to have higher priority. However, because it failed in ctx_resched(). It is never run. This patch fixes this by calculating ctx_event_type after re-evaluating event_type. Reported-by:
Ephraim Park <ephiepark@fb.com> Signed-off-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 487f05e1 ("perf/core: Optimize event rescheduling on active contexts") Link: http://lkml.kernel.org/r/20180306055504.3283731-1-songliubraving@fb.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ilya Pronin authored
commit 40c21898 upstream. When printing stats in CSV mode, 'perf stat' appends extra separators when a counter is not supported: <not supported>,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,, Which causes a failure when parsing fields. The numbers of separators should be the same for each line, no matter if the counter is or not supported. Signed-off-by:
Ilya Pronin <ipronin@twitter.com> Acked-by:
Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com Fixes: 92a61f64 ("perf stat: Implement CSV metrics output") Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kan Liang authored
commit 31766094 upstream. There is no event extension (bit 21) for SKX UPI, so use 'event' instead of 'event_ext'. Reported-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: cd34cd97 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Wilson authored
commit e7cdf5c8 upstream. The vk cts test: dEQP-VK.api.external.semaphore.opaque_fd.export_multiple_times_temporary triggers a lot of VFS: Close: file count is 0 Dave pointed out that clearing the syncobj->file from drm_syncobj_file_release() was sufficient to silence the test, but that opens a can of worm since we assumed that the syncobj->file was never unset. Stop trying to reuse the same struct file for every fd pointing to the drm_syncobj, and allocate one file for each fd instead. v2: Fixup return handling of drm_syncobj_fd_to_handle v2.1: [airlied: fix possible syncobj ref race] v2.2: [jekstrand: back-port to 4.14] Reported-by:
Dave Airlie <airlied@redhat.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Tested-by:
Dave Airlie <airlied@redhat.com> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Dave Airlie <airlied@redhat.com> Signed-off-by:
Jason Ekstrand <jason@jlekstrand.net> Tested-by:
Clayton Craft <clayton.a.craft@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H.J. Lu authored
commit c55b8550 upstream. Since the x86-64 kernel must be aligned to 2MB, refuse to boot the kernel if the alignment of the LOAD segment isn't a multiple of 2MB. Signed-off-by:
H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
H.J. Lu authored
commit e3d03598 upstream. Binutils 2.31 will enable -z separate-code by default for x86 to avoid mixing code pages with data to improve cache performance as well as security. To reduce x86-64 executable and shared object sizes, the maximum page size is reduced from 2MB to 4KB. But x86-64 kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to linker to force 2MB page size regardless of the default page size used by linker. Tested with Linux kernel 4.15.6 on x86-64. Signed-off-by:
H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOp4_%3D_8twdpTyAP2DhONOCeaTOsniJLoppzhoNptL8xzA@mail.gmail.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 32d43cd3 upstream. The undocumented 'icebp' instruction (aka 'int1') works pretty much like 'int3' in the absense of in-circuit probing equipment (except, obviously, that it raises #DB instead of raising #BP), and is used by some validation test-suites as such. But Andy Lutomirski noticed that his test suite acted differently in kvm than on bare hardware. The reason is that kvm used an inexact test for the icebp instruction: it just assumed that an all-zero VM exit qualification value meant that the VM exit was due to icebp. That is not unlike the guess that do_debug() does for the actual exception handling case, but it's purely a heuristic, not an absolute rule. do_debug() does it because it wants to ascribe _some_ reasons to the #DB that happened, and an empty %dr6 value means that 'icebp' is the most likely casue and we have no better information. But kvm can just do it right, because unlike the do_debug() case, kvm actually sees the real reason for the #DB in the VM-exit interruption information field. So instead of relying on an inexact heuristic, just use the actual VM exit information that says "it was 'icebp'". Right now the 'icebp' instruction isn't technically documented by Intel, but that will hopefully change. The special "privileged software exception" information _is_ actually mentioned in the Intel SDM, even though the cause of it isn't enumerated. Reported-by:
Andy Lutomirski <luto@kernel.org> Tested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 19b558db upstream. The clockid argument of clockid_to_kclock() comes straight from user space via various syscalls and is used as index into the posix_clocks array. Protect it against spectre v1 array out of bounds speculation. Remove the redundant check for !posix_clock[id] as this is another source for speculation and does not provide any advantage over the return posix_clock[id] path which returns NULL in that case anyway. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Dan Williams <dan.j.williams@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802151718320.1296@nanos.tec.linutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-