- 04 Feb, 2017 40 commits
-
-
Darrick J. Wong authored
commit 2aa6ba7b upstream. If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 493611eb upstream. With COW files they are the hotpath, just like for files with the extent size hint attribute. We really shouldn't micro-manage anything but failure cases with unlikely. Additionally Arnd Bergmann recently reported that one of these two unlikely annotations causes link failures together with an upcoming kernel instrumentation patch, so let's get rid of it ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Foster authored
commit 5a93790d upstream. xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize away a lock cycle in cases where the fork does not exist or is otherwise empty. This check is not safe, however, because an attribute fork short form to extent format conversion includes a transient state that causes the xfs_inode_hasattr() check to fail. Specifically, xfs_attr_shortform_to_leaf() creates an empty extent format attribute fork and then adds the existing shortform attributes to it. This means that lookup of an existing xattr can spuriously return -ENOATTR when racing against a setxattr that causes the associated format conversion. This was originally reproduced by an untar on a particularly configured glusterfs volume, but can also be reproduced on demand with properly crafted xattr requests. The format conversion occurs under the exclusive ilock. xfs_attr_get() and xfs_attr_remove() already have the proper locking and checks further down in the functions to handle this situation correctly. Drop the unlocked checks to avoid the spurious failure and rely on the existing logic. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 83d230eb upstream. sb_dirblklog is added to sb_blocklog to compute the directory block size in bytes. Therefore, we must compare the sum of both those values against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit d2b3964a upstream. Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arnd Bergmann authored
commit fd29f7af upstream. A harmless warning just got introduced: fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] Removing the 'const' modifier avoids the warning and has no other effect. Fixes: 1fc4d33f ("xfs: replace xfs_mode_to_ftype table with switch statement") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Sandeen authored
commit 657bdfb7 upstream. The GETNEXTQOTA ioctl takes whatever ID is sent in, and looks for the next active quota for an user equal or higher to that ID. But if we are at the maximum ID and then ask for the "next" one, we may wrap back to zero. In this case, userspace may loop forever, because it will start querying again at zero. We'll fix this in userspace as well, but for the kernel, return -ENOENT if we ask for the next quota ID past UINT_MAX so the caller knows to stop. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit a324cbf1 upstream. Check for invalid file type in xfs_dinode_verify() and fail to load the inode structure from disk. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit fab8eef8 upstream. The helper xfs_dentry_to_name() is used by 2 different classes of callers: Callers that pass zero mode and don't care about the returned name.type field and Callers that pass non zero mode and do care about the name.type field. Change xfs_dentry_to_name() to not take the mode argument and change the call sites of the first class to not pass the mode argument. Create a new helper xfs_dentry_mode_to_name() which does pass the mode argument and returns -EFSCORRUPTED if mode is invalid. Callers that translate non zero mode to on-disk file type now check the return value and will export the error to user instead of staging an invalid file type to be written to directory entry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit 1fc4d33f. The size of the xfs_mode_to_ftype[] conversion table was too small to handle an invalid value of mode=S_IFMT. Instead of fixing the table size, replace the conversion table with a conversion helper that uses a switch statement. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit b597dd53 upstream. xfs_dir2.h dereferences some data types in inline functions and fails to include those type definitions, e.g.: xfs_dir2_data_aoff_t, struct xfs_da_geometry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit 3c6f46ea upstream. This changes fixes an assertion hit when fuzzing on-disk i_mode values. The easy case to fix is when changing an empty file i_mode to S_IFDIR. In this case, xfs_dinode_verify() detects an illegal zero size for directory and fails to load the inode structure from disk. For the case of non empty file whose i_mode is changed to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock() is replaced with return -EFSCORRUPTED, to avoid interacting with corrupted jusk also when XFS_DEBUG is disabled. Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Amir Goldstein authored
commit bf46ecc3 upstream. The ASSERT() condition is the normal case, not the exception, so testing the condition should be likely(), not unlikely(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 84a4620c upstream. There are only two reasons for xfs_log_force / xfs_log_force_lsn to fail: one is an I/O error, for which xlog_bdstrat already logs a warning, and the second is an already shutdown log due to a previous I/O errors. In the latter case we'll already have a previous indication for the actual error, but the large stream of misleading warnings from xfs_log_force will probably scroll it out of the message buffer. Simply removing the warnings thus makes the XFS log reporting significantly better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 12ef8301 upstream. ->total is a bit of an odd parameter passed down to the low-level allocator all the way from the high-level callers. It's supposed to contain the maximum number of blocks to be allocated for the whole transaction [1]. But in xfs_iomap_write_allocate we only convert existing delayed allocations and thus only have a minimal block reservation for the current transaction, so xfs_alloc_space_available can't use it for the allocation decisions. Use the maximum of args->total and the calculated block requirement to make a decision. We probably should get rid of args->total eventually and instead apply ->minleft more broadly, but that will require some extensive changes all over. [1] which creates lots of confusion as most callers don't decrement it once doing a first allocation. But that's for a separate series. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 54fee133 upstream. We must decide in xfs_alloc_fix_freelist if we can perform an allocation from a given AG is possible or not based on the available space, and should not fail the allocation past that point on a healthy file system. But currently we have two additional places that second-guess xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the maxlen parameter to remove the reservation before doing the allocation (but ignores the various minium freespace requirements), and xfs_alloc_fix_minleft tries to fix up the allocated length after we've found an extent, but ignores the reservations and also doesn't take the AGFL into account (and thus fails allocations for not matching minlen in some cases). Remove all these later fixups and just correct the maxlen argument inside xfs_alloc_fix_freelist once we have the AGF buffer locked. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 255c5162 upstream. We can't just set minleft to 0 when we're low on space - that's exactly what we need minleft for: to protect space in the AG for btree block allocations when we are low on free space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 5149fd32 upstream. Setting aside 4 blocks globally for bmbt splits isn't all that useful, as different threads can allocate space in parallel. Bump it to 4 blocks per AG to allow each thread that is currently doing an allocation to dip into it separately. Without that we may no have enough reserved blocks if there are enough parallel transactions in an almost out space file system that all run into bmap btree splits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit f154be24 ] Commit 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") removed the netif_device_detach() call done in dsa_slave_suspend() which is necessary, and paired with a corresponding netif_device_attach(), bring it back. Fixes: 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Shearman authored
[ Upstream commit 85c81401 ] When attempting to free lwtunnel state after the module for the encap has been unloaded an oops occurs: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: lwtstate_free+0x18/0x40 [..] task: ffff88003e372380 task.stack: ffffc900001fc000 RIP: 0010:lwtstate_free+0x18/0x40 RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300 [..] Call Trace: <IRQ> free_fib_info_rcu+0x195/0x1a0 ? rt_fibinfo_free+0x50/0x50 rcu_process_callbacks+0x2d3/0x850 ? rcu_process_callbacks+0x296/0x850 __do_softirq+0xe4/0x4cb irq_exit+0xb0/0xc0 smp_apic_timer_interrupt+0x3d/0x50 apic_timer_interrupt+0x93/0xa0 [..] Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8 The problem is after the module for the encap can be unloaded the corresponding ops is removed and is thus NULL here. Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so grab the module reference for the ops on creating lwtunnel state and of course release the reference when freeing the state. Fixes: 1104d9ba ("lwtunnel: Add destroy state operation") Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Shearman authored
[ Upstream commit 88ff7334 ] Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so specify the owning module for all lwtunnel ops. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bjørn Mork authored
[ Upstream commit 5b9f5751 ] Another rebranded Novatel E371. qmi_wwan should drive this device, while cdc_ether should ignore it. Even though the USB descriptors are plain CDC-ETHER that USB interface is a QMI interface. Ref commit 7fdb7846 ("qmi_wwan/cdc_ether: add device IDs for Dell 5804 (Novatel E371) WWAN card") Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
WANG Cong authored
[ Upstream commit 0fb44559 ] Dmitry reported a deadlock scenario: unix_bind() path: u->bindlock ==> sb_writer do_splice() path: sb_writer ==> pipe->mutex ==> u->bindlock In the unix_bind() code path, unix_mknod() does not have to be done with u->bindlock held, since it is a pure fs operation, so we can just move unix_mknod() out. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
hayeswang authored
[ Upstream commit 6a0b76c0 ] Runtime suspend shouldn't be executed if the tx queue is not empty, because the device is not idle. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Ahern authored
[ Upstream commit 9f427a0e ] MPLS multipath for LSR is broken -- always selecting the first nexthop in the one label case. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev virt13 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev virt13 In this example incoming packets have a single MPLS labels which means BOS bit is set. The BOS bit is passed from mpls_forward down to mpls_multipath_hash which never processes the hash loop because BOS is 1. Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len tracks the total mpls header length on each pass (on pass N mpls_hdr_len is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set it verifies the skb has sufficient header for ipv4 or ipv6, and find the IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to advance past it. With these changes I have verified the code correctly sees the label, BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp traffic for ipv4 and ipv6 are distributed across the nexthops. Fixes: 1c78efa8 ("mpls: flow-based multipath selection") Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ivan Vecera authored
[ Upstream commit b6677449 ] Any bridge options specified during link creation (e.g. ip link add) are ignored as br_dev_newlink() does not process them. Use br_changelink() to do it. Fixes: 13323516 ("bridge: implement rtnl_link_ops->changelink") Signed-off-by: Ivan Vecera <cera@cera.cz> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit e048fc50 ] A driver using dev_alloc_page() must not reuse a page allocated from emergency memory reserve. Otherwise all packets using this page will be immediately dropped, unless for very specific sockets having SOCK_MEMALLOC bit set. This issue might be hard to debug, because only a fraction of received packets would be dropped. Fixes: 4415a031 ("net/mlx5e: Implement RX mapped page cache for page recycle") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 0dbd7ff3 ] Found that if we run LTP netstress test with large MSS (65K), the first attempt from server to send data comparable to this MSS on fastopen connection will be delayed by the probe timer. Here is an example: < S seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32 > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0 < . ack 1 win 342 length 0 Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now', as well as in 'size_goal'. This results the segment not queued for transmition until all the data copied from user buffer. Then, inside __tcp_push_pending_frames(), it breaks on send window test and continues with the check probe timer. Fragmentation occurs in tcp_write_wakeup()... +0.2 > P. seq 1:43777 ack 1 win 342 length 43776 < . ack 43777, win 1365 length 0 > P. seq 43777:65001 ack 1 win 342 options [...] length 21224 ... This also contradicts with the fact that we should bound to the half of the window if it is large. Fix this flaw by correctly initializing max_window. Before that, it could have large values that affect further calculations of 'size_goal'. Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kefeng Wang authored
[ Upstream commit 03e4deff ] Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock"), it is unnecessary to make addrconf_disable_change() use RCU iteration over the netdev list, since it already holds the RTNL lock, or we may meet Illegal context switch in RCU read-side critical section. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Ahern authored
[ Upstream commit 9ed59592] Trying to add an mpls encap route when the MPLS modules are not loaded hangs. For example: CONFIG_MPLS=y CONFIG_NET_MPLS_GSO=m CONFIG_MPLS_ROUTING=m CONFIG_MPLS_IPTUNNEL=m $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 The ip command hangs: root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 $ cat /proc/880/stack [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134 [<ffffffff81065efc>] __request_module+0x27b/0x30a [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178 [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4 [<ffffffff814ae451>] fib_table_insert+0x90/0x41f [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52 ... modprobe is trying to load rtnl-lwt-MPLS: root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS and it hangs after loading mpls_router: $ cat /proc/881/stack [<ffffffff81441537>] rtnl_lock+0x12/0x14 [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179 [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router] [<ffffffff81000471>] do_one_initcall+0x8e/0x13f [<ffffffff81119961>] do_init_module+0x5a/0x1e5 [<ffffffff810bd070>] load_module+0x13bd/0x17d6 ... The problem is that lwtunnel_build_state is called with rtnl lock held preventing mpls_init from registering. Given the potential references held by the time lwtunnel_build_state it can not drop the rtnl lock to the load module. So, extract the module loading code from lwtunnel_build_state into a new function to validate the encap type. The new function is called while converting the user request into a fib_config which is well before any table, device or fib entries are examined. Fixes: 745041e2 ("lwtunnel: autoload of lwt modules") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Gonzalez Cabanelas authored
[ Upstream commit cd33b3e0 ] Commit a1cba561 ("net: phy: Add Broadcom phy library for common interfaces") make the BCM63xx PHY driver utilize bcm_phy_config_intr() which would appear to do the right thing, except that it does not write to the MII_BCM63XX_IR register but to MII_BCM54XX_ECR which is different. This would be causing invalid link parameters and events from being generated by the PHY interrupt. Fixes: a1cba561 ("net: phy: Add Broadcom phy library for common interfaces") Signed-off-by: Daniel Gonzalez Cabanelas <dgcbueu@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 7be2c82c ] Ashizuka reported a highmem oddity and sent a patch for freescale fec driver. But the problem root cause is that core networking stack must ensure no skb with highmem fragment is ever sent through a device that does not assert NETIF_F_HIGHDMA in its features. We need to call illegal_highdma() from harmonize_features() regardless of CSUM checks. Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin Shelar <pshelar@ovn.org> Reported-by: "Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lance Richardson authored
[ Upstream commit d5ff72d9 ] vxlan->cfg.dst_port is in network byte order, so an htons() is needed here. Also reduced comment length to stay closer to 80 column width (still slightly over, however). Fixes: e1e5314d ("vxlan: implement GPE") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Wang authored
[ Upstream commit 6391a448 ] Commit 501db511 ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path. Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rolf Neugebauer authored
[ Upstream commit 501db511 ] This patch part reverts fd2a0437 and e858fae2 which introduced a subtle change in how the virtio_net flags are derived from the SKBs ip_summed field. With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to ip_summed == CHECKSUM_NONE, which should be the same. Further, the virtio spec 1.0 / CS04 explicitly says that VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver. Fixes: fd2a0437 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb") Fixes: e858fae2 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jamal Hadi Salim authored
[ Upstream commit 0faa9cb5 ] Demonstrating the issue: .. add a drop action $sudo $TC actions add action drop index 10 .. retrieve it $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 0 installed 29 sec used 29 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... bug 1 above: reference is two. Reference is actually 1 but we forget to subtract 1. ... do a GET again and we see the same issue try a few times and nothing changes ~$ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 0 installed 31 sec used 31 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... lets try to bind the action to a filter.. $ sudo $TC qdisc add dev lo ingress $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10 ... and now a few GETs: $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 3 bind 1 installed 204 sec used 204 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 4 bind 1 installed 206 sec used 206 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 5 bind 1 installed 235 sec used 235 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 .... as can be observed the reference count keeps going up. After the fix $ sudo $TC actions add action drop index 10 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 1 bind 0 installed 4 sec used 4 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 1 bind 0 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC qdisc add dev lo ingress $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 1 installed 32 sec used 32 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo $TC -s actions get action gact index 10 action order 1: gact action drop random type none pass val 0 index 10 ref 2 bind 1 installed 33 sec used 33 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Fixes: aecc5cef ("net sched actions: fix GETing actions") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Basil Gunn authored
[ Upstream commit 8a367e74 ] The ax.25 socket connection timed out & the sock struct has been previously taken down ie. sock struct is now a NULL pointer. Checking the sock_flag causes the segfault. Check if the socket struct pointer is NULL before checking sock_flag. This segfault is seen in timed out netrom connections. Please submit to -stable. Signed-off-by: Basil Gunn <basil@pacabunga.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jakub Sitnicki authored
[ Upstream commit 02ca0423 ] With ip6gre we have a tunnel header which also makes the tunnel MTU smaller. We need to reserve room for it. Previously we were using up space reserved for the Tunnel Encapsulation Limit option header (RFC 2473). Also, after commit b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") our contract with the caller has changed. Now we check if the packet length exceeds the tunnel MTU after the tunnel header has been pushed, unlike before. This is reflected in the check where we look at the packet length minus the size of the tunnel header, which is already accounted for in tunnel MTU. Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masaru Nagai authored
[ Upstream commit 8ec3e8a1 ] Due to alignment requirements of the hardware transmissions are split into two DMA descriptors, a small padding descriptor of 0 - 3 bytes in length followed by a descriptor for rest of the packet. In the case of IP packets the first descriptor will never be zero due to the way that the stack aligns buffers for IP packets. However, for non-IP packets it may be zero. In that case it has been reported that timeouts occur, presumably because transmission stops at the first zero-length DMA descriptor and thus the packet is not transmitted. However, in my environment a BUG is triggered as follows: [ 20.381417] ------------[ cut here ]------------ [ 20.386054] kernel BUG at lib/swiotlb.c:495! [ 20.390324] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 20.395805] Modules linked in: [ 20.398862] CPU: 0 PID: 2089 Comm: mz Not tainted 4.10.0-rc3-00001-gf13ad2db193f #162 [ 20.406689] Hardware name: Renesas Salvator-X board based on r8a7796 (DT) [ 20.413474] task: ffff80063b1f1900 task.stack: ffff80063a71c000 [ 20.419404] PC is at swiotlb_tbl_map_single+0x178/0x2ec [ 20.424625] LR is at map_single+0x4c/0x98 [ 20.428629] pc : [<ffff00000839c4c0>] lr : [<ffff00000839c680>] pstate: 800001c5 [ 20.436019] sp : ffff80063a71f9b0 [ 20.439327] x29: ffff80063a71f9b0 x28: ffff80063a20d500 [ 20.444636] x27: ffff000008ed5000 x26: 0000000000000000 [ 20.449944] x25: 000000067abe2adc x24: 0000000000000000 [ 20.455252] x23: 0000000000200000 x22: 0000000000000001 [ 20.460559] x21: 0000000000175ffe x20: ffff80063b2a0010 [ 20.465866] x19: 0000000000000000 x18: 0000ffffcae6fb20 [ 20.471173] x17: 0000ffffa09ba018 x16: ffff0000087c8b70 [ 20.476480] x15: 0000ffffa084f588 x14: 0000ffffa09cfa14 [ 20.481787] x13: 0000ffffcae87ff0 x12: 000000000063abe2 [ 20.487098] x11: ffff000008096360 x10: ffff80063abe2adc [ 20.492407] x9 : 0000000000000000 x8 : 0000000000000000 [ 20.497718] x7 : 0000000000000000 x6 : ffff000008ed50d0 [ 20.503028] x5 : 0000000000000000 x4 : 0000000000000001 [ 20.508338] x3 : 0000000000000000 x2 : 000000067abe2adc [ 20.513648] x1 : 00000000bafff000 x0 : 0000000000000000 [ 20.518958] [ 20.520446] Process mz (pid: 2089, stack limit = 0xffff80063a71c000) [ 20.526798] Stack: (0xffff80063a71f9b0 to 0xffff80063a720000) [ 20.532543] f9a0: ffff80063a71fa30 ffff00000839c680 [ 20.540374] f9c0: ffff80063b2a0010 ffff80063b2a0010 0000000000000001 0000000000000000 [ 20.548204] f9e0: 000000000000006e ffff80063b23c000 ffff80063b23c000 0000000000000000 [ 20.556034] fa00: ffff80063b23c000 ffff80063a20d500 000000013b1f1900 0000000000000000 [ 20.563864] fa20: ffff80063ffd18e0 ffff80063b2a0010 ffff80063a71fa60 ffff00000839cd10 [ 20.571694] fa40: ffff80063b2a0010 0000000000000000 ffff80063ffd18e0 000000067abe2adc [ 20.579524] fa60: ffff80063a71fa90 ffff000008096380 ffff80063b2a0010 0000000000000000 [ 20.587353] fa80: 0000000000000000 0000000000000001 ffff80063a71fac0 ffff00000864f770 [ 20.595184] faa0: ffff80063b23caf0 0000000000000000 0000000000000000 0000000000000140 [ 20.603014] fac0: ffff80063a71fb60 ffff0000087e6498 ffff80063a20d500 ffff80063b23c000 [ 20.610843] fae0: 0000000000000000 ffff000008daeaf0 0000000000000000 ffff000008daeb00 [ 20.618673] fb00: ffff80063a71fc0c ffff000008da7000 ffff80063b23c090 ffff80063a44f000 [ 20.626503] fb20: 0000000000000000 ffff000008daeb00 ffff80063a71fc0c ffff000008da7000 [ 20.634333] fb40: ffff80063b23c090 0000000000000000 ffff800600000037 ffff0000087e63d8 [ 20.642163] fb60: ffff80063a71fbc0 ffff000008807510 ffff80063a692400 ffff80063a20d500 [ 20.649993] fb80: ffff80063a44f000 ffff80063b23c000 ffff80063a69249c 0000000000000000 [ 20.657823] fba0: 0000000000000000 ffff80063a087800 ffff80063b23c000 ffff80063a20d500 [ 20.665653] fbc0: ffff80063a71fc10 ffff0000087e67dc ffff80063a20d500 ffff80063a692400 [ 20.673483] fbe0: ffff80063b23c000 0000000000000000 ffff80063a44f000 ffff80063a69249c [ 20.681312] fc00: ffff80063a5f1a10 000000103a087800 ffff80063a71fc70 ffff0000087e6b24 [ 20.689142] fc20: ffff80063a5f1a80 ffff80063a71fde8 000000000000000f 00000000000005ea [ 20.696972] fc40: ffff80063a5f1a10 0000000000000000 000000000000000f ffff00000887fbd0 [ 20.704802] fc60: fffffff43a5f1a80 0000000000000000 ffff80063a71fc80 ffff000008880240 [ 20.712632] fc80: ffff80063a71fd90 ffff0000087c7a34 ffff80063afc7180 0000000000000000 [ 20.720462] fca0: 0000ffffcae6fe18 0000000000000014 0000000060000000 0000000000000015 [ 20.728292] fcc0: 0000000000000123 00000000000000ce ffff0000088d2000 ffff80063b1f1900 [ 20.736122] fce0: 0000000000008933 ffff000008e7cb80 ffff80063a71fd80 ffff0000087c50a4 [ 20.743951] fd00: 0000000000008933 ffff000008e7cb80 ffff000008e7cb80 000000100000000e [ 20.751781] fd20: ffff80063a71fe4c 0000ffff00000300 0000000000000123 0000000000000000 [ 20.759611] fd40: 0000000000000000 ffff80063b1f0000 000000000000000e 0000000000000300 [ 20.767441] fd60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 20.775271] fd80: 0000000000000000 0000000000000000 ffff80063a71fda0 ffff0000087c8c20 [ 20.783100] fda0: 0000000000000000 ffff000008082f30 0000000000000000 0000800637260000 [ 20.790930] fdc0: ffffffffffffffff 0000ffffa0903078 0000000000000000 000000001ea87232 [ 20.798760] fde0: 000000000000000f ffff80063a71fe40 ffff800600000014 ffff000000000001 [ 20.806590] fe00: 0000000000000000 0000000000000000 ffff80063a71fde8 0000000000000000 [ 20.814420] fe20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 [ 20.822249] fe40: 0000000203000011 0000000000000000 0000000000000000 ffff80063a68aa00 [ 20.830079] fe60: ffff80063a68aa00 0000000000000003 0000000000008933 ffff0000081f1b9c [ 20.837909] fe80: 0000000000000000 ffff000008082f30 0000000000000000 0000800637260000 [ 20.845739] fea0: ffffffffffffffff 0000ffffa07ca81c 0000000060000000 0000000000000015 [ 20.853569] fec0: 0000000000000003 000000001ea87232 000000000000000f 0000000000000000 [ 20.861399] fee0: 0000ffffcae6fe18 0000000000000014 0000000000000300 0000000000000000 [ 20.869228] ff00: 00000000000000ce 0000000000000000 00000000ffffffff 0000000000000000 [ 20.877059] ff20: 0000000000000002 0000ffffcae87ff0 0000ffffa09cfa14 0000ffffa084f588 [ 20.884888] ff40: 0000000000000000 0000ffffa09ba018 0000ffffcae6fb20 000000001ea87010 [ 20.892718] ff60: 0000ffffa09b9000 0000ffffcae6fe30 0000ffffcae6fe18 000000000000000f [ 20.900548] ff80: 0000000000000003 000000001ea87232 0000000000000000 0000000000000000 [ 20.908378] ffa0: 0000000000000000 0000ffffcae6fdc0 0000ffffa09a7824 0000ffffcae6fdc0 [ 20.916208] ffc0: 0000ffffa0903078 0000000060000000 0000000000000003 00000000000000ce [ 20.924038] ffe0: 0000000000000000 0000000000000000 ffffffffffffffff ffffffffffffffff [ 20.931867] Call trace: [ 20.934312] Exception stack(0xffff80063a71f7e0 to 0xffff80063a71f910) [ 20.940750] f7e0: 0000000000000000 0001000000000000 ffff80063a71f9b0 ffff00000839c4c0 [ 20.948580] f800: ffff80063a71f840 ffff00000888a6e4 ffff80063a24c418 ffff80063a24c448 [ 20.956410] f820: 0000000000000000 ffff00000811cd54 ffff80063a71f860 ffff80063a24c458 [ 20.964240] f840: ffff80063a71f870 ffff00000888b258 ffff80063a24c418 0000000000000001 [ 20.972070] f860: ffff80063a71f910 ffff80063a7b7028 ffff80063a71f890 ffff0000088825e4 [ 20.979899] f880: 0000000000000000 00000000bafff000 000000067abe2adc 0000000000000000 [ 20.987729] f8a0: 0000000000000001 0000000000000000 ffff000008ed50d0 0000000000000000 [ 20.995560] f8c0: 0000000000000000 0000000000000000 ffff80063abe2adc ffff000008096360 [ 21.003390] f8e0: 000000000063abe2 0000ffffcae87ff0 0000ffffa09cfa14 0000ffffa084f588 [ 21.011219] f900: ffff0000087c8b70 0000ffffa09ba018 [ 21.016097] [<ffff00000839c4c0>] swiotlb_tbl_map_single+0x178/0x2ec [ 21.022362] [<ffff00000839c680>] map_single+0x4c/0x98 [ 21.027411] [<ffff00000839cd10>] swiotlb_map_page+0xa4/0x138 [ 21.033072] [<ffff000008096380>] __swiotlb_map_page+0x20/0x7c [ 21.038821] [<ffff00000864f770>] ravb_start_xmit+0x174/0x668 [ 21.044484] [<ffff0000087e6498>] dev_hard_start_xmit+0x8c/0x120 [ 21.050407] [<ffff000008807510>] sch_direct_xmit+0x108/0x1a0 [ 21.056064] [<ffff0000087e67dc>] __dev_queue_xmit+0x194/0x4cc [ 21.061807] [<ffff0000087e6b24>] dev_queue_xmit+0x10/0x18 [ 21.067214] [<ffff000008880240>] packet_sendmsg+0xf40/0x1220 [ 21.072873] [<ffff0000087c7a34>] sock_sendmsg+0x18/0x2c [ 21.078097] [<ffff0000087c8c20>] SyS_sendto+0xb0/0xf0 [ 21.083150] [<ffff000008082f30>] el0_svc_naked+0x24/0x28 [ 21.088462] Code: d34bfef7 2a1803f3 1a9f86d6 35fff878 (d4210000) [ 21.094611] ---[ end trace 5bc544ad491f3814 ]--- [ 21.099234] Kernel panic - not syncing: Fatal exception in interrupt [ 21.105587] Kernel Offset: disabled [ 21.109073] Memory Limit: none [ 21.112126] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Fixes: 2f45d190 ("ravb: minimize TX data copying") Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 8cf699ec ] Disable BH around the call to napi_schedule() to avoid following warning [ 52.095499] NOHZ: local_softirq_pending 08 [ 52.421291] NOHZ: local_softirq_pending 08 [ 52.608313] NOHZ: local_softirq_pending 08 Fixes: 8d59de8f ("net/mlx4_en: Process all completions in RX rings after port goes up") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-