- 22 Oct, 2015 2 commits
-
-
Wanpeng Li authored
commit 04bb92e4 upstream. Reference SDM 28.1: The current VPID is 0000H in the following situations: - Outside VMX operation. (This includes operation in system-management mode under the default treatment of SMIs and SMM with VMX operation; see Section 34.14.) - In VMX root operation. - In VMX non-root operation when the “enable VPID” VM-execution control is 0. The VPID should never be 0000H in non-root operation when "enable VPID" VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()") remove the codes which reserve 0000H for VMX root operation. This patch fix it by again reserving 0000H for VMX root operation. Fixes: 34a1cd60Reported-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marek Majtyka authored
commit ca09f02f upstream. A critical bug has been found in device memory stage1 translation for VMs with more then 4GB of address space. Once vm_pgoff size is smaller then pa (which is true for LPAE case, u32 and u64 respectively) some more significant bits of pa may be lost as a shift operation is performed on u32 and later cast onto u64. Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12 expected pa(u64): 0x0000002010030000 produced pa(u64): 0x0000000010030000 The fix is to change the order of operations (casting first onto phys_addr_t and then shifting). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed changelog and patch formatting] Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 03 Oct, 2015 30 commits
-
-
Greg Kroah-Hartman authored
-
Kyle Evans authored
commit 8a1513b4 upstream. Do not write initialize magic on systems that do not have feature query 0xb. Fixes Bug #82451. Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd for code clearity. Add a new test function, hp_wmi_bios_2008_later() & simplify hp_wmi_bios_2009_later(), which fixes a bug in cases where an improper value is returned. Probably also fixes Bug #69131. Add missing __init tag. Signed-off-by: Kyle Evans <kvans32@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Luis Henriques authored
commit 3aaf14da upstream. zcomp_create() verifies the success of zcomp_strm_{multi,single}_create() through comp->stream, which can potentially be pointing to memory that was freed if these functions returned an error. While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic 'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create() could return other error codes. Function documentation updated accordingly. Fixes: beca3ec7 ("zram: add multi stream functionality") Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herbert Xu authored
[ Upstream commit da314c99 ] On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the acquire/release helpers as they don't help us at all and simply pessimises the code by using full memory barriers (on some architectures) where only a write or read barrier is needed. > depend on memory barriers embedded in other data structures like the > above. Here, especially, rhashtable_insert() would have write barrier > *before* the entry is hashed not necessarily *after*, which means that > in the above case, a socket which appears to have set bound to a > reader might not visible when the reader tries to look up the socket > on the hashtable. But you are right we do need an explicit write barrier here to ensure that the hashing is visible. > There's no reason to be overly smart here. This isn't a crazy hot > path, write barriers tend to be very cheap, store_release more so. > Please just do smp_store_release() and note what it's paired with. It's not about being overly smart. It's about actually understanding what's going on with the code. I've seen too many instances of people simply sprinkling synchronisation primitives around without any knowledge of what is happening underneath, which is just a recipe for creating hard-to-debug races. > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, > > } > > } > > > > - if (!nlk->portid) { > > + if (!nlk->bound) { > > I don't think you can skip load_acquire here just because this is the > second deref of the variable. That doesn't change anything. Race > condition could still happen between the first and second tests and > skipping the second would lead to the same kind of bug. The reason this one is OK is because we do not use nlk->portid or try to get nlk from the hash table before we return to user-space. However, there is a real bug here that none of these acquire/release helpers discovered. The two bound tests here used to be a single one. Now that they are separate it is entirely possible for another thread to come in the middle and bind the socket. So we need to repeat the portid check in order to maintain consistency. > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, > > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND)) > > return -EPERM; > > > > - if (!nlk->portid) > > + if (!nlk->bound) > > Don't we need load_acquire here too? Is this path holding a lock > which makes that unnecessary? Ditto. ---8<--- The commit 1f770c0a ("netlink: Fix autobind race condition that leads to zero port ID") created some new races that can occur due to inconcsistencies between the two port IDs. Tejun is right that a barrier is unavoidable. Therefore I am reverting to the original patch that used a boolean to indicate that a user netlink socket has been bound. Barriers have been added where necessary to ensure that a valid portid and the hashed socket is visible. I have also changed netlink_insert to only return EBUSY if the socket is bound to a portid different to the requested one. This combined with only reading nlk->bound once in netlink_bind fixes a race where two threads that bind the socket at the same time with different port IDs may both succeed. Fixes: 1f770c0a ("netlink: Fix autobind race condition that leads to zero port ID") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Nacked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herbert Xu authored
[ Upstream commit 1f770c0a ] The commit c0bb07df ("netlink: Reset portid after netlink_insert failure") introduced a race condition where if two threads try to autobind the same socket one of them may end up with a zero port ID. This led to kernel deadlocks that were observed by multiple people. This patch reverts that commit and instead fixes it by introducing a separte rhash_portid variable so that the real portid is only set after the socket has been successfully hashed. Fixes: c0bb07df ("netlink: Reset portid after netlink_insert failure") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stas Sergeev authored
[ Upstream commit f8af8e6e in net-next tree, will be pushed to Linus very soon. ] The commit 898b2970 ("mvneta: implement SGMII-based in-band link state signaling") implemented the link parameters auto-negotiation unconditionally. Unfortunately it appears that some HW that implements SGMII protocol, doesn't generate the inband status, so it is not possible to auto-negotiate anything with such HW. This patch enables the auto-negotiation only if explicitly requested with the 'managed' DT property. This patch fixes the following regression: https://lkml.org/lkml/2015/7/8/865Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stas Sergeev authored
[ Upstream commit 4cba5c21 in net-next tree, will be pushed to Linus very soon. ] Currently the PHY management type is selected by the MAC driver arbitrary. The decision is based on the presence of the "fixed-link" node and on a will of the driver's authors. This caused a regression recently, when mvneta driver suddenly started to use the in-band status for auto-negotiation on fixed links. It appears the auto-negotiation may not work when expected by the MAC driver. Sebastien Rannou explains: << Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with inband status) is connected to the switch through QSGMII, and in this context we are on the media side of the PHY. >> https://lkml.org/lkml/2015/7/10/206 This patch introduces the new string property 'managed' that allows the user to set the management type explicitly. The supported values are: "auto" - default. Uses either MDIO or nothing, depending on the presence of the fixed-link node "in-band-status" - use in-band status Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> CC: Rob Herring <robh+dt@kernel.org> CC: Pawel Moll <pawel.moll@arm.com> CC: Mark Rutland <mark.rutland@arm.com> CC: Ian Campbell <ijc+devicetree@hellion.org.uk> CC: Kumar Gala <galak@codeaurora.org> CC: Florian Fainelli <f.fainelli@gmail.com> CC: Grant Likely <grant.likely@linaro.org> CC: devicetree@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stas Sergeev authored
[ Upstream 868a4215 in net-next tree, will be pushed to Linus very soon. ] fixed_phy_register() currently hardcodes the fixed PHY link to 1, and expects to find a "speed" parameter to provide correct information towards the fixed PHY consumer. In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band status auto-negotiation) fixed PHYs, none of these parameters can be provided since they will be auto-negotiated, hence, we just provide a zero-initialized fixed_phy_status to fixed_phy_register() which makes it fail when we call fixed_phy_update_regs() since status.speed = 0 which makes us hit the "default" label and error out. Without this change, we would also see potentially inconsistent speed/duplex parameters for fixed PHYs when the link is DOWN. CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> [florian: add more background to why this is correct and desirable] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream d2eac98f in net-next tree, will be pushed to Linus very soon. ] The SF2 driver currently overrides speed settings for its port configured using a fixed PHY, this is both unnecessary and incorrect, because we keep feedback to the hardware parameters that we read from the PHY device, which in the case of a fixed PHY cannot possibly change speed. This is a required change to allow the fixed PHY code to allow registering a PHY with a link configured as DOWN by default and avoid some sort of circular dependency where we require the link_update callback to run to program the hardware, and we then utilize the fixed PHY parameters to program the hardware with the same settings. Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wilson Kok authored
[ Upstream commit 41fc0143 ] dump_rules returns skb length and not error. But when family == AF_UNSPEC, the caller of dump_rules assumes that it returns an error. Hence, when family == AF_UNSPEC, we continue trying to dump on -EMSGSIZE errors resulting in incorrect dump idx carried between skbs belonging to the same dump. This results in fib rule dump always only dumping rules that fit into the first skb. This patch fixes dump_rules to return error so that we exit correctly and idx is correctly maintained between skbs that are part of the same dump. Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
WANG Cong authored
[ Upstream commit d8aecb10 ] fw filter uses tp->root==NULL to check if it is the old method, so it doesn't need allocation at all in this case. This patch reverts the offending commit and adds some comments for old method to make it obvious. Fixes: 33f8b9ec ("net_sched: move tp->root allocation into fw_init()") Reported-by: Akshat Kakkar <akshat.1984@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 675ee231 ] RST packets sent on behalf of TCP connections with TS option (RFC 7323 TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr. A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100 ecr 0], length 0 B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss 1460,nop,nop,TS val 7264344 ecr 100], length 0 A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr 7264344], length 0 B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0 ecr 110], length 0 We need to call skb_mstamp_get() to get proper TS val, derived from skb->skb_mstamp Note that RFC 1323 was advocating to not send TS option in RST segment, but RFC 7323 recommends the opposite : Once TSopt has been successfully negotiated, that is both <SYN> and <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST> segment for the duration of the connection, and SHOULD be sent in an <RST> segment (see Section 5.2 for details) Note this RFC recommends to send TS val = 0, but we believe it is premature : We do not know if all TCP stacks are properly handling the receive side : When an <RST> segment is received, it MUST NOT be subjected to the PAWS check by verifying an acceptable value in SEG.TSval, and information from the Timestamps option MUST NOT be used to update connection state information. SEG.TSecr MAY be used to provide stricter <RST> acceptance checks. In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider to decide to send TS val = 0, if it buys something. Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jesse Gross authored
[ Upstream commit ae5f2fb1 ] When support for megaflows was introduced, OVS needed to start installing flows with a mask applied to them. Since masking is an expensive operation, OVS also had an optimization that would only take the parts of the flow keys that were covered by a non-zero mask. The values stored in the remaining pieces should not matter because they are masked out. While this works fine for the purposes of matching (which must always look at the mask), serialization to netlink can be problematic. Since the flow and the mask are serialized separately, the uninitialized portions of the flow can be encoded with whatever values happen to be present. In terms of functionality, this has little effect since these fields will be masked out by definition. However, it leaks kernel memory to userspace, which is a potential security vulnerability. It is also possible that other code paths could look at the masked key and get uninitialized data, although this does not currently appear to be an issue in practice. This removes the mask optimization for flows that are being installed. This was always intended to be the case as the mask optimizations were really targetting per-packet flow operations. Fixes: 03f0d916 ("openvswitch: Mega flow implementation") Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael S. Tsirkin authored
[ Upstream commit 3ea79249 ] Upon TUNSETSNDBUF, macvtap reads the requested sndbuf size into a local variable u. commit 39ec7de7 ("macvtap: fix uninitialized access on TUNSETIFF") changed its type to u16 (which is the right thing to do for all other macvtap ioctls), breaking all values > 64k. The value of TUNSETSNDBUF is actually a signed 32 bit integer, so the right thing to do is to read it into an int. Cc: David S. Miller <davem@davemloft.net> Fixes: 39ec7de7 ("macvtap: fix uninitialized access on TUNSETIFF") Reported-by: Mark A. Peloquin Bisected-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upsteam commit 4671fc6d ] When changing rss key, we do not want to overwrite user provided key by the one provided by netdev_rss_key_fill(), which is the host random key generated at boot time. Fixes: 947cbb0a ("net/mlx4_en: Support for configurable RSS hash function") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eyal Perry <eyalpe@mellanox.com> CC: Amir Vadai <amirv@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Lüssing authored
[ Upstream commit c2d4fbd2 ] With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9 ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Ricardo Leitner authored
[ Upstream commit 8e2d61e0 ] Consider sctp module is unloaded and is being requested because an user is creating a sctp socket. During initialization, sctp will add the new protocol type and then initialize pernet subsys: status = sctp_v4_protosw_init(); if (status) goto err_protosw_init; status = sctp_v6_protosw_init(); if (status) goto err_v6_protosw_init; status = register_pernet_subsys(&sctp_net_ops); The problem is that after those calls to sctp_v{4,6}_protosw_init(), it is possible for userspace to create SCTP sockets like if the module is already fully loaded. If that happens, one of the possible effects is that we will have readers for net->sctp.local_addr_list list earlier than expected and sctp_net_init() does not take precautions while dealing with that list, leading to a potential panic but not limited to that, as sctp_sock_init() will copy a bunch of blank/partially initialized values from net->sctp. The race happens like this: CPU 0 | CPU 1 socket() | __sock_create | socket() inet_create | __sock_create list_for_each_entry_rcu( | answer, &inetsw[sock->type], | list) { | inet_create /* no hits */ | if (unlikely(err)) { | ... | request_module() | /* socket creation is blocked | * the module is fully loaded | */ | sctp_init | sctp_v4_protosw_init | inet_register_protosw | list_add_rcu(&p->list, | last_perm); | | list_for_each_entry_rcu( | answer, &inetsw[sock->type], sctp_v6_protosw_init | list) { | /* hit, so assumes protocol | * is already loaded | */ | /* socket creation continues | * before netns is initialized | */ register_pernet_subsys | Simply inverting the initialization order between register_pernet_subsys() and sctp_v4_protosw_init() is not possible because register_pernet_subsys() will create a control sctp socket, so the protocol must be already visible by then. Deferring the socket creation to a work-queue is not good specially because we loose the ability to handle its errors. So, as suggested by Vlad, the fix is to split netns initialization in two moments: defaults and control socket, so that the defaults are already loaded by when we register the protocol, while control socket initialization is kept at the same moment it is today. Fixes: 4db67e80 ("sctp: Make the address lists per network namespace") Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit 1853c949 ] Ken-ichirou reported that running netlink in mmap mode for receive in combination with nlmon will throw a NULL pointer dereference in __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable to handle kernel paging request". The problem is the skb_clone() in __netlink_deliver_tap_skb() for skbs that are mmaped. I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink skb has it pointed to netlink_skb_destructor(), set in the handler netlink_ring_setup_skb(). There, skb->head is being set to NULL, so that in such cases, __kfree_skb() doesn't perform a skb_release_data() via skb_release_all(), where skb->head is possibly being freed through kfree(head) into slab allocator, although netlink mmap skb->head points to the mmap buffer. Similarly, the same has to be done also for large netlink skbs where the data area is vmalloced. Therefore, as discussed, make a copy for these rather rare cases for now. This fixes the issue on my and Ken-ichirou's test-cases. Reference: http://thread.gmane.org/gmane.linux.network/371129 Fixes: bcbde0d4 ("net: netlink: virtual tap device management") Reported-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit 03679a14 ] The macro to write 64-bits quantities to the 32-bits register swapped the value and offsets arguments, we want to preserve the ordering of the arguments with respect to how writel() is implemented for instance: value first, offset/base second. Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roopa Prabhu authored
[ Upstream commit 6b9ea5a6 ] Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route leaving the fib in an inconsistent state. This patch reduces the possibility of this by doing the following: a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that most errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel */ After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: 27596472 ("ipv6: fix ECMP route replacement") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit 39797a27 ] The comparison check between cur_hw_state and hw_state is currently invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is going to cause an additional aging to occur. Fix this by not shifting cur_hw_state while reading it, but instead, mask the value with the appropriately shitfted bitmask. The other problem with the fast-ageing process is that we did not set the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL register to avoid leaving spurious bits sets from one operation to the other. Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Laing authored
[ Upstream commit 25b4a44c ] In the IPv6 multicast routing code the mrt_lock was not being released correctly in the MFC iterator, as a result adding or deleting a MIF would cause a hang because the mrt_lock could not be acquired. This fix is a copy of the code for the IPv4 case and ensures that the lock is released correctly. Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Atsushi Nemoto authored
[ Upstream commit 4548a697 ] tse_poll() calls __napi_complete() with irq enabled. This leads napi poll_list corruption and may stop all napi drivers working. Use napi_complete() instead of __napi_complete(). Signed-off-by: Atsushi Nemoto <nemoto@toshiba-tops.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Russell King authored
[ Upstream commit ed63f1dc ] The patch just to re-submit the patch "db3421c1" because the patch "4d494cdc" remove the change. Clear any pending receive interrupt before we process a pending packet. This helps to avoid any spurious interrupts being raised after we have fully cleaned the receive ring, while still allowing an interrupt to be raised if we receive another packet. The position of this is critical: we must do this prior to reading the next packet status to avoid potentially dropping an interrupt when a packet is still pending. Acked-by: Fugang Duan <B38611@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit e41b0bed ] We previously register IPPROTO_ROUTING offload under inet6_add_offload(), but in error path, we try to unregister it with inet_del_offload(). This doesn't seem correct, it should actually be inet6_del_offload(), also ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it also uses rthdr_offload twice), but it got removed entirely later on. Fixes: 3336288a ("ipv6: Switch to using new offload infrastructure.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit b382c086 ] diag socket's sock_diag_put_filterinfo() dumps classic BPF programs upon request to user space (ss -0 -b). However, native eBPF programs attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method: Their orig_prog is always NULL. However, sock_diag_put_filterinfo() unconditionally tries to access its filter length resp. wants to copy the filter insns from there. Internal cBPF to eBPF transformations attached to sockets don't have this issue, as orig_prog state is kept. It's currently only used by packet sockets. If we would want to add native eBPF support in the future, this needs to be done through a different attribute than PACKET_DIAG_FILTER to not confuse possible user space disassemblers that work on diag data. Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eugene Shatokhin authored
[ Upstream commit f50791ac ] It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in usbnet_stop(), but its value should be read before it is cleared when dev->flags is set to 0. The problem was spotted and the fix was provided by Oliver Neukum <oneukum@suse.de>. Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
WANG Cong authored
[ Upstream commit a6c1aea0 ] In commit 1e052be6 ("net_sched: destroy proto tp when all filters are gone") I added a check in u32_destroy() to see if all real filters are gone for each tp, however, that is only done for root_ht, same is needed for others. This can be reproduced by the following tc commands: tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 ht 15:2: match ip src 10.0.0.2 flowid 1:10 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 ht 15:2: match ip src 10.0.0.3 flowid 1:10 Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone") Reported-by: Akshat Kakkar <akshat.1984@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Ricardo Leitner authored
[ Upstream commit bef0057b ] Before 56ef9c90[1] it used to ignore all errors from igmp_join(). That commit enhanced that and made it error out whatever error happened with igmp_join(), but that's not good because when using multicast groups vxlan will try to join it multiple times if the socket is reused and then the 2nd and further attempts will fail with EADDRINUSE. As we don't track to which groups the socket is already subscribed, it's okay to just ignore that error. Fixes: 56ef9c90 ("vxlan: Move socket initialization to within rtnl scope") Reported-by: John Nielsen <lists@jnielsen.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
huaibin Wang authored
[ Upstream commit d4257295 ] When a tunnel is deleted, the cached dst entry should be released. This problem may prevent the removal of a netns (seen with a x-netns IPv6 gre tunnel): unregister_netdevice: waiting for lo to become free. Usage count = 3 CC: Dmitry Kozlov <xeb@mail.ru> Fixes: c12b395a ("gre: Support GRE over IPv6") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 29 Sep, 2015 8 commits
-
-
Greg Kroah-Hartman authored
-
Daniel Axtens authored
commit 4e1efb40 upstream. If the driver doesn't participate in EEH, the AFUs will be removed by cxl_remove, which will be invoked by EEH. If the driver does particpate in EEH, the vPHB needs to stick around so that the it can particpate. In both cases, we shouldn't remove the AFU/vPHB. Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Whitcroft authored
[ Upstream commit 25b97c01 ] When generating /proc/net/route we emit a header followed by a line for each route. When a short read is performed we will restart this process based on the open file descriptor. When calculating the start point we fail to take into account that the 0th entry is the header. This leads us to skip the first entry when doing a continuation read. This can be easily seen with the comparison below: while read l; do echo "$l"; done </proc/net/route >A cat /proc/net/route >B diff -bu A B | grep '^[+-]' On my example machine I have approximatly 10KB of route output. There we see the very first non-title element is lost in the while read case, and an entry around the 8K mark in the cat case: +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0 -tun1 00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0 Fix up the off-by-one when reaquiring position on continuation. Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") BugLink: http://bugs.launchpad.net/bugs/1483440Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit 211c504a ] In case we need to divert reads/writes using the slave MII bus, we may have already fetched a valid PHY interface property from Device Tree, and that mode is used by the PHY driver to make configuration decisions. If we could not fetch the "phy-mode" property, we will assign p->phy_interface to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as to whether or not we should override the interface value. Fixes: 19334920 ("net: dsa: Set valid phy interface type") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 2235f2ac ] reqsk_queue_destroy() and reqsk_queue_unlink() should use del_timer_sync() instead of del_timer() before calling reqsk_put(), otherwise we could free a req still used by another cpu. But before doing so, reqsk_queue_destroy() must release syn_wait_lock spinlock or risk a dead lock, as reqsk_timer_handler() might need to take this same spinlock from reqsk_queue_unlink() (called from inet_csk_reqsk_queue_drop()) Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 3257d8b1 ] In commit b357a364 ("inet: fix possible panic in reqsk_queue_unlink()"), I missed fact that tcp_check_req() can return the listener socket in one case, and that we must release the request socket refcount or we leak it. Tested: Following packetdrill test template shows the issue 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 2920 <mss 1460,sackOK,nop,nop> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.002 < . 1:1(0) ack 21 win 2920 +0 > R 21:21(0) Fixes: b357a364 ("inet: fix possible panic in reqsk_queue_unlink()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit 4e7c1330 ] Linus reports the following deadlock on rtnl_mutex; triggered only once so far (extract): [12236.694209] NetworkManager D 0000000000013b80 0 1047 1 0x00000000 [12236.694218] ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018 [12236.694224] ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff [12236.694230] ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80 [12236.694235] Call Trace: [12236.694250] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10 [12236.694257] [<ffffffff815d133a>] ? schedule+0x2a/0x70 [12236.694263] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10 [12236.694271] [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0 [12236.694280] [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30 [12236.694291] [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30 [12236.694299] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180 [12236.694309] [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190 [12236.694319] [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210 [12236.694331] [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70 [12236.694339] [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30 [12236.694346] [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0 [12236.694354] [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30 [12236.694360] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180 [12236.694367] [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0 [12236.694376] [<ffffffff810a236f>] ? __wake_up+0x2f/0x50 [12236.694387] [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40 [12236.694396] [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240 [12236.694405] [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0 [12236.694415] [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210 [12236.694423] [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0 [12236.694429] [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60 [12236.694434] [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70 [12236.694440] [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a It seems so far plausible that the recursive call into rtnetlink_rcv() looks suspicious. One way, where this could trigger is that the senders NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so the rtnl_getlink() request's answer would be sent to the kernel instead to the actual user process, thus grabbing rtnl_mutex() twice. One theory would be that netlink_autobind() triggered via netlink_sendmsg() internally overwrites the -EBUSY error to 0, but where it is wrongly originating from __netlink_insert() instead. That would reset the socket's portid to 0, which is then filled into NETLINK_CB(skb).portid later on. As commit d470e3b4 ("[NETLINK]: Fix two socket hashing bugs.") also puts it, -EBUSY should not be propagated from netlink_insert(). It looks like it's very unlikely to reproduce. We need to trigger the rhashtable_insert_rehash() handler under a situation where rehashing currently occurs (one /rare/ way would be to hit ht->elasticity limits while not filled enough to expand the hashtable, but that would rather require a specifically crafted bind() sequence with knowledge about destination slots, seems unlikely). It probably makes sense to guard __netlink_insert() in any case and remap that error. It was suggested that EOVERFLOW might be better than an already overloaded ENOMEM. Reference: http://thread.gmane.org/gmane.linux.network/372676Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ivan Vecera authored
[ Upstream commit ade4dc3e ] The commit "e29aa339 bna: Enable Multi Buffer RX" moved packets counter increment from the beginning of the NAPI processing loop after the check for erroneous packets so they are never accounted. This counter is used to inform firmware about number of processed completions (packets). As these packets are never acked the firmware fires IRQs for them again and again. Fixes: e29aa339 ("bna: Enable Multi Buffer RX") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-