- 14 Jan, 2015 9 commits
-
-
Prashant Sreedharan authored
This is to avoid the race between tg3_timer() and the execution paths which does not invoke tg3_timer_stop() and releases tp->lock before calling synchronize_irq() Reported-by: Peter Hurley <peter@hurleysoftware.com> Tested-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
This patch is fixing a race condition that may cause setting count_pending to -1, which results in unwanted big bulk of arp messages (in case of "notify peers"). Consider following scenario: count_pending == 2 CPU0 CPU1 team_notify_peers_work atomic_dec_and_test (dec count_pending to 1) schedule_delayed_work team_notify_peers atomic_add (adding 1 to count_pending) team_notify_peers_work atomic_dec_and_test (dec count_pending to 1) schedule_delayed_work team_notify_peers_work atomic_dec_and_test (dec count_pending to 0) schedule_delayed_work team_notify_peers_work atomic_dec_and_test (dec count_pending to -1) Fix this race by using atomic_dec_if_positive - that will prevent count_pending running under 0. Fixes: fc423ff0 ("team: add peer notification") Fixes: 492b200e ("team: add support for sending multicast rejoins") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Graf authored
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow and packet messages. This leads to an out-of-bounds access in ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE > OVS_PACKET_ATTR_MAX. Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes while maintaining to be binary compatible with existing OVS binaries. Fixes: 05da5898 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.") Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tracked-down-by: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasu Dev authored
Adds FCoE config option I40E_FCOE, so that FCoE can be enabled as needed but otherwise have it disabled by default. This also eliminate multiple FCoE config checks, instead now just one config check for CONFIG_I40E_FCOE. The I40E FCoE was added with 3.17 kernel and therefore this patch shall be applied to stable 3.17 kernel also. CC: <stable@vger.kernel.org> Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Benjamin Poirier authored
For example, one could conceivably call for_each_netdev_in_bond_rcu(condition ? bond1 : bond2, slave) and get an unexpected result. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
When IPV4 support is disabled, we cannot call arp_send from the bridge code, which would result in a kernel link error: net/built-in.o: In function `br_handle_frame_finish': :(.text+0x59914): undefined reference to `arp_send' :(.text+0x59a50): undefined reference to `arp_tbl' This makes the newly added proxy ARP support in the bridge code depend on the CONFIG_INET symbol and lets the compiler optimize the code out to avoid the link error. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 95850116 ("bridge: Add support for IEEE 802.11 Proxy ARP") Cc: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jean-Francois Remy authored
When setting base_reachable_time or base_reachable_time_ms on a specific interface through sysctl or netlink, the reachable_time value is not updated. This means that neighbour entries will continue to be updated using the old value until it is recomputed in neigh_period_work (which recomputes the value every 300*HZ). On systems with HZ equal to 1000 for instance, it means 5mins before the change is effective. This patch changes this behavior by recomputing reachable_time after each set on base_reachable_time or base_reachable_time_ms. The new value will become effective the next time the neighbour's timer is triggered. Changes are made in two places: the netlink code for set and the sysctl handling code. For sysctl, I use a proc_handler. The ipv6 network code does provide its own handler but it already refreshes reachable_time correctly so it's not an issue. Any other user of neighbour which provide its own handlers must refresh reachable_time. Signed-off-by: Jean-Francois Remy <jeff@melix.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Agner authored
On i.MX28, the MDIO bus is shared between the two FEC instances. The driver makes sure that the second FEC uses the MDIO bus of the first FEC. This is done conditionally if FEC_QUIRK_ENET_MAC is set. However, in newer designs, such as Vybrid or i.MX6SX, each FEC MAC has its own MDIO bus. Simply removing the quirk FEC_QUIRK_ENET_MAC is not an option since other logic, triggered by this quirk, is still needed. Furthermore, there are board designs which use the same MDIO bus for both PHY's even though the second bus would be available on the SoC side. Such layout are popular since it saves pins on SoC side. Due to the above quirk, those boards currently do work fine. The boards in the mainline tree with such a layout are: - Freescale Vybrid Tower with TWR-SER2 (vf610-twr.dts) - Freescale i.MX6 SoloX SDB Board (imx6sx-sdb.dts) This patch adds a new quirk FEC_QUIRK_SINGLE_MDIO for i.MX28, which makes sure that the MDIO bus of the first FEC is used in any case. However, the boards above do have a SoC with a MDIO bus for each FEC instance. But the PHY's are not connected in a 1:1 configuration. A proper device tree description is needed to allow the driver to figure out where to find its PHY. This patch fixes that shortcoming by adding a MDIO bus child node to the first FEC instance, along with the two PHY's on that bus, and making use of the phy-handle property to add a reference to the PHY's. Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 13 Jan, 2015 4 commits
-
-
David Vrabel authored
In netfront the Rx and Tx path are independent and use different locks. The Tx lock is held with hard irqs disabled, but Rx lock is held with only BH disabled. Since both sides use the same stats lock, a deadlock may occur. [ INFO: possible irq lock inversion dependency detected ] 3.16.2 #16 Not tainted --------------------------------------------------------- swapper/0/0 just changed the state of lock: (&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>] xennet_tx_interrupt+0x14/0x34 but this lock took another, HARDIRQ-unsafe lock in the past: (&stat->syncp.seq#2){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&stat->syncp.seq#2); local_irq_disable(); lock(&(&queue->tx_lock)->rlock); lock(&stat->syncp.seq#2); <Interrupt> lock(&(&queue->tx_lock)->rlock); Using separate locks for the Rx and Tx stats fixes this deadlock. Reported-by: Dmitry Piotrovsky <piotrovskydmitry@gmail.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mugunthan V N authored
Since ALE table is a common resource for both the interfaces in Dual EMAC mode and while bringing up the second interface in cpsw_ndo_set_rx_mode() all the multicast entries added by the first interface is flushed out and only second interface multicast addresses are added. Fixing this by flushing multicast addresses based on dual EMAC port vlans which will not affect the other emac port multicast addresses. Fixes: d9ba8f9e (driver: net: ethernet: cpsw: dual emac interface implementation) Cc: <stable@vger.kernel.org> # v3.9+ Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
In commit 5ad24def ("cxgb4vf: Fix ethtool get_settings for VF driver") mdio_addr of port_info structure was used unininitialzed. Fixing it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
B Viswanath authored
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations Signed-off-by: B Viswanath <marichika4@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Jan, 2015 11 commits
-
-
Alexey Khoroshilov authored
Commit e4c7f259 ("USB: kaweth.c: use GFP_ATOMIC under spin_lock") makes sure that kaweth_internal_control_msg() allocates memory with GFP_ATOMIC, but kaweth_internal_control_msg() also calls usb_start_wait_urb() that still allocates memory with GFP_NOIO. The patch fixes usb_start_wait_urb() as well. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Adding myself as the ibmveth maintainer and replacing Santiago Leon. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Cc: Santiago Leon <santi_leon@yahoo.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
In commit 58dc55f2 ("tipc: use generic SKB list APIs to manage link transmission queue") we replace all list traversal loops with the macros skb_queue_walk() or skb_queue_walk_safe(). While the previous loops were based on the assumption that the list was NULL-terminated, the standard macros stop when the iterator reaches the list head, which is non-NULL. In the function bclink_retransmit_pkt() this macro replacement has lead to a bug. When we receive a BCAST STATE_MSG we unconditionally call the function bclink_retransmit_pkt(), whether there really is anything to retransmit or not, assuming that the sequence number comparisons will lead to the correct behavior. However, if the transmission queue is empty, or if there are no eligible buffers in the transmission queue, we will by mistake pass the list head pointer to the function tipc_link_retransmit(). Since the list head is not a valid sk_buff, this leads to a crash. In this commit we fix this by only calling tipc_link_retransmit() if we actually found eligible buffers in the transmission queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ani Sinha authored
Update documentation to reflect the fact that /proc/sys/net/ipv4/route/max_size is no longer used for ipv4. Signed-off-by: Ani Sinha <ani@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexandre Belloni authored
The clock is enabled without being prepared, this leads to: WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:889 __clk_enable+0x24/0xa8() and a non working ethernet interface. Use clk_prepare_enable() and clk_disable_unprepare() to handle the clock. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giel van Schijndel authored
In C one can either use '\0' or '\x00' (or '\000') to add a NUL byte to a string. '\0x00' isn't part of these and will in fact result in a single NUL followed by "x00". This fixes that. Signed-off-by: Giel van Schijndel <me@mortis.eu> Reported-at: http://www.viva64.com/en/b/0299/Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-for-davem-2015-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers * rtlwifi: fix a regression in large skb allocation failure iwlwifi: * fix for 7265D NVM check * fixes for scan: fix long scanning times and network discovery * new firmware API for iwlmvm supported devices * fixes in rate control Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains netfilter/ipvs fixes, they are: 1) Small fix for the FTP helper in IPVS, a diff variable may be left unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter. 2) Fix nf_tables port NAT in little endian archs, patch from leroy christophe. 3) Fix race condition between conntrack confirmation and flush from userspace. This is the second reincarnation to resolve this problem. 4) Make sure inner messages in the batch come with the nfnetlink header. 5) Relax strict check from nfnetlink_bind() that may break old userspace applications using all 1s group mask. 6) Schedule removal of chains once no sets and rules refer to them in the new nf_tables ruleset flush command. Reported by Asbjoern Sloth Toennesen. Note that this batch comes later than usual because of the short winter holidays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christoph Jaeger authored
Due to a misplaced parenthesis, the expression (unlikely(offset) < 0), which expands to (__builtin_expect(!!(offset), 0) < 0), never evaluates to true. Therefore, when sending packets with PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended if the creation of the layer 2 header fails. Spotted by Coverity - CID 1259975 ("Operands don't affect result"). Fixes: 9c707762 ("packet: make packet_snd fail on len smaller than l2 header") Signed-off-by: Christoph Jaeger <cj@linux.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered wrong alx_poll() behavior. A NAPI poll() handler is supposed to return exactly the budget when/if napi_complete() has not been called. It is also supposed to return number of frames that were received, so that netdev_budget can have a meaning. Also, in case of TX pressure, we still have to dequeue received packets : alx_clean_rx_irq() has to be called even if alx_clean_tx_irq(alx) returns false, otherwise device is half duplex. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: d75b1ade ("net: less interrupt masking in NAPI") Reported-by: Oded Gabbay <oded.gabbay@amd.com> Bisected-by: Oded Gabbay <oded.gabbay@amd.com> Tested-by: Oded Gabbay <oded.gabbay@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
A NAPI poll() handler is supposed to return exactly the budget when/if napi_complete() has not been called. It is also supposed to return number of frames that were received, so that netdev_budget can have a meaning. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 09 Jan, 2015 4 commits
-
-
Hubert Feurstein authored
This patch initialises the fep->netdev pointer. This pointer was not initialised at all, but is used in fec_enet_timeout_work and in some error paths. Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nobuhiro Iwamatsu authored
TRSCER register is configured differently by SoCs. TRSCER of R-Car Gen2 is RINT8 bit only valid, other bits are reserved bits. This removes access to TRSCER register reserve bit by adding variable trscer_err_mask to sh_eth_cpu_data structure, set the register information to each SoCs. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nobuhiro Iwamatsu authored
FDR register of R-Car set in fdr_value can have the original settings. This sets the value that is suitable for each SoCs to fdr_value of R8A777x and R8A779x. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/netDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-01-06 This series contains fixes to i40e only. Jesse provides a fix for when the driver was polling with interrupts disabled the hardware would occasionally not write back descriptors. His fix causes the driver to detect this situation and force an interrupt to fire which will flush the stuck descriptor. Anjali provides a couple of fixes, the first corrects an issue where the receive port checksum error counter was incrementing incorrectly with UDP encapsulated tunneled traffic. The second fix resolves an issue where the driver was examining the outer protocol layer to set the inner protocol layer checksum offload. In the case of TCP over IPv6 over an IPv4 based VXLAN, the inner checksum offloads would be set to look for IPv4/UDP instead of IPv6/TCP, so fixed the issue so that the driver will look at the proper layer for encapsulation offload settings. v2: fixed a bug in patch 01 of the series, where the interrupt rate impacted 4 port workloads by reducing throughput. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Jan, 2015 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Just a pile of random fixes, including: 1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu. 2) MDI{,X} eeprom check in e100 driver is reversed, from John W. Linville. 3) Missing error return assignments in several ethernet drivers, from Julia Lawall. 4) Altera TSE device doesn't come back up after ifconfig down/up sequence, fix from Kostya Belezko. 5) Add more cases to the check for whether the qmi_wwan device has a bogus MAC address and needs to be assigned a random one. From Kristian Evensen. 6) Fix interrupt hangs in CPSW, from Felipe Balbi. 7) Implement ndo_features_check in r8152 so that the stack doesn't feed GSO packets which are outside of the chip's capabilities. From Hayes Wang" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) qla3xxx: don't allow never end busy loop xen-netback: fixing the propagation of the transmit shaper timeout r8152: support ndo_features_check batman-adv: fix potential TT client + orig-node memory leak batman-adv: fix multicast counter when purging originators batman-adv: fix counter for multicast supporting nodes batman-adv: fix lock class for decoding hash in network-coding.c batman-adv: fix delayed foreign originator recognition batman-adv: fix and simplify condition when bonding should be used Revert "mac80211: Fix accounting of the tailroom-needed counter" net: ethernet: cpsw: fix hangs with interrupts enic: free all rq buffs when allocation fails qmi_wwan: Set random MAC on devices with buggy fw openvswitch: Consistently include VLAN header in flow and port stats. tcp: Do not apply TSO segment limit to non-TSO packets Altera TSE: Add missing phydev net/mlx4_core: Fix error flow in mlx4_init_hca() net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow qlcnic: Fix return value in qlcnic_probe() net: axienet: fix error return code ...
-
git://git.code.sf.net/p/openipmi/linux-ipmiLinus Torvalds authored
Pull IPMI fixlet from Corey Minyard: "Fix a compile warning" * tag 'for-linus-3' of git://git.code.sf.net/p/openipmi/linux-ipmi: ipmi: Fix compile warning with tv_usec
-
Anjali Singhai authored
The driver was examining the outer protocol layer to set the inner protocol layer checksum offload. In the case of TCP over IPV6 over an IPv4 based VXLAN the inner checksum offloads would be set to look for IPv4/UDP instead of IPv6/TCP. This code fixes that so that the driver will look at the proper layer for encapsulation offload settings. Signed-off-by: Anjali Singhai <anjali.singhai@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai authored
The Rx port checksum error counter was incrementing incorrectly with UDP encapsulated tunneled traffic. This patch fixes the problem so that the port_rx_csum counter will show accurate statistics. Signed-off-by: Anjali Singhai <anjali.singhai@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
When the driver was polling with interrupts disabled the hardware will occasionally not write back descriptors. This patch causes the driver to detect this situation and force an interrupt to fire which will flush the stuck descriptor. Does not conflict with napi because if we are already polling the napi_schedule is ignored. Additionally the extra interrupts are rate limited, so don't cause a burden to the CPU. Change-ID: Iba4616d2a71288672a5f08e4512e2704b97335e8 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 06 Jan, 2015 7 commits
-
-
Andy Shevchenko authored
The counter variable wasn't increased at all which may stuck under certain circumstances. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 bugfixes from Ted Ts'o: "Revert a potential seek_data/hole regression which shows up when using ext4 to handle ext3 file systems, plus two minor bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: remove spurious KERN_INFO from ext4_warning call Revert "ext4: fix suboptimal seek_{data,hole} extents traversial" ext4: prevent online resize with backup superblock
-
Pablo Neira Ayuso authored
Jumping between chains doesn't mix well with flush ruleset. Rules from a different chain and set elements may still refer to us. [ 353.373791] ------------[ cut here ]------------ [ 353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159! [ 353.373896] invalid opcode: 0000 [#1] SMP [ 353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi [ 353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98 [ 353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010 [...] [ 353.375018] Call Trace: [ 353.375046] [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540 [ 353.375101] [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0 [ 353.375150] [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0 [ 353.375200] [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790 [ 353.375253] [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0 [ 353.375300] [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70 [ 353.375357] [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30 [ 353.375410] [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0 [ 353.375459] [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400 [ 353.375510] [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90 [ 353.375563] [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20 [ 353.375616] [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0 [ 353.375667] [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80 [ 353.375719] [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d [ 353.375776] [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20 [ 353.375823] [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b Release objects in this order: rules -> sets -> chains -> tables, to make sure no references to chains are held anymore. Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Relax the checking that was introduced in 97840cb6 ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") when the subscription bitmask is used. Existing userspace code code may request to listen to all of the existing netlink groups by setting an all to one subscription group bitmask. Netlink already validates subscription via setsockopt() for us. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Make sure there is enough room for the nfnetlink header in the netlink messages that are part of the batch. There is a similar check in netlink_rcv_skb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Commit 5195c14c ("netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse") aimed to resolve the race condition between the confirmation (packet path) and the flush command (from control plane). However, it introduced a crash when several packets race to add a new conntrack, which seems easier to reproduce when nf_queue is in place. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list This patch also changes the verdict from NF_ACCEPT to NF_DROP when we lose race. Basically, the confirmation happens for the first packet that we see in a flow. If you just invoked conntrack -F once (which should be the common case), then this is likely to be the first packet of the flow (unless you already called flush anytime soon in the past). This should be hard to trigger, but better drop this packet, otherwise we leave things in inconsistent state since the destination will likely reply to this packet, but it will find no conntrack, unless the origin retransmits. The change of the verdict has been discussed in: https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Linus Torvalds authored
Jay Foad reports that the address sanitizer test (asan) sometimes gets confused by a stack pointer that ends up being outside the stack vma that is reported by /proc/maps. This happens due to an interaction between RLIMIT_STACK and the guard page: when we do the guard page check, we ignore the potential error from the stack expansion, which effectively results in a missing guard page, since the expected stack expansion won't have been done. And since /proc/maps explicitly ignores the guard page (commit d7824370: "mm: fix up some user-visible effects of the stack guard page"), the stack pointer ends up being outside the reported stack area. This is the minimal patch: it just propagates the error. It also effectively makes the guard page part of the stack limit, which in turn measn that the actual real stack is one page less than the stack limit. Let's see if anybody notices. We could teach acct_stack_growth() to allow an extra page for a grow-up/grow-down stack in the rlimit test, but I don't want to add more complexity if it isn't needed. Reported-and-tested-by: Jay Foad <jay.foad@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-