- 30 Sep, 2015 4 commits
-
-
Bjorn Helgaas authored
commit d1541dc9 upstream. In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO". But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class and needs to be shifted to make space for the low-order interface byte. Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code. Fixes: 63c44080 ("PCI: Add quirk for setting valid class for TI816X Endpoint") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Hemant Pedanekar <hemantp@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Dan Carpenter authored
commit 3294bee8 upstream. The ">" should be ">=" or we end up reading beyond the end of the array. Fixes: 6e973d2c ('clk: vexpress: Add separate SP810 driver') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Lars-Peter Clausen authored
commit 7abad106 upstream. The different devices support by the adis16480 driver have slightly different scales for the gyroscope and accelerometer channels. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Lars-Peter Clausen authored
commit c689a923 upstream. Add inverse unit conversion macro to convert from standard IIO units to units that might be used by some devices. Those are useful in combination with scale factors that are specified as IIO_VAL_FRACTIONAL. Typically the denominator for those specifications will contain the maximum raw value the sensor will generate and the numerator the value it maps to in a specific unit. Sometimes datasheets specify those in different units than the standard IIO units (e.g. degree/s instead of rad/s) and so we need to do a unit conversion. From a mathematical point of view it does not make a difference whether we apply the unit conversion to the numerator or the inverse unit conversion to the denominator since (x / y) / z = x / (y * z). But as the denominator is typically a larger value and we are rounding both the numerator and denominator to integer values using the later method gives us a better precision (E.g. the relative error is smaller if we round 8000.3 to 8000 rather than rounding 8.3 to 8). This is where in inverse unit conversion macros will be used. Marked for stable as used by some upcoming fixes. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 21 Sep, 2015 2 commits
-
-
Jonathon Jongsma authored
commit bd3e1c7c upstream. Due to some recent changes in drm_helper_probe_single_connector_modes_merge_bits(), old custom modes were not being pruned properly. In current kernels, drm_mode_validate_basic() is called to sanity-check each mode in the list. If the sanity-check passes, the mode's status gets set to to MODE_OK. In older kernels this check was not done, so old custom modes would still have a status of MODE_UNVERIFIED at this point, and would therefore be pruned later in the function. As a result of this new behavior, the list of modes for a device always includes every custom mode ever configured for the device, with the largest one listed first. Since desktop environments usually choose the first preferred mode when a hotplug event is emitted, this had the result of making it very difficult for the user to reduce the size of the display. The qxl driver did implement the mode_valid connector function, but it was empty. In order to restore the old behavior where old custom modes are pruned, we implement a proper mode_valid function for the qxl driver. This function now checks each mode against the last configured custom mode and the list of standard modes. If the mode doesn't match any of these, its status is set to MODE_BAD so that it will be pruned as expected. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Stephen Chandler Paul authored
commit 924f92bf upstream. Most of the time this isn't an issue since hotplugging an adaptor will trigger a crtc mode change which in turn, causes the driver to probe every DisplayPort for a dpcd. However, in cases where hotplugging doesn't cause a mode change (specifically when one unplugs a monitor from a DisplayPort connector, then plugs that same monitor back in seconds later on the same port without any other monitors connected), we never probe for the dpcd before starting the initial link training. What happens from there looks like this: - GPU has only one monitor connected. It's connected via DisplayPort, and does not go through an adaptor of any sort. - User unplugs DisplayPort connector from GPU. - Change in HPD is detected by the driver, we probe every DisplayPort for a possible connection. - Probe the port the user originally had the monitor connected on for it's dpcd. This fails, and we clear the first (and only the first) byte of the dpcd to indicate we no longer have a dpcd for this port. - User plugs the previously disconnected monitor back into the same DisplayPort. - radeon_connector_hotplug() is called before everyone else, and tries to handle the link training. Since only the first byte of the dpcd is zeroed, the driver is able to complete link training but does so against the wrong dpcd, causing it to initialize the link with the wrong settings. - Display stays blank (usually), dpcd is probed after the initial link training, and the driver prints no obvious messages to the log. In theory, since only one byte of the dpcd is chopped off (specifically, the byte that contains the revision information for DisplayPort), it's not entirely impossible that this bug may not show on certain monitors. For instance, the only reason this bug was visible on my ASUS PB238 monitor was due to the fact that this monitor using the enhanced framing symbol sequence, the flag for which is ignored if the radeon driver thinks that the DisplayPort version is below 1.1. Signed-off-by: Stephen Chandler Paul <cpaul@redhat.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 18 Sep, 2015 11 commits
-
-
Marc Zyngier authored
commit 126c69a0 upstream. When injecting a fault into a misbehaving 32bit guest, it seems rather idiotic to also inject a 64bit fault that is only going to corrupt the guest state. This leads to a situation where we perform an illegal exception return at EL2 causing the host to crash instead of killing the guest. Just fix the stupid bug that has been there from day 1. Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Horia Geant? authored
commit b310c178 upstream. When doing pointer operation for accessing the HW S/G table, a value representing number of entries (and not number of bytes) must be used. Fixes: 045e3678 ("crypto: caam - ahash hmac support") Signed-off-by: Horia Geant? <horia.geanta@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Guenter Roeck authored
commit 8ef9724b upstream. When inserting a new register into a block, the present bit map size is increased using krealloc. krealloc does not clear the additionally allocated memory, leaving it filled with random values. Result is that some registers are considered cached even though this is not the case. Fix the problem by clearing the additionally allocated memory. Also, if the bitmap size does not increase, do not reallocate the bitmap at all to reduce overhead. Fixes: 3f4ff561 ("regmap: rbtree: Make cache_present bitmap per node") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Bart Van Assche authored
commit 8f2777f5 upstream. Since fc_fcp_cleanup_cmd() can sleep this function must not be called while holding a spinlock. This patch avoids that fc_fcp_cleanup_each_cmd() triggers the following bug: BUG: scheduling while atomic: sg_reset/1512/0x00000202 1 lock held by sg_reset/1512: #0: (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc] Preemption disabled at:[<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc] Call Trace: [<ffffffff816c612c>] dump_stack+0x4f/0x7b [<ffffffff810828bc>] __schedule_bug+0x6c/0xd0 [<ffffffff816c87aa>] __schedule+0x71a/0xa10 [<ffffffff816c8ad2>] schedule+0x32/0x80 [<ffffffffc0217eac>] fc_seq_set_resp+0xac/0x100 [libfc] [<ffffffffc0218b11>] fc_exch_done+0x41/0x60 [libfc] [<ffffffffc0225cff>] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc] [<ffffffffc0225f43>] fc_eh_device_reset+0x1c3/0x270 [libfc] [<ffffffff814a2cc9>] scsi_try_bus_device_reset+0x29/0x60 [<ffffffff814a3908>] scsi_ioctl_reset+0x258/0x2d0 [<ffffffff814a2650>] scsi_ioctl+0x150/0x440 [<ffffffff814b3a9d>] sd_ioctl+0xad/0x120 [<ffffffff8132f266>] blkdev_ioctl+0x1b6/0x810 [<ffffffff811da608>] block_ioctl+0x38/0x40 [<ffffffff811b4e08>] do_vfs_ioctl+0x2f8/0x530 [<ffffffff811b50c1>] SyS_ioctl+0x81/0xa0 [<ffffffff816cf8b2>] system_call_fastpath+0x16/0x7a Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Davide Italiano authored
commit 280227a7 upstream fallocate() checks that the file is extent-based and returns EOPNOTSUPP in case is not. Other tasks can convert from and to indirect and extent so it's safe to check only after grabbing the inode mutex. [Nikolay Borisov: Bakported to 3.12.47 - Adjusted context - Add the 'out' label] Signed-off-by: Davide Italiano <dccitaliano@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Nikolay Borisov <kernel@kyup.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Ian Abbott authored
commit ad83dbd9 upstream The "adl_pci7x3x" driver replaced the "adl_pci7230" and "adl_pci7432" drivers in commits 8f567c37 ("staging: comedi: new adl_pci7x3x driver") and 657f77d1 ("staging: comedi: remove adl_pci7230 and adl_pci7432 drivers"). Although the new driver code agrees with the user manuals for the respective boards, digital outputs stopped working on the PCI-7230. This has 16 digital output channels and the previous adl_pci7230 driver shifted the 16 bit output state left by 16 bits before writing to the hardware register. The new adl_pci7x3x driver doesn't do that. Fix it in `adl_pci7x3x_do_insn_bits()` by checking for the special case of the subdevice having only 16 channels and duplicating the 16 bit output state into both halves of the 32-bit register. That should work both for what the board actually does and for what the user manual says it should do. Fixes: 8f567c37 ("staging: comedi: new adl_pci7x3x driver") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Ian Abbott authored
commit c04a1f17 upstream `devpriv->ao_timer` is used while an asynchronous command is running on the AO subdevice. It also gets modified by the subdevice's `cmdtest` handler for checking new asynchronous commands, `usbduxsigma_ao_cmdtest()`, which is not correct as it's allowed to check new commands while an old command is still running. Fix it by moving the code which sets up `devpriv->ao_timer` into the subdevice's `cmd` handler, `usbduxsigma_ao_cmd()`. ** This backported patch also moves the code that sets up `devpriv->ao_sample_count` and `devpriv->ao_continuous` from `usbduxsigma_ao_cmdtest()` to `usbduxsigma_ao_cmd()` for the same reason as above. (This was not needed in the upstream commit.) ** Note that the removed code in `usbduxsigma_ao_cmdtest()` checked that `devpriv->ao_timer` did not end up less that 1, but that could not happen due because `cmd->scan_begin_arg` or `cmd->convert_arg` had already been range-checked. Also note that we tested the `high_speed` variable in the old code, but that is currently always 0 and means that we always use "scan" timing (`cmd->scan_begin_src == TRIG_TIMER` and `cmd->convert_src == TRIG_NOW`) and never "convert" (individual sample) timing (`cmd->scan_begin_src == TRIG_FOLLOW` and `cmd->convert_src == TRIG_TIMER`). The moved code tests `cmd->convert_src` instead to decide whether "scan" or "convert" timing is being used, although currently only "scan" timing is supported. Fixes: fb1ef622 ("staging: comedi: usbduxsigma: tidy up analog output command support") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Ian Abbott authored
commit 423b24c3 upstream `devpriv->ai_timer` is used while an asynchronous command is running on the AI subdevice. It also gets modified by the subdevice's `cmdtest` handler for checking new asynchronous commands (`usbduxsigma_ai_cmdtest()`), which is not correct as it's allowed to check new commands while an old command is still running. Fix it by moving the code which sets up `devpriv->ai_timer` and `devpriv->ai_interval` into the subdevice's `cmd` handler, `usbduxsigma_ai_cmd()`. ** This backported patch also moves the code that sets up `devpriv->ai_sample_count` and `devpriv->ai_continuous` from `usbduxsigma_ai_cmdtest()` to `usbduxsigma_ai_cmd()` for the same reason as above. (This was not needed in the upstream commit.) ** Note that the removed code in `usbduxsigma_ai_cmdtest()` checked that `devpriv->ai_timer` did not end up less than than 1, but that could not happen because `cmd->scan_begin_arg` had already been checked to be at least the minimum required value (at least when `cmd->scan_begin_src == TRIG_TIMER`, which had also been checked to be the case). Fixes: b986be85 ("staging: comedi: usbduxsigma: tidy up analog input command support) Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mark Rustad authored
commit 7aa6ca4d upstream. Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device functions other than function 0, so that on multi-function devices, we will always read VPD from function 0 instead of from the other functions. [bhelgaas: changelog] Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mark Rustad authored
commit 932c435c upstream. Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jiri Slaby authored
-
- 14 Sep, 2015 2 commits
-
-
Pablo Neira Ayuso authored
[ Upstream commit e53376be ] With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Andrey Vagin authored
[ Upstream commit c6825c09 ] Lets look at destroy_conntrack: hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get ____nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(&ct->tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>] [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [<ffffffff814841b9>] nf_iterate+0x69/0xb0 <4>[46267.085991] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [<ffffffff81484374>] nf_hook_slow+0x74/0x110 <4>[46267.086133] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [<ffffffff814b5890>] ? dst_output+0x0/0x20 <4>[46267.086277] [<ffffffff81495204>] ip_output+0xa4/0xc0 <4>[46267.086346] [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [<ffffffff81444d67>] sock_sendmsg+0x117/0x140 <4>[46267.086638] [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [<ffffffff81445599>] sys_sendto+0x139/0x190 <4>[46267.087229] [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 02 Sep, 2015 5 commits
-
-
Benjamin LaHaise authored
commit d856f32a upstream. As reported by Dan Aloni, commit f8567a38 ("aio: fix aio request leak when events are reaped by userspace") introduces a regression when user code attempts to perform io_submit() with more events than are available in the ring buffer. Reverting that commit would reintroduce a regression when user space event reaping is used. Fixing this bug is a bit more involved than the previous attempts to fix this regression. Since we do not have a single point at which we can count events as being reaped by user space and io_getevents(), we have to track event completion by looking at the number of events left in the event ring. So long as there are as many events in the ring buffer as there have been completion events generate, we cannot call put_reqs_available(). The code to check for this is now placed in refill_reqs_available(). A test program from Dan and modified by me for verifying this bug is available at http://www.kvack.org/~bcrl/20140824-aio_bug.c . Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Acked-by: Dan Aloni <dan@kernelim.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Heinz Mauelshagen authored
commit 14f398ca upstream. The memory allocated for the multiqueue policy's hash table doesn't need to be physically contiguous. Use vzalloc() instead of kzalloc(). Fedora has been carrying this fix since 10/10/2013. Failure seen during creation of a 10TB cached device with a 2048 sector block size and 411GB cache size: dmsetup: page allocation failure: order:9, mode:0x10c0d0 CPU: 11 PID: 29235 Comm: dmsetup Not tainted 3.10.4 #3 Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a 12/30/2011 000000000010c0d0 ffff880090941898 ffffffff81387ab4 ffff880090941928 ffffffff810bb26f 0000000000000009 000000000010c0d0 ffff880090941928 ffffffff81385dbc ffffffff815f3840 ffffffff00000000 000002000010c0d0 Call Trace: [<ffffffff81387ab4>] dump_stack+0x19/0x1b [<ffffffff810bb26f>] warn_alloc_failed+0x110/0x124 [<ffffffff81385dbc>] ? __alloc_pages_direct_compact+0x17c/0x18e [<ffffffff810bda2e>] __alloc_pages_nodemask+0x6c7/0x75e [<ffffffff810bdad7>] __get_free_pages+0x12/0x3f [<ffffffff810ea148>] kmalloc_order_trace+0x29/0x88 [<ffffffff810ec1fd>] __kmalloc+0x36/0x11b [<ffffffffa031eeed>] ? mq_create+0x1dc/0x2cf [dm_cache_mq] [<ffffffffa031efc0>] mq_create+0x2af/0x2cf [dm_cache_mq] [<ffffffffa0314605>] dm_cache_policy_create+0xa7/0xd2 [dm_cache] [<ffffffffa0312530>] ? cache_ctr+0x245/0xa13 [dm_cache] [<ffffffffa031263e>] cache_ctr+0x353/0xa13 [dm_cache] [<ffffffffa012b916>] dm_table_add_target+0x227/0x2ce [dm_mod] [<ffffffffa012e8e4>] table_load+0x286/0x2ac [dm_mod] [<ffffffffa012e65e>] ? dev_wait+0x8a/0x8a [dm_mod] [<ffffffffa012e324>] ctl_ioctl+0x39a/0x3c2 [dm_mod] [<ffffffffa012e35a>] dm_ctl_ioctl+0xe/0x12 [dm_mod] [<ffffffff81101181>] vfs_ioctl+0x21/0x34 [<ffffffff811019d3>] do_vfs_ioctl+0x3b1/0x3f4 [<ffffffff810f4d2e>] ? ____fput+0x9/0xb [<ffffffff81050b6c>] ? task_work_run+0x7e/0x92 [<ffffffff81101a68>] SyS_ioctl+0x52/0x82 [<ffffffff81391d92>] system_call_fastpath+0x16/0x1b Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Akinobu Mita authored
commit 34f2fd8d upstream. The data type of max_sectors and max_hw_sectors in queue settings are unsigned int. But these values are passed to __bio_add_page() as an argument whose data type is unsigned short. In the worst case such as max_sectors is 0x10000, bio_add_page() can't add a page and IOs can't proceed. Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
James Smart authored
commit 5116fbf1 upstream. Didn't check for less-than-or-equal zero. Means we may later call scsi_dma_unmap() even though we don't have valid mappings. Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com> Signed-off-by: James Smart <james.smart@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Shirish Pargaonkar authored
commit 7f48558e upstream. Send a smb session logoff request before removing smb session off of the list. On a signed smb session, remvoing a session off of the list before sending a logoff request results in server returning an error for lack of smb signature. Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1, not sure how an error during logoff should be retried. So for now, if a server returns an error to a logoff request, log the error and remove the session off of the list. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 31 Aug, 2015 1 commit
-
-
David Milburn authored
commit c8afd0dc upstream. Dynamically allocate buf to prevent warnings: drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_device_status’: drivers/block/mtip32xx/mtip32xx.c:2823: warning: the frame size of 1056 bytes is larger than 1024 bytes drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_registers’: drivers/block/mtip32xx/mtip32xx.c:2894: warning: the frame size of 1056 bytes is larger than 1024 bytes drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_flags’: drivers/block/mtip32xx/mtip32xx.c:2917: warning: the frame size of 1056 bytes is larger than 1024 bytes Signed-off-by: David Milburn <dmilburn@redhat.com> Acked-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 27 Aug, 2015 15 commits
-
-
Dan Carpenter authored
[ Upstream commit 468b732b ] "len" is a signed integer. We check that len is not negative, so it goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than INT_MAX so the condition can never be true. I don't know if this is harmful but it seems safe to limit "len" to INT_MAX - 4095. Fixes: a8c879a7 ('RDS: Info and stats') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jack Morgenstein authored
[ Upstream commit 1c1bf349 ] The port-change event processing in procedure mlx4_eq_int() uses "slave" as the vf_oper array index. Since the value of "slave" is the PF function index, the result is that the PF link state is used for deciding to propagate the event for all the VFs. The VF link state should be used, so the VF function index should be used here. Fixes: 948e306d ('net/mlx4: Add VF link state support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Florian Westphal authored
[ Upstream commit 0470eb99 ] Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: <IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270 [<ffffffff81085bed>] __might_sleep+0x4d/0x90 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150 [<ffffffff817e484d>] __sk_free+0x1d/0x160 [<ffffffff817e49a9>] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Edward Hyunkoo Jee authored
[ Upstream commit 0848f642 ] When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
dingtianhong authored
[ Upstream commit a951bc1e ] The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nikolay Aleksandrov authored
[ Upstream commit 06f6d109 ] When the bonding is being unloaded and the netdevice notifier is unregistered it executes NETDEV_UNREGISTER for each device which should remove the bond's proc entry but if the device enslaved is not of ARPHRD_ETHER type and is in front of the bonding, it may execute bond_release_and_destroy() first which would release the last slave and destroy the bond device leaving the proc entry and thus we will get the following error (with dynamic debug on for bond_netdev_event to see the events order): [ 908.963051] eql: event: 9 [ 908.963052] eql: IFF_SLAVE [ 908.963054] eql: event: 2 [ 908.963056] eql: IFF_SLAVE [ 908.963058] eql: event: 6 [ 908.963059] eql: IFF_SLAVE [ 908.963110] bond0: Releasing active interface eql [ 908.976168] bond0: Destroying bond bond0 [ 908.976266] bond0 (unregistering): Released all slaves [ 908.984097] ------------[ cut here ]------------ [ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160() [ 908.984110] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore usb_common [last unloaded: bonding] [ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O 4.2.0-rc2+ #8 [ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff8800358dfda8 [ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40 ffff88003e3a4280 [ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0 [ 908.984181] Call Trace: [ 908.984188] [<ffffffff81525b34>] ? dump_stack+0x40/0x50 [ 908.984193] [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0 [ 908.984196] [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50 [ 908.984199] [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160 [ 908.984205] [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding] [ 908.984208] [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding] [ 908.984217] [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70 [ 908.984225] [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0 [ 908.984228] [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30 [ 908.984232] [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding] [ 908.984236] [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250 [ 908.984241] [<ffffffff81086f99>] ? task_work_run+0x89/0xc0 [ 908.984244] [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [ 908.984247] ---[ end trace 7c006ed4abbef24b ]--- Thus remove the proc entry manually if bond_release_and_destroy() is used. Because of the checks in bond_remove_proc_entry() it's not a problem for a bond device to change namespaces (the bug fixed by the Fixes commit) but since commit f9399814 ("bonding: Don't allow bond devices to change network namespaces.") that can't happen anyway. Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: a64d49c3 ("bonding: Manage /proc/net/bonding/ entries from the netdev events") Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Eric Dumazet authored
[ Upstream commit 03645a11 ] ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Tilman Schmidt authored
[ Upstream commit fd98e941 ] Commit 79901317 ("n_tty: Don't flush buffer when closing ldisc"), first merged in kernel release 3.10, caused the following regression in the Gigaset M101 driver: Before that commit, when closing the N_TTY line discipline in preparation to switching to N_GIGASET_M101, receive_room would be reset to a non-zero value by the call to n_tty_flush_buffer() in n_tty's close method. With the removal of that call, receive_room might be left at zero, blocking data reception on the serial line. The present patch fixes that regression by setting receive_room to an appropriate value in the ldisc open method. Fixes: 79901317 ("n_tty: Don't flush buffer when closing ldisc") Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nikolay Aleksandrov authored
[ Upstream commit 5ebc7846 ] Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: cfd56754 ("bridge: add support of adding and deleting mdb entries") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Herbert Xu authored
[ Upstream commit a0a2a660 ] The commit 738ac1eb ("net: Clone skb before setting peeked flag") introduced a use-after-free bug in skb_recv_datagram. This is because skb_set_peeked may create a new skb and free the existing one. As it stands the caller will continue to use the old freed skb. This patch fixes it by making skb_set_peeked return the new skb (or the old one if unchanged). Fixes: 738ac1eb ("net: Clone skb before setting peeked flag") Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Herbert Xu authored
[ Upstream commit 89c22d8c ] When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Herbert Xu authored
[ Upstream commit 738ac1eb ] Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: a59322be ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Julian Anastasov authored
[ Upstream commit 2c17d27c ] Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Julian Anastasov authored
[ Upstream commit e9e4dd32 ] commit 381c759d ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Oleg Nesterov authored
[ Upstream commit fecdf8be ] pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-