Commits · 96a420d2d37cc019d0fbb95c9f0e965fa1080e1f · nexedi / linux

02 Jan, 2017 11 commits

usb: gadget: f_fs: Document eventfd effect on descriptor format. · 96a420d2

Vincent Pelletier authored Dec 15, 2016

When FUNCTIONFS_EVENTFD flag is set, __ffs_data_got_descs reads a 32bits,
little-endian value right after the fixed structure header, and passes it
to eventfd_ctx_fdget. Document this.

Also, rephrase a comment to be affirmative about the role of string
descriptor at index 0. Ref: USB 2.0 spec paragraph "9.6.7 String", and
also checked to still be current in USB 3.0 spec paragraph "9.6.9 String".
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

96a420d2

usb: gadget: composite: Test get_alt() presence instead of set_alt() · 7e4da3fc

Krzysztof Opasiak authored Dec 20, 2016

By convention (according to doc) if function does not provide
get_alt() callback composite framework should assume that it has only
altsetting 0 and should respond with error if host tries to set
other one.

After commit dd4dff8b ("USB: composite: Fix bug: should test
set_alt function pointer before use it")
we started checking set_alt() callback instead of get_alt().
This check is useless as we check if set_alt() is set inside
usb_add_function() and fail if it's NULL.

Let's fix this check and move comment about why we check the get
method instead of set a little bit closer to prevent future false
fixes.

Fixes: dd4dff8b ("USB: composite: Fix bug: should test set_alt function pointer before use it")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

7e4da3fc

usb: dwc3: pci: Fix dr_mode misspelling · 51c1685d

Hans de Goede authored Dec 27, 2016

usb_get_dr_mode() expects the device-property to be spelled
"dr_mode" not "dr-mode".

Spelling it properly fixes the following warning showing up in dmesg:
[ 8704.500545] dwc3 dwc3.2.auto: Configuration mismatch. dr_mode forced to gadget

Signed-off-by: Hans de Goede <hdegoede@redhat.com
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

51c1685d

usb: dwc3: core: avoid Overflow events · e71d363d

Felipe Balbi authored Dec 23, 2016

Now that we're handling so many transfers at a time
and for some dwc3 revisions LPM events *must* be
enabled, we can fall into a situation where too many
events fire and we start receiving Overflow events.

Let's do what XHCI does and allocate a full page for
the Event Ring, this will avoid any future issues.

Cc: <stable@vger.kernel.org> # v4.9
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

e71d363d

usb: dwc3: gadget: always unmap EP0 requests · d6214592

Felipe Balbi authored Dec 20, 2016

commit 0416e494 ("usb: dwc3: ep0: correct cache
sync issue in case of ep0_bounced") introduced a bug
where we would leak DMA resources which would cause
us to starve the system of them resulting in failing
DMA transfers.

Fix the bug by making sure that we always unmap EP0
requests since those are *always* mapped.

Fixes: 0416e494 ("usb: dwc3: ep0: correct cache
	sync issue in case of ep0_bounced")
Cc: <stable@vger.kernel.org>
Tested-by: Tomasz Medrek <tomaszx.medrek@intel.com>
Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

d6214592

usb: dwc3: ep0: explicitly call dwc3_ep0_prepare_one_trb() · 19ec3123

Felipe Balbi authored Dec 20, 2016

Let's call dwc3_ep0_prepare_one_trb() explicitly
because there are occasions where we will need more
than one TRB to handle an EP0 transfer.

A follow-up patch will fix one bug related to
multiple-TRB Data Phases when it comes to
mapping/unmapping requests for DMA.

Cc: <stable@vger.kernel.org>
Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

19ec3123

usb: dwc3: ep0: add dwc3_ep0_prepare_one_trb() · 7931ec86

Felipe Balbi authored Dec 20, 2016

For now this is just a cleanup patch, no functional
changes. We will be using the new function to fix a
bug introduced long ago by commit 0416e494
("usb: dwc3: ep0: correct cache sync issue in case
of ep0_bounced") and further worsened by commit
c0bd5456 ("usb: dwc3: ep0: handle non maxpacket
aligned transfers > 512")

Cc: <stable@vger.kernel.org>
Reported-by: Janusz Dziedzic <januszx.dziedzic@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

7931ec86

usb: dwc2: gadget: fix default value for gadget-dma-desc · efc95b2c

Stefan Wahren authored Nov 20, 2016

The current default for gadget DMA descriptor results on bcm2835 in a
unnecessary error message:

  Invalid value 1 for param gadget-dma-desc

So fix this by using hw->dma_desc_enable as default value.

Fixes: dec4b556 ("usb: dwc2: gadget: Add descriptor DMA parameter")
Acked-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

efc95b2c

usb: dwc2: fix default value for DMA support · 6118d064

Stefan Wahren authored Nov 20, 2016

The current defaults for DMA results on a non-DMA platform in a unnecessary
error message:

  Invalid value 0 for param gadget-dma

So fix this by using dma_capable as default value.

Fixes: 9962b62f ("usb: dwc2: Deprecate g-use-dma binding")
Acked-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

6118d064

usb: dwc2: fix dwc2_get_device_property for u8 and u16 · de02238d

Stefan Wahren authored Nov 20, 2016

According to the Devicetree ePAPR [1] the datatypes u8 and u16 are
not defined. So using device_property_read_u16() would result in
a partial read of a 32-bit big-endian integer which is not intended.
So we better read the complete 32-bit value. This fixes a regression
on bcm2835 where the values for g-rx-fifo-size and g-np-tx-fifo-size
always read as zero:

  Invalid value 0 for param g-rx-fifo-size
  Invalid value 0 for param g-np-tx-fifo-size

[1] - http://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf

Fixes: 05ee799f ("usb: dwc2: Move gadget settings into core_params")
Acked-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

de02238d

usb: dwc2: Do not set host parameter in peripheral mode · f3419735

Stefan Wahren authored Nov 20, 2016

Since commit "usb: dwc2: Improve handling of host and device hwparams" the
host mode specific hardware parameter aren't initialized in peripheral mode
from the register settings anymore. So we better do not set them in this
case which avoids the following warnings on bcm2835:

256 invalid for host_nperio_tx_fifo_size. Check HW configuration.
512 invalid for host_perio_tx_fifo_size. Check HW configuration.

Fixes: 55e1040e ("usb: dwc2: Improve handling of host and device hwparams")
Acked-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

f3419735

01 Jan, 2017 2 commits

Linux 4.10-rc2 · 0c744ea4
Linus Torvalds authored Jan 01, 2017

0c744ea4

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4759d386

Linus Torvalds authored Jan 01, 2017

Pull DAX updates from Dan Williams:
 "The completion of Jan's DAX work for 4.10.

  As I mentioned in the libnvdimm-for-4.10 pull request, these are some
  final fixes for the DAX dirty-cacheline-tracking invalidation work
  that was merged through the -mm, ext4, and xfs trees in -rc1. These
  patches were prepared prior to the merge window, but we waited for
  4.10-rc1 to have a stable merge base after all the prerequisites were
  merged.

  Quoting Jan on the overall changes in these patches:

     "So I'd like all these 6 patches to go for rc2. The first three
      patches fix invalidation of exceptional DAX entries (a bug which
      is there for a long time) - without these patches data loss can
      occur on power failure even though user called fsync(2). The other
      three patches change locking of DAX faults so that ->iomap_begin()
      is called in a more relaxed locking context and we are safe to
      start a transaction there for ext4"

  These have received a build success notification from the kbuild
  robot, and pass the latest libnvdimm unit tests. There have not been
  any -next releases since -rc1, so they have not appeared there"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  ext4: Simplify DAX fault path
  dax: Call ->iomap_begin without entry lock during dax fault
  dax: Finish fault completely when loading holes
  dax: Avoid page invalidation races and unnecessary radix tree traversals
  mm: Invalidate DAX radix tree entries only if appropriate
  ext2: Return BH_New buffers for zeroed blocks

4759d386

30 Dec, 2016 2 commits

Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux · 238d1d0f

Linus Torvalds authored Dec 30, 2016

Pull documentation fixes from Jonathan Corbet:
 "Two small fixes:

   - A merge error on my part broke the DocBook build. I've
     requisitioned one of tglx's frozen sharks for appropriate
     disciplinary action and resolved to be more careful about testing
     the DocBook stuff as long as it's still around.

   - Fix an error in unaligned-memory-access.txt"

* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
  Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
  docs: Fix build failure

238d1d0f

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · f3de082c

Linus Torvalds authored Dec 30, 2016

Pull crypto fix from Herbert Xu:
 "This fixes a boot failure on some platforms when crypto self test is
  enabled along with the new acomp interface"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: testmgr - Use heap buffer for acomp test input

f3de082c

29 Dec, 2016 2 commits

mm/filemap: fix parameters to test_bit() · 98473f9f

Olof Johansson authored Dec 29, 2016

 mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
  mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
    return test_bit(PG_waiters);
         ^~~~~~~~

Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
Signed-off-by: Olof Johansson <olof@lixom.net>
Brown-paper-bag-by: Linus Torvalds <dummy@duh.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

98473f9f

mm: optimize PageWaiters bit use for unlock_page() · b91e1302

Linus Torvalds authored Dec 27, 2016

In commit 62906027 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.

However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.

On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd.  The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.

On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result.  However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.

So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too.  And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.

This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.

The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.

So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

and adds the trivial implementation for x86.  We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better.  According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.

All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test".  After this, it's down to
0.66%, so just a quarter of the cost it used to be.

(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b91e1302

28 Dec, 2016 2 commits

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 2d706e79

Linus Torvalds authored Dec 27, 2016

Pull crypto fix from Herbert Xu:
 "This fixes a hash corruption bug in the marvell driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell - Copy IVDIG before launching partial DMA ahash requests

2d706e79

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f18e4d0

Linus Torvalds authored Dec 27, 2016

Pull networking fixes from David Miller:

 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

 2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

 3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: stmmac: fix incorrect bit set in gmac4 mdio addr register
  r8169: add support for RTL8168 series add-on card.
  net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
  openvswitch: upcall: Fix vlan handling.
  ipv4: Namespaceify tcp_tw_reuse knob
  net: korina: Fix NAPI versus resources freeing
  net, sched: fix soft lockup in tc_classify
  net/mlx4_en: Fix user prio field in XDP forward
  tipc: don't send FIN message from connectionless socket
  ipvlan: fix multicast processing
  ipvlan: fix various issues in ipvlan_process_multicast()

8f18e4d0

27 Dec, 2016 17 commits

Documentation/unaligned-memory-access.txt: fix incorrect comparison operator · 36f671be

Cihangir Akturk authored Dec 17, 2016

In the actual implementation ether_addr_equal function tests for equality to 0
when returning. It seems in commit 0d74c4 it is somehow overlooked to change
this operator to reflect the actual function.
Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

36f671be

docs: Fix build failure · 66115335

John Brooks authored Dec 23, 2016

The 80211.tmpl DocBook file was removed in commit 819bf593 ("docs-rst:
sphinxify 802.11 documentation"), but the 80211.xml target was re-added to
the Makefile by commit 7ddedebb ("ALSA: doc: ReSTize
writing-an-alsa-driver document"), leading to a failure when building the
documentation:

*** No rule to make target 'Documentation/DocBook/80211.xml', needed by
'Documentation/DocBook/80211.aux.xml'.

cc: stable@vger.kernel.org
Signed-off-by: John Brooks <john@fastquake.com>
Mea-culpa-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

66115335

Merge tag 'v4.10-rc1' into docs-next · 54ab6db0
Jonathan Corbet authored Dec 27, 2016
```
Linux 4.10-rc1
```
54ab6db0

net: stmmac: fix incorrect bit set in gmac4 mdio addr register · 5799fc90

Kweh, Hock Leong authored Dec 28, 2016

Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of
OR together with MII_WRITE.
Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5799fc90

r8169: add support for RTL8168 series add-on card. · 610c9087

Chun-Hao Lin authored Dec 27, 2016

This chip is the same as RTL8168, but its device id is 0x8161.
Signed-off-by: Chun-Hao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

610c9087

net: xdp: remove unused bfp_warn_invalid_xdp_buffer() · be267277

Jason Wang authored Dec 27, 2016

After commit 73b62bd0 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

be267277

openvswitch: upcall: Fix vlan handling. · df30f740

pravin shelar authored Dec 26, 2016

Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbad ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

df30f740

ipv4: Namespaceify tcp_tw_reuse knob · 56ab6b93

Haishuang Yan authored Dec 25, 2016

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56ab6b93

crypto: testmgr - Use heap buffer for acomp test input · 02608e02

Laura Abbott authored Dec 21, 2016

Christopher Covington reported a crash on aarch64 on recent Fedora
kernels:

kernel BUG at ./include/linux/scatterlist.h:140!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc8 #162
Hardware name: linux,dummy-virt (DT)
task: ffff80007c650080 task.stack: ffff800008910000
PC is at sg_init_one+0xa0/0xb8
LR is at sg_init_one+0x24/0xb8
...
[<ffff000008398db8>] sg_init_one+0xa0/0xb8
[<ffff000008350a44>] test_acomp+0x10c/0x438
[<ffff000008350e20>] alg_test_comp+0xb0/0x118
[<ffff00000834f28c>] alg_test+0x17c/0x2f0
[<ffff00000834c6a4>] cryptomgr_test+0x44/0x50
[<ffff0000080dac70>] kthread+0xf8/0x128
[<ffff000008082ec0>] ret_from_fork+0x10/0x50

The test vectors used for input are part of the kernel image. These
inputs are passed as a buffer to sg_init_one which eventually blows up
with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
false for the kernel image since virt_to_page will not return the
correct page. Fix this by copying the input vectors to heap buffer
before setting up the scatterlist.
Reported-by: Christopher Covington <cov@codeaurora.org>
Fixes: d7db7a88 ("crypto: acomp - update testmgr with support for acomp")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

02608e02

ext4: Simplify DAX fault path · 1db17542

Jan Kara authored Oct 21, 2016

Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
can use transaction starting in ext4_iomap_begin() and thus simplify
ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1db17542

dax: Call ->iomap_begin without entry lock during dax fault · 9f141d6e

Jan Kara authored Oct 19, 2016

Currently ->iomap_begin() handler is called with entry lock held. If the
filesystem held any locks between ->iomap_begin() and ->iomap_end()
(such as ext4 which will want to hold transaction open), this would cause
lock inversion with the iomap_apply() from standard IO path which first
calls ->iomap_begin() and only then calls ->actor() callback which grabs
entry locks for DAX (if it faults when copying from/to user provided
buffers).

Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
- ->iomap_end() pair.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

9f141d6e

dax: Finish fault completely when loading holes · f449b936

Jan Kara authored Oct 19, 2016

The only case when we do not finish the page fault completely is when we
are loading hole pages into a radix tree. Avoid this special case and
finish the fault in that case as well inside the DAX fault handler. It
will allow us for easier iomap handling.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f449b936

dax: Avoid page invalidation races and unnecessary radix tree traversals · e3fce68c

Jan Kara authored Aug 10, 2016

Currently dax_iomap_rw() takes care of invalidating page tables and
evicting hole pages from the radix tree when write(2) to the file
happens. This invalidation is only necessary when there is some block
allocation resulting from write(2). Furthermore in current place the
invalidation is racy wrt page fault instantiating a hole page just after
we have invalidated it.

So perform the page invalidation inside dax_iomap_actor() where we can
do it only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

e3fce68c

mm: Invalidate DAX radix tree entries only if appropriate · c6dcf52c

Jan Kara authored Aug 10, 2016

Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
just delete all exceptional radix tree entries they find. For DAX this
is not desirable as we track cache dirtiness in these entries and when
they are evicted, we may not flush caches although it is necessary. This
can for example manifest when we write to the same block both via mmap
and via write(2) (to different offsets) and fsync(2) then does not
properly flush CPU caches when modification via write(2) was the last
one.

Create appropriate DAX functions to handle invalidation of DAX entries
for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
wire them up into the corresponding mm functions.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

c6dcf52c

ext2: Return BH_New buffers for zeroed blocks · e568df6b

Jan Kara authored Aug 10, 2016

So far we did not return BH_New buffers from ext2_get_blocks() when we
allocated and zeroed-out a block for DAX inode to avoid racy zeroing in
DAX code. This zeroing is gone these days so we can remove the
workaround.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

e568df6b

x86/mce/AMD: Make the init code more robust · 0dad3a30

Thomas Gleixner authored Dec 26, 2016

If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.

Add a sanity check.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0dad3a30

smp/hotplug: Undo tglxs brainfart · b9d9d691

Thomas Gleixner authored Dec 26, 2016

The attempt to prevent overwriting an active state resulted in a
disaster which effectively disables all dynamically allocated hotplug
states.

Cleanup the mess.

Fixes: dc280d93 ("cpu/hotplug: Prevent overwriting of callbacks")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b9d9d691

26 Dec, 2016 4 commits

arm64: don't pull uaccess.h into *.S · b4b8664d

Al Viro authored Dec 26, 2016

Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b4b8664d

net: korina: Fix NAPI versus resources freeing · e6afb1ad

Florian Fainelli authored Dec 23, 2016

Commit beb0babf ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:

Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.

Fixes: beb0babf ("korina: disable napi on close and restart")
Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6afb1ad

net, sched: fix soft lockup in tc_classify · 628185cf

Daniel Borkmann authored Dec 21, 2016

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

628185cf

Linux 4.10-rc1 · 7ce7d89f
Linus Torvalds authored Dec 25, 2016

7ce7d89f