1. 22 Oct, 2015 40 commits
    • Chen-Yu Tsai's avatar
      regulator: axp20x: Fix enable bit indexes for DCDC4 and DCDC5 · 24a51705
      Chen-Yu Tsai authored
      commit 6b3600b4 upstream.
      
      The enable bit indexes for DCDC4 and DCDC5 regulators are off by 1.
      
      We haven't run into any problems with this since either the regulators
      aren't defined in the DT and aren't used, or all the DCDC regulators
      have the "always-on" property set, as they are almost always used
      for system critical loads.
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24a51705
    • Charles Keepax's avatar
      regulator: core: Correct return value check in regulator_resolve_supply · b6556d8c
      Charles Keepax authored
      commit 23c3f310 upstream.
      
      The ret pointer passed to regulator_dev_lookup is only filled with a
      valid error code if regulator_dev_lookup returned NULL. Currently
      regulator_resolve_supply checks this ret value before it checks if a
      regulator was returned, this can result in valid regulator lookups being
      ignored.
      
      Fixes: 6261b06d ("regulator: Defer lookup of supply to regulator_get")
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6556d8c
    • Florian Westphal's avatar
      netfilter: nf_log: don't zap all loggers on unregister · 3d21d90d
      Florian Westphal authored
      commit 205ee117 upstream.
      
      like nf_log_unset, nf_log_unregister must not reset the list of loggers.
      Otherwise, a call to nf_log_unregister() will render loggers of other nf
      protocols unusable:
      
      iptables -A INPUT -j LOG
      modprobe nf_log_arp ; rmmod nf_log_arp
      iptables -A INPUT -j LOG
      iptables: No chain/target/match by that name
      
      Fixes: 30e0c6a6 ("netfilter: nf_log: prepare net namespace support for loggers")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d21d90d
    • Pablo Neira Ayuso's avatar
      netfilter: nft_compat: skip family comparison in case of NFPROTO_UNSPEC · 9177f81f
      Pablo Neira Ayuso authored
      commit ba378ca9 upstream.
      
      Fix lookup of existing match/target structures in the corresponding list
      by skipping the family check if NFPROTO_UNSPEC is used.
      
      This is resulting in the allocation and insertion of one match/target
      structure for each use of them. So this not only bloats memory
      consumption but also severely affects the time to reload the ruleset
      from the iptables-compat utility.
      
      After this patch, iptables-compat-restore and iptables-compat take
      almost the same time to reload large rulesets.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9177f81f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_log: wait for rcu grace after logger unregistration · e1f52516
      Pablo Neira Ayuso authored
      commit ad5001cc upstream.
      
      The nf_log_unregister() function needs to call synchronize_rcu() to make sure
      that the objects are not dereferenced anymore on module removal.
      
      Fixes: 5962815a ("netfilter: nf_log: use an array of loggers instead of list")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1f52516
    • Daniel Borkmann's avatar
      netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths · 14573919
      Daniel Borkmann authored
      commit 9cf94eab upstream.
      
      Commit 0838aa7f ("netfilter: fix netns dependencies with conntrack
      templates") migrated templates to the new allocator api, but forgot to
      update error paths for them in CT and synproxy to use nf_ct_tmpl_free()
      instead of nf_conntrack_free().
      
      Due to that, memory is being freed into the wrong kmemcache, but also
      we drop the per net reference count of ct objects causing an imbalance.
      
      In Brad's case, this leads to a wrap-around of net->ct.count and thus
      lets __nf_conntrack_alloc() refuse to create a new ct object:
      
        [   10.340913] xt_addrtype: ipv6 does not support BROADCAST matching
        [   10.810168] nf_conntrack: table full, dropping packet
        [   11.917416] r8169 0000:07:00.0 eth0: link up
        [   11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
        [   12.815902] nf_conntrack: table full, dropping packet
        [   15.688561] nf_conntrack: table full, dropping packet
        [   15.689365] nf_conntrack: table full, dropping packet
        [   15.690169] nf_conntrack: table full, dropping packet
        [   15.690967] nf_conntrack: table full, dropping packet
        [...]
      
      With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs.
      nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus,
      to fix the problem, export and use nf_ct_tmpl_free() instead.
      
      Fixes: 0838aa7f ("netfilter: fix netns dependencies with conntrack templates")
      Reported-by: default avatarBrad Jackson <bjackson0971@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14573919
    • Elad Raz's avatar
      netfilter: ipset: Fixing unnamed union init · 43fd0843
      Elad Raz authored
      commit 96be5f28 upstream.
      
      In continue to proposed Vinson Lee's post [1], this patch fixes compilation
      issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed
      unions causes compilation error in gcc 4.4.x.
      
      References
      
      Visible links
      [1] https://lkml.org/lkml/2015/7/5/74Signed-off-by: default avatarElad Raz <eladr@mellanox.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43fd0843
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Out of bound access in hash:net* types fixed · 44016f5e
      Jozsef Kadlecsik authored
      commit 6fe7ccfd upstream.
      
      Dave Jones reported that KASan detected out of bounds access in hash:net*
      types:
      
      [   23.139532] ==================================================================
      [   23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58
      [   23.152937] Write of size 4 by task ipset/457
      [   23.159742] =============================================================================
      [   23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected
      [   23.173641] -----------------------------------------------------------------------------
      [   23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456
      [   23.201836]  __slab_alloc.constprop.66+0x554/0x620
      [   23.208994]  __kmalloc+0x2f2/0x360
      [   23.216105]  hash_net_create+0x16a/0x470
      [   23.223238]  ip_set_create+0x3e6/0x740
      [   23.230343]  nfnetlink_rcv_msg+0x599/0x640
      [   23.237454]  netlink_rcv_skb+0x14f/0x190
      [   23.244533]  nfnetlink_rcv+0x3f6/0x790
      [   23.251579]  netlink_unicast+0x272/0x390
      [   23.258573]  netlink_sendmsg+0x5a1/0xa50
      [   23.265485]  SYSC_sendto+0x1da/0x2c0
      [   23.272364]  SyS_sendto+0xe/0x10
      [   23.279168]  entry_SYSCALL_64_fastpath+0x12/0x6f
      
      The bug is fixed in the patch and the testsuite is extended in ipset
      to check cidr handling more thoroughly.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44016f5e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: Use 32 bit addressing register from nft_type_to_reg() · 81907cb7
      Pablo Neira Ayuso authored
      commit bf798657 upstream.
      
      nft_type_to_reg() needs to return the register in the new 32 bit addressing,
      otherwise we hit EINVAL when using mappings.
      
      Fixes: 49499c3e ("netfilter: nf_tables: switch registers to 32 bit addressing")
      Reported-by: default avatarAndreas Schultz <aschultz@tpip.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81907cb7
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink: work around wrong endianess in res_id field · fa193d93
      Pablo Neira Ayuso authored
      commit a9de9777 upstream.
      
      The convention in nfnetlink is to use network byte order in every header field
      as well as in the attribute payload. The initial version of the batching
      infrastructure assumes that res_id comes in host byte order though.
      
      The only client of the batching infrastructure is nf_tables, so let's add a
      workaround to address this inconsistency. We currently have 11 nfnetlink
      subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem
      2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias
      of nf_tables from the nfnetlink batching path when interpreting the res_id
      field.
      
      Based on original patch from Florian Westphal.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa193d93
    • Bernhard Thaler's avatar
      netfilter: bridge: fix IPv6 packets not being bridged with CONFIG_IPV6=n · 969bdb93
      Bernhard Thaler authored
      commit 18e1db67 upstream.
      
      230ac490 introduced a dependency to CONFIG_IPV6 which breaks bridging
      of IPv6 packets on a bridge with CONFIG_IPV6=n.
      
      Sysctl entry /proc/sys/net/bridge/bridge-nf-call-ip6tables defaults to 1,
      for this reason packets are handled by br_nf_pre_routing_ipv6(). When compiled
      with CONFIG_IPV6=n this function returns NF_DROP but should return NF_ACCEPT
      to let packets through.
      
      Change CONFIG_IPV6=n br_nf_pre_routing_ipv6() return value to NF_ACCEPT.
      
      Tested with a simple bridge with two interfaces and IPv6 packets trying
      to pass from host on left side to host on right side of the bridge.
      
      Fixes: 230ac490 ("netfilter: bridge: split ipv6 code into separated file")
      Signed-off-by: default avatarBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      969bdb93
    • Mikulas Patocka's avatar
      dm raid: fix round up of default region size · 64d7be57
      Mikulas Patocka authored
      commit 042745ee upstream.
      
      Commit 3a0f9aae ("dm raid: round region_size to power of two")
      intended to make sure that the default region size is a power of two.
      However, the logic in that commit is incorrect and sets the variable
      region_size to 0 or 1, depending on whether min_region_size is a power
      of two.
      
      Fix this logic, using roundup_pow_of_two(), so that region_size is
      properly rounded up to the next power of two.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Fixes: 3a0f9aae ("dm raid: round region_size to power of two")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64d7be57
    • NeilBrown's avatar
      md/raid0: apply base queue limits *before* disk_stack_limits · c4e429f1
      NeilBrown authored
      commit 66eefe5d upstream.
      
      Calling e.g. blk_queue_max_hw_sectors() after calls to
      disk_stack_limits() discards the settings determined by
      disk_stack_limits().
      So we need to make those calls first.
      
      Fixes: 199dc6ed ("md/raid0: update queue parameter in a safer location.")
      Reported-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4e429f1
    • NeilBrown's avatar
      md/raid0: update queue parameter in a safer location. · a585e23d
      NeilBrown authored
      commit 199dc6ed upstream.
      
      When a (e.g.) RAID5 array is reshaped to RAID0, the updating
      of queue parameters (e.g. max number of sectors per bio) is
      done in the wrong place.
      It should be part of ->run, but it is actually part of ->takeover.
      This means it happens before level_store() calls:
      
      	blk_set_stacking_limits(&mddev->queue->limits);
      
      and so it ineffective.  This can lead to errors from underlying
      devices.
      
      So move all the relevant settings out of create_stripe_zones()
      and into raid0_run().
      
      As this can lead to a bug-on it is suitable for any -stable
      kernel which supports reshape to RAID0.  So 2.6.35 or later.
      As the bug has been present for five years there is no urgency,
      so no need to rush into -stable.
      
      Fixes: 9af204cf ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a585e23d
    • Liu.Zhao's avatar
      USB: option: add ZTE PIDs · 05bbd1d8
      Liu.Zhao authored
      commit 19ab6bc5 upstream.
      
      This is intended to add ZTE device PIDs on kernel.
      Signed-off-by: default avatarLiu.Zhao <lzsos369@163.com>
      [johan: sort the new entries ]
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05bbd1d8
    • Shawn Lin's avatar
      staging: ion: fix corruption of ion_import_dma_buf · 0bb5f025
      Shawn Lin authored
      commit 6fa92e2b upstream.
      
      we found this issue but still exit in lastest kernel. Simply
      keep ion_handle_create under mutex_lock to avoid this race.
      
      WARNING: CPU: 2 PID: 2648 at drivers/staging/android/ion/ion.c:512 ion_handle_add+0xb4/0xc0()
      ion_handle_add: buffer already found.
      Modules linked in: iwlmvm iwlwifi mac80211 cfg80211 compat
      CPU: 2 PID: 2648 Comm: TimedEventQueue Tainted: G        W    3.14.0 #7
       00000000 00000000 9a3efd2c 80faf273 9a3efd6c 9a3efd5c 80935dc9 811d7fd3
       9a3efd88 00000a58 812208a0 00000200 80e128d4 80e128d4 8d4ae00c a8cd8600
       a8cd8094 9a3efd74 80935e0e 00000009 9a3efd6c 811d7fd3 9a3efd88 9a3efd9c
      Call Trace:
        [<80faf273>] dump_stack+0x48/0x69
        [<80935dc9>] warn_slowpath_common+0x79/0x90
        [<80e128d4>] ? ion_handle_add+0xb4/0xc0
        [<80e128d4>] ? ion_handle_add+0xb4/0xc0
        [<80935e0e>] warn_slowpath_fmt+0x2e/0x30
        [<80e128d4>] ion_handle_add+0xb4/0xc0
        [<80e144cc>] ion_import_dma_buf+0x8c/0x110
        [<80c517c4>] reg_init+0x364/0x7d0
        [<80993363>] ? futex_wait+0x123/0x210
        [<80992e0e>] ? get_futex_key+0x16e/0x1e0
        [<8099308f>] ? futex_wake+0x5f/0x120
        [<80c51e19>] vpu_service_ioctl+0x1e9/0x500
        [<80994aec>] ? do_futex+0xec/0x8e0
        [<80971080>] ? prepare_to_wait_event+0xc0/0xc0
        [<80c51c30>] ? reg_init+0x7d0/0x7d0
        [<80a22562>] do_vfs_ioctl+0x2d2/0x4c0
        [<80b198ad>] ? inode_has_perm.isra.41+0x2d/0x40
        [<80b199cf>] ? file_has_perm+0x7f/0x90
        [<80b1a5f7>] ? selinux_file_ioctl+0x47/0xf0
        [<80a227a8>] SyS_ioctl+0x58/0x80
        [<80fb45e8>] syscall_call+0x7/0x7
        [<80fb0000>] ? mmc_do_calc_max_discard+0xab/0xe4
      
      Fixes: 83271f62 ("ion: hold reference to handle...")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bb5f025
    • Chuck Lever's avatar
      svcrdma: Fix send_reply() scatter/gather set-up · d4aaf034
      Chuck Lever authored
      commit 9d11b51c upstream.
      
      The Linux NFS server returns garbage in the data payload of inline
      NFS/RDMA READ replies. These are READs of under 1000 bytes or so
      where the client has not provided either a reply chunk or a write
      list.
      
      The NFS server delivers the data payload for an NFS READ reply to
      the transport in an xdr_buf page list. If the NFS client did not
      provide a reply chunk or a write list, send_reply() is supposed to
      set up a separate sge for the page containing the READ data, and
      another sge for XDR padding if needed, then post all of the sges via
      a single SEND Work Request.
      
      The problem is send_reply() does not advance through the xdr_buf
      when setting up scatter/gather entries for SEND WR. It always calls
      dma_map_xdr with xdr_off set to zero. When there's more than one
      sge, dma_map_xdr() sets up the SEND sge's so they all point to the
      xdr_buf's head.
      
      The current Linux NFS/RDMA client always provides a reply chunk or
      a write list when performing an NFS READ over RDMA. Therefore, it
      does not exercise this particular case. The Linux server has never
      had to use more than one extra sge for building RPC/RDMA replies
      with a Linux client.
      
      However, an NFS/RDMA client _is_ allowed to send small NFS READs
      without setting up a write list or reply chunk. The NFS READ reply
      fits entirely within the inline reply buffer in this case. This is
      perhaps a more efficient way of performing NFS READs that the Linux
      NFS/RDMA client may some day adopt.
      
      Fixes: b432e6b3 ('svcrdma: Change DMA mapping logic to . . .')
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4aaf034
    • Michal Kazior's avatar
      ath10k: fix dma_mapping_error() handling · e3f48292
      Michal Kazior authored
      commit 5e55e3cb upstream.
      
      The function returns 1 when DMA mapping fails. The
      driver would return bogus values and could
      possibly confuse itself if DMA failed.
      
      Fixes: 767d34fc ("ath10k: remove DMA mapping wrappers")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3f48292
    • Mike Snitzer's avatar
      dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE · b544e589
      Mike Snitzer authored
      commit 586b286b upstream.
      
      Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
      unfortunate constraint that is required to avoid the potential for
      exceeding dm-crypt's underlying device's max_segments limits -- due to
      crypt_alloc_buffer() possibly allocating pages for the encryption bio
      that are not as physically contiguous as the original bio.
      
      It is interesting to note that this problem was already fixed back in
      2007 via commit 91e10625 ("dm crypt: use bio_add_page").  But Linux 4.0
      commit cf2f1abf ("dm crypt: don't allocate pages for a partial
      request") regressed dm-crypt back to _not_ using bio_add_page().  But
      given dm-crypt's cpu parallelization changes all depend on commit
      cf2f1abf's abandoning of the more complex io fragments processing that
      dm-crypt previously had we cannot easily go back to using
      bio_add_page().
      
      So all said the cleanest way to resolve this issue is to fix dm-crypt to
      properly constrain the original bios entering dm-crypt so the encryption
      bios that dm-crypt generates from the original bios are always
      compatible with the underlying device's max_segments queue limits.
      
      It should be noted that technically Linux 4.3 does _not_ need this fix
      because of the block core's new late bio-splitting capability.  But, it
      is reasoned, there is little to be gained by having the block core split
      the encrypted bio that is composed of PAGE_SIZE segments.  That said, in
      the future we may revert this change.
      
      Fixes: cf2f1abf ("dm crypt: don't allocate pages for a partial request")
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421Suggested-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b544e589
    • Mike Snitzer's avatar
      dm thin: disable discard support for thin devices if pool's is disabled · 0bc3b652
      Mike Snitzer authored
      commit 21607670 upstream.
      
      If the pool is configured with 'ignore_discard' its discard support is
      disabled.  The pool's thin devices should also have queue_limits that
      reflect discards are disabled.
      
      Fixes: 34fbcf62 ("dm thin: range discard support")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bc3b652
    • Bjorn Helgaas's avatar
      PCI: Clear IORESOURCE_UNSET when clipping a bridge window · 4e0bada8
      Bjorn Helgaas authored
      commit b838b39e upstream.
      
      c770cb4c ("PCI: Mark invalid BARs as unassigned") sets IORESOURCE_UNSET
      if we fail to claim a resource.  If we tried to claim a bridge window,
      failed, clipped the window, and tried to claim the clipped window, we
      failed again because of IORESOURCE_UNSET:
      
        pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff window]
        pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff 64bit pref]: no compatible bridge window
        pci 0000:00:01.0: [mem size 0x20000000 64bit pref] clipped to [mem size 0x1df00000 64bit pref]
        pci 0000:00:01.0:   bridge window [mem size 0x1df00000 64bit pref]
        pci 0000:00:01.0: can't claim BAR 15 [mem size 0x1df00000 64bit pref]: no address assigned
      
      The 00:01.0 window started as [mem 0xbdf00000-0xddefffff 64bit pref].  That
      starts before the host bridge window [mem 0xc0000000-0xffffffff window], so
      we clipped the 00:01.0 window to [mem 0xc0000000-0xddefffff 64bit pref].
      But we left it marked IORESOURCE_UNSET, so the second claim failed when it
      should have succeeded.
      
      This means downstream devices will also fail for lack of resources, e.g.,
      in the bugzilla below,
      
        radeon 0000:01:00.0: Fatal error during GPU init
      
      Clear IORESOURCE_UNSET when we clip a bridge window.  Also clear
      IORESOURCE_UNSET in our copy of the unclipped window so we can see exactly
      what the original window was and how it now fits inside the upstream
      window.
      
      Fixes: c770cb4c ("PCI: Mark invalid BARs as unassigned")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491#c47Based-on-patch-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Based-on-patch-by: default avatarYinghai Lu <yinghai@kernel.org>
      Tested-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e0bada8
    • Alex Williamson's avatar
      PCI: Use function 0 VPD for identical functions, regular VPD for others · f6983f24
      Alex Williamson authored
      commit da2d03ea upstream.
      
      932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      added PCI_DEV_FLAGS_VPD_REF_F0.  Previously, we set the flag on every
      non-zero function of quirked devices.  If a function turned out to be
      different from function 0, i.e., it had a different class, vendor ID, or
      device ID, the flag remained set but we didn't make VPD accessible at all.
      
      Flip this around so we only set PCI_DEV_FLAGS_VPD_REF_F0 for functions that
      are identical to function 0, and allow regular VPD access for any other
      functions.
      
      [bhelgaas: changelog, stable tag]
      Fixes: 932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <helgaas@kernel.org>
      Acked-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Acked-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6983f24
    • Alex Williamson's avatar
      PCI: Fix devfn for VPD access through function 0 · 471802a3
      Alex Williamson authored
      commit 9d924075 upstream.
      
      Commit 932c435c ("PCI: Add dev_flags bit to access VPD through function
      0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
      Generally this works because we're fairly well guaranteed that a PCIe
      device is at slot address 0, but for the general case, including
      conventional PCI, it's incorrect.  We need to get the slot and then convert
      it back into a devfn.
      
      Fixes: 932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <helgaas@kernel.org>
      Acked-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Acked-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      471802a3
    • Rusty Russell's avatar
      tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap · 141b2ad2
      Rusty Russell authored
      commit e523caa6 upstream.
      
      Ours uses a u32 for the data, since we ensure it's always
      aligned and it's x86 so it doesn't matter anyway.
      
        lguest.c:128:8: error: redefinition of ‘struct virtio_pci_cfg_cap’
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 3121bb02 ("virtio: define virtio_pci_cfg_cap in header.")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      141b2ad2
    • Filipe Manana's avatar
      Btrfs: update fix for read corruption of compressed and shared extents · e83e472a
      Filipe Manana authored
      commit 808f80b4 upstream.
      
      My previous fix in commit 005efedf ("Btrfs: fix read corruption of
      compressed and shared extents") was effective only if the compressed
      extents cover a file range with a length that is not a multiple of 16
      pages. That's because the detection of when we reached a different range
      of the file that shares the same compressed extent as the previously
      processed range was done at extent_io.c:__do_contiguous_readpages(),
      which covers subranges with a length up to 16 pages, because
      extent_readpages() groups the pages in clusters no larger than 16 pages.
      So fix this by tracking the start of the previously processed file
      range's extent map at extent_readpages().
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create our test file with a single extent of 64Kb that is going to
            # be compressed no matter which compression algo is used (zlib/lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone the compressed extent into an adjacent file offset.
            $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            echo "File digest before unmount:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
      
            # Remount the fs or clear the page cache to trigger the bug in
            # btrfs. Because the extent has an uncompressed length that is a
            # multiple of 16 pages, all the pages belonging to the second range
            # of the file (64K to 128K), which points to the same extent as the
            # first range (0K to 64K), had their contents full of zeroes instead
            # of the byte 0xaa. This was a bug exclusively in the read path of
            # compressed extents, the correct data was stored on disk, btrfs
            # just failed to fill in the pages correctly.
            _scratch_remount
      
            echo "File digest after remount:"
            # Must match the digest we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e83e472a
    • Filipe Manana's avatar
      Btrfs: fix read corruption of compressed and shared extents · ee81cb3d
      Filipe Manana authored
      commit 005efedf upstream.
      
      If a file has a range pointing to a compressed extent, followed by
      another range that points to the same compressed extent and a read
      operation attempts to read both ranges (either completely or part of
      them), the pages that correspond to the second range are incorrectly
      filled with zeroes.
      
      Consider the following example:
      
        File layout
        [0 - 8K]                      [8K - 24K]
            |                             |
            |                             |
         points to extent X,         points to extent X,
         offset 4K, length of 8K     offset 0, length 16K
      
        [extent X, compressed length = 4K uncompressed length = 16K]
      
      If a readpages() call spans the 2 ranges, a single bio to read the extent
      is submitted - extent_io.c:submit_extent_page() would only create a new
      bio to cover the second range pointing to the extent if the extent it
      points to had a different logical address than the extent associated with
      the first range. This has a consequence of the compressed read end io
      handler (compression.c:end_compressed_bio_read()) finish once the extent
      is decompressed into the pages covering the first range, leaving the
      remaining pages (belonging to the second range) filled with zeroes (done
      by compression.c:btrfs_clear_biovec_end()).
      
      So fix this by submitting the current bio whenever we find a range
      pointing to a compressed extent that was preceded by a range with a
      different extent map. This is the simplest solution for this corner
      case. Making the end io callback populate both ranges (or more, if we
      have multiple pointing to the same extent) is a much more complex
      solution since each bio is tightly coupled with a single extent map and
      the extent maps associated to the ranges pointing to the shared extent
      can have different offsets and lengths.
      
      The following test case for fstests triggers the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create a test file with a single extent that is compressed (the
            # data we write into it is highly compressible no matter which
            # compression algorithm is used, zlib or lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"        \
                            -c "pwrite -S 0xbb 4K 8K"        \
                            -c "pwrite -S 0xcc 12K 4K"       \
                            $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone our extent into an adjacent offset.
            $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            # Same as before but for this file we clone the extent into a lower
            # file offset.
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K"         \
                            -c "pwrite -S 0xbb 12K 8K"        \
                            -c "pwrite -S 0xcc 20K 4K"        \
                            $SCRATCH_MNT/bar | _filter_xfs_io
      
            $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
                $SCRATCH_MNT/bar $SCRATCH_MNT/bar
      
            echo "File digests before unmounting filesystem:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
      
            # Evicting the inode or clearing the page cache before reading
            # again the file would also trigger the bug - reads were returning
            # all bytes in the range corresponding to the second reference to
            # the extent with a value of 0, but the correct data was persisted
            # (it was a bug exclusively in the read path). The issue happened
            # only if the same readpages() call targeted pages belonging to the
            # first and second ranges that point to the same compressed extent.
            _scratch_remount
      
            echo "File digests after mounting filesystem again:"
            # Must match the same digests we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee81cb3d
    • Jeff Mahoney's avatar
      btrfs: skip waiting on ordered range for special files · 6117f993
      Jeff Mahoney authored
      commit a30e577c upstream.
      
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6117f993
    • Andreas Dannenberg's avatar
      ASoC: tas2552: fix dBscale-min declaration · 0bfc2f5b
      Andreas Dannenberg authored
      commit e2600460 upstream.
      
      The minimum volume level for the TAS2552 (control register value 0x00)
      is -7dB however the driver declares it as -0.07dB.
      
      Running amixer before the patch reports:
      dBscale-min=-0.07dB,step=1.00dB,mute=0
      
      Running amixer with the patch applied reports:
      dBscale-min=-7.00dB,step=1.00dB,mute=0
      Signed-off-by: default avatarAndreas Dannenberg <dannenberg@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bfc2f5b
    • Gianluca Renzi's avatar
    • Lars-Peter Clausen's avatar
      ASoC: db1200: Fix DAI link format for db1300 and db1550 · 795d7120
      Lars-Peter Clausen authored
      commit e74679b3 upstream.
      
      Commit b4508d0f ("ASoC: db1200: Use static DAI format setup") switched
      the db1200 driver over to using static DAI format setup instead of a
      callback function. But the commit only added the dai_fmt field to one of
      the three DAI links in the driver. This breaks audio on db1300 and db1550.
      
      Add the two missing dai_fmt settings to fix the issue.
      
      Fixes: b4508d0f ("ASoC: db1200: Use static DAI format setup")
      Reported-by: default avatarManuel Lauss <manuel.lauss@gmail.com>
      Tested-by: default avatarManuel Lauss <manuel.lauss@gmail.com>
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      795d7120
    • Yitian Bu's avatar
      ASoC: dwc: correct irq clear method · 0633d0e6
      Yitian Bu authored
      commit 4873867e upstream.
      
      from Designware I2S datasheet, tx/rx XRUN irq is cleared by
      reading register TOR/ROR, rather than by writing into them.
      Signed-off-by: default avatarYitian Bu <yitian.bu@tangramtek.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0633d0e6
    • Robert Jarzmik's avatar
      ASoC: fix broken pxa SoC support · a0e9b9f1
      Robert Jarzmik authored
      commit 3c8f7710 upstream.
      
      The previous fix of pxa library support, which was introduced to fix the
      library dependency, broke the previous SoC behavior, where a machine
      code binding pxa2xx-ac97 with a coded relied on :
       - sound/soc/pxa/pxa2xx-ac97.c
       - sound/soc/codecs/XXX.c
      
      For example, the mioa701_wm9713.c machine code is currently broken. The
      "select ARM" statement wrongly selects the soc/arm/pxa2xx-ac97 for
      compilation, as per an unfortunate fate SND_PXA2XX_AC97 is both declared
      in sound/arm/Kconfig and sound/soc/pxa/Kconfig.
      
      Fix this by ensuring that SND_PXA2XX_SOC correctly triggers the correct
      pxa2xx-ac97 compilation.
      
      Fixes: 846172df ("ASoC: fix SND_PXA2XX_LIB Kconfig warning")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0e9b9f1
    • Robert Jarzmik's avatar
      ASoC: pxa: pxa2xx-ac97: fix dma requestor lines · e23564c6
      Robert Jarzmik authored
      commit 8811191f upstream.
      
      PCM receive and transmit DMA requestor lines were reverted, breaking the
      PCM playback interface for PXA platforms using the sound/soc/ variant
      instead of the sound/arm variant.
      
      The commit below shows the inversion in the requestor lines.
      
      Fixes: d65a1458 ("ASoC: pxa: use snd_dmaengine_dai_dma_data")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e23564c6
    • Takashi Iwai's avatar
      ALSA: hda - Disable power_save_node for IDT 92HD73xx chips · 403fd405
      Takashi Iwai authored
      commit c7e10080 upstream.
      
      The recent widget power saving introduced some unavoidable click
      noises on old IDT 92HD73xx chips while it still seems working on the
      compatible new chips.  In the bugzilla, we tried lots of tests and
      workarounds, but they didn't help much.  So, let's disable the feature
      for these specific chips as the least (but safest) fix.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104981Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      403fd405
    • John Flatness's avatar
      ALSA: hda - Apply SPDIF pin ctl to MacBookPro 12,1 · 99b50c24
      John Flatness authored
      commit e8ff581f upstream.
      
      The MacBookPro 12,1 has the same setup as the 11 for controlling the
      status of the optical audio light. Simply apply the existing workaround
      to the subsystem ID for the 12,1.
      
      [sorted the fixup entry by tiwai]
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=105401Signed-off-by: default avatarJohn Flatness <john@zerocrates.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99b50c24
    • Laura Abbott's avatar
      ALSA: hda: Add dock support for ThinkPad T550 · 22b502d4
      Laura Abbott authored
      commit d05ea7da upstream.
      
      Much like all the other Lenovo laptops, add a quirk to make
      sound work with docking.
      
      Reported-and-tested-by: lacknerflo@gmail.com
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22b502d4
    • Takashi Iwai's avatar
      ALSA: synth: Fix conflicting OSS device registration on AWE32 · 4154d4e8
      Takashi Iwai authored
      commit 225db576 upstream.
      
      When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel
      warnings like:
        WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80()
        sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0'
      
      It's because both emux synth and opl3 drivers try to register their
      OSS device object with the same static index number 0.  This hasn't
      been a big problem until the recent rewrite of device management code
      (that exposes sysfs at the same time), but it's been an obvious bug.
      
      This patch works around it just by using a different index number of
      emux synth object.  There can be a more elegant way to fix, but it's
      enough for now, as this code won't be touched so often, in anyway.
      Reported-and-tested-by: default avatarMichael Shell <list1@michaelshell.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4154d4e8
    • Takashi Iwai's avatar
      ALSA: hda - Disable power_save_node for Thinkpads · d061d1d5
      Takashi Iwai authored
      commit 7f57d803 upstream.
      
      Lenovo Thinkpads with recent Realtek codecs seem suffering from click
      noises at power transition since the introduction of widget power
      saving in 4.1 kernel.  Although this might be solved by some delays in
      appropriate points, as a quick workaround, just disable the
      power_save_node feature for now.  The gain it gives is relatively
      small, and this makes the situation back to pre 4.1 time.
      
      This patch ended up with a bit more code changes than usual because
      the existing fixup for Thinkpads is highly chained.  Instead of adding
      yet another chain, combine a few of them into a single fixup entry, as
      a gratis cleanup.
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=943982Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d061d1d5
    • Takashi Iwai's avatar
      ALSA: hda/tegra - async probe for avoiding module loading deadlock · 839fe740
      Takashi Iwai authored
      commit 83510441 upstream.
      
      The Tegra HD-audio controller driver causes deadlocks when loaded as a
      module since the driver invokes request_module() at binding with the
      codec driver.  This patch works around it by deferring the probe in a
      work like Intel HD-audio controller driver does.  Although hovering
      the codec probe stuff into udev would be a better solution, it may
      cause other regressions, so let's try this band-aid fix until the more
      proper solution gets landed.
      Reported-by: default avatarThierry Reding <treding@nvidia.com>
      Tested-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      839fe740
    • Greg Thelen's avatar
      memcg: fix dirty page migration · aba3953d
      Greg Thelen authored
      commit 0610c25d upstream.
      
      The problem starts with a file backed dirty page which is charged to a
      memcg.  Then page migration is used to move oldpage to newpage.
      
      Migration:
       - copies the oldpage's data to newpage
       - clears oldpage.PG_dirty
       - sets newpage.PG_dirty
       - uncharges oldpage from memcg
       - charges newpage to memcg
      
      Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
      count.
      
      However, because newpage is not yet charged, setting newpage.PG_dirty
      does not increment the memcg's dirty page count.  After migration
      completes newpage.PG_dirty is eventually cleared, often in
      account_page_cleaned().  At this time newpage is charged to a memcg so
      the memcg's dirty page count is decremented which causes underflow
      because the count was not previously incremented by migration.  This
      underflow causes balance_dirty_pages() to see a very large unsigned
      number of dirty memcg pages which leads to aggressive throttling of
      buffered writes by processes in non root memcg.
      
      This issue:
       - can harm performance of non root memcg buffered writes.
       - can report too small (even negative) values in
         memory.stat[(total_)dirty] counters of all memcg, including the root.
      
      To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
      page_memcg() and set_page_memcg() helpers.
      
      Test:
          0) setup and enter limited memcg
          mkdir /sys/fs/cgroup/test
          echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
          echo $$ > /sys/fs/cgroup/test/cgroup.procs
      
          1) buffered writes baseline
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          2) buffered writes with compaction antagonist to induce migration
          yes 1 > /proc/sys/vm/compact_memory &
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          kill %
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          3) buffered writes without antagonist, should match baseline
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
                             (speed, dirty residue)
                   unpatched                       patched
          1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
          2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
          3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages
      
          Notice that unpatched baseline performance (1) fell after
          migration (3): 841 -> 114 MB/s.  In the patched kernel, post
          migration performance matches baseline.
      
      Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aba3953d