1. 11 Mar, 2014 40 commits
    • majianpeng's avatar
      libceph: unregister request in __map_request failed and nofail == false · b3f19e7f
      majianpeng authored
      commit 73d9f7ee upstream.
      
      For nofail == false request, if __map_request failed, the caller does
      cleanup work, like releasing the relative pages.  It doesn't make any sense
      to retry this request.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      [bwh: Backported to 3.2: adjust indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3f19e7f
    • Maxim Patlasov's avatar
      fuse: hotfix truncate_pagecache() issue · fe17c202
      Maxim Patlasov authored
      commit 06a7c3c2 upstream.
      
      The way how fuse calls truncate_pagecache() from fuse_change_attributes()
      is completely wrong. Because, w/o i_mutex held, we never sure whether
      'oldsize' and 'attr->size' are valid by the time of execution of
      truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
      released fc->lock in the middle of fuse_change_attributes(), we completely
      loose control of actions which may happen with given inode until we reach
      truncate_pagecache. The list of potentially dangerous actions includes
      mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.
      
      The typical outcome of doing truncate_pagecache() with outdated arguments
      is data corruption from user point of view. This is (in some sense)
      acceptable in cases when the issue is triggered by a change of the file on
      the server (i.e. externally wrt fuse operation), but it is absolutely
      intolerable in scenarios when a single fuse client modifies a file without
      any external intervention. A real life case I discovered by fsx-linux
      looked like this:
      
      1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
      FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
      2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
      to the server synchronously, then calls fuse_change_attributes(). The
      latter updates i_size, releases fc->lock, but before comparing oldsize vs
      attr->size..
      3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
      updating attributes and i_size, but now oldsize is equal to
      outarg.attr.size because i_size has just been updated (step 2). Hence,
      fuse_do_setattr() returns w/o calling truncate_pagecache().
      4. As soon as ftruncate(2) completes, the user extends file size by
      write(2) making a hole in the middle of file, then reads data from the hole
      either by read(2) or mmap-ed read. The user expects to get zero data from
      the hole, but gets stale data because truncate_pagecache() is not executed
      yet.
      
      The scenario above illustrates one side of the problem: not truncating the
      page cache even though we should. Another side corresponds to truncating
      page cache too late, when the state of inode changed significantly.
      Theoretically, the following is possible:
      
      1. As in the previous scenario fuse_dentry_revalidate() discovered that
      i_size changed (due to our own fuse_do_setattr()) and is going to call
      truncate_pagecache() for some 'new_size' it believes valid right now. But
      by the time that particular truncate_pagecache() is called ...
      2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      not -- it doesn't matter).
      3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      4. mmap-ed write makes a page in the extended region dirty.
      
      The result will be the lost of data user wrote on the fourth step.
      
      The patch is a hotfix resolving the issue in a simplistic way: let's skip
      dangerous i_size update and truncate_pagecache if an operation changing
      file size is in progress. This simplistic approach looks correct for the
      cases w/o external changes. And to handle them properly, more sophisticated
      and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
      postpone it until the issue is well discussed on the mailing list(s).
      
      Changed in v2:
       - improved patch description to cover both sides of the issue.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: add the fuse_inode::state field which we didn't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe17c202
    • Miklos Szeredi's avatar
      fuse: readdir: check for slash in names · f4a69e06
      Miklos Szeredi authored
      commit efeb9e60 upstream.
      
      Userspace can add names containing a slash character to the directory
      listing.  Don't allow this as it could cause all sorts of trouble.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: drop changes to parse_dirplusfile() which we
       don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4a69e06
    • Vyacheslav Dubeyko's avatar
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 831c8764
      Vyacheslav Dubeyko authored
      commit 7f42ec39 upstream.
      
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: default avatarJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: default avatarAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: nilfs_clear_dirty_page() has not been separated
       from nilfs_clear_dirty_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      831c8764
    • Jiri Olsa's avatar
      perf tools: Fix cache event name generation · 73a11828
      Jiri Olsa authored
      commit 275ef387 upstream.
      
      If the event name is specified with all 3 components, the last one
      overwrites the previous one during the name composing within the
      parse_events_add_cache function.
      
      Fixing this by properly adjusting the string index.
      Reported-by: default avatarJoel Uckelman <joel@lightboxtechnologies.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joel Uckelman <joel@lightboxtechnologies.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LPU-Reference: 20120905175133.GA18352@krava.brq.redhat.com
      [ committer note: Remove the newline fix, done already in 42e1fb77 ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73a11828
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove extraneous newline when parsing hardware cache events · f25c118b
      Arnaldo Carvalho de Melo authored
      commit 42e1fb77 upstream.
      
      Noticed while developing a 'perf test' entry to verify that
      perf_evsel__name works.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xz6zgh38mp3cjnd2udh38z8f@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25c118b
    • Jiang Liu's avatar
      mm/hotplug: correctly add new zone to all other nodes' zone lists · 446327d6
      Jiang Liu authored
      commit 08dff7b7 upstream.
      
      When online_pages() is called to add new memory to an empty zone, it
      rebuilds all zone lists by calling build_all_zonelists().  But there's a
      bug which prevents the new zone to be added to other nodes' zone lists.
      
      online_pages() {
      	build_all_zonelists()
      	.....
      	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
      }
      
      Here the node of the zone is put into N_HIGH_MEMORY state after calling
      build_all_zonelists(), but build_all_zonelists() only adds zones from
      nodes in N_HIGH_MEMORY state to the fallback zone lists.
      build_all_zonelists()
      
          ->__build_all_zonelists()
      	->build_zonelists()
      	    ->find_next_best_node()
      		->for_each_node_state(n, N_HIGH_MEMORY)
      
      So memory in the new zone will never be used by other nodes, and it may
      cause strange behavor when system is under memory pressure.  So put node
      into N_HIGH_MEMORY state before calling build_all_zonelists().
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      446327d6
    • Tejun Heo's avatar
      cgroup: fix RCU accesses to task->cgroups · 71a20685
      Tejun Heo authored
      commit 14611e51 upstream.
      
      task->cgroups is a RCU pointer pointing to struct css_set.  A task
      switches to a different css_set on cgroup migration but a css_set
      doesn't change once created and its pointers to cgroup_subsys_states
      aren't RCU protected.
      
      task_subsys_state[_check]() is the macro to acquire css given a task
      and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
      task->cgroups, so the RCU pointer task->cgroups ends up being
      dereferenced without read_barrier_depends() after it.  It's broken.
      
      Fix it by introducing task_css_set[_check]() which does
      RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
      reimplemented to directly dereference ->subsys[] of the css_set
      returned from task_css_set[_check]().
      
      This removes some of sparse RCU warnings in cgroup.
      
      v2: Fixed unbalanced parenthsis and there's no need to use
          rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - Remove CONFIG_PROVE_RCU condition
       - s/lockdep_is_held(&cgroup_mutex)/cgroup_lock_is_held()/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a20685
    • Kees Cook's avatar
      proc connector: reject unprivileged listener bumps · 2f590c47
      Kees Cook authored
      commit e70ab977 upstream.
      
      While PROC_CN_MCAST_LISTEN/IGNORE is entirely advisory, it was possible
      for an unprivileged user to turn off notifications for all listeners by
      sending PROC_CN_MCAST_IGNORE. Instead, require the same privileges as
      required for a multicast bind.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarEvgeniy Polyakov <zbr@ioremap.net>
      Acked-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f590c47
    • Greg Edwards's avatar
      KVM: IOMMU: hva align mapping page size · f7741e3b
      Greg Edwards authored
      commit 27ef63c7 upstream.
      
      When determining the page size we could use to map with the IOMMU, the
      page size should also be aligned with the hva, not just the gfn.  The
      gfn may not reflect the real alignment within the hugetlbfs file.
      
      Most of the time, this works fine.  However, if the hugetlbfs file is
      backed by non-contiguous huge pages, a multi-huge page memslot starts at
      an unaligned offset within the hugetlbfs file, and the gfn is aligned
      with respect to the huge page size, kvm_host_page_size() will return the
      huge page size and we will use that to map with the IOMMU.
      
      When we later unpin that same memslot, the IOMMU returns the unmap size
      as the huge page size, and we happily unpin that many pfns in
      monotonically increasing order, not realizing we are spanning
      non-contiguous huge pages and partially unpin the wrong huge page.
      
      Ensure the IOMMU mapping page size is aligned with the hva corresponding
      to the gfn, which does reflect the alignment within the hugetlbfs file.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Edwards <gedwards@ddn.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      [bwh: Backported to 3.2: s/__gfn_to_hva_memslot/gfn_to_hva_memslot/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7741e3b
    • Alexander Graf's avatar
      KVM: PPC: Emulate dcbf · 67bc20f7
      Alexander Graf authored
      commit d3286144 upstream.
      
      Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
      incoherent MMIO, just do nothing and move on.
      Reported-by: default avatarBen Collins <ben.c@servergy.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarBen Collins <ben.c@servergy.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67bc20f7
    • Christian Borntraeger's avatar
      s390/kvm: dont announce RRBM support · 115f5063
      Christian Borntraeger authored
      commit 87cac8f8 upstream.
      
      Newer kernels (linux-next with the transparent huge page patches)
      use rrbm if the feature is announced via feature bit 66.
      RRBM will cause intercepts, so KVM does not handle it right now,
      causing an illegal instruction in the guest.
      The  easy solution is to disable the feature bit for the guest.
      
      This fixes bugs like:
      Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
      illegal operation: 0001 [#1] SMP
      Modules linked in: virtio_balloon virtio_net ipv6 autofs4
      CPU: 0 Not tainted 3.5.4 #1
      Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
      Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
           00000000003cc000 0000000000000004 0000000000000000 0000000079800000
           0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
           0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
           000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
       Krnl Code:>0000000000124c2a: b9810025          ogr     %r2,%r5
                 0000000000124c2e: 41343000           la      %r3,0(%r4,%r3)
                 0000000000124c32: a716fffa           brct    %r1,124c26
                 0000000000124c36: b9010022           lngr    %r2,%r2
                 0000000000124c3a: e3d0f0800004       lg      %r13,128(%r15)
                 0000000000124c40: eb22003f000c       srlg    %r2,%r2,63
      [ 2150.713198] Call Trace:
      [ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c)
      [ 2150.713749]  [<0000000000233812>] page_referenced+0x32a/0x410
      [...]
      
      CC: Alex Graf <agraf@suse.de>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      115f5063
    • Dominik Dingel's avatar
      KVM: s390: move kvm_guest_enter,exit closer to sie · bf5597c6
      Dominik Dingel authored
      commit 2b29a9fd upstream.
      
      Any uaccess between guest_enter and guest_exit could trigger a page fault,
      the page fault handler would handle it as a guest fault and translate a
      user address as guest address.
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5597c6
    • Tejun Heo's avatar
      cgroup: cgroup_subsys->fork() should be called after the task is added to css_set · 30ec268b
      Tejun Heo authored
      commit 5edee61e upstream.
      
      cgroup core has a bug which violates a basic rule about event
      notifications - when a new entity needs to be added, you add that to
      the notification list first and then make the new entity conform to
      the current state.  If done in the reverse order, an event happening
      inbetween will be lost.
      
      cgroup_subsys->fork() is invoked way before the new task is added to
      the css_set.  Currently, cgroup_freezer is the only user of ->fork()
      and uses it to make new tasks conform to the current state of the
      freezer.  If FROZEN state is requested while fork is in progress
      between cgroup_fork_callbacks() and cgroup_post_fork(), the child
      could escape freezing - the cgroup isn't frozen when ->fork() is
      called and the freezer couldn't see the new task on the css_set.
      
      This patch moves cgroup_subsys->fork() invocation to
      cgroup_post_fork() after the new task is added to the css_set.
      cgroup_fork_callbacks() is removed.
      
      Because now a task may be migrated during cgroup_subsys->fork(),
      freezer_fork() is updated so that it adheres to the usual RCU locking
      and the rather pointless comment on why locking can be different there
      is removed (if it doesn't make anything simpler, why even bother?).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      [hq: Backported to 3.4:
       - Adjust context
       - Iterate over first CGROUP_BUILTIN_SUBSYS_COUNT elements of subsys]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30ec268b
    • Johannes Weiner's avatar
      mm: vmscan: fix endless loop in kswapd balancing · f47929fd
      Johannes Weiner authored
      commit 60cefed4 upstream.
      
      Kswapd does not in all places have the same criteria for a balanced
      zone.  Zones are only being reclaimed when their high watermark is
      breached, but compaction checks loop over the zonelist again when the
      zone does not meet the low watermark plus two times the size of the
      allocation.  This gets kswapd stuck in an endless loop over a small
      zone, like the DMA zone, where the high watermark is smaller than the
      compaction requirement.
      
      Add a function, zone_balanced(), that checks the watermark, and, for
      higher order allocations, if compaction has enough free memory.  Then
      use it uniformly to check for balanced zones.
      
      This makes sure that when the compaction watermark is not met, at least
      reclaim happens and progress is made - or the zone is declared
      unreclaimable at some point and skipped entirely.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarGeorge Spelvin <linux@horizon.com>
      Reported-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: default avatarTomas Racek <tracek@redhat.com>
      Tested-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      f47929fd
    • Hannes Reinecke's avatar
      dm mpath: fix stalls when handling invalid ioctls · 8371cffe
      Hannes Reinecke authored
      commit a1989b33 upstream.
      
      An invalid ioctl will never be valid, irrespective of whether multipath
      has active paths or not.  So for invalid ioctls we do not have to wait
      for multipath to activate any paths, but can rather return an error
      code immediately.  This fix resolves numerous instances of:
      
       udevd[]: worker [] unexpectedly returned with status 0x0100
      
      that have been seen during testing.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8371cffe
    • Linus Walleij's avatar
      dma: ste_dma40: don't dereference free:d descriptor · 8d8e4839
      Linus Walleij authored
      commit e9baa9d9 upstream.
      
      It appears that in the DMA40 driver the DMA tasklet will very
      often dereference memory for a descriptor just free:d from the
      DMA40 slab. Nothing happens because no other part of the driver
      has yet had a chance to claim this memory, but it's really
      nasty to dereference free:d memory, so let's check the flag
      before the descriptor is free and store it in a bool variable.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d8e4839
    • Jan Kara's avatar
      quota: Fix race between dqput() and dquot_scan_active() · 54499e71
      Jan Kara authored
      commit 1362f4ea upstream.
      
      Currently last dqput() can race with dquot_scan_active() causing it to
      call callback for an already deactivated dquot. The race is as follows:
      
      CPU1					CPU2
        dqput()
          spin_lock(&dq_list_lock);
          if (atomic_read(&dquot->dq_count) > 1) {
           - not taken
          if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
            spin_unlock(&dq_list_lock);
            ->release_dquot(dquot);
              if (atomic_read(&dquot->dq_count) > 1)
               - not taken
      					  dquot_scan_active()
      					    spin_lock(&dq_list_lock);
      					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
      					     - not taken
      					    atomic_inc(&dquot->dq_count);
      					    spin_unlock(&dq_list_lock);
              - proceeds to release dquot
      					    ret = fn(dquot, priv);
      					     - called for inactive dquot
      
      Fix the problem by making sure possible ->release_dquot() is finished by
      the time we call the callback and new calls to it will notice reference
      dquot_scan_active() has taken and bail out.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54499e71
    • Eric Paris's avatar
      SELinux: bigendian problems with filename trans rules · 186ef238
      Eric Paris authored
      commit 9085a642 upstream.
      
      When writing policy via /sys/fs/selinux/policy I wrote the type and class
      of filename trans rules in CPU endian instead of little endian.  On
      x86_64 this works just fine, but it means that on big endian arch's like
      ppc64 and s390 userspace reads the policy and converts it from
      le32_to_cpu.  So the values are all screwed up.  Write the values in le
      format like it should have been to start.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186ef238
    • Peter Zijlstra's avatar
      perf: Fix hotplug splat · f80747a4
      Peter Zijlstra authored
      commit e3703f8c upstream.
      
      Drew Richardson reported that he could make the kernel go *boom* when hotplugging
      while having perf events active.
      
      It turned out that when you have a group event, the code in
      __perf_event_exit_context() fails to remove the group siblings from
      the context.
      
      We then proceed with destroying and freeing the event, and when you
      re-plug the CPU and try and add another event to that CPU, things go
      *boom* because you've still got dead entries there.
      Reported-by: default avatarDrew Richardson <drew.richardson@arm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f80747a4
    • Lai Jiangshan's avatar
      workqueue: ensure @task is valid across kthread_stop() · 23f0913c
      Lai Jiangshan authored
      commit 5bdfff96 upstream.
      
      When a kworker should die, the kworkre is notified through WORKER_DIE
      flag instead of kthread_should_stop().  This, IIRC, is primarily to
      keep the test synchronized inside worker_pool lock.  WORKER_DIE is
      first set while holding pool->lock, the lock is dropped and
      kthread_stop() is called.
      
      Unfortunately, this means that there's a slight chance that the target
      kworker may see WORKER_DIE before kthread_stop() finishes and exits
      and frees the target task before or during kthread_stop().
      
      Fix it by pinning the target task before setting WORKER_DIE and
      putting it after kthread_stop() is done.
      
      tj: Improved patch description and comment.  Moved pinning above
          WORKER_DIE for better signify what it's protecting.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23f0913c
    • Guenter Roeck's avatar
      hwmon: (max1668) Fix writing the minimum temperature · 8d362994
      Guenter Roeck authored
      commit 500a9157 upstream.
      
      When trying to set the minimum temperature, the driver was erroneously
      writing the maximum temperature into the chip.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d362994
    • Joerg Dorchain's avatar
      USB: ftdi_sio: add Cressi Leonardo PID · 564a4d6c
      Joerg Dorchain authored
      commit 6dbd46c8 upstream.
      
      Hello,
      
      the following patch adds an entry for the PID of a Cressi Leonardo
      diving computer interface to kernel 3.13.0.
      It is detected as FT232RL.
      Works with subsurface.
      Signed-off-by: default avatarJoerg Dorchain <joerg@dorchain.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      564a4d6c
    • Aleksander Morgado's avatar
      USB: serial: option: blacklist interface 4 for Cinterion PHS8 and PXS8 · f922754e
      Aleksander Morgado authored
      commit 12df84d4 upstream.
      
      This interface is to be handled by the qmi_wwan driver.
      
      CC: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
      CC: Christian Schmiedl <christian.schmiedl@gemalto.com>
      CC: Nicolaus Colberg <nicolaus.colberg@gemalto.com>
      CC: David McCullough <david.mccullough@accelecon.com>
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f922754e
    • Lan Tianyu's avatar
      ACPI / processor: Rework processor throttling with work_on_cpu() · 1d023027
      Lan Tianyu authored
      commit f3ca4164 upstream.
      
      acpi_processor_set_throttling() uses set_cpus_allowed_ptr() to make
      sure that the (struct acpi_processor)->acpi_processor_set_throttling()
      callback will run on the right CPU.  However, the function may be
      called from a worker thread already bound to a different CPU in which
      case that won't work.
      
      Make acpi_processor_set_throttling() use work_on_cpu() as appropriate
      instead of abusing set_cpus_allowed_ptr().
      Reported-and-tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarLan Tianyu <tianyu.lan@intel.com>
      [rjw: Changelog]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d023027
    • Hans de Goede's avatar
      ACPI / video: Filter the _BCL table for duplicate brightness values · b0b0c264
      Hans de Goede authored
      commit bd8ba205 upstream.
      
      Some devices have duplicate entries in there brightness levels table, ie
      on my Dell Latitude E6430 the table looks like this:
      
      [    3.686060] acpi backlight index   0, val 80
      [    3.686095] acpi backlight index   1, val 50
      [    3.686122] acpi backlight index   2, val 5
      [    3.686147] acpi backlight index   3, val 5
      [    3.686172] acpi backlight index   4, val 5
      [    3.686197] acpi backlight index   5, val 5
      [    3.686223] acpi backlight index   6, val 5
      [    3.686248] acpi backlight index   7, val 5
      [    3.686273] acpi backlight index   8, val 6
      [    3.686332] acpi backlight index   9, val 7
      [    3.686356] acpi backlight index  10, val 8
      [    3.686380] acpi backlight index  11, val 9
      etc.
      
      Notice that brightness values 0-5 are all mapped to 5. This means that
      if userspace writes any value between 0 and 5 to the brightness sysfs attribute
      and then reads it, it will always return 0, which is somewhat unexpected.
      
      This is a problem for ie gnome-settings-daemon, which uses read-modify-write
      logic when the users presses the brightness up or down keys. This is done
      this way to take brightness changes from other sources into account.
      
      On this specific laptop what happens once the brightness has been set to 0,
      is that gsd reads 0, adds 5, writes 5, and on the next brightness up key press
      again reads 0, so things get stuck at the lowest brightness setting.
      
      Filtering out the duplicate table entries, makes any write to brightness
      read back as the written value as one would expect, fixing this.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0b0c264
    • Jean Delvare's avatar
      i7core_edac: Fix PCI device reference count · 9f84cff0
      Jean Delvare authored
      commit c0f5eeed upstream.
      
      The reference count changes done by pci_get_device can be a little
      misleading when the usage diverges from the most common scheme. The
      reference count of the device passed as the last parameter is always
      decreased, even if the function returns no new device. So if we are
      going to try alternative device IDs, we must manually increment the
      device reference count before each retry. If we don't, we end up
      decreasing the reference count, and after a few modprobe/rmmod cycles
      the PCI devices will vanish.
      
      In other words and as Alan put it: without this fix the EDAC code
      corrupts the PCI device list.
      
      This fixes kernel bug #50491:
      https://bugzilla.kernel.org/show_bug.cgi?id=50491Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvareReviewed-by: default avatarAlan Cox <alan@linux.intel.com>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f84cff0
    • Tejun Heo's avatar
      sata_sil: apply MOD15WRITE quirk to TOSHIBA MK2561GSYN · 31d183aa
      Tejun Heo authored
      commit 9f9c47f0 upstream.
      
      It's a bit odd to see a newer device showing mod15write; however, the
      reported behavior is highly consistent and other factors which could
      contribute seem to have been verified well enough.  Also, both
      sata_sil itself and the drive are fairly outdated at this point making
      the risk of this change fairly low.  It is possible, probably likely,
      that other drive models in the same family have the same problem;
      however, for now, let's just add the specific model which was tested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarmatson <lists-matsonpa@luxsci.me>
      References: http://lkml.kernel.org/g/201401211912.s0LJCk7F015058@rs103.luxsci.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31d183aa
    • Denis V. Lunev's avatar
      ata: enable quirk from jmicron JMB350 for JMB394 · 13e60b22
      Denis V. Lunev authored
      commit efb9e0f4 upstream.
      
      Without the patch the kernel generates the following error.
      
       ata11.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
       ata11.15: Port Multiplier vendor mismatch '0x197b' != '0x123'
       ata11.15: PMP revalidation failed (errno=-19)
       ata11.15: failed to recover PMP after 5 tries, giving up
      
      This patch helps to bypass this error and the device becomes
      functional.
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: <linux-ide@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13e60b22
    • Peter Zijlstra's avatar
      perf/x86: Fix event scheduling · 494dac8e
      Peter Zijlstra authored
      commit 26e61e89 upstream.
      
      Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
      with perf WARN_ON()s triggering. He also provided traces of the failures.
      
      This is I think the relevant bit:
      
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
      	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
      	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
      
      So we add the insn:p event (fd[23]).
      
      At this point we should have:
      
        n_events = 2, n_added = 1, n_txn = 1
      
      	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
      	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
      
      We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
      These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
      that's not visible.
      
      	group_sched_in()
      	  pmu->start_txn() /* nop - BP pmu */
      	  event_sched_in()
      	     event->pmu->add()
      
      So here we should end up with:
      
        0: n_events = 3, n_added = 2, n_txn = 2
        4: n_events = 4, n_added = 3, n_txn = 3
      
      But seeing the below state on x86_pmu_enable(), the must have failed,
      because the 0 and 4 events aren't there anymore.
      
      Looking at group_sched_in(), since the BP is the leader, its
      event_sched_in() must have succeeded, for otherwise we would not have
      seen the sibling adds.
      
      But since neither 0 or 4 are in the below state; their event_sched_in()
      must have failed; but I don't see why, the complete state: 0,0,1:p,4
      fits perfectly fine on a core2.
      
      However, since we try and schedule 4 it means the 0 event must have
      succeeded!  Therefore the 4 event must have failed, its failure will
      have put group_sched_in() into the fail path, which will call:
      
      	event_sched_out()
      	  event->pmu->del()
      
      on 0 and the BP event.
      
      Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
      giving what we see below:
      
       n_event = 2, n_added = 2, n_txn = 2
      
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
      	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
      	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
      	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
      	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
      	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
      	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
      	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
      
      So the problem is that x86_pmu_del(), when called from a
      group_sched_in() that fails (for whatever reason), and without x86_pmu
      TXN support (because the leader is !x86_pmu), will corrupt the n_added
      state.
      Reported-and-Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      494dac8e
    • Laurent Dufour's avatar
      powerpc/crashdump : Fix page frame number check in copy_oldmem_page · 8dd74be4
      Laurent Dufour authored
      commit f5295bd8 upstream.
      
      In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
      decide if the page is backed or not, is not valid when the memory layout is
      not continuous.
      
      This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
      in the memory. In that case max_pfn points to the end of RTAS, and a hole
      between the end of the kdump kernel and RTAS is not backed by PTEs. As a
      consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
      in a direct way the pages in that hole.
      
      This fix relies on the memblock's service memblock_is_region_memory to
      check if the read page is part or not of the directly accessible memory.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Tested-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dd74be4
    • Trond Myklebust's avatar
      SUNRPC: Fix races in xs_nospace() · 5dd00a93
      Trond Myklebust authored
      commit 06ea0bfe upstream.
      
      When a send failure occurs due to the socket being out of buffer space,
      we call xs_nospace() in order to have the RPC task wait until the
      socket has drained enough to make it worth while trying again.
      The current patch fixes a race in which the socket is drained before
      we get round to setting up the machinery in xs_nospace(), and which
      is reported to cause hangs.
      
      Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
      Fixes: a9a6b52e (SUNRPC: Don't start the retransmission timer...)
      Reported-by: default avatarNeil Brown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dd00a93
    • Lars-Peter Clausen's avatar
      ASoC: wm8958-dsp: Fix firmware block loading · 52b418e8
      Lars-Peter Clausen authored
      commit 548da08f upstream.
      
      The codec->control_data contains a pointer to the device's regmap struct. But
      wm8994_bulk_write() expects a pointer to the parent wm8998 device.
      
      The issue was introduced in commit d9a7666f ("ASoC: Remove ASoC-specific
      WM8994 I/O code").
      
      Fixes: d9a7666f ("ASoC: Remove ASoC-specific WM8994 I/O code")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52b418e8
    • Takashi Iwai's avatar
      ASoC: sta32x: Fix array access overflow · 71a3a9e7
      Takashi Iwai authored
      commit 025c3fa9 upstream.
      
      Preset EQ enum of sta32x codec driver declares too many number of
      items and it may lead to the access over the actual array size.
      
      Use SOC_ENUM_SINGLE_DECL() helper and it's automatically fixed.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarLiam Girdwood <liam.r.girdwood@linux.intel.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a3a9e7
    • Takashi Iwai's avatar
      ASoC: sta32x: Fix wrong enum for limiter2 release rate · 7cc83574
      Takashi Iwai authored
      commit b3619b28 upstream.
      
      There is a typo in the Limiter2 Release Rate control, a wrong enum for
      Limiter1 is assigned.  It must point to Limiter2.
      Spotted by a compile warning:
      
      In file included from sound/soc/codecs/sta32x.c:34:0:
      sound/soc/codecs/sta32x.c:223:29: warning: ‘sta32x_limiter2_release_rate_enum’ defined but not used [-Wunused-variable]
       static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
                                   ^
      include/sound/soc.h:275:18: note: in definition of macro ‘SOC_ENUM_DOUBLE_DECL’
        struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
                        ^
      sound/soc/codecs/sta32x.c:223:8: note: in expansion of macro ‘SOC_ENUM_SINGLE_DECL’
       static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
              ^
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cc83574
    • Takashi Iwai's avatar
      ASoC: wm8770: Fix wrong number of enum items · ff177edb
      Takashi Iwai authored
      commit 7a6c0a58 upstream.
      
      wm8770 codec driver defines ain_enum with a wrong number of items.
      
      Use SOC_ENUM_DOUBLE_DECL() macro and it's automatically fixed.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarLiam Girdwood <liam.r.girdwood@linux.intel.com>
      Acked-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff177edb
    • Clemens Ladisch's avatar
      ALSA: usb-audio: work around KEF X300A firmware bug · 969d5499
      Clemens Ladisch authored
      commit 624aef49 upstream.
      
      When the driver tries to access Function Unit 10, the KEF X300A
      speakers' firmware apparently locks up, making even PCM streaming
      impossible.  Work around this by ignoring this FU.
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      969d5499
    • Florian Westphal's avatar
      net: ip, ipv6: handle gso skbs in forwarding path · 29a3cd46
      Florian Westphal authored
      commit fe6cc55f upstream.
      
      [ use zero netdev_feature mask to avoid backport of
        netif_skb_dev_features function ]
      
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29a3cd46
    • Florian Westphal's avatar
      net: add and use skb_gso_transport_seglen() · 32e66c06
      Florian Westphal authored
      commit de960aa9 upstream.
      
      [ no skb_gso_seglen helper in 3.4, leave tbf alone ]
      
      This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
      skbuff core so it may be reused by upcoming ip forwarding path patch.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32e66c06
    • Daniel Borkmann's avatar
      net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode · 19e48381
      Daniel Borkmann authored
      [ Upstream commit ffd59393 ]
      
      SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
      emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
      'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
      sizeof(param) check will always fail in kernel as the structure in
      64bit kernel space is 4bytes larger than for user binaries compiled
      in 32bit mode. Thus, applications making use of sctp_connectx() won't
      be able to run under such circumstances.
      
      Introduce a compat interface in the kernel to deal with such
      situations by using a 'struct compat_sctp_getaddrs_old' structure
      where user data is copied into it, and then sucessively transformed
      into a 'struct sctp_getaddrs_old' structure with the help of
      compat_ptr(). That fixes sctp_connectx() abi without any changes
      needed in user space, and lets the SCTP test suite pass when compiled
      in 32bit and run on 64bit kernels.
      
      Fixes: f9c67811 ("sctp: Fix regression introduced by new sctp_connectx api")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19e48381