1. 16 Mar, 2016 29 commits
    • Chris Bainbridge's avatar
      mac80211: fix use of uninitialised values in RX aggregation · fac8bf1b
      Chris Bainbridge authored
      commit f39ea269 upstream.
      
      Use kzalloc instead of kmalloc for struct tid_ampdu_rx to
      initialize the "removed" field (all others are initialized
      manually). That fixes:
      
      UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29
      load of value 2 is not a valid value for type '_Bool'
      CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265
      Workqueue: phy0 rt2x00usb_work_rxdone
       0000000000000004 ffff880254a7ba50 ffffffff8181d866 0000000000000007
       ffff880254a7ba78 ffff880254a7ba68 ffffffff8188422d ffffffff8379b500
       ffff880254a7bab8 ffffffff81884747 0000000000000202 0000000348620032
      Call Trace:
       [<ffffffff8181d866>] dump_stack+0x45/0x5f
       [<ffffffff8188422d>] ubsan_epilogue+0xd/0x40
       [<ffffffff81884747>] __ubsan_handle_load_invalid_value+0x67/0x70
       [<ffffffff82227b4d>] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730
       [<ffffffff8222ca14>] ieee80211_prepare_and_rx_handle+0xd04/0x1c00
       [<ffffffff8222db03>] __ieee80211_rx_handle_packet+0x1f3/0x750
       [<ffffffff8222e4a7>] ieee80211_rx_napi+0x447/0x990
      
      While at it, convert to use sizeof(*tid_agg_rx) instead.
      
      Fixes: 788211d8 ("mac80211: fix RX A-MPDU session reorder timer deletion")
      Signed-off-by: default avatarChris Bainbridge <chris.bainbridge@gmail.com>
      [reword commit message, use sizeof(*tid_agg_rx)]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fac8bf1b
    • Sven Eckelmann's avatar
      mac80211: minstrel: Change expected throughput unit back to Kbps · 03d76167
      Sven Eckelmann authored
      commit 212c5a5e upstream.
      
      The change from cur_tp to the function
      minstrel_get_tp_avg/minstrel_ht_get_tp_avg changed the unit used for the
      current throughput. For example in minstrel_ht the correct
      conversion between them would be:
      
          mrs->cur_tp / 10 == minstrel_ht_get_tp_avg(..).
      
      This factor 10 must also be included in the calculation of
      minstrel_get_expected_throughput and minstrel_ht_get_expected_throughput to
      return values with the unit [Kbps] instead of [10Kbps]. Otherwise routing
      algorithms like B.A.T.M.A.N. V will make incorrect decision based on these
      values. Its kernel based implementation expects expected_throughput always
      to have the unit [Kbps] and not sometimes [10Kbps] and sometimes [Kbps].
      
      The same requirement has iw or olsrdv2's nl80211 based statistics module
      which retrieve the same data via NL80211_STA_INFO_TX_BITRATE.
      
      Fixes: 6a27b2c4 ("mac80211: restructure per-rate throughput calculation into function")
      Signed-off-by: default avatarSven Eckelmann <sven@open-mesh.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03d76167
    • Liad Kaufman's avatar
      iwlwifi: mvm: inc pending frames counter also when txing non-sta · 8c904a44
      Liad Kaufman authored
      commit fb896c44 upstream.
      
      Until this patch, when TXing non-sta the pending_frames counter
      wasn't increased, but it WAS decreased in
      iwl_mvm_rx_tx_cmd_single(), what makes it negative in certain
      conditions. This in turn caused much trouble when we need to
      remove the station since we won't be waiting forever until
      pending_frames gets 0. In certain cases, we were exhausting
      the station table even in BSS mode, because we had a lot of
      stale stations.
      
      Increase the counter also in iwl_mvm_tx_skb_non_sta() after a
      successful TX to avoid this outcome.
      Signed-off-by: default avatarLiad Kaufman <liad.kaufman@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c904a44
    • Maximilain Schneider's avatar
      can: gs_usb: fixed disconnect bug by removing erroneous use of kfree() · 16517aa0
      Maximilain Schneider authored
      commit e9a2d81b upstream.
      
      gs_destroy_candev() erroneously calls kfree() on a struct gs_can *, which is
      allocated through alloc_candev() and should instead be freed using
      free_candev() alone.
      
      The inappropriate use of kfree() causes the kernel to hang when
      gs_destroy_candev() is called.
      
      Only the struct gs_usb * which is allocated through kzalloc() should be freed
      using kfree() when the device is disconnected.
      Signed-off-by: default avatarMaximilian Schneider <max@schneidersoft.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16517aa0
    • Johannes Berg's avatar
      cfg80211/wext: fix message ordering · bfed1f51
      Johannes Berg authored
      commit cb150b9d upstream.
      
      Since cfg80211 frequently takes actions from its netdev notifier
      call, wireless extensions messages could still be ordered badly
      since the wext netdev notifier, since wext is built into the
      kernel, runs before the cfg80211 netdev notifier. For example,
      the following can happen:
      
      5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
          link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
      5: wlan1: <BROADCAST,MULTICAST,UP>
          link/ether
      
      when setting the interface down causes the wext message.
      
      To also fix this, export the wireless_nlevent_flush() function
      and also call it from the cfg80211 notifier.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfed1f51
    • Johannes Berg's avatar
      wext: fix message delay/ordering · 176e879d
      Johannes Berg authored
      commit 8bf86273 upstream.
      
      Beniamino reported that he was getting an RTM_NEWLINK message for a
      given interface, after the RTM_DELLINK for it. It turns out that the
      message is a wireless extensions message, which was sent because the
      interface had been connected and disconnection while it was deleted
      caused a wext message.
      
      For its netlink messages, wext uses RTM_NEWLINK, but the message is
      without all the regular rtnetlink attributes, so "ip monitor link"
      prints just rudimentary information:
      
      5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
          link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
      Deleted 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
          link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
      5: wlan1: <BROADCAST,MULTICAST,UP>
          link/ether
      (from my hwsim reproduction)
      
      This can cause userspace to get confused since it doesn't expect an
      RTM_NEWLINK message after RTM_DELLINK.
      
      The reason for this is that wext schedules a worker to send out the
      messages, and the scheduling delay can cause the messages to get out
      to userspace in different order.
      
      To fix this, have wext register a netdevice notifier and flush out
      any pending messages when netdevice state changes. This fixes any
      ordering whenever the original message wasn't sent by a notifier
      itself.
      Reported-by: default avatarBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      176e879d
    • Konstantin Khlebnikov's avatar
      ovl: fix working on distributed fs as lower layer · 455fadec
      Konstantin Khlebnikov authored
      commit b5891cfa upstream.
      
      This adds missing .d_select_inode into alternative dentry_operations.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Fixes: 7c03b5d4 ("ovl: allow distributed fs as lower layer")
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Reviewed-by: default avatarNikolay Borisov <kernel@kyup.com>
      Tested-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      455fadec
    • Konstantin Khlebnikov's avatar
      ovl: ignore lower entries when checking purity of non-directory entries · e25cdcef
      Konstantin Khlebnikov authored
      commit 45d11738 upstream.
      
      After rename file dentry still holds reference to lower dentry from
      previous location. This doesn't matter for data access because data comes
      from upper dentry. But this stale lower dentry taints dentry at new
      location and turns it into non-pure upper. Such file leaves visible
      whiteout entry after remove in directory which shouldn't have whiteouts at
      all.
      
      Overlayfs already tracks pureness of file location in oe->opaque.  This
      patch just uses that for detecting actual path type.
      
      Comment from Vivek Goyal's patch:
      
      Here are the details of the problem. Do following.
      
      $ mkdir upper lower work merged upper/dir/
      $ touch lower/test
      $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=
      work merged
      $ mv merged/test merged/dir/
      $ rm merged/dir/test
      $ ls -l merged/dir/
      /usr/bin/ls: cannot access merged/dir/test: No such file or directory
      total 0
      c????????? ? ? ? ?            ? test
      
      Basic problem seems to be that once a file has been unlinked, a whiteout
      has been left behind which was not needed and hence it becomes visible.
      
      Whiteout is visible because parent dir is of not type MERGE, hence
      od->is_real is set during ovl_dir_open(). And that means ovl_iterate()
      passes on iterate handling directly to underlying fs. Underlying fs does
      not know/filter whiteouts so it becomes visible to user.
      
      Why did we leave a whiteout to begin with when we should not have.
      ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave
      whiteout if file is pure upper. In this case file is not found to be pure
      upper hence whiteout is left.
      
      So why file was not PURE_UPPER in this case? I think because dentry is
      still carrying some leftover state which was valid before rename. For
      example, od->numlower was set to 1 as it was a lower file. After rename,
      this state is not valid anymore as there is no such file in lower.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarViktor Stanchev <me@viktorstanchev.com>
      Suggested-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e25cdcef
    • Takashi Iwai's avatar
      ASoC: wm8958: Fix enum ctl accesses in a wrong type · 3f2200cf
      Takashi Iwai authored
      commit d0784829 upstream.
      
      "MBC Mode", "VSS Mode", "VSS HPF Mode" and "Enhanced EQ Mode" ctls in
      wm8958 codec driver are enum, while the current driver accesses
      wrongly via value.integer.value[].  They have to be via
      value.enumerated.item[] instead.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f2200cf
    • Takashi Iwai's avatar
      ASoC: wm8994: Fix enum ctl accesses in a wrong type · 8d8dddbd
      Takashi Iwai authored
      commit 8019c0b3 upstream.
      
      The DRC Mode like "AIF1DRC1 Mode" and EQ Mode like "AIF1.1 EQ Mode" in
      wm8994 codec driver are enum ctls, while the current driver accesses
      wrongly via value.integer.value[].  They have to be via
      value.enumerated.item[] instead.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d8dddbd
    • Charles Keepax's avatar
      ASoC: samsung: Use IRQ safe spin lock calls · 48034ae6
      Charles Keepax authored
      commit 316fa9e0 upstream.
      
      Lockdep warns of a potential lock inversion, i2s->lock is held numerous
      times whilst we are under the substream lock (snd_pcm_stream_lock). If
      we use the IRQ unsafe spin lock calls, you can also end up locking
      snd_pcm_stream_lock whilst under i2s->lock (if an IRQ happens whilst we
      are holding i2s->lock). This could result in deadlock.
      
      [   18.147001]        CPU0                    CPU1
      [   18.151509]        ----                    ----
      [   18.156022]   lock(&(&pri_dai->spinlock)->rlock);
      [   18.160701]                                local_irq_disable();
      [   18.166622]                                lock(&(&substream->self_group.lock)->rlock);
      [   18.174595]                                lock(&(&pri_dai->spinlock)->rlock);
      [   18.181806]   <Interrupt>
      [   18.184408]     lock(&(&substream->self_group.lock)->rlock);
      [   18.190045]
      [   18.190045]  *** DEADLOCK ***
      
      This patch changes to using the irq safe spinlock calls, to avoid this
      issue.
      
      Fixes: ce8bcdbb ("ASoC: samsung: i2s: Protect more registers with a spinlock")
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Tested-by: default avatarAnand Moon <linux.amoon@gmail.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48034ae6
    • Takashi Iwai's avatar
      ASoC: dapm: Fix ctl value accesses in a wrong type · 21a397c8
      Takashi Iwai authored
      commit 741338f9 upstream.
      
      snd_soc_dapm_dai_link_get() and _put() access the associated ctl
      values as value.integer.value[].  However, this is an enum ctl, and it
      has to be accessed via value.enumerated.item[].  The former is long
      while the latter is unsigned int, so they don't align.
      
      Fixes: c6615082 ('ASoC: dapm: add code to configure dai link parameters')
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21a397c8
    • Al Viro's avatar
      ncpfs: fix a braino in OOM handling in ncp_fill_cache() · 3c92cc68
      Al Viro authored
      commit 803c0012 upstream.
      
      Failing to allocate an inode for child means that cache for *parent* is
      incompletely populated.  So it's parent directory inode ('dir') that
      needs NCPI_DIR_CACHE flag removed, *not* the child inode ('inode', which
      is what we'd failed to allocate in the first place).
      
      Fucked-up-in: commit 5e993e25 ("ncpfs: get rid of d_validate() nonsense")
      Fucked-up-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c92cc68
    • Al Viro's avatar
      jffs2: reduce the breakage on recovery from halfway failed rename() · a29fe6f3
      Al Viro authored
      commit f9381284 upstream.
      
      d_instantiate(new_dentry, old_inode) is absolutely wrong thing to
      do - it will oops if new_dentry used to be positive, for starters.
      What we need is d_invalidate() the target and be done with that.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a29fe6f3
    • Ludovic Desroches's avatar
      dmaengine: at_xdmac: fix residue computation · 782cfeb2
      Ludovic Desroches authored
      commit 25c5e962 upstream.
      
      When computing the residue we need two pieces of information: the current
      descriptor and the remaining data of the current descriptor. To get
      that information, we need to read consecutively two registers but we
      can't do it in an atomic way. For that reason, we have to check manually
      that current descriptor has not changed.
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Suggested-by: default avatarCyrille Pitchen <cyrille.pitchen@atmel.com>
      Reported-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      Tested-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      Fixes: e1f7c9ee ("dmaengine: at_xdmac: creation of the atmel
      eXtended DMA Controller driver")
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      782cfeb2
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix check for cpu online when event is disabled · f3c83858
      Steven Rostedt (Red Hat) authored
      commit dc17147d upstream.
      
      Commit f3775549 ("tracepoints: Do not trace when cpu is offline") added
      a check to make sure that tracepoints only get called when the cpu is
      online, as it uses rcu_read_lock_sched() for protection.
      
      Commit 3a630178 ("tracing: generate RCU warnings even when tracepoints
      are disabled") added lockdep checks (including rcu checks) for events that
      are not enabled to catch possible RCU issues that would only be triggered if
      a trace event was enabled. Commit f3775549 only stopped the warnings
      when the trace event was enabled but did not prevent warnings if the trace
      event was called when disabled.
      
      To fix this, the cpu online check is moved to where the condition is added
      to the trace event. This will place the cpu online check in all places that
      it may be used now and in the future.
      
      Fixes: f3775549 ("tracepoints: Do not trace when cpu is offline")
      Fixes: 3a630178 ("tracing: generate RCU warnings even when tracepoints are disabled")
      Reported-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3c83858
    • Heiko Carstens's avatar
      s390/dasd: fix diag 0x250 inline assembly · dc5a6075
      Heiko Carstens authored
      commit ce0c12b6 upstream.
      
      git commit 1ec2772e ("s390/diag: add a statistic for diagnose
      calls") added function calls to gather diagnose statistics.
      
      In case of the dasd diag driver the function call was added between a
      register asm statement which initialized register r2 and the inline
      assembly itself.  The function call clobbers the contents of register
      r2 and therefore the diag 0x250 call behaves in a more or less random
      way.
      
      Fix this by extracting the function call into a separate function like
      we do everywhere else.
      
      Fixes: 1ec2772e ("s390/diag: add a statistic for diagnose calls")
      Reported-and-tested-by: default avatarStefan Haberland <sth@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc5a6075
    • Martin Schwidefsky's avatar
      s390/mm: four page table levels vs. fork · 5833fac3
      Martin Schwidefsky authored
      commit 3446c13b upstream.
      
      The fork of a process with four page table levels is broken since
      git commit 6252d702 "[S390] dynamic page tables."
      
      All new mm contexts are created with three page table levels and
      an asce limit of 4TB. If the parent has four levels dup_mmap will
      add vmas to the new context which are outside of the asce limit.
      The subsequent call to copy_page_range will walk the three level
      page table structure of the new process with non-zero pgd and pud
      indexes. This leads to memory clobbers as the pgd_index *and* the
      pud_index is added to the mm->pgd pointer without a pgd_deref
      in between.
      
      The init_new_context() function is selecting the number of page
      table levels for a new context. The function is used by mm_init()
      which in turn is called by dup_mm() and mm_alloc(). These two are
      used by fork() and exec(). The init_new_context() function can
      distinguish the two cases by looking at mm->context.asce_limit,
      for fork() the mm struct has been copied and the number of page
      table levels may not change. For exec() the mm_alloc() function
      set the new mm structure to zero, in this case a three-level page
      table is created as the temporary stack space is located at
      STACK_TOP_MAX = 4TB.
      
      This fixes CVE-2016-2143.
      Reported-by: default avatarMarcin Kościelnicki <koriakin@0x04.net>
      Reviewed-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5833fac3
    • Paolo Bonzini's avatar
      KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 1ebd29d6
      Paolo Bonzini authored
      commit 5f0b8199 upstream.
      
      KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
      CR0.WP=1.  These pages' SPTEs flip continuously between two states:
      U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
      and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
      When guest EFER has the NX bit cleared, the reserved bit check thinks
      that the latter state is invalid; teach it that the smep_andnot_wp case
      will also use the NX bit of SPTEs.
      Reviewed-by: default avatarXiao Guangrong <guangrong.xiao@linux.inel.com>
      Fixes: c258b62bSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ebd29d6
    • Paolo Bonzini's avatar
      KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo · 68ed2ca1
      Paolo Bonzini authored
      commit 844a5fe2 upstream.
      
      Yes, all of these are needed. :) This is admittedly a bit odd, but
      kvm-unit-tests access.flat tests this if you run it with "-cpu host"
      and of course ept=0.
      
      KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
      specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
      when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
      When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
      restarts execution.  This will still cause a user write to fault, while
      supervisor writes will succeed.  User reads will fault spuriously now,
      and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
      will be enabled and supervisor writes disabled, going back to the
      originary situation where supervisor writes fault spuriously.
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0.  If the guest has not enabled NX, the result is a continuous
      stream of page faults due to the NX bit being reserved.
      
      The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
      switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
      control, so they do not use user-return notifiers for EFER---if they did,
      EFER.NX would be forced to the same value as the host).
      
      There is another bug in the reserved bit check, which I've split to a
      separate patch for easier application to stable kernels.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Reviewed-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Fixes: f6577a5fSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68ed2ca1
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit · 1c463a39
      Paul Mackerras authored
      commit ccec4456 upstream.
      
      Thomas Huth discovered that a guest could cause a hard hang of a
      host CPU by setting the Instruction Authority Mask Register (IAMR)
      to a suitable value.  It turns out that this is because when the
      code was added to context-switch the new special-purpose registers
      (SPRs) that were added in POWER8, we forgot to add code to ensure
      that they were restored to a sane value on guest exit.
      
      This adds code to set those registers where a bad value could
      compromise the execution of the host kernel to a suitable neutral
      value on guest exit.
      
      Fixes: b005255eReported-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c463a39
    • David Hildenbrand's avatar
      KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS · 78939530
      David Hildenbrand authored
      commit 9522b37f upstream.
      
      With MACHINE_HAS_VX, we convert the floating point registers from the
      vector registeres when storing the status. For other VCPUs, these are
      stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs,
      which resolves to the currently loaded VCPU.
      
      So kvm_s390_store_status_unloaded() currently writes the wrong floating
      point registers (converted from the vector registers) when called from
      another VCPU on a z13.
      
      This is only the case for old user space not handling SIGP STORE STATUS and
      SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All
      other calls come from the loaded VCPU via kvm_s390_store_status().
      
      Fixes: 9abc2a08 (KVM: s390: fix memory overwrites when vx is disabled)
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78939530
    • Radim Krčmář's avatar
      KVM: VMX: disable PEBS before a guest entry · 0bbe5fa4
      Radim Krčmář authored
      commit 7099e2e1 upstream.
      
      Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
      would crash if you decided to run a host command that uses PEBS, like
        perf record -e 'cpu/mem-stores/pp' -a
      
      This happens because KVM is using VMX MSR switching to disable PEBS, but
      SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
      isn't safe:
        When software needs to reconfigure PEBS facilities, it should allow a
        quiescent period between stopping the prior event counting and setting
        up a new PEBS event. The quiescent period is to allow any latent
        residual PEBS records to complete its capture at their previously
        specified buffer address (provided by IA32_DS_AREA).
      
      There might not be a quiescent period after the MSR switch, so a CPU
      ends up using host's MSR_IA32_DS_AREA to access an area in guest's
      memory.  (Or MSR switching is just buggy on some models.)
      
      The guest can learn something about the host this way:
      If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
      in #PF where we leak host's MSR_IA32_DS_AREA through CR2.
      
      After that, a malicious guest can map and configure memory where
      MSR_IA32_DS_AREA is pointing and can therefore get an output from
      host's tracing.
      
      This is not a critical leak as the host must initiate with PEBS tracing
      and I have not been able to get a record from more than one instruction
      before vmentry in vmx_vcpu_run() (that place has most registers already
      overwritten with guest's).
      
      We could disable PEBS just few instructions before vmentry, but
      disabling it earlier shouldn't affect host tracing too much.
      We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
      optimization isn't worth its code, IMO.
      
      (If you are implementing PEBS for guests, be sure to handle the case
       where both host and guest enable PEBS, because this patch doesn't.)
      
      Fixes: 26a4f3c0 ("perf/x86: disable PEBS on a guest entry.")
      Reported-by: default avatarJiří Olša <jolsa@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bbe5fa4
    • David Matlack's avatar
      kvm: cap halt polling at exactly halt_poll_ns · c9e1bbef
      David Matlack authored
      commit 313f636d upstream.
      
      When growing halt-polling, there is no check that the poll time exceeds
      the limit. It's possible for vcpu->halt_poll_ns grow once past
      halt_poll_ns, and stay there until a halt which takes longer than
      vcpu->halt_poll_ns. For example, booting a Linux guest with
      halt_poll_ns=11000:
      
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Fixes: aca6ff29Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9e1bbef
    • Krzysztof Hałasa's avatar
      PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr() · 431c9f01
      Krzysztof Hałasa authored
      commit 54c6e2dd upstream.
      
      pci_create_root_bus() passes a "parent" pointer to
      pci_bus_assign_domain_nr().  When CONFIG_PCI_DOMAINS_GENERIC is defined,
      pci_bus_assign_domain_nr() dereferences that pointer.  Many callers of
      pci_create_root_bus() supply a NULL "parent" pointer, which leads to a NULL
      pointer dereference error.
      
      7c674700 ("PCI: Move domain assignment from arm64 to generic code")
      moved the "parent" dereference from arm64 to generic code.  Only arm64 used
      that code (because only arm64 defined CONFIG_PCI_DOMAINS_GENERIC), and it
      always supplied a valid "parent" pointer.  Other arches supplied NULL
      "parent" pointers but didn't defined CONFIG_PCI_DOMAINS_GENERIC, so they
      used a no-op version of pci_bus_assign_domain_nr().
      
      8c7d1474 ("ARM/PCI: Move to generic PCI domains") defined
      CONFIG_PCI_DOMAINS_GENERIC on ARM, and many ARM platforms use
      pci_common_init(), which supplies a NULL "parent" pointer.
      These platforms (cns3xxx, dove, footbridge, iop13xx, etc.) crash
      with a NULL pointer dereference like this while probing PCI:
      
        Unable to handle kernel NULL pointer dereference at virtual address 000000a4
        PC is at pci_bus_assign_domain_nr+0x10/0x84
        LR is at pci_create_root_bus+0x48/0x2e4
        Kernel panic - not syncing: Attempted to kill init!
      
      [bhelgaas: changelog, add "Reported:" and "Fixes:" tags]
      Reported: http://forum.doozan.com/read.php?2,17868,22070,quote=1
      Fixes: 8c7d1474 ("ARM/PCI: Move to generic PCI domains")
      Fixes: 7c674700 ("PCI: Move domain assignment from arm64 to generic code")
      Signed-off-by: default avatarKrzysztof Hałasa <khalasa@piap.pl>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      431c9f01
    • Lokesh Vutla's avatar
      ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property · 6327a31a
      Lokesh Vutla authored
      commit 2e18f5a1 upstream.
      
      Introduce a dt property, ti,no-idle, that prevents an IP to idle at any
      point. This is to handle Errata i877, which tells that GMAC clocks
      cannot be disabled.
      Acked-by: default avatarRoger Quadros <rogerq@ti.com>
      Tested-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarLokesh Vutla <lokeshvutla@ti.com>
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarDave Gerlach <d-gerlach@ti.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6327a31a
    • Mugunthan V N's avatar
      ARM: dts: dra7: do not gate cpsw clock due to errata i877 · 958df498
      Mugunthan V N authored
      commit 0f514e69 upstream.
      
      Errata id: i877
      
      Description:
      ------------
      The RGMII 1000 Mbps Transmit timing is based on the output clock
      (rgmiin_txc) being driven relative to the rising edge of an internal
      clock and the output control/data (rgmiin_txctl/txd) being driven relative
      to the falling edge of an internal clock source. If the internal clock
      source is allowed to be static low (i.e., disabled) for an extended period
      of time then when the clock is actually enabled the timing delta between
      the rising edge and falling edge can change over the lifetime of the
      device. This can result in the device switching characteristics degrading
      over time, and eventually failing to meet the Data Manual Delay Time/Skew
      specs.
      To maintain RGMII 1000 Mbps IO Timings, SW should minimize the
      duration that the Ethernet internal clock source is disabled. Note that
      the device reset state for the Ethernet clock is "disabled".
      Other RGMII modes (10 Mbps, 100Mbps) are not affected
      
      Workaround:
      -----------
      If the SoC Ethernet interface(s) are used in RGMII mode at 1000 Mbps,
      SW should minimize the time the Ethernet internal clock source is disabled
      to a maximum of 200 hours in a device life cycle. This is done by enabling
      the clock as early as possible in IPL (QNX) or SPL/u-boot (Linux/Android)
      by setting the register CM_GMAC_CLKSTCTRL[1:0]CLKTRCTRL = 0x2:SW_WKUP.
      
      So, do not allow to gate the cpsw clocks using ti,no-idle property in
      cpsw node assuming 1000 Mbps is being used all the time. If someone does
      not need 1000 Mbps and wants to gate clocks to cpsw, this property needs
      to be deleted in their respective board files.
      Signed-off-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarLokesh Vutla <lokeshvutla@ti.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      958df498
    • Thomas Petazzoni's avatar
      ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window · 744744e2
      Thomas Petazzoni authored
      commit d7d5a43c upstream.
      
      When the Crypto SRAM mappings were added to the Device Tree files
      describing the Armada XP boards in commit c466d997 ("ARM: mvebu:
      define crypto SRAM ranges for all armada-xp boards"), the fact that
      those mappings were overlaping with the PCIe memory aperture was
      overlooked. Due to this, we currently have for all Armada XP platforms
      a situation that looks like this:
      
      Memory mapping on Armada XP boards with internal registers at
      0xf1000000:
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory aperture
       - 0xf8100000 -> 0xf8110000	64KB	Crypto SRAM #0	=> OVERLAPS WITH PCIE !
       - 0xf8110000 -> 0xf8120000	64KB	Crypto SRAM #1	=> OVERLAPS WITH PCIE !
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O aperture
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      The overlap means that when PCIe devices are added, depending on their
      memory window needs, they might or might not be mapped into the
      physical address space. Indeed, they will not be mapped if the area
      allocated in the PCIe memory aperture by the PCI core overlaps with
      one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB
      of PCIe memory will see its PCIe memory window allocated from
      0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due
      to this, the PCIe window is not created, and any attempt to access the
      PCIe window makes the kernel explode:
      
      [    3.302213] igb: Copyright (c) 2007-2014 Intel Corporation.
      [    3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143)
      [    3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window
      [    3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22
      [    3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018
      
      This problem does not occur on Armada 370 boards, because we use the
      following memory mapping (for boards that have internal registers at
      0xf1000000):
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0 => OK !
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      Obviously, the solution is to align the location of the Crypto SRAM
      mappings of Armada XP to be similar with the ones on Armada 370, i.e
      have them between the "internal registers" area and the beginning of
      the PCIe aperture.
      
      However, we have a special case with the OpenBlocks AX3-4 platform,
      which has a 128 MB NOR flash. Currently, this NOR flash is mapped from
      0xf0000000 to 0xf8000000. This is possible because on OpenBlocks
      AX3-4, the internal registers are not at 0xf1000000. And this explains
      why the Crypto SRAM mappings were not configured at the same place on
      Armada XP.
      
      Hence, the solution is two-fold:
      
       (1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from
           0xe8000000 to 0xf0000000. This frees the 0xf0000000 ->
           0xf80000000 space.
      
       (2) Move the Crypto SRAM mappings on Armada XP to be similar to
           Armada 370 (except of course that Armada XP has two Crypto SRAM
           and not one).
      
      After this patch, the memory mapping on Armada XP boards with
      registers at 0xf1 is:
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0
       - 0xf1110000 -> 0xf1120000	64KB	Crypto SRAM #1
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      And the memory mapping for the special case of the OpenBlocks AX3-4
      (internal registers at 0xd0000000, NOR of 128 MB):
      
       - 0x00000000 -> 0xc0000000	3G 	RAM
       - 0xd0000000 -> 0xd1000000	1M	internal registers
       - 0xe800000  -> 0xf0000000	128M	NOR flash
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0
       - 0xf1110000 -> 0xf1120000	64KB	Crypto SRAM #1
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      Fixes: c466d997 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards")
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Cc: Phil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      744744e2
    • Ard Biesheuvel's avatar
      arm64: account for sparsemem section alignment when choosing vmemmap offset · 97142f30
      Ard Biesheuvel authored
      commit 36e5cd6b upstream.
      
      Commit dfd55ad8 ("arm64: vmemmap: use virtual projection of linear
      region") fixed an issue where the struct page array would overflow into the
      adjacent virtual memory region if system RAM was placed so high up in
      physical memory that its addresses were not representable in the build time
      configured virtual address size.
      
      However, the fix failed to take into account that the vmemmap region needs
      to be relatively aligned with respect to the sparsemem section size, so that
      a sequence of page structs corresponding with a sparsemem section in the
      linear region appears naturally aligned in the vmemmap region.
      
      So round up vmemmap to sparsemem section size. Since this essentially moves
      the projection of the linear region up in memory, also revert the reduction
      of the size of the vmemmap region.
      
      Fixes: dfd55ad8 ("arm64: vmemmap: use virtual projection of linear region")
      Tested-by: default avatarMark Langsdorf <mlangsdo@redhat.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Tested-by: default avatarRobert Richter <rrichter@cavium.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97142f30
  2. 09 Mar, 2016 11 commits