1. 14 Jan, 2015 4 commits
  2. 13 Jan, 2015 13 commits
    • Andy Lutomirski's avatar
      x86_64, vdso: Fix the vdso address randomization algorithm · ff845a15
      Andy Lutomirski authored
      commit 394f56fe upstream.
      
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ff845a15
    • Jan Kara's avatar
      isofs: Fix unchecked printing of ER records · 24c7fcc3
      Jan Kara authored
      commit 4e202462 upstream.
      
      We didn't check length of rock ridge ER records before printing them.
      Thus corrupted isofs image can cause us to access and print some memory
      behind the buffer with obvious consequences.
      Reported-and-tested-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      24c7fcc3
    • Sasha Levin's avatar
      KEYS: close race between key lookup and freeing · 55036ae4
      Sasha Levin authored
      commit a3a87844 upstream.
      
      When a key is being garbage collected, it's key->user would get put before
      the ->destroy() callback is called, where the key is removed from it's
      respective tracking structures.
      
      This leaves a key hanging in a semi-invalid state which leaves a window open
      for a different task to try an access key->user. An example is
      find_keyring_by_name() which would dereference key->user for a key that is
      in the process of being garbage collected (where key->user was freed but
      ->destroy() wasn't called yet - so it's still present in the linked list).
      
      This would cause either a panic, or corrupt memory.
      
      Fixes CVE-2014-9529.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55036ae4
    • Sven Eckelmann's avatar
      batman-adv: Calculate extra tail size based on queued fragments · e6e75eaa
      Sven Eckelmann authored
      commit 5b6698b0 upstream.
      
      The fragmentation code was replaced in 610bfc6b
      ("batman-adv: Receive fragmented packets and merge"). The new code provided a
      mostly unused parameter skb for the merging function. It is used inside the
      function to calculate the additionally needed skb tailroom. But instead of
      increasing its own tailroom, it is only increasing the tailroom of the first
      queued skb. This is not correct in some situations because the first queued
      entry can be a different one than the parameter.
      
      An observed problem was:
      
      1. packet with size 104, total_size 1464, fragno 1 was received
         - packet is queued
      2. packet with size 1400, total_size 1464, fragno 0 was received
         - packet is queued at the end of the list
      3. enough data was received and can be given to the merge function
         (1464 == (1400 - 20) + (104 - 20))
         - merge functions gets 1400 byte large packet as skb argument
      4. merge function gets first entry in queue (104 byte)
         - stored as skb_out
      5. merge function calculates the required extra tail as total_size - skb->len
         - pskb_expand_head tail of skb_out with 64 bytes
      6. merge function tries to squeeze the extra 1380 bytes from the second queued
         skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out
      
      Instead calculate the extra required tail bytes for skb_out also using skb_out
      instead of using the parameter skb. The skb parameter is only used to get the
      total_size from the last received packet. This is also the total_size used to
      decide that all fragments were received.
      Reported-by: default avatarPhilipp Psurek <philipp.psurek@gmail.com>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarMartin Hundebøll <martin@hundeboll.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e6e75eaa
    • Jan Kara's avatar
      isofs: Fix infinite looping over CE entries · f5034d91
      Jan Kara authored
      commit f54e18f1 upstream.
      
      Rock Ridge extensions define so called Continuation Entries (CE) which
      define where is further space with Rock Ridge data. Corrupted isofs
      image can contain arbitrarily long chain of these, including a one
      containing loop and thus causing kernel to end in an infinite loop when
      traversing these entries.
      
      Limit the traversal to 32 entries which should be more than enough space
      to store all the Rock Ridge data.
      Reported-by: default avatarP J P <ppandit@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f5034d91
    • Andy Lutomirski's avatar
      x86_64, switch_to(): Load TLS descriptors before switching DS and ES · 39f1a2d2
      Andy Lutomirski authored
      commit f647d7c1 upstream.
      
      Otherwise, if buggy user code points DS or ES into the TLS
      array, they would be corrupted after a context switch.
      
      This also significantly improves the comments and documents some
      gotchas in the code.
      
      Before this patch, the both tests below failed.  With this
      patch, the es test passes, although the gsbase test still fails.
      
       ----- begin es test -----
      
      /*
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned short GDT3(int idx)
      {
      	return (idx << 3) | 3;
      }
      
      static int create_tls(int idx, unsigned int base)
      {
      	struct user_desc desc = {
      		.entry_number    = idx,
      		.base_addr       = base,
      		.limit           = 0xfffff,
      		.seg_32bit       = 1,
      		.contents        = 0, /* Data, grow-up */
      		.read_exec_only  = 0,
      		.limit_in_pages  = 1,
      		.seg_not_present = 0,
      		.useable         = 0,
      	};
      
      	if (syscall(SYS_set_thread_area, &desc) != 0)
      		err(1, "set_thread_area");
      
      	return desc.entry_number;
      }
      
      int main()
      {
      	int idx = create_tls(-1, 0);
      	printf("Allocated GDT index %d\n", idx);
      
      	unsigned short orig_es;
      	asm volatile ("mov %%es,%0" : "=rm" (orig_es));
      
      	int errors = 0;
      	int total = 1000;
      	for (int i = 0; i < total; i++) {
      		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
      		usleep(100);
      
      		unsigned short es;
      		asm volatile ("mov %%es,%0" : "=rm" (es));
      		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
      		if (es != GDT3(idx)) {
      			if (errors == 0)
      				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
      				       GDT3(idx), es);
      			errors++;
      		}
      	}
      
      	if (errors) {
      		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
      		return 1;
      	} else {
      		printf("[OK]\tES was preserved\n");
      		return 0;
      	}
      }
      
       ----- end es test -----
      
       ----- begin gsbase test -----
      
      /*
       * gsbase.c, a gsbase test
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned char *testptr, *testptr2;
      
      static unsigned char read_gs_testvals(void)
      {
      	unsigned char ret;
      	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
      	return ret;
      }
      
      int main()
      {
      	int errors = 0;
      
      	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr == MAP_FAILED)
      		err(1, "mmap");
      
      	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr2 == MAP_FAILED)
      		err(1, "mmap");
      
      	*testptr = 0;
      	*testptr2 = 1;
      
      	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
      		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
      		err(1, "ARCH_SET_GS");
      
      	usleep(100);
      
      	if (read_gs_testvals() == 1) {
      		printf("[OK]\tARCH_SET_GS worked\n");
      	} else {
      		printf("[FAIL]\tARCH_SET_GS failed\n");
      		errors++;
      	}
      
      	asm volatile ("mov %0,%%gs" : : "r" (0));
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tWriting 0 to gs worked\n");
      	} else {
      		printf("[FAIL]\tWriting 0 to gs failed\n");
      		errors++;
      	}
      
      	usleep(100);
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tgsbase is still zero\n");
      	} else {
      		printf("[FAIL]\tgsbase was corrupted\n");
      		errors++;
      	}
      
      	return errors == 0 ? 0 : 1;
      }
      
       ----- end gsbase test -----
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      39f1a2d2
    • Eric W. Biederman's avatar
      userns: Only allow the creator of the userns unprivileged mappings · be2aec30
      Eric W. Biederman authored
      commit f95d7918 upstream.
      
      If you did not create the user namespace and are allowed
      to write to uid_map or gid_map you should already have the necessary
      privilege in the parent user namespace to establish any mapping
      you want so this will not affect userspace in practice.
      
      Limiting unprivileged uid mapping establishment to the creator of the
      user namespace makes it easier to verify all credentials obtained with
      the uid mapping can be obtained without the uid mapping without
      privilege.
      
      Limiting unprivileged gid mapping establishment (which is temporarily
      absent) to the creator of the user namespace also ensures that the
      combination of uid and gid can already be obtained without privilege.
      
      This is part of the fix for CVE-2014-8989.
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      be2aec30
    • Eric W. Biederman's avatar
      userns: Document what the invariant required for safe unprivileged mappings. · be9edd8b
      Eric W. Biederman authored
      commit 0542f17b upstream.
      
      The rule is simple.  Don't allow anything that wouldn't be allowed
      without unprivileged mappings.
      
      It was previously overlooked that establishing gid mappings would
      allow dropping groups and potentially gaining permission to files and
      directories that had lesser permissions for a specific group than for
      all other users.
      
      This is the rule needed to fix CVE-2014-8989 and prevent any other
      security issues with new_idmap_permitted.
      
      The reason for this rule is that the unix permission model is old and
      there are programs out there somewhere that take advantage of every
      little corner of it.  So allowing a uid or gid mapping to be
      established without privielge that would allow anything that would not
      be allowed without that mapping will result in expectations from some
      code somewhere being violated.  Violated expectations about the
      behavior of the OS is a long way to say a security issue.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      be9edd8b
    • Eric W. Biederman's avatar
      userns: Check euid no fsuid when establishing an unprivileged uid mapping · 65007036
      Eric W. Biederman authored
      commit 80dd00a2 upstream.
      
      setresuid allows the euid to be set to any of uid, euid, suid, and
      fsuid.  Therefor it is safe to allow an unprivileged user to map
      their euid and use CAP_SETUID privileged with exactly that uid,
      as no new credentials can be obtained.
      
      I can not find a combination of existing system calls that allows setting
      uid, euid, suid, and fsuid from the fsuid making the previous use
      of fsuid for allowing unprivileged mappings a bug.
      
      This is part of a fix for CVE-2014-8989.
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      65007036
    • Andy Lutomirski's avatar
      x86/tls: Validate TLS entries to protect espfix · eb8b9652
      Andy Lutomirski authored
      commit 41bdc785 upstream.
      
      Installing a 16-bit RW data segment into the GDT defeats espfix.
      AFAICT this will not affect glibc, Wine, or dosemu at all.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      eb8b9652
    • Nadav Amit's avatar
      [3.13-stable only] KVM: x86: Fix far-jump to non-canonical check · 111fefa3
      Nadav Amit authored
      commit 7e46dddd upstream.
      
      [3.13-stable's first backport (f9bffe04) of this commit accidentally omitted
      part of the upstream patch (the WARN_ON fixes), supplied here.]
      
      Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
      jumps") introduced a bug that caused the fix to be incomplete.  Due to
      incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
      segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
      not trigger #GP.  As we know, this imposes a security problem.
      
      In addition, the condition for two warnings was incorrect.
      
      Fixes: d1442d85Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      111fefa3
    • Ronald Wahl's avatar
      usb: gadget: at91_udc: move prepare clk into process context · 0847cfb9
      Ronald Wahl authored
      commit b2ba27a5 upstream.
      
      Commit 76280832 (usb: gadget: at91_udc:
      prepare clk before calling enable) added clock preparation in interrupt
      context. This is not allowed as it might sleep. Also setting the clock
      rate is unsafe to call from there for the same reason. Move clock
      preparation and setting clock rate into process context (at91udc_probe).
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      [ kamal: backport to 3.13-stable: at91_udc.c moved ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0847cfb9
    • David Ertman's avatar
      e1000e: Fix no connectivity when driver loaded with cable out · b81e3d0a
      David Ertman authored
      commit b20a7744 upstream.
      
      In commit da1e2046, the flow for enabling/disabling an Si errata
      workaround (e1000_lv_jumbo_workaround_ich8lan) was changed to fix a problem
      with iAMT connections dropping on interface down with jumbo frames set.
      Part of this change was to move the function call disabling the workaround
      to e1000e_down() from the e1000_setup_rctl() function.  The mechanic for
      disabling of this workaround involves writing several MAC and PHY registers
      back to hardware defaults.
      
      After this commit, when the driver is loaded with the cable out, the PHY
      registers are not programmed with the correct default values.  This causes
      the device to be capable of transmitting packets, but is unable to recieve
      them until this workaround is called.
      
      The flow of e1000e's open code relies upon calling the above workaround to
      expicitly program these registers either with jumbo frame appropriate settings
      or h/w defaults on 82579 and newer hardware.
      
      Fix this issue by adding logic to e1000_setup_rctl() that not only calls
      e1000_lv_jumbo_workaround_ich8lan() when jumbo frames are set, to enable the
      workaround, but also calls this function to explicitly disable the workaround
      in the case that jumbo frames are not set.
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
      BugLink: http://bugs.launchpad.net/bugs/1400365Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b81e3d0a
  3. 09 Jan, 2015 1 commit
  4. 18 Dec, 2014 1 commit
  5. 15 Dec, 2014 11 commits
  6. 09 Dec, 2014 10 commits
    • Grygorii Strashko's avatar
      i2c: davinci: generate STP always when NACK is received · f9659683
      Grygorii Strashko authored
      commit 9ea359f7 upstream.
      
      According to I2C specification the NACK should be handled as follows:
      "When SDA remains HIGH during this ninth clock pulse, this is defined as the Not
      Acknowledge signal. The master can then generate either a STOP condition to
      abort the transfer, or a repeated START condition to start a new transfer."
      [I2C spec Rev. 6, 3.1.6: http://www.nxp.com/documents/user_manual/UM10204.pdf]
      
      Currently the Davinci i2c driver interrupts the transfer on receipt of a
      NACK but fails to send a STOP in some situations and so makes the bus
      stuck until next I2C IP reset (idle/enable).
      
      For example, the issue will happen during SMBus read transfer which
      consists from two i2c messages write command/address and read data:
      
      S Slave Address Wr A Command Code A Sr Slave Address Rd A D1..Dn A P
      <--- write -----------------------> <--- read --------------------->
      
      The I2C client device will send NACK if it can't recognize "Command Code"
      and it's expected from I2C master to generate STP in this case.
      But now, Davinci i2C driver will just exit with -EREMOTEIO and STP will
      not be generated.
      
      Hence, fix it by generating Stop condition (STP) always when NACK is received.
      
      This patch fixes Davinci I2C in the same way it was done for OMAP I2C
      commit cda2109a ("i2c: omap: query STP always when NACK is received").
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Reported-by: default avatarHein Tibosch <hein_tibosch@yahoo.es>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f9659683
    • Tejun Heo's avatar
      ahci: disable MSI on SAMSUNG 0xa800 SSD · 8de541ed
      Tejun Heo authored
      commit 2b21ef0a upstream.
      
      Just like 0x1600 which got blacklisted by 66a7cbc3 ("ahci: disable
      MSI instead of NCQ on Samsung pci-e SSDs on macbooks"), 0xa800 chokes
      on NCQ commands if MSI is enabled.  Disable MSI.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDominik Mierzejewski <dominik@greysector.net>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=89171Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8de541ed
    • Paul Mackerras's avatar
      slab: fix nodeid bounds check for non-contiguous node IDs · b462f598
      Paul Mackerras authored
      commit 7c3fbbdd upstream.
      
      The bounds check for nodeid in ____cache_alloc_node gives false
      positives on machines where the node IDs are not contiguous, leading to
      a panic at boot time.  For example, on a POWER8 machine the node IDs are
      typically 0, 1, 16 and 17.  This means that num_online_nodes() returns
      4, so when ____cache_alloc_node is called with nodeid = 16 the VM_BUG_ON
      triggers, like this:
      
        kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
        Call Trace:
          .____cache_alloc_node+0x5c/0x270 (unreliable)
          .kmem_cache_alloc_node_trace+0xdc/0x360
          .init_list+0x3c/0x128
          .kmem_cache_init+0x1dc/0x258
          .start_kernel+0x2a0/0x568
          start_here_common+0x20/0xa8
      
      To fix this, we instead compare the nodeid with MAX_NUMNODES, and
      additionally make sure it isn't negative (since nodeid is an int).  The
      check is there mainly to protect the array dereference in the get_node()
      call in the next line, and the array being dereferenced is of size
      MAX_NUMNODES.  If the nodeid is in range but invalid (for example if the
      node is off-line), the BUG_ON in the next line will catch that.
      
      Fixes: 14e50c6a ("mm: slab: Verify the nodeid passed to ____cache_alloc_node")
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b462f598
    • Daniel Forrest's avatar
      mm: fix anon_vma_clone() error treatment · e27dc77a
      Daniel Forrest authored
      commit c4ea95d7 upstream.
      
      Andrew Morton noticed that the error return from anon_vma_clone() was
      being dropped and replaced with -ENOMEM (which is not itself a bug
      because the only error return value from anon_vma_clone() is -ENOMEM).
      
      I did an audit of callers of anon_vma_clone() and discovered an actual
      bug where the error return was being lost.  In __split_vma(), between
      Linux 3.11 and 3.12 the code was changed so the err variable is used
      before the call to anon_vma_clone() and the default initial value of
      -ENOMEM is overwritten.  So a failure of anon_vma_clone() will return
      success since err at this point is now zero.
      
      Below is a patch which fixes this bug and also propagates the error
      return value from anon_vma_clone() in all cases.
      
      Fixes: ef0855d3 ("mm: mempolicy: turn vma_set_policy() into vma_dup_policy()")
      Signed-off-by: default avatarDaniel Forrest <dan.forrest@ssec.wisc.edu>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tim Hartrick <tim@edgecast.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e27dc77a
    • Hugh Dickins's avatar
      mm: fix swapoff hang after page migration and fork · 548ecdf7
      Hugh Dickins authored
      commit 2022b4d1 upstream.
      
      I've been seeing swapoff hangs in recent testing: it's cycling around
      trying unsuccessfully to find an mm for some remaining pages of swap.
      
      I have been exercising swap and page migration more heavily recently,
      and now notice a long-standing error in copy_one_pte(): it's trying to
      add dst_mm to swapoff's mmlist when it finds a swap entry, but is doing
      so even when it's a migration entry or an hwpoison entry.
      
      Which wouldn't matter much, except it adds dst_mm next to src_mm,
      assuming src_mm is already on the mmlist: which may not be so.  Then if
      pages are later swapped out from dst_mm, swapoff won't be able to find
      where to replace them.
      
      There's already a !non_swap_entry() test for stats: move that up before
      the swap_duplicate() and the addition to mmlist.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      548ecdf7
    • Andrew Morton's avatar
      drivers/input/evdev.c: don't kfree() a vmalloc address · e925c37d
      Andrew Morton authored
      commit 92788ac1 upstream.
      
      If kzalloc() failed and then evdev_open_device() fails, evdev_open()
      will pass a vmalloc'ed pointer to kfree.
      
      This might fix https://bugzilla.kernel.org/show_bug.cgi?id=88401, where
      there was a crash in kfree().
      Reported-by: default avatarChristian Casteyde <casteyde.christian@free.fr>
      Belatedly-Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Henrik Rydberg <rydberg@euromail.se>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e925c37d
    • Seth Forshee's avatar
      xen-netfront: Remove BUGs on paged skb data which crosses a page boundary · 30b9d4d1
      Seth Forshee authored
      commit 8d609725 upstream.
      
      These BUGs can be erroneously triggered by frags which refer to
      tail pages within a compound page. The data in these pages may
      overrun the hardware page while still being contained within the
      compound page, but since compound_order() evaluates to 0 for tail
      pages the assertion fails. The code already iterates through
      subsequent pages correctly in this scenario, so the BUGs are
      unnecessary and can be removed.
      
      Fixes: f36c3747 ("xen/netfront: handle compound page fragments on transmit")
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      30b9d4d1
    • Andrew Morton's avatar
      mm/vmpressure.c: fix race in vmpressure_work_fn() · 13d5f213
      Andrew Morton authored
      commit 91b57191 upstream.
      
      In some android devices, there will be a "divide by zero" exception.
      vmpr->scanned could be zero before spin_lock(&vmpr->sr_lock).
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=88051
      
      [akpm@linux-foundation.org: neaten]
      Reported-by: default avatarji_ang <ji_ang@163.com>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      13d5f213
    • Weijie Yang's avatar
      mm: frontswap: invalidate expired data on a dup-store failure · 2aef7caf
      Weijie Yang authored
      commit fb993fa1 upstream.
      
      If a frontswap dup-store failed, it should invalidate the expired page
      in the backend, or it could trigger some data corruption issue.
      Such as:
       1. use zswap as the frontswap backend with writeback feature
       2. store a swap page(version_1) to entry A, success
       3. dup-store a newer page(version_2) to the same entry A, fail
       4. use __swap_writepage() write version_2 page to swapfile, success
       5. zswap do shrink, writeback version_1 page to swapfile
       6. version_2 page is overwrited by version_1, data corrupt.
      
      This patch fixes this issue by invalidating expired data immediately
      when meet a dup-store failure.
      Signed-off-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2aef7caf
    • Petr Mladek's avatar
      drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 · dea632e1
      Petr Mladek authored
      commit f5475cc4 upstream.
      
      I was unable too boot 3.18.0-rc6 because of the following kernel
      panic in drm_calc_vbltimestamp_from_scanoutpos():
      
          [drm] Initialized drm 1.1.0 20060810
          [drm] radeon kernel modesetting enabled.
          [drm] initializing kernel modesetting (RV100 0x1002:0x515E 0x15D9:0x8080).
          [drm] register mmio base: 0xC8400000
          [drm] register mmio size: 65536
          radeon 0000:0b:01.0: VRAM: 128M 0x00000000D0000000 - 0x00000000D7FFFFFF (16M used)
          radeon 0000:0b:01.0: GTT: 512M 0x00000000B0000000 - 0x00000000CFFFFFFF
          [drm] Detected VRAM RAM=128M, BAR=128M
          [drm] RAM width 16bits DDR
          [TTM] Zone  kernel: Available graphics memory: 3829346 kiB
          [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
          [TTM] Initializing pool allocator
          [TTM] Initializing DMA pool allocator
          [drm] radeon: 16M of VRAM memory ready
          [drm] radeon: 512M of GTT memory ready.
          [drm] GART: num cpu pages 131072, num gpu pages 131072
          [drm] PCI GART of 512M enabled (table at 0x0000000037880000).
          radeon 0000:0b:01.0: WB disabled
          radeon 0000:0b:01.0: fence driver on ring 0 use gpu addr 0x00000000b0000000 and cpu addr 0xffff8800bbbfa000
          [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
          [drm] Driver supports precise vblank timestamp query.
          [drm] radeon: irq initialized.
          [drm] Loading R100 Microcode
          radeon 0000:0b:01.0: Direct firmware load for radeon/R100_cp.bin failed with error -2
          radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
          [drm:r100_cp_init] *ERROR* Failed to load firmware!
          radeon 0000:0b:01.0: failed initializing CP (-2).
          radeon 0000:0b:01.0: Disabling GPU acceleration
          [drm] radeon: cp finalized
          BUG: unable to handle kernel NULL pointer dereference at 000000000000025c
          IP: [<ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
          PGD 0
          Oops: 0000 [#1] SMP
          Modules linked in:
          CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc6-4-default #2649
          Hardware name: Supermicro X7DB8/X7DB8, BIOS 6.00 07/26/2006
          task: ffff880234da2010 ti: ffff880234da4000 task.ti: ffff880234da4000
          RIP: 0010:[<ffffffff8150423b>]  [<ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
          RSP: 0000:ffff880234da7918  EFLAGS: 00010086
          RAX: ffffffff81557890 RBX: 0000000000000000 RCX: ffff880234da7a48
          RDX: ffff880234da79f4 RSI: 0000000000000000 RDI: ffff880232e15000
          RBP: ffff880234da79b8 R08: 0000000000000000 R09: 0000000000000000
          R10: 000000000000000a R11: 0000000000000001 R12: ffff880232dda1c0
          R13: ffff880232e1518c R14: 0000000000000292 R15: ffff880232e15000
          FS:  0000000000000000(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 000000000000025c CR3: 0000000002014000 CR4: 00000000000007e0
          Stack:
           ffff880234da79d8 0000000000000286 ffff880232dcbc00 0000000000002480
           ffff880234da7958 0000000000000296 ffff880234da7998 ffffffff8151b51d
           ffff880234da7a48 0000000032dcbeb0 ffff880232dcbc00 ffff880232dcbc58
          Call Trace:
           [<ffffffff8151b51d>] ? drm_vma_offset_remove+0x1d/0x110
           [<ffffffff8152dc98>] radeon_get_vblank_timestamp_kms+0x38/0x60
           [<ffffffff8152076a>] ? ttm_bo_release_list+0xba/0x180
           [<ffffffff81503751>] drm_get_last_vbltimestamp+0x41/0x70
           [<ffffffff81503933>] vblank_disable_and_save+0x73/0x1d0
           [<ffffffff81106b2f>] ? try_to_del_timer_sync+0x4f/0x70
           [<ffffffff81505245>] drm_vblank_cleanup+0x65/0xa0
           [<ffffffff815604fa>] radeon_irq_kms_fini+0x1a/0x70
           [<ffffffff8156c07e>] r100_init+0x26e/0x410
           [<ffffffff8152ae3e>] radeon_device_init+0x7ae/0xb50
           [<ffffffff8152d57f>] radeon_driver_load_kms+0x8f/0x210
           [<ffffffff81506965>] drm_dev_register+0xb5/0x110
           [<ffffffff8150998f>] drm_get_pci_dev+0x8f/0x200
           [<ffffffff815291cd>] radeon_pci_probe+0xad/0xe0
           [<ffffffff8141a365>] local_pci_probe+0x45/0xa0
           [<ffffffff8141b741>] pci_device_probe+0xd1/0x130
           [<ffffffff81633dad>] driver_probe_device+0x12d/0x3e0
           [<ffffffff8163413b>] __driver_attach+0x9b/0xa0
           [<ffffffff816340a0>] ? __device_attach+0x40/0x40
           [<ffffffff81631cd3>] bus_for_each_dev+0x63/0xa0
           [<ffffffff8163378e>] driver_attach+0x1e/0x20
           [<ffffffff81633390>] bus_add_driver+0x180/0x240
           [<ffffffff81634914>] driver_register+0x64/0xf0
           [<ffffffff81419cac>] __pci_register_driver+0x4c/0x50
           [<ffffffff81509bf5>] drm_pci_init+0xf5/0x120
           [<ffffffff821dc871>] ? ttm_init+0x6a/0x6a
           [<ffffffff821dc908>] radeon_init+0x97/0xb5
           [<ffffffff810002fc>] do_one_initcall+0xbc/0x1f0
           [<ffffffff810e3278>] ? __wake_up+0x48/0x60
           [<ffffffff8218e256>] kernel_init_freeable+0x18a/0x215
           [<ffffffff8218d983>] ? initcall_blacklist+0xc0/0xc0
           [<ffffffff818a78f0>] ? rest_init+0x80/0x80
           [<ffffffff818a78fe>] kernel_init+0xe/0xf0
           [<ffffffff818c0c3c>] ret_from_fork+0x7c/0xb0
           [<ffffffff818a78f0>] ? rest_init+0x80/0x80
          Code: 45 ac 0f 88 a8 01 00 00 3b b7 d0 01 00 00 49 89 ff 0f 83 99 01 00 00 48 8b 47 20 48 8b 80 88 00 00 00 48 85 c0 0f 84 cd 01 00 00 <41> 8b b1 5c 02 00 00 41 8b 89 58 02 00 00 89 75 98 41 8b b1 60
          RIP  [<ffffffff8150423b>] drm_calc_vbltimestamp_from_scanoutpos+0x4b/0x320
           RSP <ffff880234da7918>
          CR2: 000000000000025c
          ---[ end trace ad2c0aadf48e2032 ]---
          Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
      
      It has helped me to add a NULL pointer check that was suggested at
      http://lists.freedesktop.org/archives/dri-devel/2014-October/070663.html
      
      I am not familiar with the code. But the change looks sane
      and we need something fast at this stage of 3.18 development.
      Suggested-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.cz>
      Tested-by: default avatarPetr Mladek <pmladek@suse.cz>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dea632e1