1. 13 Mar, 2022 6 commits
    • Jason A. Donenfeld's avatar
      virt: vmgenid: notify RNG of VM fork and supply generation ID · af6b54e2
      Jason A. Donenfeld authored
      VM Generation ID is a feature from Microsoft, described at
      <https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
      Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
      <https://aka.ms/win10rng>, as:
      
          If the OS is running in a VM, there is a problem that most
          hypervisors can snapshot the state of the machine and later rewind
          the VM state to the saved state. This results in the machine running
          a second time with the exact same RNG state, which leads to serious
          security problems.  To reduce the window of vulnerability, Windows
          10 on a Hyper-V VM will detect when the VM state is reset, retrieve
          a unique (not random) value from the hypervisor, and reseed the root
          RNG with that unique value.  This does not eliminate the
          vulnerability, but it greatly reduces the time during which the RNG
          system will produce the same outputs as it did during a previous
          instantiation of the same VM state.
      
      Linux has the same issue, and given that vmgenid is supported already by
      multiple hypervisors, we can implement more or less the same solution.
      So this commit wires up the vmgenid ACPI notification to the RNG's newly
      added add_vmfork_randomness() function.
      
      It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
      After setting that, use `savevm` in the monitor to save the VM state,
      then quit QEMU, start it again, and use `loadvm`. That will trigger this
      driver's notify function, which hands the new UUID to the RNG. This is
      described in <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.
      And there are hooks for this in libvirt as well, described in
      <https://libvirt.org/formatdomain.html#general-metadata>.
      
      Note, however, that the treatment of this as a UUID is considered to be
      an accidental QEMU nuance, per
      <https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt>,
      so this driver simply treats these bytes as an opaque 128-bit binary
      blob, as per the spec. This doesn't really make a difference anyway,
      considering that's how it ends up when handed to the RNG in the end.
      
      Cc: Alexander Graf <graf@amazon.com>
      Cc: Adrian Catangiu <adrian@parity.io>
      Cc: Daniel P. Berrangé <berrange@redhat.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Tested-by: Souradeep Chakrabarti <souradch.linux@gmail.com> # With Hyper-V's virtual hardware
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      af6b54e2
    • Alexander Graf's avatar
      ACPI: allow longer device IDs · d273845e
      Alexander Graf authored
      We create a list of ACPI "PNP" IDs which contains _HID, _CID, and CLS
      entries of the respective devices. However, when making structs for
      matching, we squeeze those IDs into acpi_device_id, which only has 9
      bytes space to store the identifier. The subsystem actually captures the
      full length of the IDs, and the modalias has the full length, but this
      struct we use for matching is limited. It originally had 16 bytes, but
      was changed to only have 9 in 6543becf ("mod/file2alias: make
      modalias generation safe for cross compiling"), presumably on the theory
      that it would match the ACPI spec so it didn't matter.
      
      Unfortunately, while most people adhere to the ACPI specs, Microsoft
      decided that its VM Generation Counter device [1] should only be
      identifiable by _CID with a value of "VM_Gen_Counter", which is longer
      than 9 characters.
      
      To allow device drivers to match identifiers that exceed the 9 byte
      limit, this simply ups the length to 16, just like it was before the
      aforementioned commit. Empirical testing indicates that this
      doesn't actually increase vmlinux size on 64-bit, because the ulong in
      the same struct caused there to be 7 bytes of padding anyway, and when
      doing a s/M/Y/g i386_defconfig build, the bzImage only increased by
      0.0055%, so negligible.
      
      This patch is a prerequisite to add support for VMGenID in Linux, the
      subsequent patch in this series. It has been confirmed to also work on
      the udev/modalias side in userspace.
      
      [1] https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docxSigned-off-by: default avatarAlexander Graf <graf@amazon.com>
      Co-developed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      [Jason: reworked commit message a bit, went with len=16 approach.]
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      d273845e
    • Jason A. Donenfeld's avatar
      random: add mechanism for VM forks to reinitialize crng · ae099e8e
      Jason A. Donenfeld authored
      When a VM forks, we must immediately mix in additional information to
      the stream of random output so that two forks or a rollback don't
      produce the same stream of random numbers, which could have catastrophic
      cryptographic consequences. This commit adds a simple API, add_vmfork_
      randomness(), for that, by force reseeding the crng.
      
      This has the added benefit of also draining the entropy pool and setting
      its timer back, so that any old entropy that was there prior -- which
      could have already been used by a different fork, or generally gone
      stale -- does not contribute to the accounting of the next 256 bits.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      ae099e8e
    • Jason A. Donenfeld's avatar
      random: don't let 644 read-only sysctls be written to · 77553cf8
      Jason A. Donenfeld authored
      We leave around these old sysctls for compatibility, and we keep them
      "writable" for compatibility, but even after writing, we should keep
      reporting the same value. This is consistent with how userspaces tend to
      use sysctl_random_write_wakeup_bits, writing to it, and then later
      reading from it and using the value.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      77553cf8
    • Jason A. Donenfeld's avatar
      random: give sysctl_random_min_urandom_seed a more sensible value · d0efdf35
      Jason A. Donenfeld authored
      This isn't used by anything or anywhere, but we can't delete it due to
      compatibility. So at least give it the correct value of what it's
      supposed to be instead of a garbage one.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      d0efdf35
    • Jason A. Donenfeld's avatar
      random: block in /dev/urandom · 6f98a4bf
      Jason A. Donenfeld authored
      This topic has come up countless times, and usually doesn't go anywhere.
      This time I thought I'd bring it up with a slightly narrower focus,
      updated for some developments over the last three years: we finally can
      make /dev/urandom always secure, in light of the fact that our RNG is
      now always seeded.
      
      Ever since Linus' 50ee7529 ("random: try to actively add entropy
      rather than passively wait for it"), the RNG does a haveged-style jitter
      dance around the scheduler, in order to produce entropy (and credit it)
      for the case when we're stuck in wait_for_random_bytes(). How ever you
      feel about the Linus Jitter Dance is beside the point: it's been there
      for three years and usually gets the RNG initialized in a second or so.
      
      As a matter of fact, this is what happens currently when people use
      getrandom(). It's already there and working, and most people have been
      using it for years without realizing.
      
      So, given that the kernel has grown this mechanism for seeding itself
      from nothing, and that this procedure happens pretty fast, maybe there's
      no point any longer in having /dev/urandom give insecure bytes. In the
      past we didn't want the boot process to deadlock, which was
      understandable. But now, in the worst case, a second goes by, and the
      problem is resolved. It seems like maybe we're finally at a point when
      we can get rid of the infamous "urandom read hole".
      
      The one slight drawback is that the Linus Jitter Dance relies on random_
      get_entropy() being implemented. The first lines of try_to_generate_
      entropy() are:
      
      	stack.now = random_get_entropy();
      	if (stack.now == random_get_entropy())
      		return;
      
      On most platforms, random_get_entropy() is simply aliased to get_cycles().
      The number of machines without a cycle counter or some other
      implementation of random_get_entropy() in 2022, which can also run a
      mainline kernel, and at the same time have a both broken and out of date
      userspace that relies on /dev/urandom never blocking at boot is thought
      to be exceedingly low. And to be clear: those museum pieces without
      cycle counters will continue to run Linux just fine, and even
      /dev/urandom will be operable just like before; the RNG just needs to be
      seeded first through the usual means, which should already be the case
      now.
      
      On systems that really do want unseeded randomness, we already offer
      getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
      their hash tables at boot. Nothing in this commit would affect
      GRND_INSECURE, and it remains the means of getting those types of random
      numbers.
      
      This patch goes a long way toward eliminating a long overdue userspace
      crypto footgun. After several decades of endless user confusion, we will
      finally be able to say, "use any single one of our random interfaces and
      you'll be fine. They're all the same. It doesn't matter." And that, I
      think, is really something. Finally all of those blog posts and
      disagreeing forums and contradictory articles will all become correct
      about whatever they happened to recommend, and along with it, a whole
      class of vulnerabilities eliminated.
      
      With very minimal downside, we're finally in a position where we can
      make this change.
      
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      6f98a4bf
  2. 28 Feb, 2022 3 commits
    • Jason A. Donenfeld's avatar
      random: do crng pre-init loading in worker rather than irq · c2a7de4f
      Jason A. Donenfeld authored
      Taking spinlocks from IRQ context is generally problematic for
      PREEMPT_RT. That is, in part, why we take trylocks instead. However, a
      spin_try_lock() is also problematic since another spin_lock() invocation
      can potentially PI-boost the wrong task, as the spin_try_lock() is
      invoked from an IRQ-context, so the task on CPU (random task or idle) is
      not the actual owner.
      
      Additionally, by deferring the crng pre-init loading to the worker, we
      can use the cryptographic hash function rather than xor, which is
      perhaps a meaningful difference when considering this data has only been
      through the relatively weak fast_mix() function.
      
      The biggest downside of this approach is that the pre-init loading is
      now deferred until later, which means things that need random numbers
      after interrupts are enabled, but before workqueues are running -- or
      before this particular worker manages to run -- are going to get into
      trouble. Hopefully in the real world, this window is rather small,
      especially since this code won't run until 64 interrupts had occurred.
      
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Acked-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      c2a7de4f
    • Jason A. Donenfeld's avatar
      random: unify cycles_t and jiffies usage and types · abded93e
      Jason A. Donenfeld authored
      random_get_entropy() returns a cycles_t, not an unsigned long, which is
      sometimes 64 bits on various 32-bit platforms, including x86.
      Conversely, jiffies is always unsigned long. This commit fixes things to
      use cycles_t for fields that use random_get_entropy(), named "cycles",
      and unsigned long for fields that use jiffies, named "now". It's also
      good to mix in a cycles_t and a jiffies in the same way for both
      add_device_randomness and add_timer_randomness, rather than using xor in
      one case. Finally, we unify the order of these volatile reads, always
      reading the more precise cycles counter, and then jiffies, so that the
      cycle counter is as close to the event as possible.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      abded93e
    • Jason A. Donenfeld's avatar
      random: cleanup UUID handling · 64276a99
      Jason A. Donenfeld authored
      Rather than hard coding various lengths, we can use the right constants.
      Strings should be `char *` while buffers should be `u8 *`. Rather than
      have a nonsensical and unused maxlength, just remove it. Finally, use
      snprintf instead of sprintf, just out of good hygiene.
      
      As well, remove the old comment about returning a binary UUID via the
      binary sysctl syscall. That syscall was removed from the kernel in 5.5,
      and actually, the "uuid_strategy" function and related infrastructure
      for even serving it via the binary sysctl syscall was removed with
      894d2491 ("sysctl drivers: Remove dead binary sysctl support") back
      in 2.6.33.
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      64276a99
  3. 24 Feb, 2022 2 commits
  4. 21 Feb, 2022 29 commits