An error occurred fetching the project authors.
  1. 18 Aug, 2023 1 commit
  2. 05 Dec, 2022 1 commit
    • Jann Horn's avatar
      ipc/sem: Fix dangling sem_array access in semtimedop race · b52be557
      Jann Horn authored
      When __do_semtimedop() goes to sleep because it has to wait for a
      semaphore value becoming zero or becoming bigger than some threshold, it
      links the on-stack sem_queue to the sem_array, then goes to sleep
      without holding a reference on the sem_array.
      
      When __do_semtimedop() comes back out of sleep, one of two things must
      happen:
      
       a) We prove that the on-stack sem_queue has been disconnected from the
          (possibly freed) sem_array, making it safe to return from the stack
          frame that the sem_queue exists in.
      
       b) We stabilize our reference to the sem_array, lock the sem_array, and
          detach the sem_queue from the sem_array ourselves.
      
      sem_array has RCU lifetime, so for case (b), the reference can be
      stabilized inside an RCU read-side critical section by locklessly
      checking whether the sem_queue is still connected to the sem_array.
      
      However, the current code does the lockless check on sem_queue before
      starting an RCU read-side critical section, so the result of the
      lockless check immediately becomes useless.
      
      Fix it by doing rcu_read_lock() before the lockless check.  Now RCU
      ensures that if we observe the object being on our queue, the object
      can't be freed until rcu_read_unlock().
      
      This bug is only hittable on kernel builds with full preemption support
      (either CONFIG_PREEMPT or PREEMPT_DYNAMIC with preempt=full).
      
      Fixes: 370b262c ("ipc/sem: avoid idr tree lookup for interrupted semop")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b52be557
  3. 10 May, 2022 2 commits
  4. 04 Feb, 2022 1 commit
  5. 14 Sep, 2021 1 commit
  6. 03 Sep, 2021 1 commit
    • Vasily Averin's avatar
      memcg: enable accounting of ipc resources · 18319498
      Vasily Averin authored
      When user creates IPC objects it forces kernel to allocate memory for
      these long-living objects.
      
      It makes sense to account them to restrict the host's memory consumption
      from inside the memcg-limited container.
      
      This patch enables accounting for IPC shared memory segments, messages
      semaphores and semaphore's undo lists.
      
      Link: https://lkml.kernel.org/r/d6507b06-4df6-78f8-6c54-3ae86e3b5339@virtuozzo.comSigned-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Yutian Yang <nglaive@gmail.com>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18319498
  7. 20 Aug, 2021 1 commit
    • Arnd Bergmann's avatar
      ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation · bdec0145
      Arnd Bergmann authored
      sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To
      remove this one, expose the internal code of the actual implementation
      that operates on a kernel pointer and call it directly after copying.
      
      There should be no measurable impact on the normal execution of this
      function, and it makes the overly long function a little shorter, which
      may help readability.
      
      While reworking the oabi version, make it behave a little more like
      the native one, using kvmalloc_array() and restructure the code
      flow in a similar way.
      
      The naming of __do_semtimedop() is not very good, I hope someone can
      come up with a better name.
      
      One regression was spotted by kernel test robot <rong.a.chen@intel.com>
      and fixed before the first mailing list submission.
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      bdec0145
  8. 01 Jul, 2021 2 commits
  9. 23 May, 2021 1 commit
    • Varad Gautam's avatar
      ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry · a11ddb37
      Varad Gautam authored
      do_mq_timedreceive calls wq_sleep with a stack local address.  The
      sender (do_mq_timedsend) uses this address to later call pipelined_send.
      
      This leads to a very hard to trigger race where a do_mq_timedreceive
      call might return and leave do_mq_timedsend to rely on an invalid
      address, causing the following crash:
      
        RIP: 0010:wake_q_add_safe+0x13/0x60
        Call Trace:
         __x64_sys_mq_timedsend+0x2a9/0x490
         do_syscall_64+0x80/0x680
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f5928e40343
      
      The race occurs as:
      
      1. do_mq_timedreceive calls wq_sleep with the address of `struct
         ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
         holds a valid `struct ext_wait_queue *` as long as the stack has not
         been overwritten.
      
      2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
         do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
         __pipelined_op.
      
      3. Sender calls __pipelined_op::smp_store_release(&this->state,
         STATE_READY).  Here is where the race window begins.  (`this` is
         `ewq_addr`.)
      
      4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
         will see `state == STATE_READY` and break.
      
      5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
         to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
         stack.  (Although the address may not get overwritten until another
         function happens to touch it, which means it can persist around for an
         indefinite time.)
      
      6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
         `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
         the wake_q_add_safe call.  In the lucky case where nothing has
         overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
         In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
         bogus address as the receiver's task_struct causing the crash.
      
      do_mq_timedsend::__pipelined_op() should not dereference `this` after
      setting STATE_READY, as the receiver counterpart is now free to return.
      Change __pipelined_op to call wake_q_add_safe on the receiver's
      task_struct returned by get_task_struct, instead of dereferencing `this`
      which sits on the receiver's stack.
      
      As Manfred pointed out, the race potentially also exists in
      ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
      those in the same way.
      
      Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
      Fixes: c5b2cbdb ("ipc/mqueue.c: update/document memory barriers")
      Fixes: 8116b54e ("ipc/sem.c: document and update memory barriers")
      Fixes: 0d97a82b ("ipc/msg.c: update and document memory barriers")
      Signed-off-by: default avatarVarad Gautam <varad.gautam@suse.com>
      Reported-by: default avatarMatthias von Faber <matthias.vonfaber@aox-tech.de>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11ddb37
  10. 07 May, 2021 2 commits
  11. 23 Aug, 2020 1 commit
  12. 12 Aug, 2020 1 commit
  13. 21 Feb, 2020 1 commit
    • Ioanna Alifieraki's avatar
      Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" · edf28f40
      Ioanna Alifieraki authored
      This reverts commit a9795584.
      
      Commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage
      in exit_sem()") removes a lock that is needed.  This leads to a process
      looping infinitely in exit_sem() and can also lead to a crash.  There is
      a reproducer available in [1] and with the commit reverted the issue
      does not reproduce anymore.
      
      Using the reproducer found in [1] is fairly easy to reach a point where
      one of the child processes is looping infinitely in exit_sem between
      for(;;) and if (semid == -1) block, while it's trying to free its last
      sem_undo structure which has already been freed by freeary().
      
      Each sem_undo struct is on two lists: one per semaphore set (list_id)
      and one per process (list_proc).  The list_id list tracks undos by
      semaphore set, and the list_proc by process.
      
      Undo structures are removed either by freeary() or by exit_sem().  The
      freeary function is invoked when the user invokes a syscall to remove a
      semaphore set.  During this operation freeary() traverses the list_id
      associated with the semaphore set and removes the undo structures from
      both the list_id and list_proc lists.
      
      For this case, exit_sem() is called at process exit.  Each process
      contains a struct sem_undo_list (referred to as "ulp") which contains
      the head for the list_proc list.  When the process exits, exit_sem()
      traverses this list to remove each sem_undo struct.  As in freeary(),
      whenever a sem_undo struct is removed from list_proc, it is also removed
      from the list_id list.
      
      Removing elements from list_id is safe for both exit_sem() and freeary()
      due to sem_lock().  Removing elements from list_proc is not safe;
      freeary() locks &un->ulp->lock when it performs
      list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
      removed by commit a9795584 ("ipc,sem: remove uneeded sem_undo_list
      lock usage in exit_sem()").
      
      This can result in the following situation while executing the
      reproducer [1] : Consider a child process in exit_sem() and the parent
      in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).
      
       - The list_proc for the child contains the last two undo structs A and
         B (the rest have been removed either by exit_sem() or freeary()).
      
       - The semid for A is 1 and semid for B is 2.
      
       - exit_sem() removes A and at the same time freeary() removes B.
      
       - Since A and B have different semid sem_lock() will acquire different
         locks for each process and both can proceed.
      
      The bug is that they remove A and B from the same list_proc at the same
      time because only freeary() acquires the ulp lock. When exit_sem()
      removes A it makes ulp->list_proc.next to point at B and at the same
      time freeary() removes B setting B->semid=-1.
      
      At the next iteration of for(;;) loop exit_sem() will try to remove B.
      
      The only way to break from for(;;) is for (&un->list_proc ==
      &ulp->list_proc) to be true which is not. Then exit_sem() will check if
      B->semid=-1 which is and will continue looping in for(;;) until the
      memory for B is reallocated and the value at B->semid is changed.
      
      At that point, exit_sem() will crash attempting to unlink B from the
      lists (this can be easily triggered by running the reproducer [1] a
      second time).
      
      To prove this scenario instrumentation was added to keep information
      about each sem_undo (un) struct that is removed per process and per
      semaphore set (sma).
      
                CPU0                                CPU1
        [caller holds sem_lock(sma for A)]      ...
        freeary()                               exit_sem()
        ...                                     ...
        ...                                     sem_lock(sma for B)
        spin_lock(A->ulp->lock)                 ...
        list_del_rcu(un_A->list_proc)           list_del_rcu(un_B->list_proc)
      
      Undo structures A and B have different semid and sem_lock() operations
      proceed.  However they belong to the same list_proc list and they are
      removed at the same time.  This results into ulp->list_proc.next
      pointing to the address of B which is already removed.
      
      After reverting commit a9795584 ("ipc,sem: remove uneeded
      sem_undo_list lock usage in exit_sem()") the issue was no longer
      reproducible.
      
      [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
      
      Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
      Fixes: a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
      Signed-off-by: default avatarIoanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <malat@debian.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      edf28f40
  14. 04 Feb, 2020 1 commit
  15. 26 Sep, 2019 1 commit
  16. 08 Mar, 2019 2 commits
  17. 06 Feb, 2019 1 commit
    • Arnd Bergmann's avatar
      y2038: syscalls: rename y2038 compat syscalls · 8dabe724
      Arnd Bergmann authored
      A lot of system calls that pass a time_t somewhere have an implementation
      using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
      been reworked so that this implementation can now be used on 32-bit
      architectures as well.
      
      The missing step is to redefine them using the regular SYSCALL_DEFINEx()
      to get them out of the compat namespace and make it possible to build them
      on 32-bit architectures.
      
      Any system call that ends in 'time' gets a '32' suffix on its name for
      that version, while the others get a '_time32' suffix, to distinguish
      them from the normal version, which takes a 64-bit time argument in the
      future.
      
      In this step, only 64-bit architectures are changed, doing this rename
      first lets us avoid touching the 32-bit architectures twice.
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      8dabe724
  18. 25 Jan, 2019 1 commit
    • Arnd Bergmann's avatar
      ipc: rename old-style shmctl/semctl/msgctl syscalls · 275f2214
      Arnd Bergmann authored
      The behavior of these system calls is slightly different between
      architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
      symbol. Most architectures that implement the split IPC syscalls don't set
      that symbol and only get the modern version, but alpha, arm, microblaze,
      mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
      
      For the architectures that so far only implement sys_ipc(), i.e. m68k,
      mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
      when adding the split syscalls, so we need to distinguish between the
      two groups of architectures.
      
      The method I picked for this distinction is to have a separate system call
      entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
      does not. The system call tables of the five architectures are changed
      accordingly.
      
      As an additional benefit, we no longer need the configuration specific
      definition for ipc_parse_version(), it always does the same thing now,
      but simply won't get called on architectures with the modern interface.
      
      A small downside is that on architectures that do set
      ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
      that are never called. They only add a few bytes of bloat, so it seems
      better to keep them compared to adding yet another Kconfig symbol.
      I considered adding new syscall numbers for the IPC_64 variants for
      consistency, but decided against that for now.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      275f2214
  19. 27 Aug, 2018 1 commit
    • Arnd Bergmann's avatar
      y2038: globally rename compat_time to old_time32 · 9afc5eee
      Arnd Bergmann authored
      Christoph Hellwig suggested a slightly different path for handling
      backwards compatibility with the 32-bit time_t based system calls:
      
      Rather than simply reusing the compat_sys_* entry points on 32-bit
      architectures unchanged, we get rid of those entry points and the
      compat_time types by renaming them to something that makes more sense
      on 32-bit architectures (which don't have a compat mode otherwise),
      and then share the entry points under the new name with the 64-bit
      architectures that use them for implementing the compatibility.
      
      The following types and interfaces are renamed here, and moved
      from linux/compat_time.h to linux/time32.h:
      
      old				new
      ---				---
      compat_time_t			old_time32_t
      struct compat_timeval		struct old_timeval32
      struct compat_timespec		struct old_timespec32
      struct compat_itimerspec	struct old_itimerspec32
      ns_to_compat_timeval()		ns_to_old_timeval32()
      get_compat_itimerspec64()	get_old_itimerspec32()
      put_compat_itimerspec64()	put_old_itimerspec32()
      compat_get_timespec64()		get_old_timespec32()
      compat_put_timespec64()		put_old_timespec32()
      
      As we already have aliases in place, this patch addresses only the
      instances that are relevant to the system call interface in particular,
      not those that occur in device drivers and other modules. Those
      will get handled separately, while providing the 64-bit version
      of the respective interfaces.
      
      I'm not renaming the timex, rusage and itimerval structures, as we are
      still debating what the new interface will look like, and whether we
      will need a replacement at all.
      
      This also doesn't change the names of the syscall entry points, which can
      be done more easily when we actually switch over the 32-bit architectures
      to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
      SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      9afc5eee
  20. 22 Aug, 2018 5 commits
  21. 27 Jul, 2018 1 commit
  22. 22 Jun, 2018 1 commit
    • NeilBrown's avatar
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown authored
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  23. 14 Jun, 2018 1 commit
  24. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: kvmalloc() -> kvmalloc_array() · 344476e1
      Kees Cook authored
      The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
      patch replaces cases of:
      
              kvmalloc(a * b, gfp)
      
      with:
              kvmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kvmalloc(a * b * c, gfp)
      
      with:
      
              kvmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvmalloc
      + kvmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(sizeof(THING) * C2, ...)
      |
        kvmalloc(sizeof(TYPE) * C2, ...)
      |
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(C1 * C2, ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      344476e1
  25. 20 Apr, 2018 4 commits
    • Arnd Bergmann's avatar
      y2038: ipc: Enable COMPAT_32BIT_TIME · b0d17578
      Arnd Bergmann authored
      Three ipc syscalls (mq_timedsend, mq_timedreceive and and semtimedop)
      take a timespec argument. After we move 32-bit architectures over to
      useing 64-bit time_t based syscalls, we need seperate entry points for
      the old 32-bit based interfaces.
      
      This changes the #ifdef guards for the existing 32-bit compat syscalls
      to check for CONFIG_COMPAT_32BIT_TIME instead, which will then be
      enabled on all existing 32-bit architectures.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      b0d17578
    • Arnd Bergmann's avatar
      y2038: ipc: Use __kernel_timespec · 21fc538d
      Arnd Bergmann authored
      This is a preparatation for changing over __kernel_timespec to 64-bit
      times, which involves assigning new system call numbers for mq_timedsend(),
      mq_timedreceive() and semtimedop() for compatibility with future y2038
      proof user space.
      
      The existing ABIs will remain available through compat code.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      21fc538d
    • Arnd Bergmann's avatar
      y2038: ipc: Report long times to user space · c2ab975c
      Arnd Bergmann authored
      The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended
      to contain extra fields for storing the upper bits of the time stamps,
      this patch does the other half of the job and and fills the new fields on
      32-bit architectures as well as 32-bit tasks running on a 64-bit kernel
      in compat mode.
      
      There should be no change for native 64-bit tasks.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      c2ab975c
    • Arnd Bergmann's avatar
      y2038: ipc: Use ktime_get_real_seconds consistently · 2a70b787
      Arnd Bergmann authored
      In some places, we still used get_seconds() instead of
      ktime_get_real_seconds(), and I'm changing the remaining ones now to
      all use ktime_get_real_seconds() so we use the full available range for
      timestamps instead of overflowing the 'unsigned long' return value in
      year 2106 on 32-bit kernels.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      2a70b787
  26. 11 Apr, 2018 1 commit
    • Davidlohr Bueso's avatar
      ipc/sem: introduce semctl(SEM_STAT_ANY) · a280d6dc
      Davidlohr Bueso authored
      There is a permission discrepancy when consulting shm ipc object
      metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl
      command.  The later does permission checks for the object vs S_IRUGO.
      As such there can be cases where EACCESS is returned via syscall but the
      info is displayed anyways in the procfs files.
      
      While this might have security implications via info leaking (albeit no
      writing to the sma metadata), this behavior goes way back and showing
      all the objects regardless of the permissions was most likely an
      overlook - so we are stuck with it.  Furthermore, modifying either the
      syscall or the procfs file can cause userspace programs to break (ie
      ipcs).  Some applications require getting the procfs info (without root
      privileges) and can be rather slow in comparison with a syscall -- up to
      500x in some reported cases for shm.
      
      This patch introduces a new SEM_STAT_ANY command such that the sem ipc
      object permissions are ignored, and only audited instead.  In addition,
      I've left the lsm security hook checks in place, as if some policy can
      block the call, then the user has no other choice than just parsing the
      procfs file.
      
      Link: http://lkml.kernel.org/r/20180215162458.10059-3-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarRobert Kettler <robert.kettler@outlook.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a280d6dc
  27. 02 Apr, 2018 3 commits