An error occurred fetching the project authors.
  1. 18 Oct, 2004 3 commits
    • Ingo Molnar's avatar
      [PATCH] fix & clean up zombie/dead task handling & preemption · d3069b4d
      Ingo Molnar authored
      This patch fixes all the preempt-after-task->state-is-TASK_DEAD problems we
      had.  Right now, the moment procfs does a down() that sleeps in
      proc_pid_flush() [it could] our TASK_DEAD state is zapped and we might be
      back to TASK_RUNNING to and we trigger this assert:
      
              schedule();
              BUG();
              /* Avoid "noreturn function does return".  */
              for (;;) ;
      
      I have split out TASK_ZOMBIE and TASK_DEAD into a separate p->exit_state
      field, to allow the detaching of exit-signal/parent/wait-handling from
      descheduling a dead task.  Dead-task freeing is done via PF_DEAD.
      
      Tested the patch on x86 SMP and UP, but all architectures should work
      fine.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d3069b4d
    • Roland McGrath's avatar
      [PATCH] exec: fix posix-timers leak and pending signal loss · fef60c1b
      Roland McGrath authored
      I've found some problems with exec and fixed them with this patch to
      de_thread.
      
      The second problem is that a multithreaded exec loses all pending signals. 
      This is violation of POSIX rules.  But a moment's thought will show it's
      also just not desireable: if you send a process a SIGTERM while it's in the
      middle of calling exec, you expect either the original program in that
      process or the new program being exec'd to handle that signal or be killed
      by it.  As it stands now, you can try to kill a process and have that
      signal just evaporate if it's multithreaded and calls exec just then.  I
      really don't know what the rationale was behind the de_thread code that
      allocates a new signal_struct.  It doesn't make any sense now.  The other
      code there ensures that the old signal_struct is no longer shared.  Except
      for posix-timers, all the state there is stuff you want to keep.  So my
      changes just keep the old structs when they are no longer shared, and all
      the right state is retained (after clearing out posix-timers).
      
      The final bug is that the cumulative statistics of dead threads and dead
      child processes are lost in the abandoned signal_struct.  This is also
      fixed by holding on to it instead of replacing it.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fef60c1b
    • Roland McGrath's avatar
      [PATCH] make rlimit settings per-process instead of per-thread · 31180071
      Roland McGrath authored
      POSIX specifies that the limit settings provided by getrlimit/setrlimit are
      shared by the whole process, not specific to individual threads.  This
      patch changes the behavior of those calls to comply with POSIX.
      
      I've moved the struct rlimit array from task_struct to signal_struct, as it
      has the correct sharing properties.  (This reduces kernel memory usage per
      thread in multithreaded processes by around 100/200 bytes for 32/64
      machines respectively.)  I took a fairly minimal approach to the locking
      issues with the newly shared struct rlimit array.  It turns out that all
      the code that is checking limits really just needs to look at one word at a
      time (one rlim_cur field, usually).  It's only the few places like
      getrlimit itself (and fork), that require atomicity in accessing a whole
      struct rlimit, so I just used a spin lock for them and no locking for most
      of the checks.  If it turns out that readers of struct rlimit need more
      atomicity where they are now cheap, or less overhead where they are now
      atomic (e.g. fork), then seqcount is certainly the right thing to use for
      them instead of readers using the spin lock.  Though it's in signal_struct,
      I didn't use siglock since the access to rlimits never needs to disable
      irqs and doesn't overlap with other siglock uses.  Instead of adding
      something new, I overloaded task_lock(task->group_leader) for this; it is
      used for other things that are not likely to happen simultaneously with
      limit tweaking.  To me that seems preferable to adding a word, but it would
      be trivial (and arguably cleaner) to add a separate lock for these users
      (or e.g. just use seqlock, which adds two words but is optimal for readers).
      
      Most of the changes here are just the trivial s/->rlim/->signal->rlim/. 
      
      I stumbled across what must be a long-standing bug, in reparent_to_init.
      It does:
      	memcpy(current->rlim, init_task.rlim, sizeof(*(current->rlim)));
      when surely it was intended to be:
      	memcpy(current->rlim, init_task.rlim, sizeof(current->rlim));
      As rlim is an array, the * in the sizeof expression gets the size of the
      first element, so this just changes the first limit (RLIMIT_CPU).  This is
      for kernel threads, where it's clear that resetting all the rlimits is what
      you want.  With that fixed, the setting of RLIMIT_FSIZE in nfsd is
      superfluous since it will now already have been reset to RLIM_INFINITY.
      
      The other subtlety is removing:
      	tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
      in exit_notify, which was to avoid a race signalling during self-reaping
      exit.  As the limit is now shared, a dying thread should not change it for
      others.  Instead, I avoid that race by checking current->state before the
      RLIMIT_CPU check.  (Adding one new conditional in that path is now required
      one way or another, since if not for this check there would also be a new
      race with self-reaping exit later on clearing current->signal that would
      have to be checked for.)
      
      The one loose end left by this patch is with process accounting.
      do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the
      accounting record.  I left this as it was, but it is now changing a limit
      that might be shared by other threads still running.  I left this in a
      dubious state because it seems to me that processing accounting may already
      be more generally a dubious state when it comes to NPTL threads.  I would
      think you would want one record per process, with aggregate data about all
      threads that ever lived in it, not a separate record for each thread.
      I don't use process accounting myself, but if anyone is interested in
      testing it out I could provide a patch to change it this way.
      
      One final note, this is not 100% to POSIX compliance in regards to rlimits.
      POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not
      to each individual thread.  I will provide patches later on to achieve that
      change, assuming this patch goes in first.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      31180071
  2. 17 Sep, 2004 1 commit
    • Roland McGrath's avatar
      [PATCH] fix posix-timers leak · c68f9a4d
      Roland McGrath authored
      Exec fails to clean up posix-timers.  This manifests itself in two ways, one
      worse than the other.  In the single-threaded case, it just fails to clear out
      the timers on exec.  POSIX says that exec clears out the timers from
      timer_create (though not the setitimer ones), so it's wrong that a lingering
      timer could fire after exec and kill the process with a signal it's not
      expecting.  In the multi-threaded case, it not only leaves lingering timers,
      but it leaks them entirely when it replaces signal_struct, so they will never
      be freed by the process exiting after that exec.  The new per-user
      RLIMIT_SIGPENDING actually limits the damage here, because a UID will fill up
      its quota with leaked timers and then never be able to use timer_create again
      (that's what my test program does).  But if you have many many untrusted UIDs,
      this leak could be considered a DoS risk.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c68f9a4d
  3. 15 Sep, 2004 1 commit
  4. 27 Aug, 2004 1 commit
    • William Lee Irwin III's avatar
      [PATCH] O(1) proc_pid_statm() · 6ac0a8d7
      William Lee Irwin III authored
      Merely removing down_read(&mm->mmap_sem) from task_vsize() is too
      half-assed to let stand. The following patch removes the vma iteration
      as well as the down_read(&mm->mmap_sem) from both task_mem() and
      task_statm() and callers for the CONFIG_MMU=y case in favor of
      accounting the various stats reported at the times of vma creation,
      destruction, and modification. Unlike the 2.4.x patches of the same
      name, this has no per-pte-modification overhead whatsoever.
      
      This patch quashes end user complaints of top(1) being slow as well as
      kernel hacker complaints of per-pte accounting overhead simultaneously.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6ac0a8d7
  5. 24 Aug, 2004 3 commits
    • Josh Aas's avatar
      [PATCH] Reduce bkl usage in do_coredump · 77dc05e7
      Josh Aas authored
      A patch that reduces bkl usage in do_coredump.  I don't see anywhere that
      it is necessary except for the call to format_corename, which is controlled
      via sysctl (sys_sysctl holds the bkl).
      
      Also make format_corename() static.
      Signed-off-by: default avatarJosh Aas <josha@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      77dc05e7
    • Ingo Molnar's avatar
      [PATCH] i386 virtual memory layout rework · 8913d55b
      Ingo Molnar authored
        Rework the i386 mm layout to allow applications to allocate more virtual
        memory, and larger contiguous chunks.
      
      
        - the patch is compatible with existing architectures that either make
          use of HAVE_ARCH_UNMAPPED_AREA or use the default mmap() allocator - there
          is no change in behavior.
      
        - 64-bit architectures can use the same mechanism to clean up 32-bit
          compatibility layouts: by defining HAVE_ARCH_PICK_MMAP_LAYOUT and
          providing a arch_pick_mmap_layout() function - which can then decide
          between various mmap() layout functions.
      
        - I also introduced a new personality bit (ADDR_COMPAT_LAYOUT) to signal
          older binaries that dont have PT_GNU_STACK.  x86 uses this to revert back
          to the stock layout.  I also changed x86 to not clear the personality bits
          upon exec(), like x86-64 already does.
      
        - once every architecture that uses HAVE_ARCH_UNMAPPED_AREA has defined
          its arch_pick_mmap_layout() function, we can get rid of
          HAVE_ARCH_UNMAPPED_AREA altogether, as a final cleanup.
      
        the new layout generation function (__get_unmapped_area()) got significant
        testing in FC1/2, so i'm pretty confident it's robust.
      
      
        Compiles & boots fine on an 'old' and on a 'new' x86 distro as well.
      
        The two known breakages were:
      
           http://www.redhatconfig.com/msg/67248.html
      
           [ 'cyzload' third-party utility broke. ]
      
           http://www.zipworld.com/au/~akpm/dde.tar.gz
      
           [ your editor broke :-) ]
      
        both were caused by application bugs that did:
      
      	int ret = malloc();
      
      	if (ret <= 0)
      		failure;
      
        such bugs are easy to spot if they happen, and if it happens it's possible
        to work it around immediately without having to change the binary, via the
        setarch patch.
      
        No other application has been found to be affected, and this particular
        change got pretty wide coverage already over RHEL3 and exec-shield, it's in
        use for more than a year.
      
      
        The setarch utility can be used to trigger the compatibility layout on
        x86, the following version has been patched to take the `-L' option:
      
       	http://people.redhat.com/mingo/flexible-mmap/setarch-1.4-2.tar.gz
      
        "setarch -L i386 <command>" will run the command with the old layout.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        The problem is in the flexible mmap patch: arch_get_unmapped_area_topdown
        is liable to give your mmap vm_start above TASK_SIZE with vm_end wrapped;
        which is confusing, and ends up as that BUG_ON(mm->map_count).
      
        The patch below stops that behaviour, but it's not the full solution:
        wilson_mmap_test -s 1000 then simply cannot allocate memory for the large
        mmap, whereas it works fine non-top-down.
      
        I think it's wrong to interpret a large or rlim_infinite stack rlimit as
        an inviolable request to reserve that much for the stack: it makes much less
        VM available than bottom up, not what was intended.  Perhaps top down should
        go bottom up (instead of belly up) when it fails - but I'd probably better
        leave that to Ingo.
      
        Or perhaps the default should place stack below text (as WLI suggested and
        ELF intended, with its text defaulting to 0x08048000, small progs sharing
        page table between stack and text and data); with a further personality for
        those needing bigger stack.
      
      From: Ingo Molnar <mingo@elte.hu>
      
        - fall back to the bottom-up layout if the stack can grow unlimited (if
        the stack ulimit has been set to RLIM_INFINITY)
      
        - try the bottom-up allocator if the top-down allocator fails - this can
        utilize the hole between the true bottom of the stack and its ulimit, as a
        last-resort effort.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8913d55b
    • Nick Piggin's avatar
      [PATCH] sched: cleanup, improve sched <=> fork APIs · 3632d86a
      Nick Piggin authored
      Move balancing and child-runs-first logic from fork.c into sched.c where
      it belongs.
      
      * Consolidate wake_up_forked_process and wake_up_forked_thread into
        wake_up_new_process, and pass in clone_flags as suggested by Linus.  This
        removes a lot of code duplication and allows all logic to be handled in that
        function.
      
      * Don't do balance-on-clone balancing for vfork'ed threads.
      
      * Don't do set_task_cpu or balance one clone in wake_up_new_process. 
        Instead do it in sched_fork to fix set_cpus_allowed races.
      
      * Don't do child-runs-first for CLONE_VM processes, as there is obviously no
        COW benifit to be had.  This is a big one, it enables Andi's workload to run
        well without clone balancing, because the OpenMP child threads can get
        balanced off to other nodes *before* they start running and allocating
        memory.
      
      * Rename sched_balance_exec to sched_exec: hide the policy from the API.
      
      
      From: Ingo Molnar <mingo@elte.hu>
      
        rename wake_up_new_process -> wake_up_new_task.
      
        in sched.c we are gradually moving away from the overloaded 'process' or
        'thread' notion to the traditional task (or context) naming.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3632d86a
  6. 23 Aug, 2004 1 commit
  7. 18 Jul, 2004 1 commit
    • Ingo Molnar's avatar
      [PATCH] NX: clean up legacy binary support · 1bb0fa18
      Ingo Molnar authored
      This cleans up legacy x86 binary support by introducing a new
      personality bit: READ_IMPLIES_EXEC, and implements Linus' suggestion to
      add the PROT_EXEC bit on the two affected syscall entry places,
      sys_mprotect() and sys_mmap().  If this bit is set then PROT_READ will
      also add the PROT_EXEC bit - as expected by legacy x86 binaries.  The
      ELF loader will automatically set this bit when it encounters a legacy
      binary.
      
      This approach avoids the problems the previous ->def_flags solution
      caused.  In particular this patch fixes the PROT_NONE problem in a
      cleaner way (http://lkml.org/lkml/2004/7/12/227), and it should fix the
      ia64 PROT_EXEC problem reported by David Mosberger.  Also,
      mprotect(PROT_READ) done by legacy binaries will do the right thing as
      well. 
      
      the details:
      
      - the personality bit is added to the personality mask upon exec(),
        within the ELF loader, but is not cleared (see the exceptions below). 
        This means that if an environment that already has the bit exec()s a
        new-style binary it will still get the old behavior.
      
      - one exception are setuid/setgid binaries: these will reset the
        bit - thus local attackers cannot manually set the bit and circumvent
        NX protection. Legacy setuid binaries will still get the bit through
        the ELF loader. This gives us maximum flexibility in shaping
        compatibility environments.
      
      - selinux also clears the bit when switching SIDs via exec().
      
      - x86 is the only arch making use of READ_IMPLIES_EXEC currently. Other
        arches will have the pre-NX-patch protection setup they always had.
      
      I have booted an old distro [RH 7.2] and two new PT_GNU_STACK distros
      [SuSE 9.2 and FC2] on an NX-capable CPU - they work just fine and all
      the mapping details are right. I've checked the PROT_NONE test-utility
      as well and it works as expected. I have checked various setuid
      scenarios as well involving legacy and new-style binaries.
      
      an improved setarch utility can be used to set the personality bit
      manually:
      
      	http://redhat.com/~mingo/nx-patches/setarch-1.4-3.tar.gz
      
      the new '-X' flag does it, e.g.:
      
      	./setarch -X linux /bin/cat /proc/self/maps
      
      will trigger the old protection layout even on a new distro.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1bb0fa18
  8. 29 Jun, 2004 1 commit
    • Yoav Zach's avatar
      [PATCH] binfmt misc fd passing via ELF aux vector · 191312cc
      Yoav Zach authored
      The proposed patch uses the aux-vector to pass the fd of the open misc
      binary to the interpreter, instead of using argv[1] for that purpose.
      
      Previous patch - open_nonreadable_binaries, offered the option of
      binfmt_misc opening the binary on behalf of the interpreter.  In case
      binfmt_misc is requested to do that it would pass the file-descriptor of
      the open binary to the interpreter as its second argument (argv[1]).  This
      method of passing the file descriptor was suspected to be problematic,
      since it changes the command line that users expect to see when using tools
      such as 'ps' and 'top'. 
      
      The proposed patch changes the method of passing the fd of the open binary
      to the translator.  Instead of passing it as an argument, binfmt_misc will
      request the ELF loader to pass it as a new element in the aux-vector that
      it prepares on the stack for ELF interpreter.  With this patch, argv[1]
      will hold the full path to the binary regardless of whether it opened it or
      not.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      191312cc
  9. 27 Jun, 2004 1 commit
    • Ingo Molnar's avatar
      [PATCH] NX (No eXecute) support for x86 · 36bc33ba
      Ingo Molnar authored
      we'd like to announce the availability of the following kernel patch:
      
           http://redhat.com/~mingo/nx-patches/nx-2.6.7-rc2-bk2-AE
      
      which makes use of the 'NX' x86 feature pioneered in AMD64 CPUs and for
      which support has also been announced by Intel. (other x86 CPU vendors,
      Transmeta and VIA announced support as well. Windows support for NX has
      also been announced by Microsoft, for their next service pack.) The NX
      feature is also being marketed as 'Enhanced Virus Protection'. This
      patch makes sure Linux has full support for this hardware feature on x86
      too.
      
      What does this patch do? The pagetable format of current x86 CPUs does
      not have an 'execute' bit. This means that even if an application maps a
      memory area without PROT_EXEC, the CPU will still allow code to be
      executed in this memory. This property is often abused by exploits when
      they manage to inject hostile code into this memory, for example via a
      buffer overflow.
      
      The NX feature changes this and adds a 'dont execute' bit to the PAE
      pagetable format. But since the flag defaults to zero (for compatibility
      reasons), all pages are executable by default and the kernel has to be
      taught to make use of this bit.
      
      If the NX feature is supported by the CPU then the patched kernel turns
      on NX and it will enforce userspace executability constraints such as a
      no-exec stack and no-exec mmap and data areas. This means less chance
      for stack overflows and buffer-overflows to cause exploits.
      
      furthermore, the patch also implements 'NX protection' for kernelspace
      code: only the kernel code and modules are executable - so even
      kernel-space overflows are harder (in some cases, impossible) to
      exploit. Here is how kernel code that tries to execute off the stack is 
      stopped:
      
       kernel tried to access NX-protected page - exploit attempt? (uid: 500)
       Unable to handle kernel paging request at virtual address f78d0f40
        printing eip:
       ...
      
      The patch is based on a prototype NX patch written for 2.4 by Intel -
      special thanks go to Suresh Siddha and Jun Nakajima @ Intel. The
      existing NX support in the 64-bit x86_64 kernels has been written by
      Andi Kleen and this patch is modeled after his code.
      
      Arjan van de Ven has also provided lots of feedback and he has
      integrated the patch into the Fedora Core 2 kernel. Test rpms are
      available for download at:
      
          http://redhat.com/~arjanv/2.6/RPMS.kernel/
      
      the kernel-2.6.6-1.411 rpms have the NX patch applied.
      
      here's a quickstart to recompile the vanilla kernel from source with the
      NX patch:
      
          http://redhat.com/~mingo/nx-patches/QuickStart-NX.txt
      
      update:
      
       - make the heap non-executable on PT_GNU_STACK binaries.
      
       - make all data mmap()s (and the heap) executable on !PT_GNU_STACK
         (legacy) binaries. This has no effect on non-NX CPUs, but should be
         much more compatible on NX CPUs. The only effect it has it has on
         non-NX CPUs is the extra 'x' bit displayed in /proc/PID/maps.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      36bc33ba
  10. 18 Jun, 2004 2 commits
    • Russell King's avatar
      [PATCH] Clean up asm/pgalloc.h include · 1c60f076
      Russell King authored
      This patch cleans up needless includes of asm/pgalloc.h from the fs/
      kernel/ and mm/ subtrees.  Compile tested on multiple ARM platforms, and
      x86, this patch appears safe.
      
      This patch is part of a larger patch aiming towards getting the include of
      asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at
      things like mm_struct and friends.
      
      I suggest testing in -mm for a while to ensure there aren't any hidden arch
      issues.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1c60f076
    • Yoav Zach's avatar
      [PATCH] Handle non-readable binfmt_misc executables · 79baf43b
      Yoav Zach authored
      <background>
      
      I work in a group that works on enabling the IA-32 Execution Layer
      (http://www.intel.com/pressroom/archive/releases/20040113comp.htm) on Linux.
      In a few words - this is a dynamic translator for IA-32 binaries on IPF
      platform.  Following David Mosberger's advice - we use the binfmt_misc
      mechanism for the invocation of the translator whenever the user tries to
      exec an IA-32 binary.
      
      The EL is meant to help in the migration path from IA-32 to IPF.  From our
      beta customers we learnt that at first stage - they tend to keep their
      environment mostly intact, using the legacy IA-32 binaries.
      
      Such an environment has, naturally, setuid and non-readable binaries.  It
      will be useless to ask the administrator to change the settings of such an
      environment - some of them are very complex, and the administrators are
      reluctant to make any changes in a system that already proved itself to be
      robust and secure.  So, our target with these patches is not to enhance the
      support for scripts but rather to allow a translator to be integrated into a
      working environment that is not (and should not be) aware to the fact it's
      being emulated.
      
      As I said before - it is practically hopeless to expect an administrator of
      such a system to change it so that it will suit the current behavior of
      binfmt_misc.  But, even if we could do that,
      
      I'm not sure it would be a good idea - these changes are likely to be less
      secure than the suggested patches -
      
      - In order to execute non-readable binaries the binary will have to be made
        readable, which is obviously less secure than allowing only a trusted
        translator to read it
      
      - There will be no way for the translator to calculate the accurate
        AT_SECURE value for the translated process.  This might end up with the
        translated process running in a non-secured mode when it actually needs to
        be secured.
      
      </background>
      
      
      I prepared a patch that solves a couple of problems that interpreters have
      when invoked via binfmt_misc.  currently -
      
      1) such interpreters cannot open non-readable binaries
      
      2) the processes will have their credentials and security attributes
         calculated according to interpreter permissions and not those of the
         original binary
      
      the proposed patch solves these problems by -
      
      1) opening the binary on behalf of the interpreter and passing its fd
         instead of the path as argv[1] to the interpreter
      
      2) calling prepare_binprm with the file struct of the binary and not the
         one of the interpreter
      
      The new functionality is enabled by adding a special flag to the registration
      string.  If this flag is not added then old behavior is not changed.
      
      A preliminary version of this patch was sent to the list on 9/1/2003 with the
      title "[PATCH]: non-readable binaries - binfmt_misc 2.6.0-test4".  This new
      version fixes the concerns that were raised by the patch, except of calling
      unshare_files() before allocating a new fd.  this is because this feature did
      not enter 2.6 yet.
      
      
      Arun Sharma <arun.sharma@intel.com> says:
      
      We were going through an internal review of this patch:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=107424598901720&w=2
      
      which is in your tree already.  I'm not sure if this line of code got
      sufficient review.
      
      +               /* call prepare_binprm before switching to interpreter's file
      +                * so that all security calculation will be done according to
      +                * binary and not interpreter */
      +               retval = prepare_binprm(bprm);
      
      The case that concerns me is: unprivileged interpreter and a privileged
      binary.  One can use binfmt_misc to execute untrusted code (interpreter) with
      elevated privileges.  One could argue that all binfmt_misc interpreters are
      trusted, because only root can register them.  But that's a change from the
      traditional behavior of binfmt_misc (and binfmt_script).
      
      
      (Update):
      
      Arun pointed out that calculating the process credentials according to the
      binary that needs to be translated is a bit risky, since it requires the
      administrator to pay extra attention not to register an interpreter which is
      not intended to run with root credentials.
      
      After discussing this issue with him, I would like to propose a modified
      patch: The old patch did 2 things - 1) open the binary for reading and 2)
      calculate the credentials according to the binary.
      
      I removed the riskier part of changing the credentials calculation, so the
      revised patch only opens the binary for reading.  It also includes few words
      of warning in the description of the 'open-binary' feature in
      binfmt_misc.txt, and makes the function entry_status print the flags in use.
      
      As for the 'credentials' part of the patch, I will prepare a separate patch
      for it and send it again to the LKML, describe the problem and ask for people
      comments.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      79baf43b
  11. 03 Jun, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] sched: balance-on-exec fix · eb313b41
      Andrew Morton authored
      From: Jack Steiner <steiner@sgi.com>
      
      It looks like the call to sched_balance_exec() from do_execve() is in the
      wrong spot.  The code calls sched_balance_exec() before determining whether
      "filename" actually exists.
      
      In many cases, users have several entries in $PATH.  If a full path name is
      not specified on the 'exec" call, the library code iterates thru the files
      in the PATH list until it finds the program.  This can result is numerous
      migrations of the parent process before the program is actually found.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      eb313b41
  12. 29 May, 2004 1 commit
    • Alexander Viro's avatar
      [PATCH] sparse: bits and pieces · 4d306504
      Alexander Viro authored
      Independent minor bits caught by sparse:
      
       - paride.h mixing void and int in ? :, used always in a void context
         ide-iops.c return insw() - insw is void()
       - scsi/constants.c uses undefined macros in #if; added #define to 0 in
         case that used to leave it undefined
       - usb/host/hcd.h: fixed-point arithmetics in constant
       - fs/exec.c: missing UL on a large constant
       - fs/locks.c: #if where #ifdef should've been
       - fs.h: missing UL on MAX_LFS_FILESIZE in 64bit case
      4d306504
  13. 22 May, 2004 8 commits
    • Andrew Morton's avatar
      [PATCH] rmap 39 add anon_vma rmap · 8aa3448c
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Andrea Arcangeli's anon_vma object-based reverse mapping scheme for anonymous
      pages.  Instead of tracking anonymous pages by pte_chains or by mm, this
      tracks them by vma.  But because vmas are frequently split and merged
      (particularly by mprotect), a page cannot point directly to its vma(s), but
      instead to an anon_vma list of those vmas likely to contain the page - a list
      on which vmas can easily be linked and unlinked as they come and go.  The vmas
      on one list are all related, either by forking or by splitting.
      
      This has three particular advantages over anonmm: that it can cope
      effortlessly with mremap moves; and no longer needs page_table_lock to protect
      an mm's vma tree, since try_to_unmap finds vmas via page -> anon_vma -> vma
      instead of using find_vma; and should use less cpu for swapout since it can
      locate its anonymous vmas more quickly.
      
      It does have disadvantages too: a lot more change in mmap.c to deal with
      anon_vmas, though small straightforward additions now that the vma merging has
      been refactored there; more lowmem needed for each anon_vma and vma structure;
      an additional restriction on the merging of vmas (cannot be merged if already
      assigned different anon_vmas, since then their pages will be pointing to
      different heads).
      
      (There would be no need to enlarge the vma structure if anonymous pages
      belonged only to anonymous vmas; but private file mappings accumulate
      anonymous pages by copy-on-write, so need to be listed in both anon_vma and
      prio_tree at the same time.  A different implementation could avoid that by
      using anon_vmas only for purely anonymous vmas, and use the existing prio_tree
      to locate cow pages - but that would involve a long search for each single
      private copy, probably not a good idea.)
      
      Where before the vm_pgoff of a purely anonymous (not file-backed) vma was
      meaningless, now it represents the virtual start address at which that vma is
      mapped - which the standard file pgoff manipulations treat linearly as vmas
      are split and merged.  But if mremap moves the vma, then it generally carries
      its original vm_pgoff to the new location, so pages shared with the old
      location can still be found.  Magic.
      
      Hugh has massaged it somewhat: building on the earlier rmap patches, this
      patch is a fifth of the size of Andrea's original anon_vma patch.  Please note
      that this posting will be his first sight of this patch, which he may or may
      not approve.
      8aa3448c
    • Andrew Morton's avatar
      [PATCH] rmap 37 page_add_anon_rmap vma · e1fd9cc9
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Silly final patch for anonmm rmap: change page_add_anon_rmap's mm arg to vma
      arg like anon_vma rmap, to smooth the transition between them.
      e1fd9cc9
    • Andrew Morton's avatar
      [PATCH] rmap 33 install_arg_page vma · 114c71ee
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      anon_vma will need to pass vma to put_dirty_page, so change it and its
      various callers (setup_arg_pages and its 32-on-64-bit arch variants); and
      please, let's rename it to install_arg_page.
      
      Earlier attempt to do this (rmap 26 __setup_arg_pages) tried to clean up
      those callers instead, but failed to boot: so now apply rmap 27's memset
      initialization of vmas to these callers too; which relieves them from
      needing the recently included linux/mempolicy.h.
      
      While there, moved install_arg_page's flush_dcache_page up before
      page_table_lock - doesn't in fact matter at all, just saves one worry when
      researching flush_dcache_page locking constraints.
      114c71ee
    • Andrew Morton's avatar
      [PATCH] rmap 27 memset 0 vma · c8ba2065
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      We're NULLifying more and more fields when initializing a vma
      (mpol_set_vma_default does that too, if configured to do anything).  Now use
      memset to avoid specifying fields, and save a little code too.
      
      (Yes, I realize anon_vma will want to set vm_pgoff non-0, but I think that
      will be better handled at the core, since anon vm_pgoff is negotiable up until
      an anon_vma is actually assigned.)
      c8ba2065
    • Andrew Morton's avatar
      [PATCH] rmap 16: pretend prio_tree · fc96c90f
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Pave the way for prio_tree by switching over to its interfaces, but actually
      still implement them with the same old lists as before.
      
      Most of the vma_prio_tree interfaces are straightforward.  The interesting one
      is vma_prio_tree_next, used to search the tree for all vmas which overlap the
      given range: unlike the list_for_each_entry it replaces, it does not find
      every vma, just those that match.
      
      But this does leave handling of nonlinear vmas in a very unsatisfactory state:
      for now we have to search again over the maximum range to find all the
      nonlinear vmas which might contain a page, which of course takes away the
      point of the tree.  Fixed in later patch of this batch.
      
      There is no need to initialize vma linkage all over, just do it before
      inserting the vma in list or tree.  /proc/pid/statm had an odd test for its
      shared count: simplified to an equivalent test on vm_file.
      fc96c90f
    • Andrew Morton's avatar
      [PATCH] small numa api fixups · e52c02f7
      Andrew Morton authored
      From: Christoph Hellwig <hch@lst.de>
      
      - don't include mempolicy.h in sched.h and mm.h when a forward delcaration
        is enough.  Andi argued against that in the past, but I'd really hate to add
        another header to two of the includes used in basically every driver when we
        can include it in the six files actually needing it instead (that number is
        for my ppc32 system, maybe other arches need more include in their
        directories)
      
      - make numa api fields in tast_struct conditional on CONFIG_NUMA, this gives
        us a few ugly ifdefs but avoids wasting memory on non-NUMA systems.
      e52c02f7
    • Andrew Morton's avatar
      [PATCH] numa api: Add VMA hooks for policy · c78b023f
      Andrew Morton authored
      From: Andi Kleen <ak@suse.de>
      
      NUMA API adds a policy to each VMA.  During VMA creattion, merging and
      splitting these policies must be handled properly.  This patch adds the calls
      to this. 
      
      It is a nop when CONFIG_NUMA is not defined.
      c78b023f
    • Andrew Morton's avatar
      [PATCH] rmap 9 remove pte_chains · 123e4df7
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Lots of deletions: the next patch will put in the new anon rmap, which
      should look clearer if first we remove all of the old pte-pointer-based
      rmap from the core in this patch - which therefore leaves anonymous rmap
      totally disabled, anon pages locked in memory until process frees them.
      
      Leave arch files (and page table rmap) untouched for now, clean them up in
      a later batch.  A few constructive changes amidst all the deletions:
      
      Choose names (e.g.  page_add_anon_rmap) and args (e.g.  no more pteps) now
      so we need not revisit so many files in the next patch.  Inline function
      page_dup_rmap for fork's copy_page_range, simply bumps mapcount under lock.
       cond_resched_lock in copy_page_range.  Struct page rearranged: no pte
      union, just mapcount moved next to atomic count, so two ints can occupy one
      long on 64-bit; i386 struct page now 32 bytes even with PAE.  Never pass
      PageReserved to page_remove_rmap, only do_wp_page did so.
      
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Move page_add_anon_rmap's BUG_ON(page_mapping(page)) inside the rmap_lock
        (well, might as well just check mapping if !mapcount then): if this page is
        being mapped or unmapped on another cpu at the same time, page_mapping's
        PageAnon(page) and page->mapping are volatile.
      
        But page_mapping(page) is used more widely: I've a nasty feeling that
        clear_page_anon, page_add_anon_rmap and/or page_mapping need barriers added
        (also in 2.6.6 itself),
      123e4df7
  14. 26 Apr, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] credentials locking fix · 10c189cd
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Contributions from:
      Stephen Smalley <sds@epoch.ncsc.mil>
      Andy Lutomirski <luto@stanford.edu>
      
      During exec the LSM bprm_apply_creds() hooks may tranisition the program to a
      new security context (like setuid binaries).  The security context of the new
      task is dependent on state such as if the task is being ptraced.  
      
      ptrace_detach() doesn't take the task_lock() when clearing task->ptrace.  So
      there is a race possible where a process starts off being ptraced, the
      malicious ptracer detaches and if any checks agains task->ptrace are done more
      than once, the results are indeterminate.
      
      This patch ensures task_lock() is held while bprm_apply_creds() hooks are
      called, keeping it safe against ptrace_attach() races.  Additionally, tests
      against task->ptrace (and ->fs->count, ->files->count and ->sighand->count all
      of which signify potential unsafe resource sharing during a security context
      transition) are done only once the results are passed down to hooks, making it
      safe against ptrace_detach() races.
      
      Additionally:
      
      - s/must_must_not_trace_exec/unsafe_exec/
      - move unsafe_exec() call above security_bprm_apply_creds() call rather than
        in call for readability.
      - fix dummy hook to honor the case where root is ptracing
      - couple minor formatting/spelling fixes
      10c189cd
  15. 21 Apr, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] compute_creds race · b7fbe52c
      Andrew Morton authored
      From: Andy Lutomirski <luto@myrealbox.com>
      
      Fixes from me, Olaf Dietsche <olaf+list.linux-kernel@olafdietsche.de>
      
      In fs/exec.c, compute_creds does:
      
      	task_lock(current);
      	if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
                       current->mm->dumpable = 0;
      
      		if (must_not_trace_exec(current)
      		    || atomic_read(&current->fs->count) > 1
      		    || atomic_read(&current->files->count) > 1
      		    || atomic_read(&current->sighand->count) > 1) {
      			if(!capable(CAP_SETUID)) {
      				bprm->e_uid = current->uid;
      				bprm->e_gid = current->gid;
      			}
      		}
      	}
      
               current->suid = current->euid = current->fsuid = bprm->e_uid;
               current->sgid = current->egid = current->fsgid = bprm->e_gid;
      
      	task_unlock(current);
      
      	security_bprm_compute_creds(bprm);
      
      I assume the task_lock is to prevent another process (on SMP or preempt)
      from ptracing the execing process between the check and the assignment.  If
      that's the concern then the fact that the lock is dropped before the call
      to security_brpm_compute_creds means that, if security_bprm_compute_creds
      does anything interesting, there's a race.
      
      For my (nearly complete) caps patch, I obviously need to fix this.  But I
      think it may be exploitable now.  Suppose there are two processes, A (the
      malicious code) and B (which uses exec).  B starts out unprivileged (A and
      B have, e.g., uid and euid = 500).
      
      1. A ptraces B.
      
      2. B calls exec on some setuid-root program.
      
      3. in cap_bprm_set_security, B sets bprm->cap_permitted to the full
         set.
      
      4. B gets to compute_creds in exec.c, calls task_lock, and does not
         change its uid.
      
      5. B calls task_unlock.
      
      6. A detaches from B (on preempt or SMP).
      
      7. B gets to task_lock in cap_bprm_compute_creds, changes its
         capabilities, and returns from compute_creds into load_elf_binary.
      
      8. load_elf_binary calls create_elf_tables (line 852 in 2.6.5-mm1),
         which calls cap_bprm_secureexec (through LSM), which returns false (!).
      
      9. exec finishes.
      
      The setuid program is now running with uid=euid=500 but full permitted
      capabilities.  There are two (or three) ways to effectively get local root
      now:
      
      1.  IIRC, linux 2.4 doesn't check capabilities in ptrace, so A could
         just ptrace B again.
      
      2. LD_PRELOAD.
      
      3.  There are probably programs that will misbehave on their own under
         these circumstances.
      
      Is there some reason why this is not doable?
      
      The patch renames bprm_compute_creds to bprm_apply_creds and moves all uid
      logic into the hook, where the test and the resulting modification can both
      happen under task_lock().
      
      This way, out-of-tree LSMs will fail to compile instead of malfunctioning. 
      It should also make life easier for LSMs and will certainly make it easier
      for me to finish the cap patch.
      b7fbe52c
  16. 17 Apr, 2004 1 commit
    • Petr Vandrovec's avatar
      [PATCH] Fix exec in multithreaded application · bea63af0
      Petr Vandrovec authored
      The recent controlling terminal changes broke exec from multithreaded
      application because de_thread was not upgraded to new arrangement.  I
      know that I should not have LD_PRELOAD library which automatically
      creates one thread, but it looked like a cool solution to the problem I
      had.
      
      de_thread must initialize the controlling terminal information in the
      new thread group.
      bea63af0
  17. 12 Apr, 2004 4 commits
    • Andrew Morton's avatar
      [PATCH] rmap 1 linux/rmap.h · 4c4acd24
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      First of a batch of three rmap patches: this initial batch of three paving
      the way for a move to some form of object-based rmap (probably Andrea's, but
      drawing from mine too), and making almost no functional change by itself.  A
      few days will intervene before the next batch, to give the struct page
      changes in the second patch some exposure before proceeding.
      
      rmap 1 create include/linux/rmap.h
      
      Start small: linux/rmap-locking.h has already gathered some declarations
      unrelated to locking, and the rest of the rmap declarations were over in
      linux/swap.h: gather them all together in linux/rmap.h, and rename the
      pte_chain_lock to rmap_lock.
      4c4acd24
    • Andrew Morton's avatar
      [PATCH] fix posix-timers to have proper per-process scope · 0e568881
      Andrew Morton authored
      From: Roland McGrath <roland@redhat.com>
      
      The posix-timers implementation associates timers with the creating thread
      and destroys timers when their creator thread dies.  POSIX clearly
      specifies that these timers are per-process, and a timer should not be torn
      down when the thread that created it exits.  I hope there won't be any
      controversy on what the correct semantics are here, since POSIX is clear
      and the Linux feature is called "posix-timers".
      
      The attached program built with NPTL -lrt -lpthread demonstrates the bug.
      The program is correct by POSIX, but fails on Linux.  Note that a until
      just the other day, NPTL had a trivial bug that always disabled its use of
      kernel timer syscalls (check strace for lack of timer_create/SYS_259).  So
      unless you have built your own NPTL libs very recently, you probably won't
      see the kernel calls actually used by this program.
      
      Also attached is my patch to fix this.  It (you guessed it) moves the
      posix_timers field from task_struct to signal_struct.  Access is now
      governed by the siglock instead of the task lock.  exit_itimers is called
      from __exit_signal, i.e.  only on the death of the last thread in the
      group, rather than from do_exit for every thread.  Timers' it_process
      fields store the group leader's pointer, which won't die.  For the case of
      SIGEV_THREAD_ID, I hold a ref on the task_struct for it_process to stay
      robust in case the target thread dies; the ref is released and the dangling
      pointer cleared when the timer fires and the target thread is dead.  (This
      should only come up in a buggy user program, so noone cares exactly how the
      kernel handles that case.  But I think what I did is robust and sensical.)
      
      /* Test for bogus per-thread deletion of timers.  */
      
      #include <stdio.h>
      #include <error.h>
      #include <time.h>
      #include <signal.h>
      #include <stdint.h>
      #include <sys/time.h>
      #include <sys/resource.h>
      #include <unistd.h>
      #include <pthread.h>
      
      /* Creating timers in another thread should work too.  */
      static void *do_timer_create(void *arg)
      {
      	struct sigevent *const sigev = arg;
      	timer_t *const timerId = sigev->sigev_value.sival_ptr;
      	if (timer_create(CLOCK_REALTIME, sigev, timerId) < 0) {
      		perror("timer_create");
      		return NULL;
      	}
      	return timerId;
      }
      
      int main(void)
      {
      	int i, res;
      	timer_t timerId;
      	struct itimerspec itval;
      	struct sigevent sigev;
      
      	itval.it_interval.tv_sec = 2;
      	itval.it_interval.tv_nsec = 0;
      	itval.it_value.tv_sec = 2;
      	itval.it_value.tv_nsec = 0;
      
      	sigev.sigev_notify = SIGEV_SIGNAL;
      	sigev.sigev_signo = SIGALRM;
      	sigev.sigev_value.sival_ptr = (void *)&timerId;
      
      	for (i = 0; i < 100; i++) {
      		printf("cnt = %d\n", i);
      
      		pthread_t thr;
      		res = pthread_create(&thr, NULL, &do_timer_create, &sigev);
      		if (res) {
      			error(0, res, "pthread_create");
      			continue;
      		}
      		void *val;
      		res = pthread_join(thr, &val);
      		if (res) {
      			error(0, res, "pthread_join");
      			continue;
      		}
      		if (val == NULL)
      			continue;
      
      		res = timer_settime(timerId, 0, &itval, NULL);
      		if (res < 0)
      			perror("timer_settime");
      
      		res = timer_delete(timerId);
      		if (res < 0)
      			perror("timer_delete");
      	}
      
      	return 0;
      }
      0e568881
    • Andrew Morton's avatar
      [PATCH] Non-Exec stack support · 01cc53b2
      Andrew Morton authored
      From: Kurt Garloff <garloff@suse.de>
      
      A patch to parse the elf binaries for a PT_GNU_STACK section to set the stack
      non-executable if possible.  Most parts have been shamelessly stolen from
      Ingo Molnar's more ambitious stackshield
      http://people.redhat.com/mingo/exec-shield/exec-shield-2.6.4-C9
      
      The toolchain has meanwhile support for marking the binaries with a
      PT_GNU_STACK section wwithout x bit as needed.
      
      If no such section is found, we leave the stack to whatever the arch defaults
      to.  If there is one, we explicitly disabled the VM_EXEC bit if no x bit is
      found, otherwise explicitly enable.
      01cc53b2
    • Andrew Morton's avatar
      [PATCH] move job control fields from task_struct to signal_struct · 7860b371
      Andrew Morton authored
      From: Roland McGrath <roland@redhat.com>
      
      This patch moves all the fields relating to job control from task_struct to
      signal_struct, so that all this info is properly per-process rather than
      being per-thread.
      7860b371
  18. 25 Feb, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] add syscalls.h · 0bab0642
      Andrew Morton authored
      From: "Randy.Dunlap" <rddunlap@osdl.org>
      
      Add syscalls.h, which contains prototypes for the kernel's system calls.
      Replace open-coded declarations all over the place.  This patch found a
      couple of prior bugs.  It appears to be more important with -mregparm=3 as we
      discover more asmlinkage mismatches.
      
      Some syscalls have arch-dependent arguments, so their prototypes are in the
      arch-specific unistd.h.  Maybe it should have been asm/syscalls.h, but there
      were already arch-specific syscall prototypes in asm/unistd.h...
      
      Tested on x86, ia64, x86_64, ppc64, s390 and sparc64.  May cause
      trivial-to-fix build breakage on other architectures.
      0bab0642
  19. 18 Feb, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] Enable coredumps > 2GB · 95b387a4
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Some x86-64 users were complaining that coredumps >2GB don't work.
      
      This will enable large coredump for everybody.  Apparently the 32bit
      gdb/binutils cannot handle them, but I hear the binutils people are working
      on fixing that.  I doubt it will harm people - unreadable coredumps are not
      worse than no coredump and it won't make any difference in space usage if
      you get a 1.99GB or a 2.5GB coredump.  So just enable it unconditionally.
      If it should be really a problem for 32bit the rlimit defaults in
      resource.h could be changed.
      
      For file systems that don't support O_LARGEFILE you should just get an
      truncated coredumps for big address spaces.
      95b387a4
  20. 19 Jan, 2004 1 commit
    • Andrew Morton's avatar
      [PATCH] nfs: Fix an open intent bug · 7de3a7b2
      Andrew Morton authored
      From: Trond Myklebust <trond.myklebust@fys.uio.no>
      
      The following patch fixes a bug when initializing the intent structure
      in sys_uselib(): intents use the FMODE_READ convention rather than
      O_RDONLY.
      
      It also adds a missing open intent to open_exec(). This ensures that NFS
      clients will do the necessary close-to-open data cache consistency
      checking.
      7de3a7b2
  21. 30 Dec, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] Fix memleak on execve failure · 7764b6de
      Andrew Morton authored
      From: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
      
      I found linux-2.6.0-test11 leaks memory when execve fails.  I've also
      checked the bitkeeper tree and the problem seems to be unchanged.
      
      The attached patch is a partial backout of bitkeeper rev.  1.87 of
      fs/exec.c.  I guess the original change was a simple mistake.
      (free_arg_pages() is a NOP when CONFIG_MMU is defined).
      7764b6de
  22. 29 Dec, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] use new steal_locks helper · 02c541ec
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Use the new steal_locks helper to steal the locks from the old files struct
      left from unshare_files() when the new unshared struct files gets used.
      02c541ec
    • Andrew Morton's avatar
      [PATCH] use new unshare_files helper · 04e9bcb4
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Use unshare_files during binary loading to eliminate potential leak of
      the binary's fd installed during execve().  As is, this breaks
      binfmt_som.c
      04e9bcb4
  23. 09 Oct, 2003 1 commit
    • Linus Torvalds's avatar
      Revert the process group accessor functions. They are buggy, and · 06349d9d
      Linus Torvalds authored
      cause NULL pointer references in /proc.
      
      Moreover, it's questionable whether the whole thing makes sense at all. 
      Per-thread state is good.
      
      Cset exclude: davem@nuts.ninka.net|ChangeSet|20031005193942|01097
      Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180420|42200
      Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180411|42211
      06349d9d
  24. 05 Oct, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] move job control fields from task_struct to · 1bd563fd
      Andrew Morton authored
      From: Roland McGrath <roland@redhat.com>
      
      This patch completes what was started with the `process_group' accessor
      function, moving all the job control-related fields from task_struct into
      signal_struct and using process_foo accessor functions to read them.  All
      these things are per-process in POSIX, none per-thread.  Off hand it's hard
      to come up with the hairy MT scenarios in which the existing code would do
      insane things, but trust me, they're there.  At any rate, all the uses
      being done via inline accessor functions now has got to be all good.
      
      I did a "make allyesconfig" build and caught the few random drivers and
      whatnot that referred to these fields.  I was surprised to find how few
      references to ->tty there really were to fix up.  I'm sure there will be a
      few more fixups needed in non-x86 code.  The only actual testing of a
      running kernel with these patches I've done is on my normal minimal x86
      config.  Everything works fine as it did before as far as I can tell.
      
      One issue that may be of concern is the lack of any locking on multiple
      threads diddling these fields.  I don't think it really matters, though
      there might be some obscure races that could produce inconsistent job
      control results.  Nothing shattering, I'm sure; probably only something
      like a multi-threaded program calling setsid while its other threads do tty
      i/o, which never happens in reality.  This is the same situation we get by
      using ->group_leader->foo without other synchronization, which seemed to be
      the trend and noone was worried about it.
      1bd563fd