1. 31 Aug, 2004 25 commits
    • Nick Piggin's avatar
      33982f7f
    • Nick Piggin's avatar
      [PATCH] use hlist for pid hash · f59ad67e
      Nick Piggin authored
      Use hlists for the PID hashes.  This halves the memory footprint of these
      hashes.  No benchmarks, but I think this is a worthy improvement because
      the hashes are something that would be likely to have significant portions
      loaded into the cache of every CPU on some workloads.
      
      This comes at the "expense" of
      	1. reintroducing the memory  prefetch into the hash traversal loop;
      	2. adding new pids to the head of the list instead of the tail. I
      	   suspect that if this was a big problem then the hash isn't sized
      	   well or could benefit from moving hot entries to the head.
      
      Also, account for all the pid hashes when reporting hash memory usage.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f59ad67e
    • Nick Piggin's avatar
      [PATCH] fix PID hash sizing · 2c05d9eb
      Nick Piggin authored
      A 4GB, 4-way Opteron would create the smallest size table (16 entries) because
      pidhash_init is called before mem_init which is where x86-64 sets up max_pfn.
      
      nr_kernel_pages is setup by paging_init, called from setup_arch, which is also
      where i386 sets up max_pfn.
      
      So export nr_kernel_pages, nr_all_pages.  Use nr_kernel_pages when sizing the
      PID hash.  This fixes the problem.
      
      This also makes the pid hash dependant on the size of ZONE_NORMAL instead of
      total size of memory.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2c05d9eb
    • Roland McGrath's avatar
      [PATCH] fix rusage semantics · 497c9d68
      Roland McGrath authored
      This patch changes the rusage bookkeeping and the semantics of the
      getrusage and times calls in a couple of ways.
      
      The first change is in the c* fields counting dead child processes.  POSIX
      requires that children that have died be counted in these fields when they
      are reaped by a wait* call, and that if they are never reaped (e.g.
      because of ignoring SIGCHLD, or exitting yourself first) then they are
      never counted.  These were counted in release_task for all threads.  I've
      changed it so they are counted in wait_task_zombie, i.e.  exactly when
      being reaped.
      
      POSIX also specifies for RUSAGE_CHILDREN that the report include the reaped
      child processes of the calling process, i.e.  whole thread group in Linux,
      not just ones forked by the calling thread.  POSIX specifies tms_c[us]time
      fields in the times call the same way.  I've moved the c* fields that
      contain this information into signal_struct, where the single set of
      counters accumulates data from any thread in the group that calls wait*.
      
      Finally, POSIX specifies getrusage and times as returning cumulative totals
      for the whole process (aka thread group), not just the calling thread.
      I've added fields in signal_struct to accumulate the stats of detached
      threads as they die.  The process stats are the sums of these records plus
      the stats of remaining each live/zombie thread.  The times and getrusage
      calls, and the internal uses for filling in wait4 results and siginfo_t,
      now iterate over the threads in the thread group and sum up their stats
      along with the stats recorded for threads already dead and gone.
      
      I added a new value RUSAGE_GROUP (-3) for the getrusage system call rather
      than changing the behavior of the old RUSAGE_SELF (0).  POSIX specifies
      RUSAGE_SELF to mean all threads, so the glibc getrusage call will just
      translate it to RUSAGE_GROUP for new kernels.  I did this thinking that
      someone somewhere might want the old behavior with an old glibc and a new
      kernel (it is only different if they are using CLONE_THREAD anyway). 
      However, I've changed the times system call to conform to POSIX as well and
      did not provide any backward compatibility there.  In that case there is
      nothing easy like a parameter value to use, it would have to be a new
      system call number.  That seems pretty pointless.  Given that, I wonder if
      it is worth bothering to preserve the compatible RUSAGE_SELF behavior by
      introducing RUSAGE_GROUP instead of just changing RUSAGE_SELF's meaning.
      Comments?
      
      I've done some basic testing on x86 and x86-64, and all the numbers come
      out right after these fixes.  (I have a test program that shows a few
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      497c9d68
    • Roland McGrath's avatar
      [PATCH] waitid system call · ca3f74aa
      Roland McGrath authored
      This patch adds a new system call `waitid'.  This is a new POSIX call that
      subsumes the rest of the wait* family and can do some things the older
      calls cannot.  A minor addition is the ability to select what kinds of
      status to check for with a mask of independent bits, so you can wait for
      just stops and not terminations, for example.  A more significant
      improvement is the WNOWAIT flag, which allows for polling child status
      without reaping.  This interface fills in a siginfo_t with the same details
      that a SIGCHLD for the status change has; some of that info (e.g.  si_uid)
      is not available via wait4 or other calls.
      
      I've added a new system call that has the parameter conventions of the
      POSIX function because that seems like the cleanest thing.  This patch
      includes the actual system call table additions for i386 and x86-64; other
      architectures will need to assign the system call number, and 64-bit ones
      may need to implement 32-bit compat support for it as I did for x86-64. 
      The new features could instead be provided by some new kludge inventions in
      the wait4 system call interface (that's what BSD did).  If kludges are
      preferable to adding a system call, I can work up something different.
      
      I added a struct rusage field si_rusage to siginfo_t in the SIGCHLD case
      (this does not affect the size or layout of the struct).  This is not part
      of the POSIX interface, but it makes it so that `waitid' subsumes all the
      functionality of `wait4'.  Future kernel ABIs (new arch's or whatnot) can
      have only the `waitid' system call and the rest of the wait* family
      including wait3 and wait4 can be implemented in user space using waitid.
      There is nothing in user space as yet that would make use of the new field.
      
      Most of the new functionality is implemented purely in the waitid system
      call itself.  POSIX also provides for the WCONTINUED flag to report when a
      child process had been stopped by job control and then resumed with
      SIGCONT.  Corresponding to this, a SIGCHLD is now generated when a child
      resumes (unless SA_NOCLDSTOP is set), with the value CLD_CONTINUED in
      siginfo_t.si_code.  To implement this, some additional bookkeeping is
      required in the signal code handling job control stops.
      
      The motivation for this work is to make it possible to implement the POSIX
      semantics of the `waitid' function in glibc completely and correctly.  If
      changing either the system call interface used to accomplish that, or any
      details of the kernel implementation work, would improve the chances of
      getting this incorporated, I am more than happy to work through any issues.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ca3f74aa
    • Anton Blanchard's avatar
      [PATCH] Using get_cycles for add_timer_randomness · 4c746d40
      Anton Blanchard authored
      I tested how long it took to do a dd from /dev/random on ppc64 before and
      after this patch, while doing a ping flood from another machine.
      
      before:
      # /usr/bin/time dd if=/dev/random of=/dev/zero count=1k
      0+51 records in
      Command terminated by signal 2
      0.00user 0.00system 19:18.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
      
      I gave up after 19 minutes.
      
      after:
      # /usr/bin/time dd if=/dev/random of=/dev/zero count=1k
      0+1024 records in
      0.00user 0.00system 0:33.38elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
      
      Just over 33 seconds. Better.
      
      From: Arnd Bergmann <arnd@arndb.de>
      
      I noticed that only i386 and x86-64 are currently using a high resolution
      timer source when adding randomness.  Since many architectures have a
      working get_cycles() implementation, it seems rather straightforward to use
      that.
      
      Has this been discussed before, or can anyone comment on the implementation
      below?
      
      This patch attempts to take into account the size of cycles_t, which is
      either 32 or 64 bits wide but independent of the architecture's word size.
      
      The behavior should be nearly identical to the old one on i386, x86-64 and
      all architectures without a time stamp counter, while finding more entropy
      on the other architectures.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4c746d40
    • Andi Kleen's avatar
      [PATCH] x86_64: emulate NUMA on non-NUMA hardware · 60b292ca
      Andi Kleen authored
      Apply this handy patch and boot with numa=fake=4 (or how many nodes you
      want, 8 max right now).
      
      There is a minor issue with the hash function, which can make the last node
      be bigger than the others.  Is probably fixable if it should be a problem.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      60b292ca
    • Matt Mackall's avatar
      [PATCH] tiny shmem/tmpfs replacement · 14ef4d0a
      Matt Mackall authored
      A patch to replace tmpfs/shmem with ramfs for systems without swap,
      incorporating the suggestions from Andi and Hugh.  It uses ramfs instead.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      14ef4d0a
    • Alan Cox's avatar
      [PATCH] VLAN support for 3c59x/3c90x · 59835997
      Alan Cox authored
      This adds VLAN support to the 3c59x/90x series hardware.
      
      Stefan de Konink ported this code from the 2.4 VLAN patches and tested it
      extensively. I cleaned up the ifdefs and fixed a problem with bracketing
      that made older cards fail.
      
      --
      
      Developer's Certificate of Origin 1.0
      
      By making a contribution to this project, I certify that:
      
      (a) The contribution was created in whole or in part by me and I have the
      right to submit it under the open source license indicated in the file; or
      
      (b) The contribution is based upon previous work that, to the best of my
      knowledge, is covered under an appropriate open source license and I have
      the right under that license to submit that work with modifications,
      whether created in whole or in part by me, under the same open source
      license (unless I am permitted to submit under a different license), as
      indicated in the file; or
      
      (c) The contribution was provided directly to me by some other person who
      certified (a), (b) or (c) and I have not modified it.
      
      I, Stefan de Konink, certify that:
       The contribution is based upon previous work that, again is based on GPL
      code and I have the right under that license to submit that work with
      modifications, whether created in whole or in part by me, under the same
      open source license.
      
      I, Alan Cox, certify likewise.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      59835997
    • Andrew Morton's avatar
      [PATCH] truncate_inode_pages latency fix · 65fe40ed
      Andrew Morton authored
      Fix scheduling latency issues with large truncates.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      65fe40ed
    • Oleg Nesterov's avatar
      [PATCH] hugetlbfs private mappings · c3dfa712
      Oleg Nesterov authored
      Hugetlbfs silently coerce private mappings of hugetlb files into shared
      ones.  So private writable mapping has MAP_SHARED semantics.  I think, such
      mappings should be disallowed.
      
      First, such behavior allows open hugetlbfs file O_RDONLY, and overwrite it
      via mmap(PROT_READ|PROT_WRITE, MAP_PRIVATE), so it is security bug.
      
      Second, private writable mmap() should fail just because kernel does not
      support this.
      
      I belisve, it is ok to allow private readonly hugetlb mappings,
      sys_mprotect() does not work with hugetlb vmas.
      
      There is another problem.  Hugetlb mapping is always prefaulted, pages
      allocated at mmap() time.  So even readonly mapping allows to enlarge the
      size of the hugetlbfs file, and steal huge pages without appropriative
      permissions.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c3dfa712
    • Oleg Nesterov's avatar
      [PATCH] /dev/zero vs hugetlb mappings. · ec081b11
      Oleg Nesterov authored
      Hugetlbfs mmap with MAP_PRIVATE becomes MAP_SHARED silently, but
      vma->vm_flags have no VM_SHARED bit.  Reading from /dev/zero into hugetlb
      area will do:
      
      read_zero()
          read_zero_pagealigned()
              if (vma->vm_flags & VM_SHARED)
                  break;                      // fallback to clear_user()
              zap_page_range();
              zeromap_page_range();
      
      It will hit BUG_ON() in unmap_hugepage_range() if region is not huge page
      aligned, or silently convert it into the private anonymous mapping.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ec081b11
    • Dave Hansen's avatar
      [PATCH] ppc64: add a pfn_to_kaddr() function · 82b11318
      Dave Hansen authored
      This is a helper function that a few architectures already have.  This just
      copies the i386 implementation to ppc64.
      Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      82b11318
    • Paul Mackerras's avatar
      [PATCH] ppc64: allocate irqstacks only for possible cpus · de2c2c9b
      Paul Mackerras authored
      With earlier setup of cpu_possible_map the number of irqstacks shrinks from
      NR_CPUS to the number of possible cpus.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      de2c2c9b
    • Paul Mackerras's avatar
      [PATCH] ppc64: set platform cpuids later in boot · 4fd4fa10
      Paul Mackerras authored
      Move the initialization of the per-cpu paca->hw_cpu_id out of the Open
      Firmware client boot code and into a common location which is executed
      later.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4fd4fa10
    • Paul Mackerras's avatar
      [PATCH] ppc64: rework PPC64 cpu map setup · 815e7a88
      Paul Mackerras authored
      Move all cpu map initializations to one place (except for the online map --
      cpus mark themselves online as they come up).  This sets up
      cpu_possible_map early enough that we can use num_possible_cpus for
      allocating irqstacks instead of NR_CPUS.  Hopefully this should also help
      set the stage for kexec.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      815e7a88
    • Paul Mackerras's avatar
      [PATCH] Update PPC MAINTAINERS & CREDITS · ce26f197
      Paul Mackerras authored
      David Engebretsen has moved on to other things and is no longer maintaining
      ppc64.  This patch adds an entry in CREDITS to note his contribution in
      leading the team that did the PPC64 port originally and updates various
      PPC-related MAINTAINERS entries.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ce26f197
    • Takashi Iwai's avatar
      [PATCH] Fix the unnecessary entropy call in the irq handler · 8cd809c5
      Takashi Iwai authored
      Currently add_interrupt_randomness() is called at each interrupt when one
      of the handlers has SA_SAMPLE_RANDOM flag, regardless whether the interrupt
      is processed by that handler or not.  This results in the higher latency
      and perfomance loss.
      
      The patch fixes this behavior to avoid the unnecessary call by checking the
      return value from each handler.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8cd809c5
    • Prasanna S. Panchamukhi's avatar
      [PATCH] Jumper Probes to provide function arguments · b08e7589
      Prasanna S. Panchamukhi authored
      A special kprobe type which can be placed on function entry points, and
      employs a simple mirroring principle to allow seamless access to the
      arguments of a function being probed.  The probe handler routine should
      have the same prototype as the function being probed.  Currently
      implemented for x86.
      
      The way it works is that when the probe is hit, the breakpoint handler
      simply irets to the probe handler's eip while retaining register and stack
      state corresponding to the function entry.  After it is done, the probe
      handler calls jprobe_return() which traps again to restore processor state
      and switch back to the probed function.  Linus noted correctly at KS that
      we need to be careful as gcc assumes that the callee owns arguments.  We
      save and restore enough stack bytes to cover argument space.
      
      Sample Usage:
      	static int jip_queue_xmit(struct sk_buff *skb, int ipfragok)
      	{
      		... whatever ...
      		jprobe_return();
      		return 0;
      	}
      
      	struct jprobe jp = {
      		{.addr = (kprobe_opcode_t *) ip_queue_xmit},
      		.entry = (kprobe_opcode_t *) jip_queue_xmit
      	};
      	register_jprobe(&jp);
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b08e7589
    • Prasanna S. Panchamukhi's avatar
      [PATCH] kprobes base patch · 4ece5899
      Prasanna S. Panchamukhi authored
      This patch helps developers to trap at almost any kernel code address,
      specifying a handler routine to be invoked when the breakpoint is hit.  
      
      Useful for analysing the Linux kernel by collecting debugging information
      non-disruptively.  Employs single-stepping out-of-line to avoid probe
      misses on SMP and may be especially useful in aiding debugging elusive
      races and problems on live systems.  More elaborate dynamic tracing tools
      such as DProbes can be built over the kprobes interface.
      
      
      Helps developers to trap at almost any kernel code address, specifying a
      handler routine to be invoked when the breakpoint is hit.  Useful for
      analysing the Linux kernel by collecting debugging information
      non-disruptively.  Employs single-stepping out-of-line to avoid probe
      misses on SMP and may be especially useful in aiding debugging elusive
      races and problems on live systems.  More elaborate dynamic tracing tools
      such as DProbes can be built over the kprobes interface.
      
      Sample usage:
      	To place a probe on __blockdev_direct_IO:
      	static int probe_handler(struct kprobe *p, struct pt_regs *)
      	{
      		... whatever ...
      	}
      	struct kprobe kp = {
      		.addr = __blockdev_direct_IO,
      		.pre_handler = probe_handler
      	};
      	register_kprobe(&kp);
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4ece5899
    • Prasanna S. Panchamukhi's avatar
      [PATCH] i386 exceptions notifier for kprobes · f63b75f9
      Prasanna S. Panchamukhi authored
      This patch provides notifiers for i386 architecture exceptions.  This patch
      has been ported from x86_64 architecture as suggested by Andi Kleen.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f63b75f9
    • Philippe Elie's avatar
      [PATCH] Fix oops with nmi-watchdog=2 · 5ae3fd75
      Philippe Elie authored
      Contributions from  Zarakin <zarakin@hotpop.com>
      
      Intel removed two msrs: MSR_P4_IQ_ESCR_0|1 (0x3ba/0x3bb), P4 model >= 3.  See
      Intel documentation Vol.  3 System Programming Guide Appendix B.
      
      nmi_watchdog=2 oopsed at boot time and oprofile at driver load.
      
      Avoid touching them when model >= 3.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5ae3fd75
    • Jason Davis's avatar
      [PATCH] platform update for ES7000 · 75b02c33
      Jason Davis authored
      This update only applies to Unisys' ES7000 server machines.  The patch adds
      a OEM id check to verify the current machine running is actually a Unisys
      type box before executing the Unisys OEM parser routine.  It also increases
      the MAX_MP_BUSSES definition from 32 to 256.  On the ES7000s, bus ID
      numbering can range from 0 to 255.  Without the patch, the system panics if
      booted with acpi=off.
      
      This patch has been tested and verified on an authentic ES7000 machine.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      75b02c33
    • Bjorn Helgaas's avatar
      [PATCH] Make assign_irq_vector() non-__init · 92d1cc78
      Bjorn Helgaas authored
      Make assign_irq_vector() non-__init always (it's called from
      io_apic_set_pci_routing(), which is used in the pci_enable_device() path).
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      92d1cc78
    • Anton Blanchard's avatar
      [PATCH] prio-tree: remove function prototype inside function · f4939b75
      Anton Blanchard authored
      I had a problem when compiling a 2.6 kernel with gcc 3.5 CVS.  The
      prototype for prio_tree_remove in mm/prio_tree.c is inside another
      function.  gcc 3.5 gets upset and removes the function completely.
      Apparently this isnt valid C, so lets fix it up.
      
      Details can be found here:
      
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17205Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f4939b75
  2. 30 Aug, 2004 3 commits
  3. 31 Aug, 2004 3 commits
  4. 30 Aug, 2004 9 commits