Import 1.3.90

942f4576 · Linus Torvalds · b4dfb143 · 942f4576 · 942f4576 · 942f4576
Commit 942f4576 authored Nov 23, 2007 by Linus Torvalds
34 changed files
--- a/CREDITS
+++ b/CREDITS
@@ -202,6 +202,14 @@ S: 14509 NE 39th Street #1096
 S: Bellevue, Washington 98007
 S: USA

+N: Stuart Cheshire
+E: cheshire@cs.stanford.edu
+D: Author of Starmode Radio IP (STRIP) driver
+D: Originator of design for new combined interrupt handlers
+S: William Gates Department
+S: Stanford University
+S: Stanford, California 94305, USA
+
 N: Juan Jose Ciarlante
 E: jjciarla@raiz.uncu.edu.ar
 D: Network driver alias support
@@ -264,10 +272,9 @@ E: davison@borland.com
 D: Second extended file system co-designer

 N: Terry Dawson
-E: terryd@extro.ucc.su.oz.au
-E: vk2ktj@gw.vk2ktj.ampr.org (Amateur Radio use only)
-D: NET-2-HOWTO author.
-D: RADIOLINUX Amateur Radio software for Linux list collator.
+E: terry@perf.no.itg.telecom.com.au
+E: terry@albert.vk2ktj.ampr.org (Amateur Radio use only)
+D: AX25-HOWTO, HAM-HOWTO, IPX-HOWTO, NET-2-HOWTO

 N: Todd J. Derr
 E: tjd@fore.com

--- a/Documentation/Changes
+++ b/Documentation/Changes
--- a/Documentation/Configure.help
+++ b/Documentation/Configure.help
@@ -1241,7 +1241,7 @@ EATA-PIO (old DPT PM2001, PM2012A) support
 CONFIG_SCSI_EATA_PIO
  This driver supports all EATA-PIO protocol compliant SCSI Host Adaptors
  like the DPT PM2001 and the PM2012A. EATA-DMA compliant HBAs can also use 
-  this driver but are discuraged from doing so, since this driver only 
+  this driver but are discouraged from doing so, since this driver only 
  supports harddisks and lacks numerous features.
  You might want to have a look at the SCSI-HOWTO, available via ftp 
  (user: anonymous) at sunsite.unc.edu:/pub/Linux/docs/HOWTO.  
@@ -1334,9 +1334,12 @@ CONFIG_SCSI_IN2000
  explained in section 3.6 of the SCSI-HOWTO, available via ftp (user:
  anonymous) at sunsite.unc.edu:/pub/Linux/docs/HOWTO. If it doesn't
  work out of the box, you may have to change some settings in
-  drivers/scsi/inn2000.h.  If you want to compile this as a module ( =
-  code which can be inserted in and removed from the running kernel
-  whenever you want), say M here and read Documentation/modules.txt.
+  drivers/scsi/in2000.h.  You may also want to drop in a rewritten,
+  and probably more reliable, driver from John Shifflett, which you
+  can get from ftp://ftp.netcom.com/pub/js/jshiffle/in2000/ .  If you
+  want to compile this as a module ( = code which can be inserted in
+  and removed from the running kernel whenever you want), say M here
+  and read Documentation/modules.txt.

 PAS16 SCSI support
 CONFIG_SCSI_PAS16
@@ -1357,7 +1360,7 @@ CONFIG_SCSI_QLOGIC

 Seagate ST-02 and Future Domain TMC-8xx SCSI support
 CONFIG_SCSI_SEAGATE
-  These are 8-bit SCSI controller; the ST-01 is also supported by this
+  These are 8-bit SCSI controllers; the ST-01 is also supported by this
  driver.  It is explained in section 3.9 of the SCSI-HOWTO, available
  via ftp (user: anonymous) at
  sunsite.unc.edu:/pub/Linux/docs/HOWTO. If it doesn't work out of the

--- a/Documentation/mandatory.txt
+++ b/Documentation/mandatory.txt
@@ -2,7 +2,7 @@

 		Andy Walker <andy@lysaker.kvaerner.no>

-			   06 April 1996
+			   15 April 1996


 What is  mandatory locking?
@@ -41,7 +41,8 @@ to entire files, so the mandatory locking rules also have byte level
 granularity.

 Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
-borrowing the fcntl() locking scheme from System V.
+borrowing the fcntl() locking scheme from System V. The mandatory locking
+scheme is defined by the System V Interface Definition (SVID) Version 3.

 Marking a file for mandatory locking
 ------------------------------------
@@ -66,25 +67,23 @@ SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
 Generally I have tried to make the most sense out of the behaviour exhibited
 by these three reference systems. There are many anomalies.

-Originally I wrote (about SunOS):
- "For one thing, calls to open() for a file fail with EAGAIN if another
-  process holds a mandatory lock on the file. However, processes already
-  holding open file descriptors can carry on using them. Weird!"
-
-Well, all my reference systems do it, so I decided to go with the flow.
-My gut feeling is that only calls to open() and creat() with O_TRUNC should be
-rejected, as these are the only ones that try to modify the file contents as
-part of the open() call.
+All the reference systems reject all calls to open() for a file on which
+another process has outstanding mandatory locks. This is in direct
+contravention of SVID 3, which states that only calls to open() with the
+O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
+definition, which is the "Right Thing", since only calls with O_TRUNC can
+modify the contents of the file.

 HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
-just mandatory locks. That to me contravenes POSIX.1.
+just mandatory locks. That would appear to contravene POSIX.1.

 mmap() is another interesting case. All the operating systems mentioned
 prevent mandatory locks from being applied to an mmap()'ed file, but  HP-UX
-also disallows advisory locks for such a file.
+also disallows advisory locks for such a file. SVID actually specifies the
+paranoid HP-UX behaviour.

-My opinion is that only MAP_SHARED mappings should be immune from locking, and
-then only from mandatory locks - that is what is currently implemented.
+In my opinion only MAP_SHARED mappings should be immune from locking, and then
+only from mandatory locks - that is what is currently implemented.

 SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
 mandatory locks, so reads and writes to locked files always block when they
@@ -113,8 +112,9 @@ Semantics
   unless a process has opened the file with the O_NONBLOCK flag in which case
   the system call will return immediately with the error status EAGAIN.

-4. Calls to open() or creat() on a file that has any mandatory locks owned
-   by other processes will be rejected with the error status EAGAIN.
+4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
+   any mandatory locks owned by other processes will be rejected with the
+   error status EAGAIN.

 5. Attempts to apply a mandatory lock to a file that is memory mapped and
   shared (via mmap() with MAP_SHARED) will be rejected with the error status

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -193,6 +193,12 @@ M:	longyear@netcom.com, Cc: longyear@sii.com
 L:	linux-ppp@vger.rutgers.edu
 S:	Maintained

+STARMODE RADIO IP (STRIP) PROTOCOL DRIVER
+P:     Stuart Cheshire
+M:     cheshire@cs.stanford.edu
+W:     http://mosquitonet.Stanford.EDU/strip.html
+S:     Maintained
+
 SMB FILESYSTEM:
 P:	Volker Lendecke
 M:	lendecke@namu01.gwdg.de

--- a/arch/i386/kernel/signal.c
+++ b/arch/i386/kernel/signal.c
@@ -82,55 +82,98 @@ if (!(context.x & 0xfffc) || (context.x & 3) != 3) goto badframe; COPY(x);
 * Set up a signal frame... Make the stack look the way iBCS2 expects
 * it to look.
 */
-void setup_frame(struct sigaction * sa, unsigned long ** fp, unsigned long eip,
-	struct pt_regs * regs, int signr, unsigned long oldmask)
+static void setup_frame(struct sigaction * sa,
+	struct pt_regs * regs, int signr,
+	unsigned long oldmask)
 {
 	unsigned long * frame;

-#define __CODE ((unsigned long)(frame+24))
-#define CODE(x) ((unsigned long *) ((x)+__CODE))
-	frame = *fp;
+	frame = (unsigned long *) regs->esp;
 	if (regs->ss != USER_DS && sa->sa_restorer)
 		frame = (unsigned long *) sa->sa_restorer;
 	frame -= 32;
 	if (verify_area(VERIFY_WRITE,frame,32*4))
 		do_exit(SIGSEGV);
+
 /* set up the "normal" stack seen by the signal handler (iBCS2) */
-	put_fs_long(__CODE,frame);
+#define __CODE ((unsigned long)(frame+24))
+#define CODE(x) ((unsigned long *) ((x)+__CODE))
+	put_user(__CODE,frame);
 	if (current->exec_domain && current->exec_domain->signal_invmap)
-		put_fs_long(current->exec_domain->signal_invmap[signr], frame+1);
+		put_user(current->exec_domain->signal_invmap[signr], frame+1);
 	else
-		put_fs_long(signr, frame+1);
-	put_fs_long(regs->gs, frame+2);
-	put_fs_long(regs->fs, frame+3);
-	put_fs_long(regs->es, frame+4);
-	put_fs_long(regs->ds, frame+5);
-	put_fs_long(regs->edi, frame+6);
-	put_fs_long(regs->esi, frame+7);
-	put_fs_long(regs->ebp, frame+8);
-	put_fs_long((long)*fp, frame+9);
-	put_fs_long(regs->ebx, frame+10);
-	put_fs_long(regs->edx, frame+11);
-	put_fs_long(regs->ecx, frame+12);
-	put_fs_long(regs->eax, frame+13);
-	put_fs_long(current->tss.trap_no, frame+14);
-	put_fs_long(current->tss.error_code, frame+15);
-	put_fs_long(eip, frame+16);
-	put_fs_long(regs->cs, frame+17);
-	put_fs_long(regs->eflags, frame+18);
-	put_fs_long(regs->esp, frame+19);
-	put_fs_long(regs->ss, frame+20);
-	put_fs_long(0,frame+21);		/* 387 state pointer - not implemented*/
+		put_user(signr, frame+1);
+	put_user(regs->gs, frame+2);
+	put_user(regs->fs, frame+3);
+	put_user(regs->es, frame+4);
+	put_user(regs->ds, frame+5);
+	put_user(regs->edi, frame+6);
+	put_user(regs->esi, frame+7);
+	put_user(regs->ebp, frame+8);
+	put_user(regs->esp, frame+9);
+	put_user(regs->ebx, frame+10);
+	put_user(regs->edx, frame+11);
+	put_user(regs->ecx, frame+12);
+	put_user(regs->eax, frame+13);
+	put_user(current->tss.trap_no, frame+14);
+	put_user(current->tss.error_code, frame+15);
+	put_user(regs->eip, frame+16);
+	put_user(regs->cs, frame+17);
+	put_user(regs->eflags, frame+18);
+	put_user(regs->esp, frame+19);
+	put_user(regs->ss, frame+20);
+	put_user(NULL,frame+21);
 /* non-iBCS2 extensions.. */
-	put_fs_long(oldmask, frame+22);
-	put_fs_long(current->tss.cr2, frame+23);
+	put_user(oldmask, frame+22);
+	put_user(current->tss.cr2, frame+23);
 /* set up the return code... */
-	put_fs_long(0x0000b858, CODE(0));	/* popl %eax ; movl $,%eax */
-	put_fs_long(0x80cd0000, CODE(4));	/* int $0x80 */
-	put_fs_long(__NR_sigreturn, CODE(2));
-	*fp = frame;
+	put_user(0x0000b858, CODE(0));	/* popl %eax ; movl $,%eax */
+	put_user(0x80cd0000, CODE(4));	/* int $0x80 */
+	put_user(__NR_sigreturn, CODE(2));
 #undef __CODE
 #undef CODE
+
+	/* Set up registers for signal handler */
+	regs->esp = (unsigned long) frame;
+	regs->eip = (unsigned long) sa->sa_handler;
+	regs->cs = USER_CS; regs->ss = USER_DS;
+	regs->ds = USER_DS; regs->es = USER_DS;
+	regs->gs = USER_DS; regs->fs = USER_DS;
+	regs->eflags &= ~TF_MASK;
+}
+
+/*
+ * OK, we're invoking a handler
+ */	
+static void handle_signal(unsigned long signr, struct sigaction *sa,
+	unsigned long oldmask, struct pt_regs * regs)
+{
+	/* are we from a system call? */
+	if (regs->orig_eax >= 0) {
+		/* If so, check system call restarting.. */
+		switch (regs->eax) {
+			case -ERESTARTNOHAND:
+				regs->eax = -EINTR;
+				break;
+
+			case -ERESTARTSYS:
+				if (!(sa->sa_flags & SA_RESTART)) {
+					regs->eax = -EINTR;
+					break;
+				}
+			/* fallthrough */
+			case -ERESTARTNOINTR:
+				regs->eax = regs->orig_eax;
+				regs->eip -= 2;
+		}
+	}
+
+	/* set up the stack frame */
+	setup_frame(sa, regs, signr, oldmask);
+
+	if (sa->sa_flags & SA_ONESHOT)
+		sa->sa_handler = NULL;
+	current->blocked |= sa->sa_mask;
 }

 /*
@@ -145,9 +188,6 @@ void setup_frame(struct sigaction * sa, unsigned long ** fp, unsigned long eip,
 asmlinkage int do_signal(unsigned long oldmask, struct pt_regs * regs)
 {
 	unsigned long mask = ~current->blocked;
-	unsigned long handler_signal = 0;
-	unsigned long *frame = NULL;
-	unsigned long eip = 0;
 	unsigned long signr;
 	struct sigaction * sa;

@@ -219,48 +259,19 @@ asmlinkage int do_signal(unsigned long oldmask, struct pt_regs * regs)
 				do_exit(signr);
 			}
 		}
-		/*
-		 * OK, we're invoking a handler
-		 */
-		if (regs->orig_eax >= 0) {
-			if (regs->eax == -ERESTARTNOHAND ||
-			   (regs->eax == -ERESTARTSYS && !(sa->sa_flags & SA_RESTART)))
-				regs->eax = -EINTR;
-		}
-		handler_signal |= 1 << (signr-1);
-		mask &= ~sa->sa_mask;
+		handle_signal(signr, sa, oldmask, regs);
+		return 1;
 	}
-	if (regs->orig_eax >= 0 &&
-	    (regs->eax == -ERESTARTNOHAND ||
-	     regs->eax == -ERESTARTSYS ||
-	     regs->eax == -ERESTARTNOINTR)) {
-		regs->eax = regs->orig_eax;
-		regs->eip -= 2;
-	}
-	if (!handler_signal)		/* no handler will be called - return 0 */
-		return 0;
-	eip = regs->eip;
-	frame = (unsigned long *) regs->esp;
-	signr = 1;
-	sa = current->sig->action;
-	for (mask = 1 ; mask ; sa++,signr++,mask += mask) {
-		if (mask > handler_signal)
-			break;
-		if (!(mask & handler_signal))
-			continue;
-		setup_frame(sa,&frame,eip,regs,signr,oldmask);
-		eip = (unsigned long) sa->sa_handler;
-		if (sa->sa_flags & SA_ONESHOT)
-			sa->sa_handler = NULL;
-		regs->cs = USER_CS; regs->ss = USER_DS;
-		regs->ds = USER_DS; regs->es = USER_DS;
-		regs->gs = USER_DS; regs->fs = USER_DS;
-		current->blocked |= sa->sa_mask;
-		oldmask |= sa->sa_mask;
+
+	/* Did we come from a system call? */
+	if (regs->orig_eax >= 0) {
+		/* Restart the system call - no handlers present */
+		if (regs->eax == -ERESTARTNOHAND ||
+		    regs->eax == -ERESTARTSYS ||
+		    regs->eax == -ERESTARTNOINTR) {
+			regs->eax = regs->orig_eax;
+			regs->eip -= 2;
+		}
 	}
-	regs->esp = (unsigned long) frame;
-	regs->eip = eip;		/* "return" to the first handler */
-	regs->eflags &= ~TF_MASK;
-	current->tss.trap_no = current->tss.error_code = 0;
-	return 1;
+	return 0;
 }
--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -442,39 +442,6 @@ struct request *get_md_request (int max_req, kdev_t dev)

 #endif

-/*
- * Swap partitions are now read via brw_page.  ll_rw_page is an
- * asynchronous function now --- we must call wait_on_page afterwards
- * if synchronous IO is required.  
- */
-void ll_rw_page(int rw, kdev_t dev, unsigned long page, char * buffer)
-{
-	unsigned int major = MAJOR(dev);
-	int block = page;
-	
-	if (major >= MAX_BLKDEV || !(blk_dev[major].request_fn)) {
-		printk("Trying to read nonexistent block-device %s (%ld)\n",
-		       kdevname(dev), page);
-		return;
-	}
-	switch (rw) {
-		case READ:
-			break;
-		case WRITE:
-			if (is_read_only(dev)) {
-				printk("Can't page to read-only device %s\n",
-					kdevname(dev));
-				return;
-			}
-			break;
-		default:
-			panic("ll_rw_page: bad block dev cmd, must be R/W");
-	}
-	if (set_bit(PG_locked, &mem_map[MAP_NR(buffer)].flags))
-		panic ("ll_rw_page: page already locked");
-	brw_page(rw, (unsigned long) buffer, dev, &block, PAGE_SIZE, 0);
-}
-
 /* This function can be used to request a number of buffers from a block
   device. Currently the only restriction is that all buffers must belong to
   the same device */

--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -692,7 +692,7 @@ vortex_start_xmit(struct sk_buff *skb, struct device *dev)
 			dev->tbusy = 0;
 		} else
 			/* Interrupt us when the FIFO has room for max-sized packet. */
-			outw(SetTxThreshold + 1536, ioaddr + EL3_CMD);
+			outw(SetTxThreshold + (1536>>2), ioaddr + EL3_CMD);
 	}
 #else
 	/* ... and the packet rounded to a doubleword. */
@@ -702,7 +702,7 @@ vortex_start_xmit(struct sk_buff *skb, struct device *dev)
 		dev->tbusy = 0;
 	} else
 		/* Interrupt us when the FIFO has room for max-sized packet. */
-		outw(SetTxThreshold + 1536, ioaddr + EL3_CMD);
+		outw(SetTxThreshold + (1536>>2), ioaddr + EL3_CMD);
 #endif  /* bus master */

 	dev->trans_start = jiffies;

--- a/drivers/net/dlci.c
+++ b/drivers/net/dlci.c
@@ -626,7 +626,7 @@ int dlci_setup(void)
 }

 #ifdef MODULE
-static struct device dlci = {devname, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, dlci_init, };
+static struct device dlci = {"dlci", 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, dlci_init, };

 int init_module(void)
 {

--- a/drivers/net/eexpress.c
+++ b/drivers/net/eexpress.c
--- a/drivers/net/sdla.c
+++ b/drivers/net/sdla.c
@@ -797,7 +797,7 @@ static void sdla_receive(struct device *dev)

      if (i == CONFIG_DLCI_MAX)
      {
-         printk(KERN_NOTICE "%s: Recieved packet from invalid DLCI %i, ignoring.", dev->name, dlci);
+         printk(KERN_NOTICE "%s: Received packet from invalid DLCI %i, ignoring.", dev->name, dlci);
         flp->stats.rx_errors++;
         success = 0;
      }
@@ -876,7 +876,7 @@ static void sdla_isr(int irq, void *dev_id, struct pt_regs * regs)

   if (!flp->initialized)
   {
-      printk(KERN_WARNING "%s: irq %d for unintialiazed device.\n", dev->name, irq);
+      printk(KERN_WARNING "%s: irq %d for uninitialized device.\n", dev->name, irq);
      return;
   }

@@ -1176,7 +1176,7 @@ static int sdla_config(struct device *dev, struct frad_conf *conf, int get)
      if (err)
         return(err);

-      /* no sense reading if the CPU isnt' started */
+      /* no sense reading if the CPU isn't started */
      if (dev->start)
      {
         size = sizeof(data);

--- a/drivers/net/sk_g16.c
+++ b/drivers/net/sk_g16.c
@@ -1534,7 +1534,7 @@ static void SK_rxintr(struct device *dev)
 	    if (rmdstat & RX_STP) 
 	    {
 		p->stats.rx_errors++;        /* bad packet received */
-		p->stats.rx_length_errors++; /* packet to long */
+		p->stats.rx_length_errors++; /* packet too long */

 		printk("%s: packet too long\n", dev->name);
 	    }

--- a/drivers/scsi/53c7,8xx.c
+++ b/drivers/scsi/53c7,8xx.c
@@ -24,7 +24,7 @@
 * in the text.
 *
 *
- *   OPTION_NOASYNC
+ *   OPTION_NO_ASYNC
 *	Don't negotiate for asynchronous transfers on the first command 
 *	when OPTION_ALWAYS_SYNCHRONOUS is set.  Useful for dain bramaged
 *	devices which do something bad rather than sending a MESSAGE 

--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -639,7 +639,7 @@ static void BusLogic_InitializeAddressProbeList(void)
 	  if (AutoSCSIByte45.ForceBusDeviceScanningOrder)
 	    {
 	      /*
-		Sort the I/O Addresses such that the corrseponding PCI devices
+		Sort the I/O Addresses such that the corresponding PCI devices
 		are in ascending order by Bus Number and Device Number.
 	      */
 	      int LastInterchange = DestinationIndex-1, Bound, j;

--- a/drivers/scsi/ppa.c
+++ b/drivers/scsi/ppa.c
@@ -6,8 +6,6 @@
        (c) 1995,1996 Grant R. Guenther, grant@torque.net,
                      under the terms of the GNU Public License.

-        THIS IS BETA SOFTWARE - PLEASE TAKE ALL NECESSARY PRECAUTIONS
-
 */

 /*      This driver was developed without the benefit of any technical

--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -2207,7 +2207,7 @@ static int update_timeout(Scsi_Cmnd * SCset, int timeout)
     * called, and again when scsi_done completes the command.  To limit
     * the load this routine can cause, we shortcut processing if no clock
     * ticks have occurred since the last time it was called.  This may
-     * cause the computation of least below to be inaccurrate, but it will
+     * cause the computation of least below to be inaccurate, but it will
     * be corrected after the next clock tick.
     */


--- a/drivers/scsi/scsi.h
+++ b/drivers/scsi/scsi.h
@@ -394,7 +394,7 @@ typedef struct scsi_cmnd {
      passes it to the driver's queue command function.  The serial_number
      is cleared when scsi_done is entered indicating that the command has
      been completed.  If a timeout occurs, the serial number at the moment
-      of timeout is copied into serial_number_at_timeout.  By subseuqently
+      of timeout is copied into serial_number_at_timeout.  By subsequently
      comparing the serial_number and serial_number_at_timeout fields
      during abort or reset processing, we can detect whether the command
      has already completed.  This also detects cases where the command has

--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -147,7 +147,7 @@ static int sd_open(struct inode * inode, struct file * filp)
 static void sd_release(struct inode * inode, struct file * file)
 {
    int target;
-    sync_dev(inode->i_rdev);
+    fsync_dev(inode->i_rdev);
    
    target =  DEVICE_NR(inode->i_rdev);
    

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -803,6 +803,10 @@ void set_writetime(struct buffer_head * buf, int flag)
 }


+/*
+ * A buffer may need to be moved from one buffer list to another
+ * (e.g. in case it is not shared any more). Handle this.
+ */
 void refile_buffer(struct buffer_head * buf)
 {
 	int dispose;
@@ -1088,6 +1092,46 @@ static struct buffer_head * create_buffers(unsigned long page, unsigned long siz
 	return NULL;
 }

+/* Run the hooks that have to be done when a page I/O has completed. */
+static inline void after_unlock_page (struct page * page)
+{
+	if (clear_bit(PG_decr_after, &page->flags))
+		nr_async_pages--;
+	if (clear_bit(PG_free_after, &page->flags))
+		free_page(page_address(page));
+	if (clear_bit(PG_swap_unlock_after, &page->flags))
+		swap_after_unlock_page(page->swap_unlock_entry);
+}
+
+/* Free all temporary buffers belonging to a page. */
+static inline void free_async_buffers (struct buffer_head * bh)
+{
+	struct buffer_head * tmp;
+	unsigned long flags;
+
+	tmp = bh;
+	save_flags(flags);
+	cli();
+	do {
+		if (!test_bit(BH_FreeOnIO, &tmp->b_state)) {
+			printk ("Whoops: unlock_buffer: "
+				"async IO mismatch on page.\n");
+			restore_flags(flags);
+			return;
+		}
+		tmp->b_next_free = reuse_list;
+		reuse_list = tmp;
+		clear_bit(BH_FreeOnIO, &tmp->b_state);
+		tmp = tmp->b_this_page;
+	} while (tmp != bh);
+	restore_flags(flags);
+}
+
+/*
+ * Start I/O on a page.
+ * This function expects the page to be locked and may return before I/O is complete.
+ * You then have to check page->locked, page->uptodate, and maybe wait on page->wait.
+ */
 int brw_page(int rw, unsigned long address, kdev_t dev, int b[], int size, int bmap)
 {
 	struct buffer_head *bh, *prev, *next, *arr[MAX_BUF_PER_PAGE];
@@ -1095,10 +1139,20 @@ int brw_page(int rw, unsigned long address, kdev_t dev, int b[], int size, int b
 	struct page *page;

 	page = mem_map + MAP_NR(address);
+	if (!PageLocked(page))
+		panic("brw_page: page not locked for I/O");
 	clear_bit(PG_uptodate, &page->flags);
+	/*
+	 * Allocate buffer heads pointing to this page, just for I/O.
+	 * They do _not_ show up in the buffer hash table!
+	 * They are _not_ registered in page->buffers either!
+	 */
 	bh = create_buffers(address, size);
-	if (!bh)
+	if (!bh) {
+		clear_bit(PG_locked, &page->flags);
+		wake_up(&page->wait);
 		return -ENOMEM;
+	}
 	nr = 0;
 	next = bh;
 	do {
@@ -1148,33 +1202,32 @@ int brw_page(int rw, unsigned long address, kdev_t dev, int b[], int size, int b
 	} while (prev = next, (next = next->b_this_page) != NULL);
 	prev->b_this_page = bh;
 	
-	if (nr)
+	if (nr) {
 		ll_rw_block(rw, nr, arr);
-	else {
-		unsigned long flags;
+		/* The rest of the work is done in mark_buffer_uptodate()
+		 * and unlock_buffer(). */
+	} else {
 		clear_bit(PG_locked, &page->flags);
 		set_bit(PG_uptodate, &page->flags);
 		wake_up(&page->wait);
-		next = bh;
-		save_flags(flags);
-		cli();
-		do {
-			next->b_next_free = reuse_list;
-			reuse_list = next;
-			next = next->b_this_page;
-		} while (next != bh);
-		restore_flags(flags);
+		free_async_buffers(bh);
+		after_unlock_page(page);
 	}
 	++current->maj_flt;
 	return 0;
 }

+/*
+ * This is called by end_request() when I/O has completed.
+ */
 void mark_buffer_uptodate(struct buffer_head * bh, int on)
 {
 	if (on) {
 		struct buffer_head *tmp = bh;
 		int page_uptodate = 1;
 		set_bit(BH_Uptodate, &bh->b_state);
+		/* If a page has buffers and all these buffers are uptodate,
+		 * then the page is uptodate. */
 		do {
 			if (!test_bit(BH_Uptodate, &tmp->b_state)) {
 				page_uptodate = 0;
@@ -1188,10 +1241,12 @@ void mark_buffer_uptodate(struct buffer_head * bh, int on)
 		clear_bit(BH_Uptodate, &bh->b_state);
 }

+/*
+ * This is called by end_request() when I/O has completed.
+ */
 void unlock_buffer(struct buffer_head * bh)
 {
 	struct buffer_head *tmp;
-	unsigned long flags;
 	struct page *page;

 	clear_bit(BH_Lock, &bh->b_state);
@@ -1199,6 +1254,7 @@ void unlock_buffer(struct buffer_head * bh)

 	if (!test_bit(BH_FreeOnIO, &bh->b_state))
 		return;
+	/* This is a temporary buffer used for page I/O. */
 	page = mem_map + MAP_NR(bh->b_data);
 	if (!PageLocked(page)) {
 		printk ("Whoops: unlock_buffer: "
@@ -1218,31 +1274,11 @@ void unlock_buffer(struct buffer_head * bh)
 		if (test_bit(BH_Lock, &tmp->b_state) || tmp->b_count)
 			return;
 	}
-
-	/* OK, go ahead and complete the async IO on this page. */
-	save_flags(flags);
+	/* OK, the async IO on this page is complete. */
 	clear_bit(PG_locked, &page->flags);
 	wake_up(&page->wait);
-	cli();
-	tmp = bh;
-	do {
-		if (!test_bit(BH_FreeOnIO, &tmp->b_state)) {
-			printk ("Whoops: unlock_buffer: "
-				"async IO mismatch on page.\n");
-			restore_flags(flags);
-			return;
-		}
-		tmp->b_next_free = reuse_list;
-		reuse_list = tmp;
-		clear_bit(BH_FreeOnIO, &tmp->b_state);
-		tmp = tmp->b_this_page;
-	} while (tmp != bh);
-	restore_flags(flags);
-	if (clear_bit(PG_freeafter, &page->flags)) {
-		extern int nr_async_pages;
-		nr_async_pages--;
-		free_page(page_address(page));
-	}
+	free_async_buffers(bh);
+	after_unlock_page(page);
 	wake_up(&buffer_wait);
 }

@@ -1262,6 +1298,7 @@ int generic_readpage(struct inode * inode, struct page * page)
 	address = page_address(page);
 	page->count++;
 	set_bit(PG_locked, &page->flags);
+	set_bit(PG_free_after, &page->flags);
 	
 	i = PAGE_SIZE >> inode->i_sb->s_blocksize_bits;
 	block = page->offset >> inode->i_sb->s_blocksize_bits;
@@ -1275,7 +1312,6 @@ int generic_readpage(struct inode * inode, struct page * page)

 	/* IO start */
 	brw_page(READ, address, inode->i_dev, nr, inode->i_sb->s_blocksize, 1);
-	free_page(address);
 	return 0;
 }


--- a/fs/dquot.c
+++ b/fs/dquot.c
@@ -390,7 +390,7 @@ static int check_idq(struct dquot *dquot, short type, u_long short inodes)
 	   (dquot->dq_curinodes + inodes) > dquot->dq_isoftlimit &&
 	    dquot->dq_itime && CURRENT_TIME >= dquot->dq_itime && !fsuser()) {
                if (need_print_warning(type, dquot)) {
-			sprintf(quotamessage, "%s: warning, %s file quota exceeded to long.\r\n",
+			sprintf(quotamessage, "%s: warning, %s file quota exceeded too long.\r\n",
 		        	dquot->dq_mnt->mnt_dirname, quotatypes[type]);
 			tty_write_message(current->tty, quotamessage);
 		}
@@ -428,7 +428,7 @@ static int check_bdq(struct dquot *dquot, short type, u_long blocks)
 	   (dquot->dq_curblocks + blocks) > dquot->dq_bsoftlimit &&
 	    dquot->dq_btime && CURRENT_TIME >= dquot->dq_btime && !fsuser()) {
                if (need_print_warning(type, dquot)) {
-			sprintf(quotamessage, "%s: write failed, %s disk quota exceeded to long.\r\n",
+			sprintf(quotamessage, "%s: write failed, %s disk quota exceeded too long.\r\n",
 		        	dquot->dq_mnt->mnt_dirname, quotatypes[type]);
 			tty_write_message(current->tty, quotamessage);
 		}

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -385,14 +385,6 @@ int open_namei(const char * pathname, int flag, int mode,
 		iput(dir);
 		return error;
 	}
-	/* SunOS, Solaris 2.x and HPUX all deny open() on
-	 * an existing file with mandatory locks.
-	 */
-	error = locks_verify_locked(inode);
-	if (error) {
-		iput(inode);
-		return error;
-	}
 	error = follow_link(dir,inode,flag,mode,&inode);
 	if (error)
 		return error;
@@ -438,18 +430,14 @@ int open_namei(const char * pathname, int flag, int mode,
 			iput(inode);
 			return error;
 		}
-#if 0
 		/*
-		 * In my opinion the mandatory lock check should really be
-		 * here. Only O_TRUNC calls can modify the file contents -
-		 * but none of the commercial OS'es seem to do it this way.
+		 * Refuse to truncate files with mandatory locks held on them
 		 */
 		error = locks_verify_locked(inode);
 		if (error) {
 			iput(inode);
 			return error;
 		}
-#endif
 		if (inode->i_sb && inode->i_sb->dq_op)
 			inode->i_sb->dq_op->initialize(inode, -1);
 			

--- a/fs/nfs/nfsroot.c
+++ b/fs/nfs/nfsroot.c
@@ -47,6 +47,10 @@
 *				from being used (thanks to Leo Spiekman)
 *	Andy Walker	:	Allow to specify the NFS server in nfs_root
 *				without giving a path name
+ *	Swen Th=FCmmler	:	Allow to specify the NFS options in nfs_root
+ *				without giving a path name. Fix BOOTP request
+ *				for domainname (domainname is NIS domain, not
+ *				DNS domain!). Skip dummy devices for BOOTP.
 *
 */

@@ -168,6 +172,7 @@ static int root_dev_open(void)
 		if (dev->type < ARPHRD_SLIP &&
 		    dev->family == AF_INET &&
 		    !(dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) &&
+		    (0 != strncmp(dev->name, "dummy", 5)) &&
 		    (!user_dev_name[0] || !strcmp(dev->name, user_dev_name))) {
 			/* First up the interface */
 			old_flags = dev->flags;
@@ -622,7 +627,7 @@ static void root_bootp_init_ext(u8 *e)
 	*e++ = 12;		/* Host name request */
 	*e++ = 32;
 	e += 32;
-	*e++ = 15;		/* Domain name request */
+	*e++ = 40;		/* NIS Domain name request */
 	*e++ = 32;
 	e += 32;
 	*e++ = 17;		/* Boot path */
@@ -756,7 +761,6 @@ static int root_bootp_string(char *dest, char *src, int len, int max)
 static void root_do_bootp_ext(u8 *ext)
 {
 	u8 *c;
-	static int got_bootp_domain = 0;

 #ifdef NFSROOT_BOOTP_DEBUG
 	printk("BOOTP: Got extension %02x",*ext);
@@ -775,20 +779,9 @@ static void root_do_bootp_ext(u8 *ext)
 				memcpy(&gateway.sin_addr.s_addr, ext+1, 4);
 			break;
 		case 12:	/* Host name */
-			if (root_bootp_string(system_utsname.nodename, ext+1, *ext, __NEW_UTS_LEN)) {
-				c = strchr(system_utsname.nodename, '.');
-				if (c) {
-					*c++ = 0;
-					if (!system_utsname.domainname[0]) {
-						strcpy(system_utsname.domainname, c);
-						got_bootp_domain = 1;
-					}
-				}
-			}
+			root_bootp_string(system_utsname.nodename, ext+1, *ext, __NEW_UTS_LEN);
 			break;
-		case 15:	/* Domain name */
-			if (got_bootp_domain && *ext && ext[1])
-				system_utsname.domainname[0] = '\0';
+		case 40:	/* NIS Domain name */
 			root_bootp_string(system_utsname.domainname, ext+1, *ext, __NEW_UTS_LEN);
 			break;
 		case 17:	/* Root path */
@@ -1094,7 +1087,9 @@ static int root_nfs_name(char *name)
 		printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n");
 		return -1;
 	}
-	sprintf(nfs_path, buf, cp);
+	/* update nfs_path with path from nfsroot=... command line parameter */
+	if (*buf)
+		sprintf(nfs_path, buf, cp);

 	/* Set some default values */
 	nfs_port          = -1;

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -117,30 +117,31 @@ struct vm_operations_struct {
 */
 typedef struct page {
 	atomic_t count;
+	unsigned dirty:16,
+		 age:8;
 	unsigned flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page *next;
-
 	struct page *next_hash;
 	unsigned long offset;
 	struct inode *inode;
-	struct page *write_list;
-
 	struct page *prev;
 	struct page *prev_hash;
 	struct buffer_head * buffers;
-	unsigned dirty:16,
-		 age:8;
+	unsigned long swap_unlock_entry;
+	unsigned long map_nr;	/* page->map_nr == page - mem_map */
 } mem_map_t;

 /* Page flag bit values */
-#define PG_locked	 0
-#define PG_error	 1
-#define PG_referenced	 2
-#define PG_uptodate	 3
-#define PG_freeafter	 4
-#define PG_DMA		 5
-#define PG_reserved	31
+#define PG_locked		 0
+#define PG_error		 1
+#define PG_referenced		 2
+#define PG_uptodate		 3
+#define PG_free_after		 4
+#define PG_decr_after		 5
+#define PG_swap_unlock_after	 6
+#define PG_DMA			 7
+#define PG_reserved		31

 /* Make it prettier to test the above... */
 #define PageLocked(page)	(test_bit(PG_locked, &(page)->flags))
@@ -148,10 +149,79 @@ typedef struct page {
 #define PageReferenced(page)	(test_bit(PG_referenced, &(page)->flags))
 #define PageDirty(page)		(test_bit(PG_dirty, &(page)->flags))
 #define PageUptodate(page)	(test_bit(PG_uptodate, &(page)->flags))
-#define PageFreeafter(page)	(test_bit(PG_freeafter, &(page)->flags))
+#define PageFreeAfter(page)	(test_bit(PG_free_after, &(page)->flags))
+#define PageDecrAfter(page)	(test_bit(PG_decr_after, &(page)->flags))
+#define PageSwapUnlockAfter(page) (test_bit(PG_swap_unlock_after, &(page)->flags))
 #define PageDMA(page)		(test_bit(PG_DMA, &(page)->flags))
 #define PageReserved(page)	(test_bit(PG_reserved, &(page)->flags))

+/*
+ * page->reserved denotes a page which must never be accessed (which
+ * may not even be present).
+ *
+ * page->dma is set for those pages which lie in the range of
+ * physical addresses capable of carrying DMA transfers.
+ *
+ * Multiple processes may "see" the same page. E.g. for untouched
+ * mappings of /dev/null, all processes see the same page full of
+ * zeroes, and text pages of executables and shared libraries have
+ * only one copy in memory, at most, normally.
+ *
+ * For the non-reserved pages, page->count denotes a reference count.
+ *   page->count == 0 means the page is free.
+ *   page->count == 1 means the page is used for exactly one purpose
+ *   (e.g. a private data page of one process).
+ *
+ * A page may be used for kmalloc() or anyone else who does a
+ * get_free_page(). In this case the page->count is at least 1, and
+ * all other fields are unused but should be 0 or NULL. The
+ * managament of this page is the responsibility of the one who uses
+ * it.
+ *
+ * The other pages (we may call them "process pages") are completely
+ * managed by the Linux memory manager: I/O, buffers, swapping etc.
+ * The following discussion applies only to them.
+ *
+ * A page may belong to an inode's memory mapping. In this case,
+ * page->inode is the inode, and page->offset is the file offset
+ * of the page (not necessarily a multiple of PAGE_SIZE).
+ *
+ * A page may have buffers allocated to it. In this case,
+ * page->buffers is a circular list of these buffer heads. Else,
+ * page->buffers == NULL.
+ *
+ * For pages belonging to inodes, the page->count is the number of
+ * attaches, plus 1 if buffers are allocated to the page.
+ *
+ * All pages belonging to an inode make up a doubly linked list
+ * inode->i_pages, using the fields page->next and page->prev. (These
+ * fields are also used for freelist management when page->count==0.)
+ * There is also a hash table mapping (inode,offset) to the page
+ * in memory if present. The lists for this hash table use the fields
+ * page->next_hash and page->prev_hash.
+ *
+ * All process pages can do I/O:
+ * - inode pages may need to be read from disk,
+ * - inode pages which have been modified and are MAP_SHARED may need
+ *   to be written to disk,
+ * - private pages which have been modified may need to be swapped out
+ *   to swap space and (later) to be read back into memory.
+ * During disk I/O, page->locked is true. This bit is set before I/O
+ * and reset when I/O completes. page->wait is a wait queue of all
+ * tasks waiting for the I/O on this page to complete.
+ * page->uptodate tells whether the page's contents is valid.
+ * When a read completes, the page becomes uptodate, unless a disk I/O
+ * error happened.
+ * When a write completes, and page->free_after is true, the page is
+ * freed without any further delay.
+ *
+ * For choosing which pages to swap out, inode pages carry a
+ * page->referenced bit, which is set any time the system accesses
+ * that page through the (inode,offset) hash table.
+ * There is also the page->age counter, which implements a linear
+ * decay (why not an exponential decay?), see swapctl.h.
+ */
+
 extern mem_map_t * mem_map;

 /*

--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,7 +14,7 @@

 static inline unsigned long page_address(struct page * page)
 {
-	return PAGE_OFFSET + PAGE_SIZE*(page - mem_map);
+	return PAGE_OFFSET + PAGE_SIZE * page->map_nr;
 }

 #define PAGE_HASH_BITS 10
@@ -22,7 +22,7 @@ static inline unsigned long page_address(struct page * page)

 #define PAGE_AGE_VALUE 16

-extern unsigned long page_cache_size;
+extern unsigned long page_cache_size; /* # of pages currently in the hash table */
 extern struct page * page_hash_table[PAGE_HASH_SIZE];

 /*
@@ -33,7 +33,7 @@ extern struct page * page_hash_table[PAGE_HASH_SIZE];
 */
 static inline unsigned long _page_hashfn(struct inode * inode, unsigned long offset)
 {
-#define i (((unsigned long) inode)/sizeof(unsigned long))
+#define i (((unsigned long) inode)/(sizeof(struct inode) & ~ (sizeof(struct inode) - 1)))
 #define o (offset >> PAGE_SHIFT)
 #define s(x) ((x)+((x)>>PAGE_HASH_BITS))
 	return s(i+o) & (PAGE_HASH_SIZE-1);

--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -54,6 +54,7 @@ extern void rw_swap_page(int, unsigned long, char *, int);
 	rw_swap_page(READ,(nr),(buf),1)
 #define write_swap_page(nr,buf) \
 	rw_swap_page(WRITE,(nr),(buf),1)
+extern void swap_after_unlock_page (unsigned long entry);

 /* linux/mm/page_alloc.c */
 extern void swap_in(struct task_struct *, struct vm_area_struct *,

--- a/init/main.c
+++ b/init/main.c
@@ -521,7 +521,7 @@ static void parse_options(char *line)
 			int n;
 			line += 8;
 			ROOT_DEV = MKDEV(UNNAMED_MAJOR, 255);
-			if (line[0] == '/' || (line[0] >= '0' && line[0] <= '9')) {
+			if (line[0] == '/' || line[0] == ',' || (line[0] >= '0' && line[0] <= '9')) {
 				strncpy(nfs_root_name, line, sizeof(nfs_root_name));
 				nfs_root_name[sizeof(nfs_root_name)-1] = '\0';
 				continue;

--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -421,6 +421,7 @@ static int shm_map (struct vm_area_struct *shmd)
 	pmd_t *page_middle;
 	pte_t *page_table;
 	unsigned long tmp, shm_sgn;
+	int error;

 	/* clear old mappings */
 	do_munmap(shmd->vm_start, shmd->vm_end - shmd->vm_start);
@@ -431,6 +432,7 @@ static int shm_map (struct vm_area_struct *shmd)
 	merge_segments(current, shmd->vm_start, shmd->vm_end);

 	/* map page range */
+	error = 0;
 	shm_sgn = shmd->vm_pte +
 	  SWP_ENTRY(0, (shmd->vm_offset >> PAGE_SHIFT) << SHM_IDX_SHIFT);
 	flush_cache_range(shmd->vm_mm, shmd->vm_start, shmd->vm_end);
@@ -440,11 +442,15 @@ static int shm_map (struct vm_area_struct *shmd)
 	{
 		page_dir = pgd_offset(shmd->vm_mm,tmp);
 		page_middle = pmd_alloc(page_dir,tmp);
-		if (!page_middle)
-			return -ENOMEM;
+		if (!page_middle) {
+			error = -ENOMEM;
+			break;
+		}
 		page_table = pte_alloc(page_middle,tmp);
-		if (!page_table)
-			return -ENOMEM;
+		if (!page_table) {
+			error = -ENOMEM;
+			break;
+		}
 		set_pte(page_table, __pte(shm_sgn));
 	}
 	flush_tlb_range(shmd->vm_mm, shmd->vm_start, shmd->vm_end);
@@ -712,7 +718,7 @@ int shm_swap (int prio, int dma)
 	pte_val(page) = shp->shm_pages[idx];
 	if (!pte_present(page))
 		goto check_table;
-	if (dma && !PageDMA(MAP_NR(pte_page(page)) + mem_map))
+	if (dma && !PageDMA(&mem_map[MAP_NR(pte_page(page))]))
 		goto check_table;
 	swap_attempts++;


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -246,7 +246,7 @@ asmlinkage int sys_setregid(gid_t rgid, gid_t egid)
 		    (current->egid == egid) ||
 		    (current->sgid == egid) ||
 		    suser())
-			current->egid = egid;
+			current->fsgid = current->egid = egid;
 		else {
 			current->gid = old_rgid;
 			return(-EPERM);
@@ -455,7 +455,7 @@ asmlinkage int sys_setreuid(uid_t ruid, uid_t euid)
 		    (current->euid == euid) ||
 		    (current->suid == euid) ||
 		    suser())
-			current->euid = euid;
+			current->fsuid = current->euid = euid;
 		else {
 			current->uid = old_ruid;
 			return(-EPERM);

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -464,7 +464,7 @@ if (count > 0) ccount += count;
 * Will try asynchronous read-ahead.
 * Double the max read ahead size each time.
 *   That heuristic avoid to do some large IO for files that are not really
- *   accessed sequentialy.
+ *   accessed sequentially.
 */
 	} else {
 		try_async = 1;

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -569,7 +569,7 @@ unsigned long put_dirty_page(struct task_struct * tsk, unsigned long page, unsig
 		return 0;
 	}
 	set_pte(pte, pte_mkwrite(pte_mkdirty(mk_pte(page, PAGE_COPY))));
-/* no need for invalidate */
+/* no need for flush_tlb */
 	return page;
 }


--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -37,7 +37,7 @@ int nr_free_pages = 0;

 struct free_area_struct {
 	struct page list;
-	unsigned int *  map;
+	unsigned int * map;
 };

 static struct free_area_struct free_area[NR_MEM_LISTS];
@@ -143,7 +143,7 @@ do { struct free_area_struct * area = free_area+order; \
 	do { struct page *prev = &area->list, *ret; \
 		while (&area->list != (ret = prev->next)) { \
 			if (!dma || CAN_DMA(ret)) { \
-				unsigned long map_nr = ret - mem_map; \
+				unsigned long map_nr = ret->map_nr; \
 				(prev->next = ret->next)->prev = prev; \
 				MARK_USED(map_nr, new_order, area); \
 				nr_free_pages -= 1 << order; \
@@ -263,6 +263,7 @@ unsigned long free_area_init(unsigned long start_mem, unsigned long end_mem)
 	do {
 		--p;
 		p->flags = (1 << PG_DMA) | (1 << PG_reserved);
+		p->map_nr = p - mem_map;
 	} while (p > mem_map);

 	for (i = 0 ; i < NR_MEM_LISTS ; i++) {
@@ -310,6 +311,7 @@ void swap_in(struct task_struct * tsk, struct vm_area_struct * vma,
 	vma->vm_mm->rss++;
 	tsk->maj_flt++;
 	if (!write_access && add_to_swap_cache(MAP_NR(page), entry)) {
+		/* keep swap page allocated for the moment (swap cache) */
 		set_pte(page_table, mk_pte(page, vma->vm_page_prot));
 		return;
 	}

--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -5,6 +5,7 @@
 *
 *  Swap reorganised 29.12.95, 
 *  Asynchronous swapping added 30.12.95. Stephen Tweedie
+ *  Removed race in async swapping. 14.4.1996. Bruno Haible
 */

 #include <linux/mm.h>
@@ -28,6 +29,18 @@

 static struct wait_queue * lock_queue = NULL;

+/*
+ * Reads or writes a swap page.
+ * wait=1: start I/O and wait for completion. wait=0: start asynchronous I/O.
+ *
+ * Important prevention of race condition: The first thing we do is set a lock
+ * on this swap page, which lasts until I/O completes. This way a
+ * write_swap_page(entry) immediately followed by a read_swap_page(entry)
+ * on the same entry will first complete the write_swap_page(). Fortunately,
+ * not more than one write_swap_page() request can be pending per entry. So
+ * all races the caller must catch are: multiple read_swap_page() requests
+ * on the same entry.
+ */
 void rw_swap_page(int rw, unsigned long entry, char * buf, int wait)
 {
 	unsigned long type, offset;
@@ -53,6 +66,7 @@ void rw_swap_page(int rw, unsigned long entry, char * buf, int wait)
 		printk("Trying to swap to unused swap-device\n");
 		return;
 	}
+	/* Make sure we are the only process doing I/O with this swap page. */
 	while (set_bit(offset,p->swap_lockmap))
 		sleep_on(&lock_queue);
 	if (rw == READ)
@@ -64,12 +78,16 @@ void rw_swap_page(int rw, unsigned long entry, char * buf, int wait)
 	if (p->swap_device) {
 		if (!wait) {
 			page->count++;
-			set_bit(PG_freeafter, &page->flags);
+			set_bit(PG_free_after, &page->flags);
+			set_bit(PG_decr_after, &page->flags);
+			set_bit(PG_swap_unlock_after, &page->flags);
+			page->swap_unlock_entry = entry;
 			nr_async_pages++;
 		}
 		ll_rw_page(rw,p->swap_device,offset,buf);
-		if (wait)
-			wait_on_page(page);
+		if (!wait)
+			return;
+		wait_on_page(page);
 	} else if (p->swap_file) {
 		struct inode *swapf = p->swap_file;
 		unsigned int zones[PAGE_SIZE/512];
@@ -114,3 +132,52 @@ void rw_swap_page(int rw, unsigned long entry, char * buf, int wait)
 		printk("rw_swap_page: lock already cleared\n");
 	wake_up(&lock_queue);
 }
+
+/* This is run when asynchronous page I/O has completed. */
+void swap_after_unlock_page (unsigned long entry)
+{
+	unsigned long type, offset;
+	struct swap_info_struct * p;
+
+	type = SWP_TYPE(entry);
+	if (type >= nr_swapfiles) {
+		printk("swap_after_unlock_page: bad swap-device\n");
+		return;
+	}
+	p = &swap_info[type];
+	offset = SWP_OFFSET(entry);
+	if (offset >= p->max) {
+		printk("swap_after_unlock_page: weirdness\n");
+		return;
+	}
+	if (!clear_bit(offset,p->swap_lockmap))
+		printk("swap_after_unlock_page: lock already cleared\n");
+	wake_up(&lock_queue);
+}
+
+/*
+ * Swap partitions are now read via brw_page.  ll_rw_page is an
+ * asynchronous function now --- we must call wait_on_page afterwards
+ * if synchronous IO is required.  
+ */
+void ll_rw_page(int rw, kdev_t dev, unsigned long page, char * buffer)
+{
+	int block = page;
+
+	switch (rw) {
+		case READ:
+			break;
+		case WRITE:
+			if (is_read_only(dev)) {
+				printk("Can't page to read-only device %s\n",
+					kdevname(dev));
+				return;
+			}
+			break;
+		default:
+			panic("ll_rw_page: bad block dev cmd, must be R/W");
+	}
+	if (set_bit(PG_locked, &mem_map[MAP_NR(buffer)].flags))
+		panic ("ll_rw_page: page already locked");
+	brw_page(rw, (unsigned long) buffer, dev, &block, PAGE_SIZE, 0);
+}
--- a/net/ipv4/ip_masq.c
+++ b/net/ipv4/ip_masq.c
@@ -636,46 +636,58 @@ int ip_fw_demasquerade(struct sk_buff **skb_p, struct device *dev)
 static int ip_msqhst_procinfo(char *buffer, char **start, off_t offset,
 			      int length, int unused)
 {
-	off_t pos=0, begin=0;
+	off_t pos=0, begin;
 	struct ip_masq *ms;
 	unsigned long flags;
+	char temp[129];
        int idx = 0;
 	int len=0;
 	
-	len=sprintf(buffer,"Prc FromIP   FPrt ToIP     TPrt Masq Init-seq  Delta PDelta Expires (free=%d,%d)\n",
-                    ip_masq_free_ports[0], ip_masq_free_ports[1]); 
+	if (offset < 128) 
+	{
+		sprintf(temp,
+			"Prc FromIP   FPrt ToIP     TPrt Masq Init-seq  Delta PDelta Expires (free=%d,%d)",
+			ip_masq_free_ports[0], ip_masq_free_ports[1]); 
+		len = sprintf(buffer, "%-127s\n", temp);
+	}
+	pos = 128;
 	save_flags(flags);
 	cli();
        
        for(idx = 0; idx < IP_MASQ_TAB_SIZE; idx++)
        for(ms = ip_masq_m_tab[idx]; ms ; ms = ms->m_link)
 	{
-		int timer_active = del_timer(&ms->timer);
+		int timer_active;
+		pos += 128;
+		if (pos <= offset)
+			continue;
+
+		timer_active = del_timer(&ms->timer);
 		if (!timer_active)
 			ms->timer.expires = jiffies;
-		len+=sprintf(buffer+len,"%s %08lX:%04X %08lX:%04X %04X %08X %6d %6d %lu\n",
+		sprintf(temp,"%s %08lX:%04X %08lX:%04X %04X %08X %6d %6d %7lu",
 			masq_proto_name(ms->protocol),
-			ntohl(ms->saddr),ntohs(ms->sport),
-			ntohl(ms->daddr),ntohs(ms->dport),
+			ntohl(ms->saddr), ntohs(ms->sport),
+			ntohl(ms->daddr), ntohs(ms->dport),
 			ntohs(ms->mport),
-			ms->out_seq.init_seq,ms->out_seq.delta,ms->out_seq.previous_delta,ms->timer.expires-jiffies);
+			ms->out_seq.init_seq,
+			ms->out_seq.delta,
+			ms->out_seq.previous_delta,
+			ms->timer.expires-jiffies);
 		if (timer_active)
 			add_timer(&ms->timer);
+		len += sprintf(buffer+len, "%-127s\n", temp);

-		pos=begin+len;
-		if(pos<offset) 
-		{
- 			len=0;
-			begin=pos;
-		}
-		if(pos>offset+length)
-			break;
+		if(len >= length)
+			goto done;
        }
+done:
 	restore_flags(flags);
-	*start=buffer+(offset-begin);
-	len-=(offset-begin);
+	begin = len - (pos - offset);
+	*start = buffer + begin;
+	len -= begin;
 	if(len>length)
-		len=length;
+		len = length;
 	return len;
 }


--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -394,7 +394,7 @@ static void tcp_conn_request(struct sock *sk, struct sk_buff *skb,
 	skb_queue_head_init(&newsk->back_log);
 	newsk->rtt = 0;		/*TCP_CONNECT_TIME<<3*/
 	newsk->rto = TCP_TIMEOUT_INIT;
-	newsk->mdev = 0;
+	newsk->mdev = TCP_TIMEOUT_INIT<<1;
 	newsk->max_window = 0;
 	newsk->cong_window = 1;
 	newsk->cong_count = 0;
@@ -598,7 +598,6 @@ static int tcp_ack(struct sock *sk, struct tcphdr *th, u32 ack, int len)
 	 *     in shutdown state
 	 * 2 - data from retransmit queue was acked and removed
 	 * 4 - window shrunk or data from retransmit queue was acked and removed
-	 * 8 - we want to do a fast retransmit. One packet only.
 	 */

 	if(sk->zapped)
@@ -709,18 +708,52 @@ static int tcp_ack(struct sock *sk, struct tcphdr *th, u32 ack, int len)
 	 *	This will allow us to do fast retransmits.
 	 */

-	if (sk->rcv_ack_seq == ack && sk->window_seq == window_seq && !(flag&1))
+	/* We are looking for duplicate ACKs here.
+	 * An ACK is a duplicate if:
+	 * (1) it has the same sequence number as the largest number we've seen,
+	 * (2) it has the same window as the last ACK,
+	 * (3) we have outstanding data that has not been ACKed
+	 * (4) The packet was not carrying any data.
+	 * I've tried to order these in occurance of most likely to fail
+	 * to least likely to fail.
+	 * [These are the rules BSD stacks use to determine if an ACK is a
+	 *  duplicate.]
+	 */
+
+	if (sk->rcv_ack_seq == ack
+		&& sk->window_seq == window_seq
+		&& !(flag&1)
+		&& before(ack, sk->sent_seq))
 	{
-		/*
-		 * We only want to short cut this once, many
-		 * ACKs may still come, we'll do a normal transmit
-		 * for these ACKs.
+		/* See draft-stevens-tcpca-spec-01 for explanation
+		 * of what we are doing here.
 		 */
-		if (++sk->rcv_ack_cnt == MAX_DUP_ACKS+1)
-			flag |= 8;	/* flag for a fast retransmit */
+		sk->rcv_ack_cnt++;
+		if (sk->rcv_ack_cnt == MAX_DUP_ACKS+1) {
+			sk->ssthresh = max(sk->cong_window >> 1, 2);
+			sk->cong_window = sk->ssthresh+MAX_DUP_ACKS+1;
+			tcp_do_retransmit(sk,0);
+			/* reduce the count. We don't want to be
+			* seen to be in "retransmit" mode if we
+			* are doing a fast retransmit.
+			*/
+			sk->retransmits--;
+		} else if (sk->rcv_ack_cnt > MAX_DUP_ACKS+1) {
+			sk->cong_window++;
+			/*
+			* At this point we are suppose to transmit a NEW
+			* packet (not retransmit the missing packet,
+			* this would only get us into a retransmit war.)
+			* I think that having just adjusted cong_window
+			* we will transmit the new packet below.
+			*/
+		}
 	}
 	else
 	{
+		if (sk->rcv_ack_cnt > MAX_DUP_ACKS) {
+			sk->cong_window = sk->ssthresh;
+		}
 		sk->window_seq = window_seq;
 		sk->rcv_ack_seq = ack;
 		sk->rcv_ack_cnt = 1;
@@ -1046,8 +1079,7 @@ static int tcp_ack(struct sock *sk, struct tcphdr *th, u32 ack, int len)
 	
 	if (((!flag) || (flag&4)) && sk->send_head != NULL &&
 	       (((flag&2) && sk->retransmits) ||
-		(flag&8) ||
-	       (sk->send_head->when + sk->rto < jiffies))) 
+	       (sk->send_head->when + sk->rto < jiffies)))
 	{
 		if(sk->send_head->when + sk->rto < jiffies)
 			tcp_retransmit(sk,0);