Finish implementation of proper handling of allocated_size vs

initialized_size vs data_size (i.e. i_size) everywhere. Now host of mftbmp address space mapping is set to the ntfs volume so it can be retrieved in the async i/o completion handler. Hopefully this will not cause problems. If it does need to use a fake inode.

Finish implementation of proper handling of allocated_size vs
initialized_size vs data_size (i.e. i_size) everywhere. Now host of mftbmp address space mapping is set to the ntfs volume so it can be retrieved in the async i/o completion handler. Hopefully this will not cause problems. If it does need to use a fake inode.
d3c671c7 · Anton Altaparmakov · Anton Altaparmakov · 4a0f5869 · d3c671c7 · d3c671c7
Commit d3c671c7 authored Mar 23, 2002 by Anton Altaparmakov Committed by Anton Altaparmakov Mar 23, 2002
Show whitespace changes
Inline Side-by-side

Showing with 367 additions and 306 deletions

fs/ntfs/ChangeLog fs/ntfs/ChangeLog +17 -14

fs/ntfs/aops.c fs/ntfs/aops.c +344 -290

fs/ntfs/super.c fs/ntfs/super.c +6 -2

No files found.
--- a/fs/ntfs/ChangeLog
+++ b/fs/ntfs/ChangeLog
 ToDo:
-	- Audit for allocated_size vs initialized_size vs data_size (i.e.
-	  i_size) in whole driver.
-	  Need to enforce limits and zeroes need to be written when overflow is
-	  detected. We CANNOT use block_read_full_page() at all anywhere! This
-	  is because initialized_size can lie within a block and ntfs_get_block
-	  has no way to tell block_read_full_page about it. So our readpage
-	  functions need to clone block_read_full_page and modify it to cope
-	  with the significance of the different attribute sizes.
-	  Still need to go through:
-		aops.c
 	- Find and fix bugs.
 	- W.r.t. s_maxbytes still need to be careful on reading/truncating as
 	  there are dragons lurking in the details, e.g. read_inode() currently
@@ -24,13 +14,18 @@ ToDo:
 	  strictly a speed optimization. Obviously need to keep the ->run_list
 	  locked or RACE. load_attribute_list() already performs such an
 	  optimization so use the same optimization where desired.
-	- Optimize mft_readpage() to not do i/o on buffer heads beyond
+	- Optimize all our readpage functions to not do i/o on buffer heads
-	  initialized_size, just zero the buffer heads instead. Question: How
+	  beyond initialized_size, just zero the buffer heads instead.
-	  to setup the buffer heads so they point to the on disk location
+	  Question: How to setup the buffer heads so they point to the on disk
-	  correctly (after all they are allocated) but are not read from disk?
+	  location correctly (after all they are allocated) but are not read
+	  from disk?
 	- Consider if ntfs_file_read_compressed_block() shouldn't be coping
 	  with initialized_size < data_size. I don't think it can happen but
 	  it requires more careful consideration.
+	- CLEANUP: Modularise and reuse code in aops.c. At the moment we have
+	  several copies of almost identicall functions and the functions are
+	  quite big. Modularising them a bit, e.g. a-la get_block(), will make
+	  them cleaner and make code reuse easier.
 tng-0.0.9 - Work in progress
@@ -55,6 +50,14 @@ tng-0.0.9 - Work in progress
 	- Rename aops.c::end_buffer_read_index_async() to
 	  end_buffer_read_mst_async() and optimize the overflow checking and
 	  handling.
+	- Use the host of the mftbmp address space mapping to hold the ntfs
+	  volume. This is needed so the async i/o completion handler can
+	  retrieve a pointer to the volume. Hopefully this will not cause
+	  problems elsewhere in the kernel... Otherwise will need to use a
+	  fake inode.
+	- Complete implementation of proper handling of allocated_size vs
+	  initialized_size vs data_size (i.e. i_size) in whole driver.
+	  Basically aops.c is now completely rewritten.
 - Don't bother getting the run list in inode.c::ntfs_read_inode().

--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -29,116 +29,183 @@
 #include "ntfs.h"
-/**
+#define MAX_BUF_PER_PAGE (PAGE_CACHE_SIZE / 512)
- * ntfs_file_get_block - read/create inode @ino block @blk into buffer head @bh
- * @ino:	inode to read/create block from/onto
+/*
- * @blk:	block number to read/create
+ * Async io completion handler for accessing files. Adapted from
- * @bh:		buffer in which to return the read/created block
+ * end_buffer_read_mst_async().
- * @create:	if not zero, create the block if it doesn't exist already
- * 
- * ntfs_file_get_block() remaps the block number @blk of the inode @ino from
- * file offset into disk block position and returns the result in the buffer
- * head @bh. If the block doesn't exist and create is not zero,
- * ntfs_file_get_block() creates the block before returning it. @blk is the
- * file offset divided by the file system block size, as defined by the field
- * s_blocksize in the super block reachable by @ino->i_sb.
- *
- * If the block doesn't exist, create is true, and the inode is marked
- * for synchronous I/O, then we will wait for creation to complete before
- * returning the created block (which will be zeroed). Otherwise we only
- * schedule creation and return. - FIXME: Need to have a think whether this is
- * really necessary. What would happen if we didn't actually write the block to
- * disk at this stage? We would just save writing a block full of zeroes to the
- * device. - We can always write it synchronously when the user actually writes
- * some data into it. - But this might result in random data being returned
- * should the computer crash. - Hmmm. - This requires more thought.
- *
- * Obviously the block is only created if the file system super block flag
- * MS_RDONLY is not set and only if NTFS write support is compiled in.
 */
-int ntfs_file_get_block(struct inode *vi, const sector_t blk,
+static void end_buffer_read_file_async(struct buffer_head *bh, int uptodate)
-		struct buffer_head *bh, const int create)
+{
+	static spinlock_t page_uptodate_lock = SPIN_LOCK_UNLOCKED;
+	unsigned long flags;
+	struct buffer_head *tmp;
+	struct page *page;
+	mark_buffer_uptodate(bh, uptodate);
+	page = bh->b_page;
+	if (likely(uptodate)) {
+		s64 file_ofs;
+		ntfs_inode *ni = NTFS_I(page->mapping->host);
+		file_ofs = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh);
+		if (file_ofs + bh->b_size > ni->initialized_size) {
+			char *addr;
+			int ofs = 0;
+			if (file_ofs < ni->initialized_size)
+				ofs = ni->initialized_size - file_ofs;
+			addr = kmap_atomic(page, KM_BIO_IRQ);
+			memset(addr + bh_offset(bh) + ofs, 0, bh->b_size - ofs);
+			flush_dcache_page(page);
+			kunmap_atomic(addr, KM_BIO_IRQ);
+		}
+	} else
+		SetPageError(page);
+	spin_lock_irqsave(&page_uptodate_lock, flags);
+	mark_buffer_async(bh, 0);
+	unlock_buffer(bh);
+	tmp = bh->b_this_page;
+	while (tmp != bh) {
+		if (buffer_locked(tmp)) {
+			if (buffer_async(tmp))
+				goto still_busy;
+		} else if (!buffer_uptodate(tmp))
+			SetPageError(page);
+		tmp = tmp->b_this_page;
+	}
+	spin_unlock_irqrestore(&page_uptodate_lock, flags);
+	if (!PageError(page))
+		SetPageUptodate(page);
+	UnlockPage(page);
+	return;
+still_busy:
+	spin_unlock_irqrestore(&page_uptodate_lock, flags);
+	return;
+}
+/* NTFS version of block_read_full_page(). Adapted from ntfs_mst_readpage(). */
+static int ntfs_file_read_block(struct page *page)
 {
-	ntfs_inode *ni = NTFS_I(vi);
-	ntfs_volume *vol = ni->vol;
 	VCN vcn;
 	LCN lcn;
-	int ofs;
+	ntfs_inode *ni;
-	BOOL is_retry = FALSE;
+	ntfs_volume *vol;
+	struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+	sector_t iblock, lblock;
+	unsigned int blocksize, blocks, vcn_ofs;
+	int i, nr;
+	unsigned char blocksize_bits;
-	bh->b_dev = vi->i_dev;
+	ni = NTFS_I(page->mapping->host);
-	bh->b_blocknr = -1;
+	vol = ni->vol;
-	bh->b_state &= ~(1UL << BH_Mapped);
-	/* Convert @blk into a virtual cluster number (vcn) and offset. */
+	blocksize_bits = VFS_I(ni)->i_blkbits;
-	vcn = (VCN)blk << vol->sb->s_blocksize_bits >> vol->cluster_size_bits;
+	blocksize = 1 << blocksize_bits;
-	ofs = ((VCN)blk << vol->sb->s_blocksize_bits) & vol->cluster_size_mask;
-	/* Check for initialized size overflow. */
+	create_empty_buffers(page, blocksize);
-	if ((vcn << vol->cluster_size_bits) + ofs >= ni->initialized_size)
+	bh = head = page->buffers;
-		return 0;
+	if (!bh)
-	/*
+		return -ENOMEM;
-	 * Further, we need to be checking i_size and be just doing the
-	 * following if it is zero or we are out of bounds:
-	 * 	bh->b_blocknr = -1UL;
-	 * 	raturn 0;
-	 * Also, we need to deal with attr->initialized_size.
-	 * Also, we need to deal with the case where the last block is
-	 * requested but it is not initialized fully, i.e. it is a partial
-	 * block. We then need to read it synchronously and fill the remainder
-	 * with zero. Can't do it other way round as reading from the block
-	 * device would result in our pre-zeroed data to be overwritten as the
-	 * whole block is loaded from disk.
-	 * Also, need to lock run_list in inode so we don't have someone
-	 * reading it at the same time as someone else writing it.
-	 */
-retry_remap:
+	blocks = PAGE_CACHE_SIZE >> blocksize_bits;
+	iblock = page->index << (PAGE_CACHE_SHIFT - blocksize_bits);
+	lblock = (ni->allocated_size + blocksize - 1) >> blocksize_bits;
-	/* Convert the vcn to the corresponding logical cluster number (lcn). */
+#ifdef DEBUG
+	if (unlikely(!ni->mft_no)) {
+		ntfs_error(vol->sb, "NTFS: Attempt to access $MFT! This is a "
+				"very serious bug! Denying access...");
+		return -EACCES;
+	}
+#endif
+	/* Loop through all the buffers in the page. */
+	nr = i = 0;
+	do {
+		BUG_ON(buffer_mapped(bh) || buffer_uptodate(bh));
+		bh->b_dev = VFS_I(ni)->i_dev;
+		/* Is the block within the allowed limits? */
+		if (iblock < lblock) {
+			BOOL is_retry = FALSE;
+			/* Convert iblock into corresponding vcn and offset. */
+			vcn = (VCN)iblock << blocksize_bits >>
+					vol->cluster_size_bits;
+			vcn_ofs = ((VCN)iblock << blocksize_bits) &
+					vol->cluster_size_mask;
+retry_remap:
+			/* Convert the vcn to the corresponding lcn. */
 			down_read(&ni->run_list.lock);
 			lcn = vcn_to_lcn(ni->run_list.rl, vcn);
 			up_read(&ni->run_list.lock);
 			/* Successful remap. */
 			if (lcn >= 0) {
-		/* Setup the buffer head to describe the correct block. */
+				/* Setup buffer head to correct block. */
-#if 0
+				bh->b_blocknr = ((lcn << vol->cluster_size_bits)
-		/* Already the case when we are called. */
+						+ vcn_ofs) >> blocksize_bits;
-		bh->b_dev = vfs_ino->i_dev;
-#endif
-		bh->b_blocknr = ((lcn << vol->cluster_size_bits) + ofs) >>
-				vol->sb->s_blocksize_bits;
 				bh->b_state |= (1UL << BH_Mapped);
-		return 0;
+				arr[nr++] = bh;
-	}
+				continue;
-	/* It is a hole. */
-	if (lcn == LCN_HOLE) {
-		if (create)
-			/* FIXME: We should instantiate the hole. */
-			return -EROFS;
-		/*
-		 * Hole. Set the block number to -1 (it is ignored but
-		 * just in case and might help with debugging).
-		 */
-		bh->b_blocknr = -1UL;
-		bh->b_state &= ~(1UL << BH_Mapped);
-		return 0;
 			}
-	/* If on first try and the run list was not mapped, map it and retry. */
+			/* It is a hole, need to zero it. */
+			if (lcn == LCN_HOLE)
+				goto handle_hole;
+			/* If first try and run list unmapped, map and retry. */
 			if (!is_retry && lcn == LCN_RL_NOT_MAPPED) {
-		int err = map_run_list(ni, vcn);
-		if (!err) {
 				is_retry = TRUE;
+				if (!map_run_list(ni, vcn))
 					goto retry_remap;
 			}
-		return err;
+			/* Hard error, zero out region. */
+			SetPageError(page);
+			ntfs_error(vol->sb, "vcn_to_lcn(vcn = 0x%Lx) failed "
+					"with error code 0x%Lx%s.",
+					(long long)vcn, (long long)-lcn,
+					is_retry ? " even after retrying" : "");
+			// FIXME: Depending on vol->on_errors, do something.
 		}
-	if (create)
+		/*
-		/* FIXME: We might need to extend the attribute. */
+		 * Either iblock was outside lblock limits or vcn_to_lcn()
-		return -EROFS;
+		 * returned error. Just zero that portion of the page and set
-	/* Error. */
+		 * the buffer uptodate.
-	return -EIO;
+		 */
+handle_hole:
+		bh->b_blocknr = -1UL;
+		bh->b_state &= ~(1UL << BH_Mapped);
+		memset(kmap(page) + i * blocksize, 0, blocksize);
+		flush_dcache_page(page);
+		kunmap(page);
+		set_bit(BH_Uptodate, &bh->b_state);
+	} while (i++, iblock++, (bh = bh->b_this_page) != head);
+	/* Check we have at least one buffer ready for i/o. */
+	if (nr) {
+		/* Lock the buffers. */
+		for (i = 0; i < nr; i++) {
+			struct buffer_head *tbh = arr[i];
+			lock_buffer(tbh);
+			tbh->b_end_io = end_buffer_read_file_async;
+			mark_buffer_async(tbh, 1);
+		}
+		/* Finally, start i/o on the buffers. */
+		for (i = 0; i < nr; i++)
+			submit_bh(READ, arr[i]);
+		return 0;
+	}
+	/* No i/o was scheduled on any of the buffers. */
+	if (!PageError(page))
+		SetPageUptodate(page);
+	else /* Signal synchronous i/o error. */
+		nr = -EIO;
+	UnlockPage(page);
+	return nr;
 }
 /**
@@ -162,29 +229,17 @@ int ntfs_file_get_block(struct inode *vi, const sector_t blk,
 static int ntfs_file_readpage(struct file *file, struct page *page)
 {
 	s64 attr_pos;
-	struct inode *vi;
 	ntfs_inode *ni;
-	char *page_addr;
+	char *addr;
-	u32 attr_len;
-	int err = 0;
 	attr_search_context *ctx;
 	MFT_RECORD *mrec;
+	u32 attr_len;
+	int err = 0;
-	//ntfs_debug("Entering for index 0x%lx.", page->index);
-	/* The page must be locked. */
 	if (!PageLocked(page))
 		PAGE_BUG(page);
-	/*
-	 * Get the VFS and ntfs inodes associated with the page. This could
+	ni = NTFS_I(page->mapping->host);
-	 * be achieved by looking at f->f_dentry->d_inode, too, unless the
-	 * dentry is negative, but could it really be negative considering we
-	 * are reading from the opened file? - NOTE: We can't get it from file,
-	 * because we can use ntfs_file_readpage on inodes not representing
-	 * open files!!! So basically we never ever touch file or at least we
-	 * must check it is not NULL before doing so.
-	 */
-	vi = page->mapping->host;
-	ni = NTFS_I(vi);
 	/* Is the unnamed $DATA attribute resident? */
 	if (test_bit(NI_NonResident, &ni->state)) {
@@ -195,35 +250,29 @@ static int ntfs_file_readpage(struct file *file, struct page *page)
 			err = -EACCES;
 			goto unl_err_out;
 		}
-		if (!test_bit(NI_Compressed, &ni->state))
-			/* Normal data stream, use generic functionality. */
-			return block_read_full_page(page, ntfs_file_get_block);
 		/* Compressed data stream. Handled in compress.c. */
+		if (test_bit(NI_Compressed, &ni->state))
 			return ntfs_file_read_compressed_block(page);
+		/* Normal data stream. */
+		return ntfs_file_read_block(page);
 	}
 	/* Attribute is resident, implying it is not compressed or encrypted. */
-	/*
-	 * Make sure the inode doesn't disappear under us. - Shouldn't be
-	 * needed as the page is locked.
-	 */
-	// atomic_inc(&vfs_ino->i_count);
 	/* Map, pin and lock the mft record for reading. */
 	mrec = map_mft_record(READ, ni);
 	if (IS_ERR(mrec)) {
 		err = PTR_ERR(mrec);
-		goto dec_unl_err_out;
+		goto unl_err_out;
 	}
 	err = get_attr_search_ctx(&ctx, ni, mrec);
 	if (err)
-		goto unm_dec_unl_err_out;
+		goto unm_unl_err_out;
 	/* Find the data attribute in the mft record. */
 	if (!lookup_attr(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx)) {
 		err = -ENOENT;
-		goto put_unm_dec_unl_err_out;
+		goto put_unm_unl_err_out;
 	}
 	/* Starting position of the page within the attribute value. */
@@ -232,173 +281,188 @@ static int ntfs_file_readpage(struct file *file, struct page *page)
 	/* The total length of the attribute value. */
 	attr_len = le32_to_cpu(ctx->attr->_ARA(value_length));
-	/* Map the page so we can access it. */
+	addr = kmap(page);
-	page_addr = kmap(page);
+	/* Copy over in bounds data, zeroing the remainder of the page. */
-	/*
-	 * TODO: Find out whether we really need to zero the page. If it is
-	 * initialized to zero already we could skip this.
-	 */
-	/* 
-	 * If we are asking for any in bounds data, copy it over, zeroing the
-	 * remainder of the page if necessary. Otherwise just zero the page.
-	 */
 	if (attr_pos < attr_len) {
 		u32 bytes = attr_len - attr_pos;
 		if (bytes > PAGE_CACHE_SIZE)
 			bytes = PAGE_CACHE_SIZE;
 		else if (bytes < PAGE_CACHE_SIZE)
-			memset(page_addr + bytes, 0, PAGE_CACHE_SIZE - bytes);
+			memset(addr + bytes, 0, PAGE_CACHE_SIZE - bytes);
 		/* Copy the data to the page. */
-		memcpy(page_addr, attr_pos + (char*)ctx->attr +
+		memcpy(addr, attr_pos + (char*)ctx->attr +
-				le16_to_cpu(ctx->attr->_ARA(value_offset)), bytes);
+				le16_to_cpu(ctx->attr->_ARA(value_offset)),
+				bytes);
 	} else
-		memset(page_addr, 0, PAGE_CACHE_SIZE);
+		memset(addr, 0, PAGE_CACHE_SIZE);
 	kunmap(page);
-	/* We are done. */
 	SetPageUptodate(page);
-put_unm_dec_unl_err_out:
+put_unm_unl_err_out:
 	put_attr_search_ctx(ctx);
-unm_dec_unl_err_out:
+unm_unl_err_out:
-	/* Unlock, unpin and release the mft record. */
 	unmap_mft_record(READ, ni);
-dec_unl_err_out:
-	/* Release the inode. - Shouldn't be needed as the page is locked. */
-	// atomic_dec(&vfs_ino->i_count);
 unl_err_out:
 	UnlockPage(page);
 	return err;
 }
 /*
- * Specialized get block for reading the mft bitmap. Adapted from
+ * Async io completion handler for accessing mft bitmap. Adapted from
- * ntfs_file_get_block.
+ * end_buffer_read_mst_async().
 */
-static int ntfs_mftbmp_get_block(ntfs_volume *vol, const sector_t blk,
+static void end_buffer_read_mftbmp_async(struct buffer_head *bh, int uptodate)
-		struct buffer_head *bh)
 {
-	VCN vcn = (VCN)blk << vol->sb->s_blocksize_bits >>
+	static spinlock_t page_uptodate_lock = SPIN_LOCK_UNLOCKED;
-			vol->cluster_size_bits;
+	unsigned long flags;
-	int ofs = (blk << vol->sb->s_blocksize_bits) &
+	struct buffer_head *tmp;
-			vol->cluster_size_mask;
+	struct page *page;
-	LCN lcn;
-	ntfs_debug("Entering for blk = 0x%lx, vcn = 0x%Lx, ofs = 0x%x.",
+	mark_buffer_uptodate(bh, uptodate);
-			blk, (long long)vcn, ofs);
-	bh->b_dev = vol->mft_ino->i_dev;
+	page = bh->b_page;
-	bh->b_state &= ~(1UL << BH_Mapped);
-	bh->b_blocknr = -1;
+	if (likely(uptodate)) {
-	/* Check for initialized size overflow. */
+		s64 file_ofs;
-	if ((vcn << vol->cluster_size_bits) + ofs >=
-			vol->mftbmp_initialized_size) {
+		/* Host is the ntfs volume. Our mft bitmap access kludge... */
-		ntfs_debug("Done.");
+		ntfs_volume *vol = (ntfs_volume*)page->mapping->host;
-		return 0;
+		file_ofs = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh);
+		if (file_ofs + bh->b_size > vol->mftbmp_initialized_size) {
+			char *addr;
+			int ofs = 0;
+			if (file_ofs < vol->mftbmp_initialized_size)
+				ofs = vol->mftbmp_initialized_size - file_ofs;
+			addr = kmap_atomic(page, KM_BIO_IRQ);
+			memset(addr + bh_offset(bh) + ofs, 0, bh->b_size - ofs);
+			flush_dcache_page(page);
+			kunmap_atomic(addr, KM_BIO_IRQ);
 		}
-	down_read(&vol->mftbmp_rl.lock);
+	} else
-	lcn = vcn_to_lcn(vol->mftbmp_rl.rl, vcn);
+		SetPageError(page);
-	up_read(&vol->mftbmp_rl.lock);
-	ntfs_debug("lcn = 0x%Lx.", (long long)lcn);
+	spin_lock_irqsave(&page_uptodate_lock, flags);
-	if (lcn < 0LL) {
+	mark_buffer_async(bh, 0);
-		ntfs_error(vol->sb, "Returning -EIO, lcn = 0x%Lx.",
+	unlock_buffer(bh);
-				(long long)lcn);
-		return -EIO;
+	tmp = bh->b_this_page;
+	while (tmp != bh) {
+		if (buffer_locked(tmp)) {
+			if (buffer_async(tmp))
+				goto still_busy;
+		} else if (!buffer_uptodate(tmp))
+			SetPageError(page);
+		tmp = tmp->b_this_page;
 	}
-	/* Setup the buffer head to describe the correct block. */
-	bh->b_blocknr = ((lcn << vol->cluster_size_bits) + ofs) >>
-			vol->sb->s_blocksize_bits;
-	bh->b_state |= (1UL << BH_Mapped);
-	ntfs_debug("Done, bh->b_blocknr = 0x%lx.", bh->b_blocknr);
-	return 0;
-}
-#define MAX_BUF_PER_PAGE (PAGE_CACHE_SIZE / 512)
+	spin_unlock_irqrestore(&page_uptodate_lock, flags);
+	if (!PageError(page))
+		SetPageUptodate(page);
+	UnlockPage(page);
+	return;
+still_busy:
+	spin_unlock_irqrestore(&page_uptodate_lock, flags);
+	return;
+}
-/*
+/* Readpage for accessing mft bitmap. Adapted from ntfs_mst_readpage(). */
- * Specialized readpage for accessing mft bitmap. Adapted from
- * block_read_full_page().
- */
 static int ntfs_mftbmp_readpage(ntfs_volume *vol, struct page *page)
 {
-	sector_t iblock, lblock;
+	VCN vcn;
+	LCN lcn;
 	struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
-	unsigned int blocksize, blocks;
+	sector_t iblock, lblock;
+	unsigned int blocksize, blocks, vcn_ofs;
 	int nr, i;
 	unsigned char blocksize_bits;
-	ntfs_debug("Entering for index 0x%lx.", page->index);
 	if (!PageLocked(page))
 		PAGE_BUG(page);
 	blocksize = vol->sb->s_blocksize;
 	blocksize_bits = vol->sb->s_blocksize_bits;
-	if (!page->buffers)
 	create_empty_buffers(page, blocksize);
-	head = page->buffers;
+	bh = head = page->buffers;
-	if (!head) {
+	if (!bh)
-		ntfs_error(vol->sb, "Creation of empty buffers failed, cannot "
+		return -ENOMEM;
-				"read page.");
-		return -EINVAL;
-	}
 	blocks = PAGE_CACHE_SIZE >> blocksize_bits;
 	iblock = page->index << (PAGE_CACHE_SHIFT - blocksize_bits);
-	lblock = (((vol->_VMM(nr_mft_records) + 7) >> 3) + blocksize - 1) >>
+	lblock = (vol->mftbmp_allocated_size + blocksize - 1) >> blocksize_bits;
-			blocksize_bits;
-	ntfs_debug("blocks = 0x%x, iblock = 0x%lx, lblock = 0x%lx.", blocks,
+	/* Loop through all the buffers in the page. */
-			iblock, lblock);
-	bh = head;
 	nr = i = 0;
 	do {
-		ntfs_debug("In do loop, i = 0x%x, iblock = 0x%lx.", i,
+		BUG_ON(buffer_mapped(bh) || buffer_uptodate(bh));
-				iblock);
+		bh->b_dev = vol->mft_ino->i_dev;
-		if (buffer_uptodate(bh)) {
+		/* Is the block within the allowed limits? */
-			ntfs_debug("Buffer is already uptodate.");
-			continue;
-		}
-		if (!buffer_mapped(bh)) {
 		if (iblock < lblock) {
-				if (ntfs_mftbmp_get_block(vol, iblock, bh))
+			/* Convert iblock into corresponding vcn and offset. */
+			vcn = (VCN)iblock << blocksize_bits >>
+					vol->cluster_size_bits;
+			vcn_ofs = ((VCN)iblock << blocksize_bits) &
+					vol->cluster_size_mask;
+			/* Convert the vcn to the corresponding lcn. */
+			down_read(&vol->mftbmp_rl.lock);
+			lcn = vcn_to_lcn(vol->mftbmp_rl.rl, vcn);
+			up_read(&vol->mftbmp_rl.lock);
+			/* Successful remap. */
+			if (lcn >= 0) {
+				/* Setup buffer head to correct block. */
+				bh->b_blocknr = ((lcn << vol->cluster_size_bits)
+						+ vcn_ofs) >> blocksize_bits;
+				bh->b_state |= (1UL << BH_Mapped);
+				arr[nr++] = bh;
 				continue;
 			}
-			if (!buffer_mapped(bh)) {
+			if (lcn != LCN_HOLE) {
-				ntfs_debug("Buffer is not mapped, setting "
+				/* Hard error, zero out region. */
-						"uptodate.");
+				SetPageError(page);
-				memset(kmap(page) + i*blocksize, 0, blocksize);
+				ntfs_error(vol->sb, "vcn_to_lcn(vcn = 0x%Lx) "
-				flush_dcache_page(page);
+						"failed with error code "
-				kunmap(page);
+						"0x%Lx.", (long long)vcn,
-				set_bit(BH_Uptodate, &bh->b_state);
+						(long long)-lcn);
-				continue;
+				// FIXME: Depending on vol->on_errors, do
+				// something.
+			}
 		}
 		/*
-			 * ntfs_mftbmp_get_block() might have updated the
+		 * Either iblock was outside lblock limits or vcn_to_lcn()
-			 * buffer synchronously.
+		 * returned error. Just zero that portion of the page and set
+		 * the buffer uptodate.
 		 */
-			if (buffer_uptodate(bh)) {
+		bh->b_blocknr = -1UL;
-				ntfs_debug("Buffer is now uptodate.");
+		bh->b_state &= ~(1UL << BH_Mapped);
-				continue;
+		memset(kmap(page) + i * blocksize, 0, blocksize);
-			}
+		flush_dcache_page(page);
-		}
+		kunmap(page);
-		arr[nr++] = bh;
+		set_bit(BH_Uptodate, &bh->b_state);
 	} while (i++, iblock++, (bh = bh->b_this_page) != head);
-	ntfs_debug("After do loop, i = 0x%x, iblock = 0x%lx, nr = 0x%x.", i,
-			iblock, nr);
+	/* Check we have at least one buffer ready for i/o. */
-	if (!nr) {
+	if (nr) {
-		/* All buffers are uptodate - set the page uptodate as well. */
+		/* Lock the buffers. */
-		ntfs_debug("All buffers are uptodate, returning 0.");
-		SetPageUptodate(page);
-		UnlockPage(page);
-		return 0;
-	}
-	/* Stage two: lock the buffers */
-	ntfs_debug("Locking buffers.");
 		for (i = 0; i < nr; i++) {
-		struct buffer_head *bh = arr[i];
+			struct buffer_head *tbh = arr[i];
-		lock_buffer(bh);
+			lock_buffer(tbh);
-		set_buffer_async_io(bh);
+			tbh->b_end_io = end_buffer_read_mftbmp_async;
+			mark_buffer_async(tbh, 1);
 		}
-	/* Stage 3: start the IO */
+		/* Finally, start i/o on the buffers. */
-	ntfs_debug("Starting IO on buffers.");
 		for (i = 0; i < nr; i++)
 			submit_bh(READ, arr[i]);
-	ntfs_debug("Done.");
 		return 0;
+	}
+	/* No i/o was scheduled on any of the buffers. */
+	if (!PageError(page))
+		SetPageUptodate(page);
+	else /* Signal synchronous i/o error. */
+		nr = -EIO;
+	UnlockPage(page);
+	return nr;
 }
 /**
@@ -432,7 +496,9 @@ static void end_buffer_read_mst_async(struct buffer_head *bh, int uptodate)
 	mark_buffer_uptodate(bh, uptodate);
 	page = bh->b_page;
 	ni = NTFS_I(page->mapping->host);
 	if (likely(uptodate)) {
 		s64 file_ofs;
@@ -445,36 +511,27 @@ static void end_buffer_read_mst_async(struct buffer_head *bh, int uptodate)
 			if (file_ofs < ni->initialized_size)
 				ofs = ni->initialized_size - file_ofs;
 			addr = kmap_atomic(page, KM_BIO_IRQ);
-			memset(addr + bh_offset(bh) + ofs, 0,
+			memset(addr + bh_offset(bh) + ofs, 0, bh->b_size - ofs);
-					bh->b_size - ofs);
 			flush_dcache_page(page);
 			kunmap_atomic(addr, KM_BIO_IRQ);
 		}
 	} else
 		SetPageError(page);
-	/*
-	 * Be _very_ careful from here on. Bad things can happen if
-	 * two buffer heads end IO at almost the same time and both
-	 * decide that the page is now completely done.
-	 *
-	 * Async buffer_heads are here only as labels for IO, and get
-	 * thrown away once the IO for this page is complete.  IO is
-	 * deemed complete once all buffers have been visited
-	 * (b_count==0) and are now unlocked. We must make sure that
-	 * only the _last_ buffer that decrements its count is the one
-	 * that unlock the page..
-	 */
 	spin_lock_irqsave(&page_uptodate_lock, flags);
 	mark_buffer_async(bh, 0);
 	unlock_buffer(bh);
 	tmp = bh->b_this_page;
 	while (tmp != bh) {
-		if (buffer_async(tmp) && buffer_locked(tmp))
+		if (buffer_locked(tmp)) {
+			if (buffer_async(tmp))
 				goto still_busy;
+		} else if (!buffer_uptodate(tmp))
+			SetPageError(page);
 		tmp = tmp->b_this_page;
 	}
-	/* OK, the async IO on this page is complete. */
 	spin_unlock_irqrestore(&page_uptodate_lock, flags);
 	/*
 	 * If none of the buffers had errors then we can set the page uptodate,
@@ -544,9 +601,7 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 {
 	VCN vcn;
 	LCN lcn;
-	struct inode *vi;
 	ntfs_inode *ni;
-	struct super_block *sb;
 	ntfs_volume *vol;
 	struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
 	sector_t iblock, lblock;
@@ -554,28 +609,24 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 	int i, nr;
 	unsigned char blocksize_bits;
-	/* The page must be locked. */
 	if (!PageLocked(page))
 		PAGE_BUG(page);
-	/* Get the VFS and ntfs nodes as well as the super blocks for page. */
-	vi = page->mapping->host;
-	ni = NTFS_I(vi);
-	sb = vi->i_sb;
-	vol = NTFS_SB(sb);
-	blocksize = sb->s_blocksize;
+	ni = NTFS_I(page->mapping->host);
-	blocksize_bits = sb->s_blocksize_bits;
+	vol = ni->vol;
+	blocksize_bits = VFS_I(ni)->i_blkbits;
+	blocksize = 1 << blocksize_bits;
-	/* We need to create buffers for the page so we can do low level io. */
 	create_empty_buffers(page, blocksize);
+	bh = head = page->buffers;
+	if (!bh)
+		return -ENOMEM;
 	blocks = PAGE_CACHE_SIZE >> blocksize_bits;
 	iblock = page->index << (PAGE_CACHE_SHIFT - blocksize_bits);
 	lblock = (ni->allocated_size + blocksize - 1) >> blocksize_bits;
-	bh = head = page->buffers;
-	BUG_ON(!bh);
 #ifdef DEBUG
 	if (unlikely(!ni->run_list.rl && !ni->mft_no))
 		panic("NTFS: $MFT/$DATA run list has been unmapped! This is a "
@@ -583,10 +634,10 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 #endif
 	/* Loop through all the buffers in the page. */
-	i = nr = 0;
+	nr = i = 0;
 	do {
 		BUG_ON(buffer_mapped(bh) || buffer_uptodate(bh));
-		bh->b_dev = vi->i_dev;
+		bh->b_dev = VFS_I(ni)->i_dev;
 		/* Is the block within the allowed limits? */
 		if (iblock < lblock) {
 			BOOL is_retry = FALSE;
@@ -620,10 +671,11 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 					goto retry_remap;
 			}
 			/* Hard error, zero out region. */
-			ntfs_error(sb, "vcn_to_lcn(vcn = 0x%Lx) failed with "
+			SetPageError(page);
-					"error code 0x%Lx%s.", (long long)vcn,
+			ntfs_error(vol->sb, "vcn_to_lcn(vcn = 0x%Lx) failed "
-					(long long)-lcn, is_retry ? " even "
+					"with error code 0x%Lx%s.",
-					"after retrying" : "");
+					(long long)vcn, (long long)-lcn,
+					is_retry ? " even after retrying" : "");
 			// FIXME: Depending on vol->on_errors, do something.
 		}
 		/*
@@ -640,7 +692,7 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 		set_bit(BH_Uptodate, &bh->b_state);
 	} while (i++, iblock++, (bh = bh->b_this_page) != head);
-	/* Check we have at least one buffer ready for io. */
+	/* Check we have at least one buffer ready for i/o. */
 	if (nr) {
 		/* Lock the buffers. */
 		for (i = 0; i < nr; i++) {
@@ -649,16 +701,18 @@ int ntfs_mst_readpage(struct file *dir, struct page *page)
 			tbh->b_end_io = end_buffer_read_mst_async;
 			mark_buffer_async(tbh, 1);
 		}
-		/* Finally, start io on the buffers. */
+		/* Finally, start i/o on the buffers. */
 		for (i = 0; i < nr; i++)
 			submit_bh(READ, arr[i]);
 		return 0;
 	}
-	/* We didn't schedule any io on any of the buffers. */
+	/* No i/o was scheduled on any of the buffers. */
-	ntfs_error(sb, "No I/O was scheduled on any buffers. Page I/O error.");
+	if (!PageError(page))
-	SetPageError(page);
+		SetPageUptodate(page);
+	else /* Signal synchronous i/o error. */
+		nr = -EIO;
 	UnlockPage(page);
-	return -EIO;
+	return nr;
 }
 /* Address space operations for accessing normal file data. */

--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -925,8 +925,12 @@ static BOOL load_system_files(ntfs_volume *vol)
 	vol->mftbmp_rl.rl = rl;
 	vol->mftbmp_mapping.a_ops = &ntfs_mftbmp_aops;
-	/* Not inode data, set to NULL. Our mft bitmap access kludge... */
+	/*
-	vol->mftbmp_mapping.host = NULL;
+	 * Not inode data, set to volume. Our mft bitmap access kludge...
+	 * We can only pray this is not going to cause problems... If it does
+	 * cause problems we will need a fake inode for this.
+	 */
+	vol->mftbmp_mapping.host = (struct inode*)vol;
 	// FIXME: If mounting read-only, it would be ok to ignore errors when
 	// loading the mftbmp but we then need to make sure nobody remounts the