- 16 Oct, 2007 40 commits
-
-
Christoph Lameter authored
Why do we need to support memoryless nodes? KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > For fujitsu, problem is called "empty" node. > > When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init) > creates nodes, which includes no memory, no cpu. > > I tried to remove empty-node in past, but that was denied. > It was because we can hot-add cpu to the empty node. > (node-hotplug triggered by cpu is not implemented now. and it will be ugly.) > > > For HP, (Lee can comment on this later), they have memory-less-node. > As far as I hear, HP's machine can have following configration. > > (example) > Node0: CPU0 memory AAA MB > Node1: CPU1 memory AAA MB > Node2: CPU2 memory AAA MB > Node3: CPU3 memory AAA MB > Node4: Memory XXX GB > > AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap. > After boot, only Node 4 has valid memory (but have no cpu.) > > Maybe this is memory-interleave by firmware config. Christoph Lameter <clameter@sgi.com> wrote: > Future SGI platforms (actually also current one can have but nothing like > that is deployed to my knowledge) have nodes with only cpus. Current SGI > platforms have nodes with just I/O that we so far cannot manage in the > core. So the arch code maps them to the nearest memory node. Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote: > For the HP platforms, we can configure each cell with from 0% to 100% > "cell local memory". When we configure with <100% CLM, the "missing > percentages" are interleaved by hardware on a cache-line granularity to > improve bandwidth at the expense of latency for numa-challenged > applications [and OSes, but not our problem ;-)]. When we boot Linux on > such a config, all of the real nodes have no memory--it all resides in a > single interleaved pseudo-node. > > When we boot Linux on a 100% CLM configuration [== NUMA], we still have > the interleaved pseudo-node. It contains a few hundred MB stolen from > the real nodes to contain the DMA zone. [Interleaved memory resides at > phys addr 0]. The memoryless-nodes patches, along with the zoneorder > patches, support this config as well. > > Also, when we boot a NUMA config with the "mem=" command line, > specifying less memory than actually exists, Linux takes the excluded > memory "off the top" rather than distributing it across the nodes. This > can result in memoryless nodes, as well. > This patch: Preparation for memoryless node patches. Provide a generic way to keep nodemasks describing various characteristics of NUMA nodes. Remove the node_online_map and the node_possible map and realize the same functionality using two nodes stats: N_POSSIBLE and N_ONLINE. [Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config] Signed-off-by: Christoph Lameter <clameter@sgi.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and GFS2 were converted to the new aops, so we can make some simplifications for that. [michal.k.k.piotrowski@gmail.com: fix warning] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is now implemented in a way that does not create blocks in sparse regions, which is a silly thing for it to have been doing (isn't it?) ext2 survives fsx and fsstress. jfs is converted as well... ext3 should be easy to do (but not done yet). [akpm@linux-foundation.org: coding-style fixes] Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Plug ocfs2 into the ->write_begin and ->write_end aops. A bunch of custom code is now gone - the iovec iteration stuff during write and the ocfs2 splice write actor. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Andries Brouwer <Andries.Brouwer@cwi.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Convert udf to new aops. Also seem to have fixed pagecache corruption in udf_adinicb_commit_write -- page was marked uptodate when it is not. Also, fixed the silly setup where prepare_write was doing a kmap to be used in commit_write: just do kmap_atomic in write_end. Use libfs helpers to make this easier. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: <bfennema@falcon.csc.calpoly.edu> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
This also gets rid of a lot of useless read_file stuff. And also optimises the full page write case by marking a !uptodate page uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
[mszeredi] - don't send zero length write requests - it is not legal for the filesystem to return with zero written bytes Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
[akpm@linux-foundation.org: fix against git-nfs] [peterz@infradead.org: fix against git-nfs] Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Saveliev authored
This patch makes reiserfs to use AOP_FLAG_CONT_EXPAND in order to get rid of the special generic_cont_expand routine Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Saveliev authored
Convert reiserfs to new aops Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Saveliev authored
Make reiserfs to write via generic routines. Original reiserfs write optimized for big writes is deadlock rone Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Rework the generic block "cont" routines to handle the new aops. Supporting cont_prepare_write would take quite a lot of code to support, so remove it instead (and we later convert all filesystems to use it). write_begin gets passed AOP_FLAG_CONT_EXPAND when called from generic_cont_expand, so filesystems can avoid the old hacks they used. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Steven Whitehouse authored
Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Convert ext4 to use write_begin()/write_end() methods. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dmitriy Monakhov <dmonakhov@sw.ru> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Various fixes and improvements Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Implement new aops for some of the simpler filesystems. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Restore the KERNEL_DS optimisation, especially helpful to the 2copy write path. This may be a pretty questionable gain in most cases, especially after the legacy 2copy write path is removed, but it doesn't cost much. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dmitry Monakhov authored
Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know who still used cryptoapi, but theoretically it is possible. So let's leave things as they are. Loop device doesn't support partial write before Nick's "write_begin/write_end" patch set, and let's it behave the same way after. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [mark.fasheh@oracle.com: API design contributions, code review and fixes] [akpm@linux-foundation.org: various fixes] [dmonakhov@sw.ru: new aop block_write_begin fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
New buffers against uptodate pages are simply be marked uptodate, while the buffer_new bit remains set. This causes error-case code to zero out parts of those buffers because it thinks they contain stale data: wrong, they are actually uptodate so this is a data loss situation. Fix this by actually clearning buffer_new and marking the buffer dirty. It makes sense to always clear buffer_new before setting a buffer uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Piggin authored
Modify the core write() code so that it won't take a pagefault while holding a lock on the pagecache page. There are a number of different deadlocks possible if we try to do such a thing: 1. generic_buffered_write 2. lock_page 3. prepare_write 4. unlock_page+vmtruncate 5. copy_from_user 6. mmap_sem(r) 7. handle_mm_fault 8. lock_page (filemap_nopage) 9. commit_write 10. unlock_page a. sys_munmap / sys_mlock / others b. mmap_sem(w) c. make_pages_present d. get_user_pages e. handle_mm_fault f. lock_page (filemap_nopage) 2,8 - recursive deadlock if page is same 2,8;2,8 - ABBA deadlock is page is different 2,6;b,f - ABBA deadlock if page is same The solution is as follows: 1. If we find the destination page is uptodate, continue as normal, but use atomic usercopies which do not take pagefaults and do not zero the uncopied tail of the destination. The destination is already uptodate, so we can commit_write the full length even if there was a partial copy: it does not matter that the tail was not modified, because if it is dirtied and written back to disk it will not cause any problems (uptodate *means* that the destination page is as new or newer than the copy on disk). 1a. The above requires that fault_in_pages_readable correctly returns access information, because atomic usercopies cannot distinguish between non-present pages in a readable mapping, from lack of a readable mapping. 2. If we find the destination page is non uptodate, unlock it (this could be made slightly more optimal), then allocate a temporary page to copy the source data into. Relock the destination page and continue with the copy. However, instead of a usercopy (which might take a fault), copy the data from the pinned temporary page via the kernel address space. (also, rename maxlen to seglen, because it was confusing) This increases the CPU/memory copy cost by almost 50% on the affected workloads. That will be solved by introducing a new set of pagecache write aops in a subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-