1. 16 Feb, 2011 18 commits
    • Tejun Heo's avatar
      x86-64, NUMA: Unify the rest of memblk registration · fd0435d8
      Tejun Heo authored
      Move the remaining memblk registration logic from acpi_scan_nodes() to
      numa_register_memblks() and initmem_init().
      
      This applies nodes_cover_memory() sanity check, memory node sorting
      and node_online() checking, which were only applied to acpi, to all
      init methods.
      
      As all memblk registration is moved to common code, active range
      clearing is moved to initmem_init() too and removed from bad_srat().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      fd0435d8
    • Tejun Heo's avatar
      x86-64, NUMA: Unify use of memblk in all init methods · 43a662f0
      Tejun Heo authored
      Make both amd and dummy use numa_add_memblk() to describe the detected
      memory blocks.  This allows initmem_init() to call
      numa_register_memblk() regardless of init method in use.  Drop custom
      memory registration codes from amd and dummy.
      
      After this change, memblk merge/cleanup in numa_register_memblks() is
      applied to all init methods.
      
      As this makes compute_hash_shift() and numa_register_memblks() used
      only inside numa_64.c, make them static.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      43a662f0
    • Tejun Heo's avatar
      x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk() · ef396ec9
      Tejun Heo authored
      Factor out memblk handling from srat_64.c into two functions in
      numa_64.c.  This patch doesn't introduce any behavior change.  The
      next patch will make all init methods use these functions.
      
      - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS.
            Reported by Ingo.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      ef396ec9
    • Tejun Heo's avatar
      x86-64, NUMA: Kill {acpi|amd}_get_nodes() · 19095548
      Tejun Heo authored
      With common numa_nodes[], common code in numa_64.c can access it
      directly.  Copy directly and kill {acpi|amd}_get_nodes().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      19095548
    • Tejun Heo's avatar
      x86-64, NUMA: Use common numa_nodes[] · 206e4208
      Tejun Heo authored
      ACPI and amd are using separate nodes[] array.  Add numa_nodes[] and
      use them in all NUMA init methods.  cutoff_node() cleanup is moved
      from srat_64.c to numa_64.c and applied in initmem_init() regardless
      of init methods.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      206e4208
    • Tejun Heo's avatar
      x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() · 45fe6c78
      Tejun Heo authored
      This brings amd initialization behavior closer to that of acpi.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      45fe6c78
    • Tejun Heo's avatar
      x86-64, NUMA: Remove local variable found from amd_numa_init() · 99df738c
      Tejun Heo authored
      Use weight count on mem_nodes_parsed instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      99df738c
    • Tejun Heo's avatar
      x86-64, NUMA: Use common {cpu|mem}_nodes_parsed · ec8cf29b
      Tejun Heo authored
      ACPI and amd are using separate nodes_parsed masks.  Add
      {cpu|mem}_nodes_parsed and use them in all NUMA init methods.
      Initialization of the masks and building node_possible_map are now
      handled commonly by initmem_init().
      
      dummy_numa_init() is updated to set node 0 on both masks.  While at
      it, move the info messages from scan to init.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      ec8cf29b
    • Tejun Heo's avatar
      x86-64, NUMA: Restructure initmem_init() · ffe77a46
      Tejun Heo authored
      Reorganize initmem_init() such that,
      
      * Different NUMA init methods are iterated in a consistent way.
      
      * Each iteration re-initializes all the parameters and different
        method can be tried after a failure.
      
      * Dummy init is handled the same as other methods.
      
      Apart from how retry after failure, this patch doesn't change the
      behavior.  The call sequences are kept equivalent across the
      conversion.
      
      After the change, bad_srat() doesn't need to clear apic to node
      mapping or worry about numa_off.  Simplified accordingly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      ffe77a46
    • Tejun Heo's avatar
      x86, NUMA: Move *_numa_init() invocations into initmem_init() · d8fc3afc
      Tejun Heo authored
      There's no reason for these to live in setup_arch().  Move them inside
      initmem_init().
      
      - v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
        Fixed.  Found by Ankita.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      d8fc3afc
    • Tejun Heo's avatar
      x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value · a9aec56a
      Tejun Heo authored
      Because of the way ACPI tables are parsed, the generic
      acpi_numa_init() couldn't return failure when error was detected by
      arch hooks.  Instead, the failure state was recorded and later arch
      dependent init hook - acpi_scan_nodes() - would fail.
      
      Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be
      indicated as return value immediately.  This is in preparation for
      further NUMA init cleanups.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      a9aec56a
    • Tejun Heo's avatar
      x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values · 940fed2e
      Tejun Heo authored
      The functions used during NUMA initialization - *_numa_init() and
      *_scan_nodes() - have different arguments and return values.  Unify
      them such that they all take no argument and return 0 on success and
      -errno on failure.  This is in preparation for further NUMA init
      cleanups.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      940fed2e
    • Tejun Heo's avatar
      x86, NUMA: Drop @start/last_pfn from initmem_init() · 86ef4dbf
      Tejun Heo authored
      initmem_init() extensively accesses and modifies global data
      structures and the parameters aren't even followed depending on which
      path is being used.  Drop @start/last_pfn and let it deal with
      @max_pfn directly.  This is in preparation for further NUMA init
      cleanups.
      
      - v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
        Fixed.  Found by Yinghai.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      86ef4dbf
    • Tejun Heo's avatar
      x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() · 13081df5
      Tejun Heo authored
      Hotplug node handling in acpi_numa_memory_affinity_init() was
      unnecessarily complicated with storing the original nodes[] entry and
      restoring it afterwards.  Simplify it by not modifying the nodes[]
      entry for hotplug nodes from the beginning.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      13081df5
    • Tejun Heo's avatar
      x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones · 7d36b7bc
      Tejun Heo authored
      Dummy node initialization in initmem_init() didn't initialize apicid
      to node mapping and set cpu to node mapping directly by caling
      numa_set_node(), which is different from non-dummy init paths.
      
      Update it such that they behave similarly.  Initialize apicid to node
      mapping and call numa_init_array().  The actual cpu to node mapping is
      handled by init_cpu_to_node() later.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      7d36b7bc
    • Ingo Molnar's avatar
      Merge branch 'x86/amd-nb' into x86/mm · 275a88d3
      Ingo Molnar authored
      Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts
                    with ongoing NUMA work.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      275a88d3
    • Ingo Molnar's avatar
      Merge branch 'x86/numa' into x86/mm · 52b8b8d7
      Ingo Molnar authored
      Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      52b8b8d7
    • Ingo Molnar's avatar
      Merge branch 'x86/bootmem' into x86/mm · 02ac81a8
      Ingo Molnar authored
      Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree
                    and prevent conflicts.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      02ac81a8
  2. 15 Feb, 2011 1 commit
    • Borislav Petkov's avatar
      x86, amd: Initialize variable properly · 9e81509e
      Borislav Petkov authored
      Commit d518573d ("x86, amd: Normalize compute unit IDs on
      multi-node processors") introduced compute unit normalization
      but causes a compiler warning:
      
       arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp':
       arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function
       arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here
      
      The compiler is right - initialize it with a proper value.
      
      Also, fix up a comment while at it.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9e81509e
  3. 14 Feb, 2011 10 commits
  4. 13 Feb, 2011 6 commits
  5. 12 Feb, 2011 5 commits
    • Grant Likely's avatar
      MAINTAINERS: Add entry for GPIO subsystem · a0dc00b4
      Grant Likely authored
      I'll probably regret this....
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0dc00b4
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · c8e0b00e
      Linus Torvalds authored
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        jbd2: call __jbd2_log_start_commit with j_state_lock write locked
        ext4: serialize unaligned asynchronous DIO
        ext4: make grpinfo slab cache names static
        ext4: Fix data corruption with multi-block writepages support
        ext4: fix up ext4 error handling
        ext4: unregister features interface on module unload
        ext4: fix panic on module unload when stopping lazyinit thread
      c8e0b00e
    • Theodore Ts'o's avatar
      jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831
      Theodore Ts'o authored
      On an SMP ARM system running ext4, I've received a report that the
      first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      While investigating possible causes for this problem, I noticed that
      __jbd2_log_start_commit() is getting called with j_state_lock only
      read-locked, in spite of the fact that it's possible for it might
      j_commit_request.  Fix this by grabbing the necessary information so
      we can test to see if we need to start a new transaction before
      dropping the read lock, and then calling jbd2_log_start_commit() which
      will grab the write lock.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e4471831
    • Eric Sandeen's avatar
      ext4: serialize unaligned asynchronous DIO · e9e3bcec
      Eric Sandeen authored
      ext4 has a data corruption case when doing non-block-aligned
      asynchronous direct IO into a sparse file, as demonstrated
      by xfstest 240.
      
      The root cause is that while ext4 preallocates space in the
      hole, mappings of that space still look "new" and 
      dio_zero_block() will zero out the unwritten portions.  When
      more than one AIO thread is going, they both find this "new"
      block and race to zero out their portion; this is uncoordinated
      and causes data corruption.
      
      Dave Chinner fixed this for xfs by simply serializing all
      unaligned asynchronous direct IO.  I've done the same here.
      The difference is that we only wait on conversions, not all IO.
      This is a very big hammer, and I'm not very pleased with
      stuffing this into ext4_file_write().  But since ext4 is
      DIO_LOCKING, we need to serialize it at this high level.
      
      I tried to move this into ext4_ext_direct_IO, but by then
      we have the i_mutex already, and we will wait on the
      work queue to do conversions - which must also take the
      i_mutex.  So that won't work.
      
      This was originally exposed by qemu-kvm installing to
      a raw disk image with a normal sector-63 alignment.  I've
      tested a backport of this patch with qemu, and it does
      avoid the corruption.  It is also quite a lot slower
      (14 min for package installs, vs. 8 min for well-aligned)
      but I'll take slow correctness over fast corruption any day.
      
      Mingming suggested that we can track outstanding
      conversions, and wait on those so that non-sparse
      files won't be affected, and I've implemented that here;
      unaligned AIO to nonsparse files won't take a perf hit.
      
      [tytso@mit.edu: Keep the mutex as a hashed array instead
       of bloating the ext4 inode]
      
      [tytso@mit.edu: Fix up namespace issues so that global
       variables are protected with an "ext4_" prefix.]
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e9e3bcec
    • Eric Sandeen's avatar
      ext4: make grpinfo slab cache names static · 2892c15d
      Eric Sandeen authored
      In 2.6.37 I was running into oopses with repeated module
      loads & unloads.  I tracked this down to:
      
      fb1813f4 ext4: use dedicated slab caches for group_info structures
      
      (this was in addition to the features advert unload problem)
      
      The kstrdup & subsequent kfree of the cache name was causing
      a double free.  In slub, at least, if I read it right it allocates
      & frees the name itself, slab seems to do something different...
      so in slub I think we were leaking -our- cachep->name, and double
      freeing the one allocated by slub.
      
      After getting lost in slab/slub/slob a bit, I just looked at other
      sized-caches that get allocated.  jbd2, biovec, sgpool all do it
      more or less the way jbd2 does.  Below patch follows the jbd2
      method of dynamically allocating a cache at mount time from
      a list of static names.
      
      (This might also possibly fix a race creating the caches with
      parallel mounts running).
      
      [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
      the original patch]
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2892c15d