1. 13 Feb, 2008 4 commits
    • Stephen Hemminger's avatar
      fib_trie: /proc/net/route performance improvement · 8315f5d8
      Stephen Hemminger authored
      Use key/offset caching to change /proc/net/route (use by iputils route)
      from O(n^2) to O(n). This improves performance from 30sec with 160,000
      routes to 1sec.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8315f5d8
    • Stephen Hemminger's avatar
      fib_trie: handle empty tree · ec28cf73
      Stephen Hemminger authored
      This fixes possible problems when trie_firstleaf() returns NULL
      to trie_leafindex().
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec28cf73
    • David S. Miller's avatar
      [IPV4]: Remove IP_TOS setting privilege checks. · e4f8b5d4
      David S. Miller authored
      Various RFCs have all sorts of things to say about the CS field of the
      DSCP value.  In particular they try to make the distinction between
      values that should be used by "user applications" and things like
      routing daemons.
      
      This seems to have influenced the CAP_NET_ADMIN check which exists for
      IP_TOS socket option settings, but in fact it has an off-by-one error
      so it wasn't allowing CS5 which is meant for "user applications" as
      well.
      
      Further adding to the inconsistency and brokenness here, IPV6 does not
      validate the DSCP values specified for the IPV6_TCLASS socket option.
      
      The real actual uses of these TOS values are system specific in the
      final analysis, and these RFC recommendations are just that, "a
      recommendation".  In fact the standards very purposefully use
      "SHOULD" and "SHOULD NOT" when describing how these values can be
      used.
      
      In the final analysis the only clean way to provide consistency here
      is to remove the CAP_NET_ADMIN check.  The alternatives just don't
      work out:
      
      1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
         setups.
      
      2) If we just fix the off-by-one error in the class comparison in
         IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
         default.  So people will just ask for a sysctl asking to
         override that.
      
      I checked several other freely available kernel trees and they
      do not make any privilege checks in this area like we do.  For
      the BSD stacks, this goes back all the way to Stevens Volume 2
      and beyond.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f8b5d4
    • David S. Miller's avatar
  2. 12 Feb, 2008 9 commits
    • Linus Torvalds's avatar
      WMI: initialize wmi_blocks.list even if ACPI is disabled · 96b5a46e
      Linus Torvalds authored
      Even if we don't want to register the WMI driver, we should initialize
      the wmi_blocks list to be empty, since we don't want the wmi helper
      functions to oops just because that basic list has not even been set up.
      
      With this, "find_guid()" will happily return "not found" rather than
      oopsing all over the place, and the callers will then just automatically
      return false or AE_NOT_FOUND as appropriate.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96b5a46e
    • Roland McGrath's avatar
      x86: vdso_install fix · 2c158269
      Roland McGrath authored
      The makefile magic for installing the 32-bit vdso images on disk had a
      little error.  A single-line change would fix that bug, but this does a
      little more to reduce the error-prone duplication of this bit of
      makefile variable magic.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c158269
    • KOSAKI Motohiro's avatar
      mempolicy: silently restrict nodemask to allowed nodes · 31f1de46
      KOSAKI Motohiro authored
      Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
      presence of memoryless nodes.  This patch attempts to fix that problem.
      
      Some background:
      
      numactl --interleave=all calls set_mempolicy(2) with a fully populated
      [out to MAXNUMNODES] nodemask.  set_mempolicy() [in do_set_mempolicy()]
      calls contextualize_policy() which requires that the nodemask be a
      subset of the current task's mems_allowed; else EINVAL will be returned.
      
      A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
      i.e., nodes with memory.  So, a fully populated nodemask will be
      declared invalid if it includes memoryless nodes.
      
        NOTE:  the same thing will occur when running in a cpuset
               with restricted mem_allowed--for the same reason:
               node mask contains dis-allowed nodes.
      
      mbind(2), on the other hand, just masks off any nodes in the nodemask
      that are not included in the caller's mems_allowed.
      
      In each case [mbind() and set_mempolicy()], mpol_check_policy() will
      complain [again, resulting in EINVAL] if the nodemask contains any
      memoryless nodes.  This is somewhat redundant as mpol_new() will remove
      memoryless nodes for interleave policy, as will bind_zonelist()--called
      by mpol_new() for BIND policy.
      
      Proposed fix:
      
      1) modify contextualize_policy logic to:
         a) remember whether the incoming node mask is empty.
         b) if not, restrict the nodemask to allowed nodes, as is
            currently done in-line for mbind().  This guarantees
            that the resulting mask includes only nodes with memory.
      
            NOTE:  this is a [benign, IMO] change in behavior for
                   set_mempolicy().  Dis-allowed nodes will be
                   silently ignored, rather than returning an error.
      
         c) fold this code into mpol_check_policy(), replace 2 calls to
            contextualize_policy() to call mpol_check_policy() directly
            and remove contextualize_policy().
      
      2) In existing mpol_check_policy() logic, after "contextualization":
         a) MPOL_DEFAULT:  require that in coming mask "was_empty"
         b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask
            contains at least one node.
         c) add a case for MPOL_PREFERRED:  if in coming was not empty
            and resulting mask IS empty, user specified invalid nodes.
            Return EINVAL.
         c) remove the now redundant check for memoryless nodes
      
      3) remove the now redundant masking of policy nodes for interleave
         policy from mpol_new().
      
      4) Now that mpol_check_policy() contextualizes the nodemask, remove
         the in-line nodes_and() from sys_mbind().  I believe that this
         restores mbind() to the behavior before the memoryless-nodes
         patch series.  E.g., we'll no longer treat an invalid nodemask
         with MPOL_PREFERRED as local allocation.
      
      [ Patch history:
      
        v1 -> v2:
         - Communicate whether or not incoming node mask was empty to
           mpol_check_policy() for better error checking.
         - As suggested by David Rientjes, remove the now unused
           cpuset_nodes_subset_current_mems_allowed() from cpuset.h
      
        v2 -> v3:
         - As suggested by Kosaki Motohito, fold the "contextualization"
           of policy nodemask into mpol_check_policy().  Looks a little
           cleaner. ]
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31f1de46
    • Linus Torvalds's avatar
    • Jonathan Corbet's avatar
      Be more robust about bad arguments in get_user_pages() · 900cf086
      Jonathan Corbet authored
      So I spent a while pounding my head against my monitor trying to figure
      out the vmsplice() vulnerability - how could a failure to check for
      *read* access turn into a root exploit? It turns out that it's a buffer
      overflow problem which is made easy by the way get_user_pages() is
      coded.
      
      In particular, "len" is a signed int, and it is only checked at the
      *end* of a do {} while() loop.  So, if it is passed in as zero, the loop
      will execute once and decrement len to -1.  At that point, the loop will
      proceed until the next invalid address is found; in the process, it will
      likely overflow the pages array passed in to get_user_pages().
      
      I think that, if get_user_pages() has been asked to grab zero pages,
      that's what it should do.  Thus this patch; it is, among other things,
      enough to block the (already fixed) root exploit and any others which
      might be lurking in similar code.  I also think that the number of pages
      should be unsigned, but changing the prototype of this function probably
      requires some more careful review.
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      900cf086
    • Linus Torvalds's avatar
    • Pekka Enberg's avatar
      Add Matt to MAINTAINERS as a SLAB allocator maintainer · c76d118e
      Pekka Enberg authored
      Matt is already the maintainer of SLOB which is one of the "SLAB" allocators in
      the kernel so add him to MAINTAINERS.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c76d118e
    • Linus Torvalds's avatar
      Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · a17b7a39
      Linus Torvalds authored
      * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
        sata_mv: platform driver allocs dma without create
        pata_ninja32: setup changes
        pata_legacy: typo fix
        pata_amd: Note in the module description it handles Nvidia
        sata_mv: fix loop with last port
        libata: ignore deverr on SETXFER if mode is configured
        pata_via: fix SATA cable detection on cx700
      a17b7a39
    • Andi Kleen's avatar
      Make topology fallback macros reference their arguments. · 271cad6d
      Andi Kleen authored
      This avoids warnings with unreferenced variables in the !NUMA case.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      271cad6d
  3. 11 Feb, 2008 27 commits