1. 10 Aug, 2011 1 commit
    • Michal Hocko's avatar
      Revert "memcg: get rid of percpu_charge_mutex lock" · 9f50fad6
      Michal Hocko authored
      This reverts commit 8521fc50.
      
      The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
      bit operations is sufficient but that is not true.  Johannes Weiner has
      reported a crash during parallel memory cgroup removal:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
        IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
        Oops: 0000 [#1] PREEMPT SMP
        Pid: 19677, comm: rmdir Tainted: G        W   3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
        RIP: 0010:[<ffffffff81083b70>]  css_is_ancestor+0x20/0x70
        RSP: 0018:ffff880077b09c88  EFLAGS: 00010202
        Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
        Call Trace:
         [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
         [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
         [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
         [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
         [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
         [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
         [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
         [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
         [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
      
      We are crashing because we try to dereference cached memcg when we are
      checking whether we should wait for draining on the cache.  The cache is
      already cleaned up, though.
      
      There is also a theoretical chance that the cached memcg gets freed
      between we test for the FLUSHING_CACHED_CHARGE and dereference it in
      mem_cgroup_same_or_subtree:
      
              CPU0                    CPU1                         CPU2
        mem=stock->cached
        stock->cached=NULL
                                    clear_bit
                                                              test_and_set_bit
        test_bit()                    ...
        <preempted>             mem_cgroup_destroy
        use after free
      
      The percpu_charge_mutex protected from this race because sync draining
      is exclusive.
      
      It is safer to revert now and come up with a more parallel
      implementation later.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reported-by: default avatarJohannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarJohannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f50fad6
  2. 09 Aug, 2011 7 commits
    • Linus Torvalds's avatar
      47e180d6
    • Christoph Lameter's avatar
      slub: Fix partial count comparison confusion · 81107188
      Christoph Lameter authored
      deactivate_slab() has the comparison if more than the minimum number of
      partial pages are in the partial list wrong. An effect of this may be that
      empty pages are not freed from deactivate_slab(). The result could be an
      OOM due to growth of the partial slabs per node. Frees mostly occur from
      __slab_free which is okay so this would only affect use cases where a lot
      of switching around of per cpu slabs occur.
      
      Switching per cpu slabs occurs with high frequency if debugging options are
      enabled.
      Reported-and-tested-by: default avatarXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      81107188
    • Linus Torvalds's avatar
      Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 · e6a99d31
      Linus Torvalds authored
      * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
        slub: fix check_bytes() for slub debugging
        slub: Fix full list corruption if debugging is on
      e6a99d31
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 6bb615bc
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
        sound: pss - don't use the deprecated function check_region
        ALSA: timer - Add NULL-check for invalid slave timer
        ALSA: timer - Fix Oops at closing slave timer
        ASoC: Acknowledge WM8996 interrupts before acting on them
        ASoC: Rename WM8915 to WM8996
        ALSA: Fix dependency of CONFIG_SND_TEA575X
        ALSA: asihpi - use kzalloc()
        ALSA: snd-usb-caiaq: Fix keymap for RigKontrol3
        ALSA: snd-usb: Fix uninitialized variable usage
        ALSA: hda - Fix a complile warning in patch_via.c
        ALSA: hdspm - Fix uninitialized compile warnings
        ALSA: usb-audio - add quirk for Keith McMillen StringPort
        ALSA: snd-usb: operate on given mixer interface only
        ALSA: snd-usb: avoid dividing by zero on invalid input
        ALSA: snd-usb: Accept UAC2 FORMAT_TYPE descriptors with bLength > 6
        sound: oss/pas2: Remove CLOCK_TICK_RATE dependency from PAS16 driver
        ALSA: hda - Use auto-parser for ASUS UX50, Eee PC P901, S101 and P1005
        ALSA: hda - Fix digital-mic mono recording on ASUS Eee PC
        ASoC: sgtl5000: fix cache handling
        ASoC: Disable wm_hubs periodic DC servo update
      6bb615bc
    • Alan Cox's avatar
      gma500: Fix clashes with DRM updates · ab04fc58
      Alan Cox authored
      The private object support has migrated from gma500 into the DRM core,
      remove our now clashing copy.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab04fc58
    • Akinobu Mita's avatar
      slub: fix check_bytes() for slub debugging · ef62fb32
      Akinobu Mita authored
      The check_bytes() function is used by slub debugging.  It returns a pointer
      to the first unmatching byte for a character in the given memory area.
      
      If the character for matching byte is greater than 0x80, check_bytes()
      doesn't work.  Becuase 64-bit pattern is generated as below.
      
      	value64 = value | value << 8 | value << 16 | value << 24;
      	value64 = value64 | value64 << 32;
      
      The integer promotions are performed and sign-extended as the type of value
      is u8.  The upper 32 bits of value64 is 0xffffffff in the first line, and
      the second line has no effect.
      
      This fixes the 64-bit pattern generation.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Reviewed-by: default avatarMarcin Slusarz <marcin.slusarz@gmail.com>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      ef62fb32
    • Christoph Lameter's avatar
      slub: Fix full list corruption if debugging is on · 6fbabb20
      Christoph Lameter authored
      When a slab is freed by __slab_free() and the slab can only contain a
      single object ever then it was full (and therefore not on the partial
      lists but on the full list in the debug case) before we reached
      slab_empty.
      
      This caused the following full list corruption when SLUB debugging was enabled:
      
        [ 5913.233035] ------------[ cut here ]------------
        [ 5913.233097] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
        [ 5913.233101] Hardware name: Adamo 13
        [ 5913.233105] list_del corruption. prev->next should be ffffea000434fd20, but was ffffea0004199520
        [ 5913.233108] Modules linked in: nfs fscache fuse ebtable_nat ebtables ppdev parport_pc lp parport ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss xt_CHECKSUM sunrpc iptable_mangle bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables rfcomm bnep arc4 iwlagn snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel btusb mac80211 snd_hda_codec bluetooth snd_hwdep snd_seq snd_seq_device snd_pcm usb_debug dell_wmi sparse_keymap cdc_ether usbnet cdc_acm uvcvideo cdc_wdm mii cfg80211 snd_timer dell_laptop videodev dcdbas snd microcode v4l2_compat_ioctl32 soundcore joydev tg3 pcspkr snd_page_alloc iTCO_wdt i2c_i801 rfkill iTCO_vendor_support wmi virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
        [ 5913.233213] Pid: 0, comm: swapper Not tainted 3.0.0+ #127
        [ 5913.233213] Call Trace:
        [ 5913.233213]  <IRQ>  [<ffffffff8105df18>] warn_slowpath_common+0x83/0x9b
        [ 5913.233213]  [<ffffffff8105dfd3>] warn_slowpath_fmt+0x46/0x48
        [ 5913.233213]  [<ffffffff8127e7c1>] __list_del_entry+0x8d/0x98
        [ 5913.233213]  [<ffffffff8127e7da>] list_del+0xe/0x2d
        [ 5913.233213]  [<ffffffff814e0430>] __slab_free+0x1db/0x235
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff81133085>] kmem_cache_free+0x88/0x102
        [ 5913.233213]  [<ffffffff811706ab>] bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706e1>] bio_free+0x34/0x64
        [ 5913.233213]  [<ffffffff813dc390>] dm_bio_destructor+0x12/0x14
        [ 5913.233213]  [<ffffffff8116fef6>] bio_put+0x2b/0x2d
        [ 5913.233213]  [<ffffffff813dccab>] clone_endio+0x9e/0xb4
        [ 5913.233213]  [<ffffffff8116f7dd>] bio_endio+0x2d/0x2f
        [ 5913.233213]  [<ffffffffa00148da>] crypt_dec_pending+0x5c/0x8b [dm_crypt]
        [ 5913.233213]  [<ffffffffa00150a9>] crypt_endio+0x78/0x81 [dm_crypt]
      
      [ Full discussion here: https://lkml.org/lkml/2011/8/4/375 ]
      
      Make sure that we remove such a slab also from the full lists.
      Reported-and-tested-by: default avatarDave Jones <davej@redhat.com>
      Reported-and-tested-by: default avatarXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      6fbabb20
  3. 08 Aug, 2011 16 commits
  4. 07 Aug, 2011 16 commits
    • Linus Torvalds's avatar
      9e233113
    • Rafael J. Wysocki's avatar
      sh: Fix boot crash related to SCI · fc97114b
      Rafael J. Wysocki authored
      Commit d006199e72a9 ("serial: sh-sci: Regtype probing doesn't need to be
      fatal.") made sci_init_single() return when sci_probe_regmap() succeeds,
      although it should return when sci_probe_regmap() fails.  This causes
      systems using the serial sh-sci driver to crash during boot.
      
      Fix the problem by using the right return condition.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc97114b
    • Linus Torvalds's avatar
      arm: remove stale export of 'sha_transform' · f23c126b
      Linus Torvalds authored
      The generic library code already exports the generic function, this was
      left-over from the ARM-specific version that just got removed.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f23c126b
    • Linus Torvalds's avatar
      arm: remove "optimized" SHA1 routines · 4d448714
      Linus Torvalds authored
      Since commit 1eb19a12 ("lib/sha1: use the git implementation of
      SHA-1"), the ARM SHA1 routines no longer work.  The reason? They
      depended on the larger 320-byte workspace, and now the sha1 workspace is
      just 16 words (64 bytes).  So the assembly version would overwrite the
      stack randomly.
      
      The optimized asm version is also probably slower than the new improved
      C version, so there's no reason to keep it around.  At least that was
      the case in git, where what appears to be the same assembly language
      version was removed two years ago because the optimized C BLK_SHA1 code
      was faster.
      Reported-and-tested-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Nicolas Pitre <nico@fluxnic.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d448714
    • Al Viro's avatar
      fix rcu annotations noise in cred.h · 32955148
      Al Viro authored
      task->cred is declared as __rcu, and access to other tasks' ->cred is,
      indeed, protected.  Access to current->cred does not need rcu_dereference()
      at all, since only the task itself can change its ->cred.  sparse, of
      course, has no way of knowing that...
      
      Add force-cast in current_cred(), make current_fsuid() et.al. use it.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32955148
    • Linus Torvalds's avatar
      vfs: rename 'do_follow_link' to 'should_follow_link' · 7813b94a
      Linus Torvalds authored
      Al points out that the do_follow_link() helper function really is
      misnamed - it's about whether we should try to follow a symlink or not,
      not about actually doing the following.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7813b94a
    • Takashi Iwai's avatar
      ALSA: Fix dependency of CONFIG_SND_TEA575X · df944f66
      Takashi Iwai authored
      CONFIG_SND_TEA575X is enabled by RADIO_SF16FMR2, but the latter one is
      no PCI device.  Since tea575x-tuner itself is independent from the board
      bus type, the config should be moved out of SND_PCI dependency.
      Reported-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      df944f66
    • Thomas Meyer's avatar
      ALSA: asihpi - use kzalloc() · 67ada836
      Thomas Meyer authored
       Use kzalloc rather than kmalloc followed by memset with 0
      
       This considers some simple cases that are common and easy to validate
       Note in particular that there are no ...s in the rule, so all of the
       matched code has to be contiguous
      
       The semantic patch that makes this output is available
       in scripts/coccinelle/api/alloc/kzalloc-simple.cocci.
      
       More information about semantic patching is available at
       http://coccinelle.lip6.fr/Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      67ada836
    • Ari Savolainen's avatar
      Fix POSIX ACL permission check · 206b1d09
      Ari Savolainen authored
      After commit 3567866b: "RCUify freeing acls, let check_acl() go ahead in
      RCU mode if acl is cached" posix_acl_permission is being called with an
      unsupported flag and the permission check fails. This patch fixes the issue.
      Signed-off-by: default avatarAri Savolainen <ari.m.savolainen@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      206b1d09
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · c2f340a6
      Linus Torvalds authored
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        ore: Make ore its own module
        exofs: Rename raid engine from exofs/ios.c => ore
        exofs: ios: Move to a per inode components & device-table
        exofs: Move exofs specific osd operations out of ios.c
        exofs: Add offset/length to exofs_get_io_state
        exofs: Fix truncate for the raid-groups case
        exofs: Small cleanup of exofs_fill_super
        exofs: BUG: Avoid sbi realloc
        exofs: Remove pnfs-osd private definitions
        nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
      c2f340a6
    • Linus Torvalds's avatar
      vfs: optimize inode cache access patterns · 3ddcd056
      Linus Torvalds authored
      The inode structure layout is largely random, and some of the vfs paths
      really do care.  The path lookup in particular is already quite D$
      intensive, and profiles show that accessing the 'inode->i_op->xyz'
      fields is quite costly.
      
      We already optimized the dcache to not unnecessarily load the d_op
      structure for members that are often NULL using the DCACHE_OP_xyz bits
      in dentry->d_flags, and this does something very similar for the inode
      ops that are used during pathname lookup.
      
      It also re-orders the fields so that the fields accessed by 'stat' are
      together at the beginning of the inode structure, and roughly in the
      order accessed.
      
      The effect of this seems to be in the 1-2% range for an empty kernel
      "make -j" run (which is fairly kernel-intensive, mostly in filename
      lookup), so it's visible.  The numbers are fairly noisy, though, and
      likely depend a lot on exact microarchitecture.  So there's more tuning
      to be done.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ddcd056
    • Linus Torvalds's avatar
      vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e
      Linus Torvalds authored
      Gcc tends to generate better code with small integers, including the
      DCACHE_xyz flag tests - so move the common ones to be first in the list.
      Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
      DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
      tree.
      
      And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
      common case to be a nice straight-line fall-through.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      830c0f0e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7cd4767e
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: Compute protocol sequence numbers and fragment IDs using MD5.
        crypto: Move md5_transform to lib/md5.c
      7cd4767e
    • Boaz Harrosh's avatar
      ore: Make ore its own module · cf283ade
      Boaz Harrosh authored
      Export everything from ore need exporting. Change Kbuild and Kconfig
      to build ore.ko as an independent module. Import ore from exofs
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      cf283ade
    • Boaz Harrosh's avatar
      exofs: Rename raid engine from exofs/ios.c => ore · 8ff660ab
      Boaz Harrosh authored
      ORE stands for "Objects Raid Engine"
      
      This patch is a mechanical rename of everything that was in ios.c
      and its API declaration to an ore.c and an osd_ore.h header. The ore
      engine will later be used by the pnfs objects layout driver.
      
      * File ios.c => ore.c
      
      * Declaration of types and API are moved from exofs.h to a new
        osd_ore.h
      
      * All used types are prefixed by ore_ from their exofs_ name.
      
      * Shift includes from exofs.h to osd_ore.h so osd_ore.h is
        independent, include it from exofs.h.
      
      Other than a pure rename there are no other changes. Next patch
      will move the ore into it's own module and will export the API
      to be used by exofs and later the layout driver
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      8ff660ab
    • Boaz Harrosh's avatar
      exofs: ios: Move to a per inode components & device-table · 9e9db456
      Boaz Harrosh authored
      Exofs raid engine was saving on memory space by having a single layout-info,
      single pid, and a single device-table, global to the filesystem. Then passing
      a credential and object_id info at the io_state level, private for each
      inode. It would also devise this contraption of rotating the device table
      view for each inode->ino to spread out the device usage.
      
      This is not compatible with the pnfs-objects standard, demanding that
      each inode can have it's own layout-info, device-table, and each object
      component it's own pid, oid and creds.
      
      So: Bring exofs raid engine to be usable for generic pnfs-objects use by:
      
      * Define an exofs_comp structure that holds obj_id and credential info.
      
      * Break up exofs_layout struct to an exofs_components structure that holds a
        possible array of exofs_comp and the array of devices + the size of the
        arrays.
      
      * Add a "comps" parameter to get_io_state() that specifies the ids creds
        and device array to use for each IO.
      
        This enables to keep the layout global, but the device-table view, creds
        and IDs at the inode level. It only adds two 64bit to each inode, since
        some of these members already existed in another form.
      
      * ios raid engine now access layout-info and comps-info through the passed
        pointers. Everything is pre-prepared by caller for generic access of
        these structures and arrays.
      
      At the exofs Level:
      
      * Super block holds an exofs_components struct that holds the device
        array, previously in layout. The devices there are in device-table
        order. The device-array is twice bigger and repeats the device-table
        twice so now each inode's device array can point to a random device
        and have a round-robin view of the table, making it compatible to
        previous exofs versions.
      
      * Each inode has an exofs_components struct that is initialized at
        load time, with it's own view of the device table IDs and creds.
        When doing IO this gets passed to the io_state together with the
        layout.
      
      While preforming this change. Bugs where found where credentials with the
      wrong IDs where used to access the different SB objects (super.c). As well
      as some dead code. It was never noticed because the target we use does not
      check the credentials.
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      9e9db456