1. 24 Jan, 2013 4 commits
    • Li Zefan's avatar
      sched: remove redundant NULL cgroup check in task_group_path() · 2a73991b
      Li Zefan authored
      A task_group won't be online (thus no one can see it) until
      cpu_cgroup_css_online(), and at that time tg->css.cgroup has
      been initialized, so this NULL check is redundant.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2a73991b
    • Li Zefan's avatar
      sched: split out css_online/css_offline from tg creation/destruction · ace783b9
      Li Zefan authored
      This is a preparaton for later patches.
      
      - What do we gain from cpu_cgroup_css_online():
      
      After ss->css_alloc() and before ss->css_online(), there's a small
      window that tg->css.cgroup is NULL. With this change, tg won't be seen
      before ss->css_online(), where it's added to the global list, so we're
      guaranteed we'll never see NULL tg->css.cgroup.
      
      - What do we gain from cpu_cgroup_css_offline():
      
      tg is freed via RCU, so is cgroup. Without this change, This is how
      synchronization works:
      
      cgroup_rmdir()
        no ss->css_offline()
      diput()
        syncornize_rcu()
        ss->css_free()       <-- unregister tg, and free it via call_rcu()
        kfree_rcu(cgroup)    <-- wait possible refs to cgroup, and free cgroup
      
      We can't just kfree(cgroup), because tg might access tg->css.cgroup.
      
      With this change:
      
      cgroup_rmdir()
        ss->css_offline()    <-- unregister tg
      diput()
        synchronize_rcu()    <-- wait possible refs to tg and cgroup
        ss->css_free()       <-- free tg
        kfree_rcu(cgroup)    <-- free cgroup
      
      As you see, kfree_rcu() is redundant now.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      ace783b9
    • Li Zefan's avatar
      cgroup: initialize cgrp->dentry before css_alloc() · fe1c06ca
      Li Zefan authored
      With this change, we're guaranteed that cgroup_path() won't see NULL
      cgrp->dentry, and thus we can remove the NULL check in it.
      
      (Well, it's not strictly true, because dummptop.dentry is always NULL
       but we already handle that separately.)
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fe1c06ca
    • Li Zefan's avatar
      cgroup: remove a NULL check in cgroup_exit() · b5d646f5
      Li Zefan authored
      init_task.cgroups is initialized at boot phase, and whenver a ask
      is forked, it's cgroups pointer is inherited from its parent, and
      it's never set to NULL afterwards.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b5d646f5
  2. 23 Jan, 2013 1 commit
    • Li Zefan's avatar
      cgroup: fix bogus kernel warnings when cgroup_create() failed · 2739d3cc
      Li Zefan authored
      If cgroup_create() failed and cgroup_destroy_locked() is called to
      do cleanup, we'll see a bunch of warnings:
      
      cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2
      cgroup_addrm_files: failed to remove prioidx, err=-2
      cgroup_addrm_files: failed to remove ifpriomap, err=-2
      ...
      
      We failed to remove those files, because cgroup_create() has failed
      before creating those cgroup files.
      
      To fix this, we simply don't warn if cgroup_rm_file() can't find the
      cft entry.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2739d3cc
  3. 14 Jan, 2013 2 commits
    • Li Zefan's avatar
      cgroup: remove synchronize_rcu() from rebind_subsystems() · 130e3695
      Li Zefan authored
      Nothing's protected by RCU in rebind_subsystems(), and I can't think
      of a reason why it is needed.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      130e3695
    • Li Zefan's avatar
      cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() · 5d65bc0c
      Li Zefan authored
      These 2 syncronize_rcu()s make attaching a task to a cgroup
      quite slow, and it can't be ignored in some situations.
      
      A real case from Colin Cross: Android uses cgroups heavily to
      manage thread priorities, putting threads in a background group
      with reduced cpu.shares when they are not visible to the user,
      and in a foreground group when they are. Some RPCs from foreground
      threads to background threads will temporarily move the background
      thread into the foreground group for the duration of the RPC.
      This results in many calls to cgroup_attach_task.
      
      In cgroup_attach_task() it's task->cgroups that is protected by RCU,
      and put_css_set() calls kfree_rcu() to free it.
      
      If we remove this synchronize_rcu(), there can be threads in RCU-read
      sections accessing their old cgroup via current->cgroups with
      concurrent rmdir operation, but this is safe.
      
       # time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }
      
      real    0m2.524s
      user    0m0.008s
      sys     0m0.004s
      
      With this patch:
      
      real    0m0.004s
      user    0m0.004s
      sys     0m0.000s
      
      tj: These synchronize_rcu()s are utterly confused.  synchornize_rcu()
          necessarily has to come between two operations to guarantee that
          the changes made by the former operation are visible to all rcu
          readers before proceeding to the latter operation.  Here,
          synchornize_rcu() are at the end of attach operations with nothing
          beyond it.  Its only effect would be delaying completion of
          write(2) to sysfs tasks/procs files until all rcu readers see the
          change, which doesn't mean anything.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarColin Cross <ccross@google.com>
      5d65bc0c
  4. 10 Jan, 2013 1 commit
  5. 08 Jan, 2013 1 commit
    • Greg Thelen's avatar
      cgroups: fix cgroup_event_listener error handling · 799105d5
      Greg Thelen authored
      The error handling in cgroup_event_listener.c did not correctly deal
      with either an error opening either <control_file> or
      cgroup.event_control.  Due to an uninitialized variable the program
      exit code was undefined if either of these opens failed.
      
      This patch simplifies and corrects cgroup_event_listener.c error
      handling by:
      1. using err*() rather than printf(),exit()
      2. depending on process exit to close open files
      
      With this patch failures always return non-zero error.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      799105d5
  6. 07 Jan, 2013 3 commits
  7. 03 Jan, 2013 10 commits
  8. 02 Jan, 2013 8 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 5439ca6b
      Linus Torvalds authored
      Pull ext4 bug fixes from Ted Ts'o:
       "Various bug fixes for ext4.  Perhaps the most serious bug fixed is one
        which could cause file system corruptions when performing file punch
        operations."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid hang when mounting non-journal filesystems with orphan list
        ext4: lock i_mutex when truncating orphan inodes
        ext4: do not try to write superblock on ro remount w/o journal
        ext4: include journal blocks in df overhead calcs
        ext4: remove unaligned AIO warning printk
        ext4: fix an incorrect comment about i_mutex
        ext4: fix deadlock in journal_unmap_buffer()
        ext4: split off ext4_journalled_invalidatepage()
        jbd2: fix assertion failure in jbd2_journal_flush()
        ext4: check dioread_nolock on remount
        ext4: fix extent tree corruption caused by hole punch
      5439ca6b
    • Hugh Dickins's avatar
      mempolicy: remove arg from mpol_parse_str, mpol_to_str · a7a88b23
      Hugh Dickins authored
      Remove the unused argument (formerly no_context) from mpol_parse_str()
      and from mpol_to_str().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7a88b23
    • Hugh Dickins's avatar
      tmpfs mempolicy: fix /proc/mounts corrupting memory · f2a07f40
      Hugh Dickins authored
      Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
      mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
      or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
      in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
      pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
      worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
      
      Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
      when commit e17f74af "mempolicy: don't call mpol_set_nodemask() when
      no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
      which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
      With slab poisoning, you can then rely on mpol_to_str() to set the bit
      for node 0x6b6b, probably in the next page above the caller's stack.
      
      mpol_parse_str() is only called from shmem_parse_options(): no_context
      is always true, so call it unused for now, and remove !no_context code.
      Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
      expect.  Then mpol_to_str() can ignore its no_context argument also,
      the mpol being appropriately initialized whether contextualized or not.
      Rename its no_context unused too, and let subsequent patch remove them
      (that's not needed for stable backporting, which would involve rejects).
      
      I don't understand why MPOL_LOCAL is described as a pseudo-policy:
      it's a reasonable policy which suffers from a confusing implementation
      in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
      much more robust if MPOL_LOCAL were recognized in switch statements
      throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
      empty) nodes mask like everyone else, instead of its preferred_node
      variant (I presume an optimization from the days before MPOL_LOCAL).
      But that would take me too long to get right and fully tested.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2a07f40
    • Eric Wong's avatar
      epoll: prevent missed events on EPOLL_CTL_MOD · 128dd175
      Eric Wong authored
      EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
      ensure events are not missed.  Since the modifications to the interest
      mask are not protected by the same lock as ep_poll_callback, we need to
      ensure the change is visible to other CPUs calling ep_poll_callback.
      
      We also need to ensure f_op->poll() has an up-to-date view of past
      events which occured before we modified the interest mask.  So this
      barrier also pairs with the barrier in wq_has_sleeper().
      
      This should guarantee either ep_poll_callback or f_op->poll() (or both)
      will notice the readiness of a recently-ready/modified item.
      
      This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
      http://thread.gmane.org/gmane.linux.kernel/1408782/Signed-off-by: default avatarEric Wong <normalperson@yhbt.net>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
      Tested-by: default avatar"Junchang(Jason) Wang" <junchang.wang@yale.edu>
      Cc: netdev@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      128dd175
    • Aaro Koskinen's avatar
      watchdog: twl4030_wdt: add DT support · 8899b8d9
      Aaro Koskinen authored
      Add DT support for twl4030_wdt. This is needed to get twl4030_wdt to
      probe when booting with DT.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      8899b8d9
    • Aaro Koskinen's avatar
      watchdog: omap_wdt: eliminate unused variable and a compiler warning · 412b3729
      Aaro Koskinen authored
      We forgot to delete this in the commit 4f4753d9 (watchdog: omap_wdt:
      convert to devm_ functions), and as a result the following compilation
      warning was introduced:
      
      drivers/watchdog/omap_wdt.c: In function 'omap_wdt_remove':
      drivers/watchdog/omap_wdt.c:299:19: warning: unused variable 'res' [-Wunused-variable]
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Reviewed-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      412b3729
    • Axel Lin's avatar
      watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path · 98e4a293
      Axel Lin authored
      Otherwise, WDIOC_GETTIMEOUT returns wrong value if set_timeout fails.
      This patch also removes unnecessary ret variable in da9055_wdt_ping function.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      98e4a293
    • Axel Lin's avatar
      watchdog: da9055: Fix invalid free of devm_ allocated data · ee8c94ad
      Axel Lin authored
      It is not required to free devm_ allocated data. Since kref_put
      needs a valid release function, da9055_wdt_release_resources()
      is not deleted.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      ee8c94ad
  9. 30 Dec, 2012 10 commits