1. 09 Jan, 2013 12 commits
    • Tejun Heo's avatar
      cfq-iosched: implement cfq_group->nr_active and ->children_weight · 7918ffb5
      Tejun Heo authored
      To prepare for blkcg hierarchy support, add cfqg->nr_active and
      ->children_weight.  cfqg->nr_active counts the number of active cfqgs
      at the cfqg's level and ->children_weight is sum of weights of those
      cfqgs.  The level covers itself (cfqg->leaf_weight) and immediate
      children.
      
      The two values are updated when a cfqg enters and leaves the group
      service tree.  Unless the hierarchy is very deep, the added overhead
      should be negligible.
      
      Currently, the parent is determined using cfqg_flat_parent() which
      makes the root cfqg the parent of all other cfqgs.  This is to make
      the transition to hierarchy-aware scheduling gradual.  Scheduling
      logic will be converted to use cfqg->children_weight without actually
      changing the behavior.  When everything is ready,
      blkcg_weight_parent() will be replaced with proper parent function.
      
      This patch doesn't introduce any behavior chagne.
      
      v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      7918ffb5
    • Tejun Heo's avatar
      cfq-iosched: add leaf_weight · e71357e1
      Tejun Heo authored
      cfq blkcg is about to grow proper hierarchy handling, where a child
      blkg's weight would nest inside the parent's.  This makes tasks in a
      blkg to compete against both tasks in the sibling blkgs and the tasks
      of child blkgs.
      
      We're gonna use the existing weight as the group weight which decides
      the blkg's weight against its siblings.  This patch introduces a new
      weight - leaf_weight - which decides the weight of a blkg against the
      child blkgs.
      
      It's named leaf_weight because another way to look at it is that each
      internal blkg nodes have a hidden child leaf node which contains all
      its tasks and leaf_weight is the weight of the leaf node and handled
      the same as the weight of the child blkgs.
      
      This patch only adds leaf_weight fields and exposes it to userland.
      The new weight isn't actually used anywhere yet.  Note that
      cfq-iosched currently offcially supports only single level hierarchy
      and root blkgs compete with the first level blkgs - ie. root weight is
      basically being used as leaf_weight.  For root blkgs, the two weights
      are kept in sync for backward compatibility.
      
      v2: cfqd->root_group->leaf_weight initialization was missing from
          cfq_init_queue() causing divide by zero when
          !CONFIG_CFQ_GROUP_SCHED.  Fix it.  Reported by Fengguang.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      e71357e1
    • Tejun Heo's avatar
      blkcg: make blkcg_gq's hierarchical · 3c547865
      Tejun Heo authored
      Currently a child blkg (blkcg_gq) can be created even if its parent
      doesn't exist.  ie. Given a blkg, it's not guaranteed that its
      ancestors will exist.  This makes it difficult to implement proper
      hierarchy support for blkcg policies.
      
      Always create blkgs recursively and make a child blkg hold a reference
      to its parent.  blkg->parent is added so that finding the parent is
      easy.  blkcg_parent() is also added in the process.
      
      This change can be visible to userland.  e.g. while issuing IO in a
      nested cgroup didn't affect the ancestors at all, now it will
      initialize all ancestor blkgs and zero stats for the request_queue
      will always appear on them.  While this is userland visible, this
      shouldn't cause any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      3c547865
    • Tejun Heo's avatar
      blkcg: cosmetic updates to blkg_create() · 93e6d5d8
      Tejun Heo authored
      * Rename out_* labels to err_*.
      
      * Do ERR_PTR() conversion once in the error return path.
      
      This patch is cosmetic and to prepare for the hierarchy support.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      93e6d5d8
    • Tejun Heo's avatar
      blkcg: reorganize blkg_lookup_create() and friends · 86cde6b6
      Tejun Heo authored
      Reorganize such that
      
      * __blkg_lookup() takes bool param @update_hint to determine whether
        to update hint.
      
      * __blkg_lookup_create() no longer performs lookup before trying to
        create.  Renamed to blkg_create().
      
      * blkg_lookup_create() now performs lookup and then invokes
        blkg_create() if lookup fails.
      
      * root_blkg creation in blkcg_activate_policy() updated accordingly.
        Note that blkcg_activate_policy() no longer updates lookup hint if
        root_blkg already exists.
      
      Except for the last lookup hint bit which is immaterial, this is pure
      reorganization and doesn't introduce any visible behavior change.
      This is to prepare for proper hierarchy support.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      86cde6b6
    • Tejun Heo's avatar
      blkcg: fix minor bug in blkg_alloc() · 356d2e58
      Tejun Heo authored
      blkg_alloc() was mistakenly checking blkcg_policy_enabled() twice.
      The latter test should have been on whether pol->pd_init_fn() exists.
      This doesn't cause actual problems because both blkcg policies
      implement pol->pd_init_fn().  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      356d2e58
    • Vivek Goyal's avatar
      cfq-iosched: Print sync-noidle information in blktrace messages · b226e5c4
      Vivek Goyal authored
      Currently we attach a character "S" or "A" to the cfqq<pid>, to represent
      whether queues is sync or async. Add one more character "N" to represent
      whether it is sync-noidle queue or sync queue. So now three different
      type of queues will look as follows.
      
      cfq1234S   --> sync queus
      cfq1234SN  --> sync noidle queue
      cfq1234A   --> Async queue
      
      Previously S/A classification was being printed only if group scheduling
      was enabled. This patch also makes sure that this classification is
      displayed even if group idling is disabled.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b226e5c4
    • Vivek Goyal's avatar
      cfq-iosched: Get rid of unnecessary local variable · 1f23f121
      Vivek Goyal authored
      Use of local varibale "n" seems to be unnecessary. Remove it. This brings
      it inline with function __cfq_group_st_add(), which is also doing the
      similar operation of adding a group to a rb tree.
      
      No functionality change here.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1f23f121
    • Vivek Goyal's avatar
      cfq-iosched: Rename few functions related to selecting workload · 6d816ec7
      Vivek Goyal authored
      choose_service_tree() selects/sets both wl_class and wl_type.  Rename it to
      choose_wl_class_and_type() to make it very clear.
      
      cfq_choose_wl() only selects and sets wl_type. It is easy to confuse
      it with choose_st(). So rename it to cfq_choose_wl_type() to make
      it clear what does it do.
      
      Just renaming. No functionality change.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6d816ec7
    • Vivek Goyal's avatar
      cfq-iosched: Rename "service_tree" to "st" at some places · 34b98d03
      Vivek Goyal authored
      At quite a few places we use the keyword "service_tree". At some places,
      especially local variables, I have abbreviated it to "st".
      
      Also at couple of places moved binary operator "+" from beginning of line
      to end of previous line, as per Tejun's feedback.
      
      v2:
       Reverted most of the service tree name change based on Jeff Moyer's feedback.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      34b98d03
    • Vivek Goyal's avatar
      cfq-iosched: More renaming to better represent wl_class and wl_type · 4d2ceea4
      Vivek Goyal authored
      Some more renaming. Again making the code uniform w.r.t use of
      wl_class/class to represent IO class (RT, BE, IDLE) and using
      wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC).
      
      At places this patch shortens the string "workload" to "wl".
      Renamed "saved_workload" to "saved_wl_type". Renamed
      "saved_serving_class" to "saved_wl_class".
      
      For uniformity with "saved_wl_*" variables, renamed "serving_class"
      to "serving_wl_class" and renamed "serving_type" to "serving_wl_type".
      
      Again, just trying to improve upon code uniformity and improve
      readability. No functional change.
      
      v2:
      - Restored the usage of keyword "service" based on Jeff Moyer's feedback.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4d2ceea4
    • Vivek Goyal's avatar
      cfq-iosched: Properly name all references to IO class · 3bf10fea
      Vivek Goyal authored
      Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we
      are calling workloads belonging to these classes as "prio". This gets
      very confusing as one starts to associate it with ioprio.
      
      So this patch just does bunch of renaming so that reading code becomes
      easier. All reference to RT, BE and IDLE workload are done using keyword
      "class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made
      using keyword "type".
      
      This makes me feel much better while I am reading the code. There is no
      functionality change due to this patch.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3bf10fea
  2. 03 Jan, 2013 10 commits
  3. 02 Jan, 2013 8 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 5439ca6b
      Linus Torvalds authored
      Pull ext4 bug fixes from Ted Ts'o:
       "Various bug fixes for ext4.  Perhaps the most serious bug fixed is one
        which could cause file system corruptions when performing file punch
        operations."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid hang when mounting non-journal filesystems with orphan list
        ext4: lock i_mutex when truncating orphan inodes
        ext4: do not try to write superblock on ro remount w/o journal
        ext4: include journal blocks in df overhead calcs
        ext4: remove unaligned AIO warning printk
        ext4: fix an incorrect comment about i_mutex
        ext4: fix deadlock in journal_unmap_buffer()
        ext4: split off ext4_journalled_invalidatepage()
        jbd2: fix assertion failure in jbd2_journal_flush()
        ext4: check dioread_nolock on remount
        ext4: fix extent tree corruption caused by hole punch
      5439ca6b
    • Hugh Dickins's avatar
      mempolicy: remove arg from mpol_parse_str, mpol_to_str · a7a88b23
      Hugh Dickins authored
      Remove the unused argument (formerly no_context) from mpol_parse_str()
      and from mpol_to_str().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7a88b23
    • Hugh Dickins's avatar
      tmpfs mempolicy: fix /proc/mounts corrupting memory · f2a07f40
      Hugh Dickins authored
      Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
      mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
      or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
      in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
      pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
      worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
      
      Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
      when commit e17f74af "mempolicy: don't call mpol_set_nodemask() when
      no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
      which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
      With slab poisoning, you can then rely on mpol_to_str() to set the bit
      for node 0x6b6b, probably in the next page above the caller's stack.
      
      mpol_parse_str() is only called from shmem_parse_options(): no_context
      is always true, so call it unused for now, and remove !no_context code.
      Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
      expect.  Then mpol_to_str() can ignore its no_context argument also,
      the mpol being appropriately initialized whether contextualized or not.
      Rename its no_context unused too, and let subsequent patch remove them
      (that's not needed for stable backporting, which would involve rejects).
      
      I don't understand why MPOL_LOCAL is described as a pseudo-policy:
      it's a reasonable policy which suffers from a confusing implementation
      in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
      much more robust if MPOL_LOCAL were recognized in switch statements
      throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
      empty) nodes mask like everyone else, instead of its preferred_node
      variant (I presume an optimization from the days before MPOL_LOCAL).
      But that would take me too long to get right and fully tested.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2a07f40
    • Eric Wong's avatar
      epoll: prevent missed events on EPOLL_CTL_MOD · 128dd175
      Eric Wong authored
      EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
      ensure events are not missed.  Since the modifications to the interest
      mask are not protected by the same lock as ep_poll_callback, we need to
      ensure the change is visible to other CPUs calling ep_poll_callback.
      
      We also need to ensure f_op->poll() has an up-to-date view of past
      events which occured before we modified the interest mask.  So this
      barrier also pairs with the barrier in wq_has_sleeper().
      
      This should guarantee either ep_poll_callback or f_op->poll() (or both)
      will notice the readiness of a recently-ready/modified item.
      
      This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
      http://thread.gmane.org/gmane.linux.kernel/1408782/Signed-off-by: default avatarEric Wong <normalperson@yhbt.net>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
      Tested-by: default avatar"Junchang(Jason) Wang" <junchang.wang@yale.edu>
      Cc: netdev@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      128dd175
    • Aaro Koskinen's avatar
      watchdog: twl4030_wdt: add DT support · 8899b8d9
      Aaro Koskinen authored
      Add DT support for twl4030_wdt. This is needed to get twl4030_wdt to
      probe when booting with DT.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      8899b8d9
    • Aaro Koskinen's avatar
      watchdog: omap_wdt: eliminate unused variable and a compiler warning · 412b3729
      Aaro Koskinen authored
      We forgot to delete this in the commit 4f4753d9 (watchdog: omap_wdt:
      convert to devm_ functions), and as a result the following compilation
      warning was introduced:
      
      drivers/watchdog/omap_wdt.c: In function 'omap_wdt_remove':
      drivers/watchdog/omap_wdt.c:299:19: warning: unused variable 'res' [-Wunused-variable]
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Reviewed-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      412b3729
    • Axel Lin's avatar
      watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path · 98e4a293
      Axel Lin authored
      Otherwise, WDIOC_GETTIMEOUT returns wrong value if set_timeout fails.
      This patch also removes unnecessary ret variable in da9055_wdt_ping function.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      98e4a293
    • Axel Lin's avatar
      watchdog: da9055: Fix invalid free of devm_ allocated data · ee8c94ad
      Axel Lin authored
      It is not required to free devm_ allocated data. Since kref_put
      needs a valid release function, da9055_wdt_release_resources()
      is not deleted.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      ee8c94ad
  4. 30 Dec, 2012 10 commits