1. 14 Nov, 2012 7 commits
    • Daniel Lezcano's avatar
      cpuidle / sysfs: move structure declaration into the sysfs.c file · 349631e0
      Daniel Lezcano authored
      The structure cpuidle_state_kobj is not used anywhere except
      in the sysfs.c file. The definition of this structure is not
      needed in the cpuidle header file. This patch moves it to the
      sysfs.c file in order to encapsulate the code a bit more.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      349631e0
    • Youquan Song's avatar
      cpuidle: Get typical recent sleep interval · c96ca4fb
      Youquan Song authored
      The function detect_repeating_patterns was not very useful for
      workloads with alternating long and short pauses, for example
      virtual machines handling network requests for each other (say
      a web and database server).
      
      Instead, try to find a recent sleep interval that is somewhere
      between the median and the mode sleep time, by discarding outliers
      to the up side and recalculating the average and standard deviation
      until that is no longer required.
      
      This should do something sane with a sleep interval series like:
      
      	200 180 210 10000 30 1000 170 200
      
      The current code would simply discard such a series, while the
      new code will guess a typical sleep interval just shy of 200.
      
      The original patch come from Rik van Riel <riel@redhat.com>.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c96ca4fb
    • Youquan Song's avatar
      cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc
      Youquan Song authored
      When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
      there is tasks request to be executed. So the idle CPU will not really enter
      the target C-state and go to run task.
      
      In this situation, it will use the residency of previous really entered target
      C-states. Obviously, it is not reasonable.
      
      So, this patch fix it by set the target C-state residency to 0.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d73d68dc
    • Youquan Song's avatar
      cpuidle: Quickly notice prediction failure in general case · e11538d1
      Youquan Song authored
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      The patch extends to general case that prediction logic get a small predicted
      residency, so it choose a shallow C-state though the expected residency is large
      . Once the prediction will be fail, the CPU will keep staying at shallow C-state
      for a long time. Acutally, the CPU has change enter into deep C-state.
      So when the expected residency is long enough but governor choose a shallow
      C-state, an timer will be added in order to monitor if the prediction failure.
      
      When C-state is waken up prior to the adding timer, the timer will be cancelled
      initiatively. When the timer is triggered and menu governor will quickly notice
      prediction failure and re-evaluates deeper C-states possibility.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e11538d1
    • Youquan Song's avatar
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song authored
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarYouquan Song <youquan.song@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
    • Daniel Lezcano's avatar
      cpuidle / sysfs: move kobj initialization in the syfs file · e45a00d6
      Daniel Lezcano authored
      Move the kobj initialization and completion in the sysfs.c
      and encapsulate the code more.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e45a00d6
    • Daniel Lezcano's avatar
      cpuidle / sysfs: change function parameter · 1aef40e2
      Daniel Lezcano authored
      The function needs the cpuidle_device which is initially passed to the
      caller.
      
      The current code gets the struct device from the struct cpuidle_device,
      pass it the cpuidle_add_sysfs function. This function calls
      per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.
      
      This patch pass the cpuidle_device instead and simplify the code.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aef40e2
  2. 11 Nov, 2012 1 commit
  3. 10 Nov, 2012 10 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b251f0f3
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Bug fixes galore, mostly in drivers as is often the case:
      
        1) USB gadget and cdc_eem drivers need adjustments to their frame size
           lengths in order to handle VLANs correctly.  From Ian Coolidge.
      
        2) TIPC and several network drivers erroneously call tasklet_disable
           before tasklet_kill, fix from Xiaotian Feng.
      
        3) r8169 driver needs to apply the WOL suspend quirk to more chipsets,
           fix from Cyril Brulebois.
      
        4) Fix multicast filters on RTL_GIGA_MAC_VER_35 r8169 chips, from
           Nathan Walp.
      
        5) FDB netlink dumps should use RTM_NEWNEIGH as the message type, not
           zero.  From John Fastabend.
      
        6) Fix smsc95xx tx checksum offload on big-endian, from Steve
           Glendinning.
      
        7) __inet_diag_dump() needs to repsect and report the error value
           returned from inet_diag_lock_handler() rather than ignore it.
           Otherwise if an inet diag handler is not available for a particular
           protocol, we essentially report success instead of giving an error
           indication.  Fix from Cyrill Gorcunov.
      
        8) When the QFQ packet scheduler sees TSO/GSO packets it does not
           handle things properly, and in fact ends up corrupting it's
           datastructures as well as mis-schedule packets.  Fix from Paolo
           Valente.
      
        9) Fix oopser in skb_loop_sk(), from Eric Leblond.
      
        10) CXGB4 passes partially uninitialized datastructures in to FW
            commands, fix from Vipul Pandya.
      
        11) When we send unsolicited ipv6 neighbour advertisements, we should
            send them to the link-local allnodes multicast address, as per
            RFC4861.  Fix from Hannes Frederic Sowa.
      
        12) There is some kind of bug in the usbnet's kevent deferral
            mechanism, but more immediately when it triggers an uncontrolled
            stream of kernel messages spam the log.  Rate limit the error log
            message triggered when this problem occurs, as sending thousands
            of error messages into the kernel log doesn't help matters at all,
            and in fact makes further diagnosis more difficult.
      
            From Steve Glendinning.
      
        13) Fix gianfar restore from hibernation, from Wang Dongsheng.
      
        14) The netlink message attribute sizes are wrong in the ipv6 GRE
            driver, it was using the size of ipv4 addresses instead of ipv6
            ones :-) Fix from Nicolas Dichtel."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        gre6: fix rtnl dump messages
        gianfar: ethernet vanishes after restoring from hibernation
        usbnet: ratelimit kevent may have been dropped warnings
        ipv6: send unsolicited neighbour advertisements to all-nodes
        net: usb: cdc_eem: Fix rx skb allocation for 802.1Q VLANs
        usb: gadget: g_ether: fix frame size check for 802.1Q
        cxgb4: Fix initialization of SGE_CONTROL register
        isdn: Make CONFIG_ISDN depend on CONFIG_NETDEVICES
        cxgb4: Initialize data structures before using.
        af-packet: fix oops when socket is not present
        pkt_sched: enable QFQ to support TSO/GSO
        net: inet_diag -- Return error code if protocol handler is missed
        net: bnx2x: Fix typo in bnx2x driver
        smsc95xx: fix tx checksum offload for big endian
        rtnetlink: Use nlmsg type RTM_NEWNEIGH from dflt fdb dump
        ptp: update adjfreq callback description
        r8169: allow multicast packets on sub-8168f chipset.
        r8169: Fix WoL on RTL8168d/8111d.
        drivers/net: use tasklet_kill in device remove/close process
        tipc: do not use tasklet_disable before tasklet_kill
      b251f0f3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 2b1768f3
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Several build/bug fixes for sparc, including:
      
        1) Configuring a mix of static vs.  modular sparc64 crypto modules
           didn't work, remove an ill-conceived attempt to only have to build
           the device match table for these drivers once to fix the problem.
      
           Reported by Meelis Roos.
      
        2) Make the montgomery multiple/square and mpmul instructions actually
           usable in 32-bit tasks.  Essentially this involves providing 32-bit
           userspace with a way to use a 64-bit stack when it needs to.
      
        3) Our sparc64 atomic backoffs don't yield cpu strands properly on
           Niagara chips.  Use pause instruction when available to achieve
           this, otherwise use a benign instruction we know blocks the strand
           for some time.
      
        4) Wire up kcmp
      
        5) Fix the build of various drivers by removing the unnecessary
           blocking of OF_GPIO when SPARC.
      
        6) Fix unintended regression wherein of_address_to_resource stopped
           being provided.  Fix from Andreas Larsson.
      
        7) Fix NULL dereference in leon_handle_ext_irq(), also from Andreas
           Larsson."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix build with mix of modular vs. non-modular crypto drivers.
        sparc: Support atomic64_dec_if_positive properly.
        of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again
        sparc32, leon: Check for existent irq_map entry in leon_handle_ext_irq
        sparc: Add sparc support for platform_get_irq()
        sparc: Allow OF_GPIO on sparc.
        qlogicpti: Fix build warning.
        sparc: Wire up sys_kcmp.
        sparc64: Improvde documentation and readability of atomic backoff code.
        sparc64: Use pause instruction when available.
        sparc64: Fix cpu strand yielding.
        sparc64: Make montmul/montsqr/mpmul usable in 32-bit threads.
      2b1768f3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 · affd9a8d
      Linus Torvalds authored
      Pull cifs fixes from Jeff Layton.
      
      * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Do not lookup hashed negative dentry in cifs_atomic_open
        cifs: fix potential buffer overrun in cifs.idmap handling code
      affd9a8d
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · 487bda54
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       - correct argument type (pgprot_t) when calling __ioremap()
       - PCI_IOBASE virtual address change
       - use architected event for CPU cycle counter
       - fix ELF core dumping
       - select CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION
       - missing completion for secondary CPU boot
       - booting on systems with all memory beyond 4GB
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
        arm64: mm: fix booting on systems with no memory below 4GB
        arm64: smp: add missing completion for secondary boot
        arm64: compat: select CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION
        arm64: elf: fix core dumping definitions for GP and FP registers
        arm64: perf: use architected event for CPU cycle counter
        arm64: Move PCI_IOBASE closer to MODULES_VADDR
        arm64: Use pgprot_t as the last argument when invoking __ioremap()
      487bda54
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.7-rc5-tag' of... · 0020dd0b
      Linus Torvalds authored
      Merge tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull Xen fixes from Konrad Rzeszutek Wilk:
       "There are three ARM compile fixes (we forgot to export certain
        functions and if the drivers are built as an module - we go belly-up).
      
        There is also an mismatch of irq_enter() / exit_idle() calls sequence
        which were fixed some time ago in other piece of codes, but failed to
        appear in the Xen code.
      
        Lastly a fix for to help in the field with troubleshooting in case we
        cannot get the appropriate parameter and also fallback code when
        working with very old hypervisors."
      
      Bug-fixes:
       - Fix compile issues on ARM.
       - Fix hypercall fallback code for old hypervisors.
       - Print out which HVM parameter failed if it fails.
       - Fix idle notifier call after irq_enter.
      
      * tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xen/arm: Fix compile errors when drivers are compiled as modules (export more).
        xen/arm: Fix compile errors when drivers are compiled as modules.
        xen/generic: Disable fallback build on ARM.
        xen/events: fix RCU warning, or Call idle notifier after irq_enter()
        xen/hvm: If we fail to fetch an HVM parameter print out which flag it is.
        xen/hypercall: fix hypercall fallback code for very old hypervisors
      0020dd0b
    • David S. Miller's avatar
      sparc64: Fix build with mix of modular vs. non-modular crypto drivers. · 226f7cea
      David S. Miller authored
      We tried linking in a single built object to hold the device table,
      but only works if all of the sparc64 crypto modules get built the same
      way (modular vs. non-modular).
      
      Just include the device ID stub into each driver source file so that
      the table gets compiled into the correct result in all cases.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      226f7cea
    • David S. Miller's avatar
      sparc: Support atomic64_dec_if_positive properly. · 193d2aad
      David S. Miller authored
      Sparc32 already supported it, as a consequence of using the
      generic atomic64 implementation.  And the sparc64 implementation
      is rather trivial.
      
      This allows us to set ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE for all
      of sparc, and avoid the annoying warning from lib/atomic64_test.c
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      193d2aad
    • Andreas Larsson's avatar
      of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again · 0bce04be
      Andreas Larsson authored
      This bug-fix makes sure that of_address_to_resource is defined extern for sparc
      so that the sparc-specific implementation of of_address_to_resource() is once
      again used when including include/linux/of_address.h in a sparc context. A
      number of drivers in mainline relies on this function working for sparc.
      
      The bug was introduced in a850a755, "of/address:
      add empty static inlines for !CONFIG_OF". Contrary to that commit title, the
      static inlines are added for !CONFIG_OF_ADDRESS, and CONFIG_OF_ADDRESS is never
      defined for sparc. This is good behavior for the other functions in
      include/linux/of_address.h, as the extern functions defined in
      drivers/of/address.c only gets linked when OF_ADDRESS is configured. However,
      for of_address_to_resource there exists a sparc-specific implementation in
      arch/sparc/arch/sparc/kernel/of_device_common.c
      
      Solution suggested by: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Acked-by: default avatarRob Herring <rob.herring@calxeda.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bce04be
    • Andreas Larsson's avatar
      sparc32, leon: Check for existent irq_map entry in leon_handle_ext_irq · 20424d85
      Andreas Larsson authored
      If an irq is being unlinked concurrently with leon_handle_ext_irq,
      irq_map[eirq] might be null in leon_handle_ext_irq. Make sure that
      this is not dereferenced.
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Acked-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20424d85
    • Andreas Larsson's avatar
      sparc: Add sparc support for platform_get_irq() · 5cf8f7db
      Andreas Larsson authored
      This adds sparc support for platform_get_irq that in the normal case use
      platform_get_resource() to get an irq. This standard approach fails for sparc as
      there are no resources of type IORESOURCE_IRQ for irqs for sparc.
      
      Cross platform drivers can then use this standard platform function and work on
      sparc instead of having to have a special case for sparc.
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cf8f7db
  4. 09 Nov, 2012 22 commits