Commits · ca3d30cc02f780f68771087040ce935add6ba2b7 · Kirill Smelkov / linux

15 Dec, 2011 4 commits

x86_64, asm: Optimise fls(), ffs() and fls64() · ca3d30cc

David Howells authored Dec 13, 2011

fls(N), ffs(N) and fls64(N) can be optimised on x86_64.  Currently they use a
CMOV instruction after the BSR/BSF to set the destination register to -1 if the
value to be scanned was 0 (in which case BSR/BSF set the Z flag).

Instead, according to the AMD64 specification, we can make use of the fact that
BSR/BSF doesn't modify its output register if its input is 0.  By preloading
the output with -1 and incrementing the result, we achieve the desired result
without the need for a conditional check.

The Intel x86_64 specification, however, says that the result of BSR/BSF in
such a case is undefined.  That said, when queried, one of the Intel CPU
architects said that the behaviour on all Intel CPUs is that:

 (1) with BSRQ/BSFQ, the 64-bit destination register is written with its
     original value if the source is 0, thus, in essence, giving the effect we
     want.  And,

 (2) with BSRL/BSFL, the lower half of the 64-bit destination register is
     written with its original value if the source is 0, and the upper half is
     cleared, thus giving us the effect we want (we return a 4-byte int).

Further, it was indicated that they (Intel) are unlikely to get away with
changing the behaviour.

It might be possible to optimise the 32-bit versions of these functions, but
there's a lot more variation, and so the effective non-destructive property of
BSRL/BSRF cannot be relied on.

[ hpa: specifically, some 486 chips are known to NOT have this property. ]

I have benchmarked these functions on my Core2 Duo test machine using the
following program:

	#include <stdlib.h>
	#include <stdio.h>

	#ifndef __x86_64__
	#error
	#endif

	#define PAGE_SHIFT 12

	typedef unsigned long long __u64, u64;
	typedef unsigned int __u32, u32;
	#define noinline	__attribute__((noinline))

	static __always_inline int fls64(__u64 x)
	{
		long bitpos = -1;

		asm("bsrq %1,%0"
		    : "+r" (bitpos)
		    : "rm" (x));
		return bitpos + 1;
	}

	static inline unsigned long __fls(unsigned long word)
	{
		asm("bsr %1,%0"
		    : "=r" (word)
		    : "rm" (word));
		return word;
	}
	static __always_inline int old_fls64(__u64 x)
	{
		if (x == 0)
			return 0;
		return __fls(x) + 1;
	}

	static noinline // __attribute__((const))
	int old_get_order(unsigned long size)
	{
		int order;

		size = (size - 1) >> (PAGE_SHIFT - 1);
		order = -1;
		do {
			size >>= 1;
			order++;
		} while (size);
		return order;
	}

	static inline __attribute__((const))
	int get_order_old_fls64(unsigned long size)
	{
		int order;
		size--;
		size >>= PAGE_SHIFT;
		order = old_fls64(size);
		return order;
	}

	static inline __attribute__((const))
	int get_order(unsigned long size)
	{
		int order;
		size--;
		size >>= PAGE_SHIFT;
		order = fls64(size);
		return order;
	}

	unsigned long prevent_optimise_out;

	static noinline unsigned long test_old_get_order(void)
	{
		unsigned long n, total = 0;
		long rep, loop;

		for (rep = 1000000; rep > 0; rep--) {
			for (loop = 0; loop <= 16384; loop += 4) {
				n = 1UL << loop;
				total += old_get_order(n);
			}
		}
		return total;
	}

	static noinline unsigned long test_get_order_old_fls64(void)
	{
		unsigned long n, total = 0;
		long rep, loop;

		for (rep = 1000000; rep > 0; rep--) {
			for (loop = 0; loop <= 16384; loop += 4) {
				n = 1UL << loop;
				total += get_order_old_fls64(n);
			}
		}
		return total;
	}

	static noinline unsigned long test_get_order(void)
	{
		unsigned long n, total = 0;
		long rep, loop;

		for (rep = 1000000; rep > 0; rep--) {
			for (loop = 0; loop <= 16384; loop += 4) {
				n = 1UL << loop;
				total += get_order(n);
			}
		}
		return total;
	}

	int main(int argc, char **argv)
	{
		unsigned long total;

		switch (argc) {
		case 1:  total = test_old_get_order();		break;
		case 2:  total = test_get_order_old_fls64();	break;
		default: total = test_get_order();		break;
		}
		prevent_optimise_out = total;
		return 0;
	}

This allows me to test the use of the old fls64() implementation and the new
fls64() implementation and also to contrast these to the out-of-line loop-based
implementation of get_order().  The results were:

	warthog>time ./get_order
	real    1m37.191s
	user    1m36.313s
	sys     0m0.861s
	warthog>time ./get_order x
	real    0m16.892s
	user    0m16.586s
	sys     0m0.287s
	warthog>time ./get_order x x
	real    0m7.731s
	user    0m7.727s
	sys     0m0.002s

Using the current upstream fls64() as a basis for an inlined get_order() [the
second result above] is much faster than using the current out-of-line
loop-based get_order() [the first result above].

Using my optimised inline fls64()-based get_order() [the third result above]
is even faster still.

[ hpa: changed the selection of 32 vs 64 bits to use CONFIG_X86_64
  instead of comparing BITS_PER_LONG, updated comments, rebased manually
  on top of 83d99df7 x86, bitops: Move fls64.h inside __KERNEL__ ]
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

ca3d30cc

x86, bitops: Move fls64.h inside __KERNEL__ · 83d99df7

H. Peter Anvin authored Dec 15, 2011

We would include <asm-generic/bitops/fls64.h> even without __KERNEL__,
but that doesn't make sense, as:

1. That file provides fls64(), but the corresponding function fls() is
   not exported to user space.
2. The implementation of fls64.h uses kernel-only symbols.
3. fls64.h is not exported to user space.

This appears to have been a bug introduced in checkin:

d57594c2 bitops: use __fls for fls64 on 64-bit archs

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/4EEA77E1.6050009@zytor.com

83d99df7

x86: Fix and improve percpu_cmpxchg{8,16}b_double() · cebef5be

Jan Beulich authored Dec 14, 2011

They had several problems/shortcomings:

Only the first memory operand was mentioned in the 2x32bit asm()
operands, and 2x64-bit version had a memory clobber. The first
allowed the compiler to not recognize the need to re-load the
data in case it had it cached in some register, and the second
was overly destructive.

The memory operand in the 2x32-bit asm() was declared to only be
an output.

The types of the local copies of the old and new values were
incorrect (as in other per-CPU ops, the types of the per-CPU
variables accessed should be used here, to make sure the
respective types are compatible).

The __dummy variable was pointless (and needlessly initialized
in the 2x32-bit case), given that local copies of the inputs
already exist.

The 2x64-bit variant forced the address of the first object into
%rsi, even though this is needed only for the call to the
emulation function. The real cmpxchg16b can operate on an
memory.

At once also change the return value type to what it really is -
'bool'.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

cebef5be

x86: Report cpb and eff_freq_ro flags correctly · 969df4b8

Joerg Roedel authored Dec 14, 2011

Add the flags to get rid of the [9] and [10] feature names
in cpuinfo's 'power management' fields and replace them with
meaningful names.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1323875574-17881-1-git-send-email-joerg.roedel@amd.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

969df4b8

12 Dec, 2011 1 commit

x86/i386: Use less assembly in strlen(), speed things up a bit · 890890cb

Alexey Dobriyan authored Dec 11, 2011

Current i386 strlen() hardcodes NOT/DEC sequence. DEC is
mentioned to be suboptimal on Core2. So, put only REPNE SCASB
sequence in assembly, compiler can do the rest.

The difference in generated code is like below (MCORE2=y):

	<strlen>:
		push   %edi
		mov    $0xffffffff,%ecx
		mov    %eax,%edi
		xor    %eax,%eax
		repnz scas %es:(%edi),%al
		not    %ecx

	-	dec    %ecx
	-	mov    %ecx,%eax
	+	lea    -0x1(%ecx),%eax

		pop    %edi
		ret
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <JBeulich@suse.com>
Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.bySigned-off-by: Ingo Molnar <mingo@elte.hu>

890890cb

07 Dec, 2011 1 commit

x86: Use the same node_distance for 32 and 64-bit · 79f1ddd0

H Hartley Sweeten authored Dec 06, 2011

The node_distance function is not x86 64-bit specific.  Having
the #ifdef around the extern function declaration and the
 #define causes the default node_distance macro to be used in
asm-generic/topology.h. This also causes a sparse warning in
arch/x86/mm/numa.c when CONFIG_X86_64 is not set:

warning: symbol '__node_distance' was not declared. Should it be
static?

Remove the #ifdef to fix both issues.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1112061220310.28251@chino.kir.corp.google.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

79f1ddd0

06 Dec, 2011 3 commits

x86: Fix rflags in FAKE_STACK_FRAME · 1cf8343f

Seiichi Ikarashi authored Dec 06, 2011

The x86_64 kernel pushes the fake kernel stack in
arch/x86/kernel/entry_64.S:FAKE_STACK_FRAME, and
rflags register in it does not conform to the specification.

Although Intel's manual[1] says bit 1 of it shall be set to 1,
this bit is cleared to 0 on pushing the fake stack.

[1] Intel(R) 64 and IA-32 Architectures Software Developer's Manual
    Vol.1 3-21 Figure 3-8. EFLAGS Register

If it is not on purpose, it is better to be fixed, because
it can lead some tools misunderstanding the stack frame. For example,
"crash" utility[2] actually detects it and warns you like
below:

       RIP: ffffffff8005dfa2  RSP: ffff8104ce0c7f58  RFLAGS: 00000200
       [...]

       bt: WARNING: possibly bogus exception frame
Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Tested-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

1cf8343f

x86: Clean up and extend do_int3() · cc3a1bf5

Srikar Dronamraju authored Oct 25, 2011

Since there is a possibility of !KPROBES int3 listeners
(such as kgdb) and since DIE_TRAP is currently not being
used by anybody, notify all listeners with DIE_INT3.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20111025142159.GB21225@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

cc3a1bf5

x86: Call do_notify_resume() with interrupts enabled · 3596ff4e

Srikar Dronamraju authored Oct 25, 2011

do_notify_resume() gets called with interrupts disabled on x86_32. This
is different from the x86_64 behavior, where interrupts are enabled at
the time.

Queries on lkml on this issue hasn't yielded any clear answer. Lets make
x86_32 behave the same as x86_64, unless there is a real reason to
maintain status quo.

Please refer https://lkml.org/lkml/2011/9/27/130 for more
details.

A similar change was suggested in ARM:

	https://lkml.org/lkml/2011/8/25/231

My 32-bit machine works fine (tm) with this patch.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20111025141812.GA21225@linux.vnet.ibm.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

3596ff4e

05 Dec, 2011 9 commits

x86/div64: Add a micro-optimization shortcut if base is power of two · 668b4484

Sebastian Andrzej Siewior authored Nov 30, 2011

In the target code I have a do_div(x, PAGE_SIZE). The x86-64
version of it was doing a shift and a mask which is clever. The
32bit version of it had a div operation in it which made me
think. After digging I noticed that x86 has an optimized version
of it. This patch adds this shift and mask optimization if base
is constant so we don't have any runtime "checking" overhead
since most users use a power of ten.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1322649814-544-1-git-send-email-bigeasy@linutronix.deSigned-off-by: Ingo Molnar <mingo@elte.hu>

668b4484

x86-64: Cleanup some assembly entry points · f6b2bc84

Jan Beulich authored Nov 29, 2011

system_call_after_swapgs doesn't really benefit from forcing
alignment from it - quite the opposite, native code needlessly
so far got a big NOP instruction inserted in front of it. Xen
being the only user of the separate entry point can well live
with the branch going to three bytes into a cache line.

The compatibility mode ptregs entry points for one can make use
of the GLOBAL() macro, and should be suitably aligned. Their
shared continuation point (ia32_ptregs_common) otoh doesn't need
to be global at all, but should continue to be properly aligned.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

f6b2bc84

x86-64: Slightly shorten line system call entry and exit paths · 46db09d3

Jan Beulich authored Nov 29, 2011

GET_THREAD_INFO() involves a memory read immediately followed by
an "sub" on the value read, in turn (in several cases)
immediately followed by a use of the calculated value as the
base address of a memory access. This combination of
instructions has a non-negligible potential for stalls.

In the system call entry point code, however, the (fixed) offset
of the stack pointer from the end of the stack is generally
known, and hence we can instead avoid the memory load and
subtract, and instead do the memory reference using %rsp as the
base register. To do so in a legible fashion, introduce a
THREAD_INFO() macro which, provided a register (generally %rsp)
and the known offset from the end of the stack, produces a
suitable memory access operand.

The patch attempts to only touch the fast paths (no auditing and
alike), but manages to do so only in the 64-bit entry point
case; the compatibility mode entry points have so many
interdependencies between their various branch targets that it
was necessary to also adjust the slow paths to eliminate the
risk of having missed some register dependency during code
analysis.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

46db09d3

x86-64: Reduce amount of redundant code generated for invalidate_interruptNN · 39e95433

Jan Beulich authored Nov 29, 2011

Previously these up to 32 entry points, consisting of all the
same code except for their very first instruction, consumed 0x70
bytes per instance. Just like for device interrupt entry points,
fold them together so that they all use a single instance of the
code after having pushed their vector indicator (resulting in
0x10 bytes per instance, to retain 16-byte alignment of the
individual entry points).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CA230200007800064065@nat28.tlf.novell.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

39e95433

x86-64: Slightly shorten int_ret_from_sys_call · 70ea6855

Jan Beulich authored Nov 29, 2011

Testing for a return to ring 0 was necessary here solely because
of the branch out of ret_from_fork. That branch, however, can be
directed to retint_restore_args, and thus the test-and-branch
can be eliminated here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4C7EE0200007800064028@nat28.tlf.novell.comSigned-off-by: Ingo Molnar <mingo@elte.hu>

70ea6855

x86, efi: Convert efi_phys_get_time() args to physical addresses · e9a9eca5

Maurice Ma authored Oct 11, 2011

Because callers of efi_phys_get_time() pass virtual stack
addresses as arguments, we need to find their corresponding
physical addresses and when calling GetTime() in physical mode.

Without this patch the following line is printed on boot,

	"Oops: efitime: can't read time!"
Signed-off-by: Maurice Ma <maurice.ma@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1318330333-4617-1-git-send-email-matt@console-pimps.orgSigned-off-by: Ingo Molnar <mingo@elte.hu>

e9a9eca5

x86: Default to vsyscall=emulate · 2e57ae05

Andy Lutomirski authored Nov 07, 2011

This essentially reverts:

  2b666859: x86: Default to vsyscall=native for now

The ABI breakage should now be fixed by:

 commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9
 Author: Andy Lutomirski <luto@mit.edu>
 Date:   Thu Oct 20 08:48:19 2011 -0700

    x86-64: Set siginfo and context on vsyscall emulation faults
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@elte.hu>

2e57ae05

x86-64: Set siginfo and context on vsyscall emulation faults · 4fc34901

Andy Lutomirski authored Nov 07, 2011

To make this work, we teach the page fault handler how to send
signals on failed uaccess.  This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly.  UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.
Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@elte.hu>

4fc34901

Merge branch 'upstream/ticketlock-cleanup' of... · 01acc269

Ingo Molnar authored Dec 05, 2011

Merge branch 'upstream/ticketlock-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/asm

01acc269

04 Dec, 2011 1 commit

x86: Fix boot failures on older AMD CPU's · 8e8da023

Linus Torvalds authored Dec 04, 2011

People with old AMD chips are getting hung boots, because commit
bcb80e53 ("x86, microcode, AMD: Add microcode revision to
/proc/cpuinfo") moved the microcode detection too early into
"early_init_amd()".

At that point we are *so* early in the booth that the exception tables
haven't even been set up yet, so the whole

	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

doesn't actually work: if the rdmsr does a GP fault (due to non-existant
MSR register on older CPU's), we can't fix it up yet, and the boot fails.

Fix it by simply moving the code to a slightly later point in the boot
(init_amd() instead of early_init_amd()), since the kernel itself
doesn't even really care about the microcode patchlevel at this point
(or really ever: it's made available to user space in /proc/cpuinfo, and
updated if you do a microcode load).
Reported-tested-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Bob Tracy <rct@gherkin.frus.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8e8da023

03 Dec, 2011 1 commit

xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf

Konrad Rzeszutek Wilk authored Nov 21, 2011

The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e5fd47bf

02 Dec, 2011 14 commits

Merge branch 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · af968e29

Linus Torvalds authored Dec 02, 2011

* 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
  usb: ftdi_sio: add PID for Propox ISPcable III
  Revert "xHCI: reset-on-resume quirk for NEC uPD720200"
  xHCI: fix bug in xhci_clear_command_ring()
  usb: gadget: fsl_udc: fix dequeuing a request in progress
  usb: fsl_mxc_udc.c: Remove compile-time dependency of MX35 SoC type
  usb: fsl_mxc_udc.c: Fix build issue by including missing header file
  USB: fsl_udc_core: use usb_endpoint_xfer_isoc to judge ISO XFER
  usb: udc: Fix gadget driver's speed check in various UDC drivers
  usb: gadget: fix g_serial regression
  usb: renesas_usbhs: fixup driver speed
  usb: renesas_usbhs: fixup gadget.dev.driver when udc_stop.
  usb: renesas_usbhs: fixup signal the driver that cable was disconnected
  usb: renesas_usbhs: fixup device_register timing
  usb: musb: PM: fix context save/restore in suspend/resume path
  USB: linux-cdc-acm.inf: add support for the acm_ms gadget
  EHCI : Fix a regression in the ISO scheduler
  xHCI: reset-on-resume quirk for NEC uPD720200
  USB: whci-hcd: fix endian conversion in qset_clear()
  USB: usb-storage: unusual_devs entry for Kingston DT 101 G2
  usb: option: add SIMCom SIM5218
  ...

af968e29

Merge branch 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · f9143eae

Linus Torvalds authored Dec 02, 2011

* 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Staging: comedi: fix integer overflow in do_insnlist_ioctl()
  Revert "Staging: comedi: integer overflow in do_insnlist_ioctl()"
  Staging: comedi: integer overflow in do_insnlist_ioctl()
  Staging: comedi: fix signal handling in read and write
  Staging: comedi: fix mmap_count
  staging: comedi: fix oops for USB DAQ devices.
  staging: comedi: usbduxsigma: Fixed wrong range for the analogue channel.
  staging:rts_pstor:Complete scanning_done variable
  staging: usbip: bugfix for deadlock

f9143eae

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs · ffb8fb54

Linus Torvalds authored Dec 02, 2011

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix attr2 vs large data fork assert
  xfs: force buffer writeback before blocking on the ilock in inode reclaim
  xfs: validate acl count

ffb8fb54

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 7ed89aed
Linus Torvalds authored Dec 02, 2011
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: Correct General touch PID
```
7ed89aed

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · c2b5adb4

Linus Torvalds authored Dec 02, 2011

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  vmwgfx: integer overflow in vmw_kms_update_layout_ioctl()
  drm/radeon/kms: fix 2D tiling CS support on EG/CM
  drm/radeon/kms: fix scanout of 2D tiled buffers on EG/CM
  drm: Fix lack of CRTC disable for drm_crtc_helper_set_config(.fb=NULL)
  drm/radeon/kms: add some new pci ids
  drm/radeon/kms: Skip ACPI call to ATIF when possible
  drm/radeon/kms: Hide debugging message
  drm/radeon/kms: add some loop timeouts in pageflip code
  drm/nv50/disp: silence compiler warning
  drm/nouveau: fix oopses caused by clear being called on unpopulated ttms
  drm/nouveau: Keep RAMIN heap within the channel.
  drm/nvd0/disp: fix sor dpms typo, preventing dpms on in some situations
  drm/nvc0/gr: fix TP init for transform feedback offset queries
  drm/nouveau: add dumb ioctl support

c2b5adb4

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 0efebaa7

Linus Torvalds authored Dec 02, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix S3/S4 problem on machines with VREF-pin mute-LED
  ALSA: hda_intel - revert a quirk that affect VIA chipsets
  ALSA: hda - Avoid touching mute-VREF pin for IDT codecs
  firmware: Sigma: Fix endianess issues
  firmware: Sigma: Skip header during CRC generation
  firmware: Sigma: Prevent out of bounds memory access
  ALSA: usb-audio - Support for Roland GAIA SH-01 Synthesizer
  ASoC: Supply dcs_codes for newer WM1811 revisions
  ASoC: Error out if we can't generate a LRCLK at all for WM8994
  ASoC: Correct name of Speyside Main Speaker widget
  ASoC: skip resume of soc-audio devices without codecs
  ASoC: cs42l51: Fix off-by-one for reg_cache_size
  ASoC: drop support for PlayPaq with WM8510
  ASoC: mpc8610: tell the CS4270 codec that it's the master
  ASoC: cs4720: use snd_soc_cache_sync()
  ASoC: SAMSUNG: Fix build error
  ASoC: max9877: Update register if either val or val2 is changed
  ASoC: Fix wrong define for AD1836_ADC_WORD_OFFSET

0efebaa7

vmwgfx: integer overflow in vmw_kms_update_layout_ioctl() · bab9efc2

Xi Wang authored Nov 28, 2011

There are two issues in vmw_kms_update_layout_ioctl().  First, the
for loop forgets to index rects and only checks the first element.
Second, there is a potential integer overflow if userspace passes
in a large arg->num_outputs.  The call to kzalloc() would allocate
a small buffer, leading to out-of-bounds read.
Reported-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

bab9efc2

drm/radeon/kms: fix 2D tiling CS support on EG/CM · f3a71df0

Alex Deucher authored Nov 28, 2011

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=43191Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

f3a71df0

drm/radeon/kms: fix scanout of 2D tiled buffers on EG/CM · 392e3722

Alex Deucher authored Nov 28, 2011

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=43191Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

392e3722

drm: Fix lack of CRTC disable for drm_crtc_helper_set_config(.fb=NULL) · 6eebd6bb

Chris Wilson authored Nov 28, 2011

Disabling the CRTC by setting its framebuffer to NULL, as used by
drm_framebuffer_cleanup(), was failing to pass the current framebuffer
to the crtc_func->disable callback. This is because of the dance within
drm_crtc_helper_set_config to pass the new_fb (NULL in this case) to the
drm_crtc_helper_set_mode with the currently attached fb as a parameter.
drm_crtc_helper_set_mode treats this as a no-op and the encoder is still
enabled. And so the current fb is forgotten before the call to
drm_helper_disable_unused_functions.

This patch treats disabling the CRTC as a simple special case rather
than adding further complexity into the configuration logic.

This fixes a pin-leak of the fb bo on Xserver close.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>

6eebd6bb

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5983fe2b

Linus Torvalds authored Dec 01, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS
  ipv4: flush route cache after change accept_local
  sch_red: fix red_change
  Revert "udp: remove redundant variable"
  bridge: master device stuck in no-carrier state forever when in user-stp mode
  ipv4: Perform peer validation on cached route lookup.
  net/core: fix rollback handler in register_netdevice_notifier
  sch_red: fix red_calc_qavg_from_idle_time
  bonding: only use primary address for ARP
  ipv4: fix lockdep splat in rt_cache_seq_show
  sch_teql: fix lockdep splat
  net: fec: Select the FEC driver by default for i.MX SoCs
  isdn: avoid copying too long drvid
  isdn: make sure strings are null terminated
  netlabel: Fix build problems when IPv6 is not enabled
  sctp: better integer overflow check in sctp_auth_create_key()
  sctp: integer overflow in sctp_auth_create_key()
  ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.
  net: Fix corruption in /proc/*/net/dev_mcast
  mac80211: fix race between the AGG SM and the Tx data path
  ...

5983fe2b

netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS · 3ced1be5
David S. Miller authored Dec 01, 2011
```
firewalld in Fedora 16 needs this.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
3ced1be5

ipv4: flush route cache after change accept_local · d01ff0a0

Peter Pan(潘卫平) authored Dec 01, 2011

After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d01ff0a0

sch_red: fix red_change · 1ee5fa1e

Eric Dumazet authored Dec 01, 2011

Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

> (Almost) nobody uses RED because they can't figure it out.
> According to Wikipedia, VJ says that:
>  "there are not one, but two bugs in classic RED."

RED is useful for high throughput routers, I doubt many linux machines
act as such devices.

I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
Gummadi, Scott Shender), August 2001

In this version, maxp is dynamic (from 1% to 50%), and user only have to
setup min_th (target average queue size)
(max_th and wq (burst in linux RED) are automatically setup)

By the way it seems we have a small bug in red_change()

if (skb_queue_empty(&sch->q))
	red_end_of_idle_period(&q->parms);

First, if queue is empty, we should call
red_start_of_idle_period(&q->parms);

Second, since we dont use anymore sch->q, but q->qdisc, the test is
meaningless.

Oh well...

[PATCH] sch_red: fix red_change()

Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
we start an idle period, not end it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ee5fa1e

01 Dec, 2011 6 commits

Linux 3.2-rc4 · 5611cc45
Linus Torvalds authored Dec 01, 2011

5611cc45

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · 0a4ebed7

Linus Torvalds authored Dec 01, 2011

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
  ocfs2: avoid unaligned access to dqc_bitmap
  ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
  ocfs2: honor O_(D)SYNC flag in fallocate
  ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
  ocfs2: send correct UUID to cleancache initialization
  ocfs2: Commit transactions in error cases -v2
  ocfs2: make direntry invalid when deleting it
  fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
  ocfs2: Avoid livelock in ocfs2_readpage()
  ocfs2: serialize unaligned aio
  ocfs2: Implement llseek()
  ocfs2: Fix ocfs2_page_mkwrite()
  ocfs2: Add comment about orphan scanning
  ocfs2: Clean up messages in the fs
  ocfs2/cluster: Cluster up now includes network connections too
  ocfs2/cluster: Add new function o2net_fill_node_map()
  ocfs2/cluster: Fix output in file elapsed_time_in_ms
  ocfs2/dlm: dlmlock_remote() needs to account for remastery
  ocfs2/dlm: Take inflight reference count for remotely mastered resources too
  ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
  ...

0a4ebed7

ocfs2: avoid unaligned access to dqc_bitmap · 93925579

Akinobu Mita authored Nov 15, 2011

The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

93925579

Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm · 3b120ab7

Linus Torvalds authored Dec 01, 2011

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
ARM: 7182/1: ARM cpu topology: fix warning
ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
ARM: 7177/1: GIC: avoid skipping non-existent PPIs in irq_start calculation
ARM: 7176/1: cpu_pm: register GIC PM notifier only once
ARM: 7175/1: add subname parameter to mfp_set_groupg callers
ARM: 7174/1: Fix build error in kprobes test code on Thumb2 kernels
ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
ARM: 7170/2: fix compilation breakage in entry-armv.S
ARM: 7168/1: use cache type functions for arch_get_unmapped_area
ARM: perf: check that we have a platform device when reserving PMU
ARM: 7166/1: Use PMD_SHIFT instead of PGDIR_SHIFT in dma-consistent.c
ARM: 7165/2: PL330: Fix typo in _prepare_ccr()
ARM: 7163/2: PL330: Only register usable channels
ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds
ARM: 7161/1: errata: no automatic store buffer drain
ARM: perf: initialise used_mask for fake PMU during validation
ARM: PMU: remove pmu_init declaration
ARM: PMU: re-export release_pmu symbol to modules

3b120ab7

Revert "udp: remove redundant variable" · 59c2cdae

David S. Miller authored Dec 01, 2011

This reverts commit 81d54ec8.

If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated.  So we won't compute
the same values as the original code did.
Reported-by: paul bilke <fsmail@conspiracy.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

59c2cdae

Merge branch 'for-usb-linus' of... · 8593b6f6

Greg Kroah-Hartman authored Dec 01, 2011

Merge branch 'for-usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci into usb-linus

* 'for-usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci:
  Revert "xHCI: reset-on-resume quirk for NEC uPD720200"
  xHCI: fix bug in xhci_clear_command_ring()

8593b6f6