Commits · 9a262d5c24c63d2b7bea05e41d9b3bfbef63e903 · nexedi / linux

09 Jan, 2008 21 commits

[NET]: Fix netx-eth.c compilation. · 9a262d5c

Adrian Bunk authored Jan 05, 2008

This was missed when commit e2ac455a
fixed the compile errors in drivers/net/netx-eth.c caused by
commit 09f75cd7.
Signed-off-by: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a262d5c

[IPV4] ipconfig: Fix regression in ip command line processing · 92ffb85d

Amos Waterland authored Jan 05, 2008

The recent changes for ip command line processing fixed some problems
but unfortunately broke some common usage scenarios.  In current
2.6.24-rc6 the following command line results in no IP address
assignment, which is surely a regression:

 ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off

Please find below a patch that works for all cases I can find.
Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92ffb85d

[IPV4] raw: Strengthen check on validity of iph->ihl · f844c74f

Herbert Xu authored Jan 05, 2008

We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length.  However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.

This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

f844c74f

[NIU]: Update driver version and release date. · cb77df3e
David S. Miller authored Jan 05, 2008
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
cb77df3e

[NIU]: Fix potentially stuck TCP socket send queues. · 3ebebccf

David S. Miller authored Jan 04, 2008

It is possible for the TX ring to have packets sit in it for unbounded
amounts of time.

The only way to defer TX interrupts in the chip is to periodically set
"mark" bits, when processing of a TX descriptor with the mark bit set
is complete it triggers the interrupt for the TX queue's LDG.

A consequence of this kind of scheme is that if packet flow suddenly
stops, the remaining TX packets will just sit there.

If this happens, since those packets could be charged to TCP socket
send queues, such sockets could get stuck.

The simplest solution is to divorce the socket ownership of the packet
once the device takes the SKB, by using skb_orphan() in
niu_start_xmit().

In hindsight, it would have been much nicer if the chip provided two
interrupt sources for TX (like basically every other ethernet chip
does).  Namely, keep the "mark" bit, but also signal the LDG when the
TX queue becomes completely empty.  That way there is no need to have
a deadlock breaker like this.
Signed-off-by: David S. Miller <davem@davemloft.net>

3ebebccf

[NIU]: Missing ->last_rx update. · 792dd90f
David S. Miller authored Jan 04, 2008
```
Noticed by Paul Lodridge.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
792dd90f

[NIU]: Fix slowpath interrupt handling. · 406f353c

Matheos Worku authored Jan 04, 2008

niu_slowpath_interrupt() expects values to be setup in lp->{v0,v1,v2}
but they aren't.  That's only done by niu_schedule_napi() which is
done later in the interrupt path.

If niu_rx_error() returns zero, and v0 is clear, hit the
RX_DMA_CTL_STATE register with a RX_DMA_CTL_STAT_MEX.

Only emit verbose RX error logs if a fatal channel or port error is
signalled.  Other cases will be recorded into statistics by
niu_log_rxchan_errors().
Signed-off-by: David S. Miller <davem@davemloft.net>

406f353c

futex: Prevent stale futex owner when interrupted/timeout · cdf71a10

Thomas Gleixner authored Jan 08, 2008

Roland Westrelin did a great analysis of a long standing thinko in the
return path of futex_lock_pi.

While we fixed the lock steal case long ago, which was easy to trigger,
we never had a test case which exposed this problem and stupidly never
thought about the reverse lock stealing scenario and the return to user
space with a stale state.

When a blocked tasks returns from rt_mutex_timed_locked without holding
the rt_mutex (due to a signal or timeout) and at the same time the task
holding the futex is releasing the futex and assigning the ownership of
the futex to the returning task, then it might happen that a third task
acquires the rt_mutex before the final rt_mutex_trylock() of the
returning task happens under the futex hash bucket lock. The returning
task returns to user space with ETIMEOUT or EINTR, but the user space
futex value is assigned to this task. The task which acquired the
rt_mutex fixes the user space futex value right after the hash bucket
lock has been released by the returning task, but for a short period of
time the user space value is wrong.

Detailed description is available at:

https://bugzilla.redhat.com/show_bug.cgi?id=400541

The fix for this is the same as we do when the rt_mutex was acquired by
a higher priority task via lock stealing from the designated new owner.
In that case we already fix the user space value and the internal
pi_state up before we return. This mechanism can be used to fixup the
above corner case as well. When the returning task, which failed to
acquire the rt_mutex, notices that it is the designated owner of the
futex, then it fixes up the stale user space value and the pi_state,
before returning to user space. This happens with the futex hash bucket
lock held, so the task which acquired the rt_mutex is guaranteed to be
blocked on the hash bucket lock. We can access the rt_mutex owner, which
gives us the pid of the new owner, safely here as the owner is not able
to modify (release) it while waiting on the hash bucket lock.

Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to
avoid confusion with current and add the check for the stale state into
the failure path of rt_mutex_trylock() in the return path of
unlock_futex_pi(). If the situation is detected use
fixup_pi_state_owner() to assign everything to the owner of the
rt_mutex.
Pointed-out-and-tested-by: Roland Westrelin <roland.westrelin@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cdf71a10

pl2303: Fix mode switching regression · bf5e5834

Alan Cox authored Jan 08, 2008

Cleaning out all the incorrect 'no change made' checks for termios
settings showed up a problem with the PL2303. The hardware here seems to
lose sync and bits if you tell it to make no changes. This shows up with
a real world application.

To fix this the driver check for meaningful hardware changes is restored
but doing the tests correctly and as a tty layer function so it doesn't
get duplicated wrongly everywhere if other drivers turn out to need it.
Signed-off-by: Alan Cox <alan@redhat.com>
Tested-by: Mirko Parthey <mirko.parthey@informatik.tu-chemnitz.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bf5e5834

hfs: handle more on-disk corruptions without oopsing · cf059462

Eric Sandeen authored Jan 08, 2008

hfs seems prone to bad things when it encounters on disk corruption.  Many
values are read from disk, and used as lengths to memcpy, as an example.
This patch fixes up several of these problematic cases.

o sanity check the on-disk maximum key lengths on mount
  (these are set to a defined value at mkfs time and shouldn't differ)
o check on-disk node keylens against the maximum key length for each tree
o fix hfs_btree_open so that going out via free_tree: doesn't wind
  up in hfs_releasepage, which wants to follow the very pointer
  we were trying to set up:
	HFS_SB(sb)->cat_tree = hfs_btree_open()
		...
		failure gets to hfs_releasepage and tries
		to follow HFS_SB(sb)->cat_tree

Tested with the fsfuzzer; it survives more than it used to.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cf059462

Fix crash with FLAT_MEMORY and ARCH_PFN_OFFSET != 0 · 467bc461

Thomas Bogendoerfer authored Jan 08, 2008

When using FLAT_MEMORY and ARCH_PFN_OFFSET is not 0, the kernel crashes in
memmap_init_zone().  This bug got introduced by commit
c713216dSigned-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

467bc461

snd_mixer_oss_build_input(): fix for __you_cannot_kmalloc_that_much failure with gcc-3.2 · 22a860a9

Jean Delvare authored Jan 08, 2008

Rework this functions so that gcc-3.2 can successfully perform
constant-folding.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

22a860a9

dmi-id: fix for __you_cannot_kmalloc_that_much failure · ce8c628a

Jean Delvare authored Jan 08, 2008

gcc 3.2 has a hard time coping with the code in dmi_id_init():

drivers/built-in.o(.init.text+0x789e): In function `dmi_id_init':
: undefined reference to `__you_cannot_kmalloc_that_much'
make: *** [.tmp_vmlinux1] Error 1

Moving half of the code to a separate function seems to help.  This is a
no-op for gcc 4.1 which will successfully inline the code anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Dave Airlie <airlied@linux.ie>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ce8c628a

vmcoreinfo: add the array length of "free_list" for filtering free pages · 83a08e7c

Ken'ichi Ohmichi authored Jan 08, 2008

This patch adds the array length of "free_area.free_list" to the vmcoreinfo
data so that makedumpfile (dump filtering command) can exclude all free pages
in linux-2.6.24.

makedumpfile creates a small dumpfile by excluding unnecessary pages for the
analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo
data which has the minimum debugging information only for dump filtering.

In 2.6.24-rc1 or later, the free_area.free_list is an array which has one list
for each migrate types instead of a single list. makedumpfile needs the array
length of "free_area.free_list" and the vmcoreinfo data should contain it.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

83a08e7c

eCryptfs: fix dentry handling on create error, unlink, and inode destroy · caeeeecf

Michael Halcrow authored Jan 08, 2008

This patch corrects some erroneous dentry handling in eCryptfs.

If there is a problem creating the lower file, then there is nothing that
the persistent lower file can do to really help us.  This patch makes a
vfs_create() failure in the lower filesystem always lead to an
unconditional do_create failure in eCryptfs.

Under certain sequences of operations, the eCryptfs dentry can remain in
the dcache after an unlink.  This patch calls d_drop() on the eCryptfs
dentry to correct this.

eCryptfs has no business calling d_delete() directly on a lower
filesystem's dentry.  This patch removes the call to d_delete() on the
lower persistent file's dentry in ecryptfs_destroy_inode().

(Thanks to David Kleikamp, Eric Sandeen, and Jeff Moyer for helping
identify and resolve this issue)
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

caeeeecf

xip: fix get_zeroed_page with __GFP_HIGHMEM · c51b1a16

Akinobu Mita authored Jan 08, 2008

The use of get_zeroed_page() with __GFP_HIGHMEM is invalid.  Use
alloc_page() with __GFP_ZERO instead of invalid get_zeroed_page().

(This patch is only compile tested)

Cc: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c51b1a16

md: fix data corruption when a degraded raid5 array is reshaped · 0f94e87c

Dan Williams authored Jan 08, 2008

We currently do not wait for the block from the missing device to be
computed from parity before copying data to the new stripe layout.

The change in the raid6 code is not techincally needed as we don't delay
data block recovery in the same way for raid6 yet.  But making the change
now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0f94e87c

KEYS: fix macro · 5b7741b3

Sebastian Siewior authored Jan 08, 2008

Commit 664cceb0 changed the parameters of
the function make_key_ref().  The macros that are used in case CONFIG_KEY
is not defined did not change.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5b7741b3

fat: optimize fat_count_free_clusters() · 9f966be8

OGAWA Hirofumi authored Jan 08, 2008

On large partition, scanning the free clusters is very slow if users
doesn't use "usefree" option.

For optimizing it, this patch uses sb_breadahead() to read of FAT
sectors. On some user's 15GB partition, this patch improved it very
much (1min => 600ms).

The following is the result of 2GB partition on my machine.

without patch:
	root@devron (/)# time df -h > /dev/null

	real    0m1.202s
	user    0m0.000s
	sys     0m0.440s

with patch:
	root@devron (/)# time df -h > /dev/null

	real    0m0.378s
	user    0m0.012s
	sys     0m0.168s
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9f966be8

spi_bitbang: always grab lock with irqs blocked · d52df2e2

David Brownell authored Jan 08, 2008

Fix a glitch reported by lockdep in the spi_bitbang code: it needs to
consistently block IRQs when holding that spinlock.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d52df2e2

x86: fix do_fork_idle section mismatch · a2b484a2

Thomas Gleixner authored Jan 09, 2008

With CPU_HOTPLUG=n:

WARNING: vmlinux.o(.text+0x104f8): Section mismatch: reference to .init.text:fork_idle (between
'do_fork_idle' and 'lapic_timer_broadcast')

do_fork_idle() needs to be __cpuinit. It can be static as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a2b484a2

08 Jan, 2008 5 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 165e4694

Linus Torvalds authored Jan 08, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/srp: Release transport before removing host
  IB/mlx4: Fix value of pkey_index in QP1 completions
  MAINTAINERS: Update Sean Hefty's email address

165e4694

IB/srp: Release transport before removing host · ad696989

Dave Dillow authored Jan 03, 2008

The documented call sequence for removing a host is to call the
transport xxx_remove_host() prior to scsi_remove_host(). The SRP
transport used to crash when that order was followed, but as it is now
fixed, use the documented order.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ad696989

IB/mlx4: Fix value of pkey_index in QP1 completions · e1bb7843

Dotan Barak authored Jan 07, 2008

Fix the value of pkey_index in completions to get a valid value for
GSI QPs.  Without this fix, incoming GSI packets on port 2 get an
invalid P_Key index in the completion, which prevents the MAD layer
from sending back a response, which can make the second port of
ConnectX HCAs completely useless.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

e1bb7843

Revert "hda_intel suspend latency: shorten codec read" · d238998f

Linus Torvalds authored Jan 08, 2008

This reverts commit 57a04513.

Harald Dunkel reports that it broke sound for him:
  "Alsa stopped working for me.  I still can access /dev/dsp, change the
   volume and so on, but the speakers are quiet."

Reverting it fixed things for him.
Reported-and-tested-by: Harald Dunkel <harald.dunkel@t-online.de>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d238998f

MAINTAINERS: Update Sean Hefty's email address · ed96f247

Sean Hefty authored Jan 02, 2008

My Unix email account is being discontinued at end of Q1 '08.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ed96f247

07 Jan, 2008 11 commits

acct: real_parent ppid · b59f8197

Roland McGrath authored Jan 07, 2008

The ac_ppid field reported in process accounting records
should match what getppid() would have returned to that
process, regardless of whether a debugger is attached.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b59f8197

core dump: real_parent ppid · 45626bb2

Roland McGrath authored Jan 07, 2008

The pr_ppid field reported in core dumps should match what
getppid() would have returned to that process, regardless of
whether a debugger is attached.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

45626bb2

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · e4c6d3c6

Linus Torvalds authored Jan 07, 2008

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Fix CONFIG_BOOT_RAW.
  [MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug
  [MIPS] Fix IP32 breakage
  [MIPS] Alchemy: Fix use of __init code bug exposed by modpost warning
  [MIPS] Move inclusing of kernel/time/Kconfig menu to appropriate place

e4c6d3c6

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb · 89a30a83

Linus Torvalds authored Jan 07, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (6916): ivtv: udelay has to be changed *after* the eeprom was read, not before
  V4L/DVB (6944a): Fix Regression VIDIOCGMBUF ioctl hangs on bttv driver

89a30a83

[MIPS] Fix CONFIG_BOOT_RAW. · ba820c5c

Ralf Baechle authored Jan 07, 2008

This was broken by 017e3a492683b32d17dcd1b13b279745cc656073 (lmo) /
396a2ae0 (kernel.org).
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ba820c5c

[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug · ce202cbb

Thomas Bogendoerfer authored Jan 04, 2008

This seems as reasonable assumption and gets some SNI machines to work
which currently must rely on the cp0 counter as clocksource.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ce202cbb

[MIPS] Fix IP32 breakage · c990081b

Thomas Bogendoerfer authored Jan 05, 2008

- suppress master aborts during config read
- set io_map_base
- only fixup end of iomem resource to avoid failing request_resource
  in serial driver
- killed useless setting of crime_int bit, which caused wrong interrupts
- use physcial address for serial port platform device and let 8250
  driver do the ioremap
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

c990081b

[MIPS] Alchemy: Fix use of __init code bug exposed by modpost warning · 9cfacb79

Sergei Shtylyov authored Dec 25, 2007

WARNING: vmlinux.o(.text+0x1ca608): Section mismatch: reference to
.init.text: add_wired_entry (between 'config_access' and 'config_read')

by refactoring the code calling add_wired_entry() from config_access() to
a separate function which is called from aau1x_pci_setup(). While at it:

- make some unnecassarily global variables 'static';

- fix the letter case, whitespace, etc. in the comments...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

9cfacb79

[MIPS] Move inclusing of kernel/time/Kconfig menu to appropriate place · c4eee283

Atsushi Nemoto authored Nov 12, 2007

CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS should be selected in "Kernel
type" menu, not in "CPU selection" menu.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

c4eee283

V4L/DVB (6916): ivtv: udelay has to be changed *after* the eeprom was read, not before · 89dab357

Hans Verkuil authored Jan 07, 2008

The eeprom decides which Hauppauge model it is, so the decision whether to
use an udelay of 5 or 10 needs to be taken after reading the eeprom, not
before.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>

89dab357

V4L/DVB (6944a): Fix Regression VIDIOCGMBUF ioctl hangs on bttv driver · d9030f57

Gregor Jasny authored Jan 06, 2008

Fix bttv VIDIOCGMBUF locking like done in commit
820eacd8. 
Signed-off-by: Gregor Jasny <gjasny@web.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>

d9030f57

06 Jan, 2008 3 commits

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 2b300d20

Linus Torvalds authored Jan 06, 2008

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 4691/1: add missing i2c_board_info struct for at91rm9200
  [ARM] 4735/1: Unbreak pxa25x suspend/resume

2b300d20

Linux 2.6.24-rc7 · 3ce54450
Linus Torvalds authored Jan 06, 2008

3ce54450

CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU · a263898f

Ingo Molnar authored Dec 30, 2007

make randconfig bootup testing found that the cpufreq code
crashes on bootup, if the powernow-k8 driver is enabled and
if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU
kernel.

First lockdep found out that there's an inconsistent unlock
sequence:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 but there are no more locks to release!

Call Trace:
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c
 [<ffffffff80252f3a>] mark_held_locks+0x56/0x94
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4
 ...

then shortly afterwards the cpufreq code crashed on an assert:

 ------------[ cut here ]------------
 kernel BUG at drivers/cpufreq/cpufreq.c:1068!
 invalid opcode: 0000 [1] SMP
 [...]
 Call Trace:
  [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91
  [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2
  [<ffffffff80cc0596>] powernowk8_init+0x86/0x94
 [...]
 ---[ end trace 1e9219be2b4431de ]---

the bug was caused by maxcpus=1 bootup, which brought up the
secondary core as !cpu_online() but !cpu_is_offline() either,
which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h):

  /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */
  static inline int cpu_is_offline(int cpu) { return 0; }

but the cpufreq code uses cpu_online() and cpu_is_offline() in
a mixed way - the low-level drivers use cpu_online(), while
the cpufreq core uses cpu_is_offline(). This opened up the
possibility to add the non-initialized sysdev device of the
secondary core:

 cpufreq-core: trying to register driver powernow-k8
 cpufreq-core: adding CPU 0
 powernow-k8: BIOS error - no PSB or ACPI _PSS objects
 cpufreq-core: initialization failed
 cpufreq-core: adding CPU 1
 cpufreq-core: initialization failed

which then blew up. The fix is to make cpu_is_offline() always
the negation of cpu_online(). With that fix applied the kernel
boots up fine without crashing:

 Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94()
 powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
 powernow-k8: BIOS error - no PSB or ACPI _PSS objects
 initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19.
 initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94()
 Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39()

We could fix this by making CPU enumeration aware of max_cpus, but that
would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu)
possibility was quite confusing and a continuous source of bugs too.

Most distributions have kernels with CPU hotplug enabled, so this bug
remained hidden for a long time.

Bug forensics:

The broken cpu_is_offline() API variant was introduced via:

 commit a59d2e4e
 Author: Rusty Russell <rusty@rustcorp.com.au>
 Date:   Mon Mar 8 06:06:03 2004 -0800

     [PATCH] minor cleanups for hotplug CPUs

( this predates linux-2.6.git, this commit is available from Thomas's
  historic git tree. )

Then 1.5 years later the cpufreq code made use of it:

 commit c32b6b8e
 Author: Ashok Raj <ashok.raj@intel.com>
 Date:   Sun Oct 30 14:59:54 2005 -0800

     [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers

 +       if (cpu_is_offline(cpu))
 +               return 0;

which is a correct use of the subtly broken new API. v2.6.15 then
shipped with this bug included.

then it took two more years for random-kernel qa to hit it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a263898f