Commits · a2b897687d39aea80a8604748a1a48f32e616715 · Kirill Smelkov / linux

16 Nov, 2007 40 commits

Stephen Hemminger authored Nov 06, 2007

patch ab5adecb in mainline.

The D-Link PCI-X board (and maybe others) can lie about status
ring entries. It seems it will update the register for last status
index before completing the DMA for the ring entry. To avoid reading
stale data, zap the old entry and check.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a2b89768

skge: XM PHY handling fixes · 738a0133

Stephen Hemminger authored Nov 06, 2007

patch 501fb72d in mainline.

Change how PHY is managed on SysKonnect fibre based boards.
Poll for PHY coming up 1 per second, but use interrupt to detect loss.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

738a0133

Fix L2TP oopses. · 7581142e

James Chapman authored Nov 13, 2007

changeset 91781004 in mainline.

[PPP]: L2TP: Fix oops in transmit and receive paths

Changes made on 18-sep to fix skb handling in the pppol2tp driver
broke the transmit and receive paths. Users are only running into this
now because distros are now using 2.6.23 and I must have messed up
when I tested the change.

For receive, we now do our own calculation of how much to pull from
the skb (variable length L2TP header) rather than using
skb_transport_offset(). Also, if the skb isn't a data packet, it must
be passed back to UDP with skb->data pointing to the UDP header.

For transmit, make sure skb->sk is set up because ip_queue_xmit()
needs it.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

7581142e

TG3: Fix performance regression on 5705. · 08b37aee

Michael Chan authored Oct 15, 2007

patch 114342f2 in mainline.

A performance regression was introduced by the following commit:

    commit ee6a99b5
    Author: Michael Chan <mchan@broadcom.com>
    Date:   Wed Jul 18 21:49:10 2007 -0700

    [TG3]: Fix msi issue with kexec/kdump.

In making that change, the PCI latency timer and cache line size
registers were not restored after chip reset.  On the 5705, the
latency timer gets reset to 0 during chip reset and this causes
very poor performance.

Update version to 3.81.1
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

08b37aee

forcedeth: add MCP77 device IDs · 091e78e2

Ayaz Abdulla authored Oct 25, 2007

patch 96fd4cd3 in mainline.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

091e78e2

forcedeth msi bugfix · 38892570

Manfred Spraul authored Oct 17, 2007

patch a7475906 in mainline.

pci_enable_msi() replaces the INTx irq number in pci_dev->irq with the
new MSI irq number.
The forcedeth driver did not update the copy in netdevice->irq and
parts of the driver used the stale copy.
See bugzilla.kernel.org, bug 9047.

The patch
- updates netdevice->irq
- replaces all accesses to netdevice->irq with pci_dev->irq.

The patch is against 2.6.23.1. IMHO suitable for both 2.6.23 and 2.6.24
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

38892570

ehea: 64K page kernel support fix · c5b130d0

Jan-Bernd Themann authored Oct 16, 2007

based on 2c69448b in mainline.

The current eHEA module compiled for 64K page kernels can not
be loaded with insmod due to bad hypervisor call parameters.
The patch is a subset of the follwing patch which has been applied
for 2.6.24 upstream:

http://www.spinics.net/lists/netdev/msg42814.htmlSigned-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c5b130d0

libertas: fix endianness breakage · 48884d97

Al Viro authored Oct 09, 2007

patch 57077081 in mainline.

	wep->keytype[] is u8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

48884d97

libertas: more endianness breakage · 7debca25

Al Viro authored Oct 09, 2007

based on patch 8362cd41 in mainline.

	domain->header.len is le16 and has just been assigned
cpu_to_le16(arithmetical expression).  And all fields of adapter->logmsg
are __le32; not a single 16-bit among them...
	That's incremental to the previous one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

7debca25

Linux 2.6.23.4 · 08050400
Greg Kroah-Hartman authored Nov 16, 2007

08050400

mac80211: make ieee802_11_parse_elems return void · ef081163

John W. Linville authored Oct 26, 2007

patch 67a4cce4 in mainline.

Some APs send management frames with junk padding after the last IE.
We already account for a similar problem with some Apple Airport
devices, but at least one device is known to send more than a single
extra byte.  The device in question is the Draytek Vigor2900:

	http://www.draytek.com.au/products/Vigor2900.php

The junk in question looks like an IE that runs off the end of the
frame.  This cause us to return ParseFailed.  Since the frame in
question is an association response, this causes us to fail to associate
with this AP.

The return code from ieee802_11_parse_elems is superfluous.
All callers still check for the presence of the specific IEs that
interest them anyway.  So, remove the return code so the parse never
"fails".
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ef081163

mac80211: only honor IW_SCAN_THIS_ESSID in STA, IBSS, and AP modes · f08fdbd7

John W. Linville authored Oct 26, 2007

patch d114f399 in mainline.

The previous IW_SCAN_THIS_ESSID patch left a hole allowing scan
requests on interfaces in inappropriate modes.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f08fdbd7

mac80211: honor IW_SCAN_THIS_ESSID in siwscan ioctl · f4d709e3

Bill Moss authored Oct 26, 2007

patch 107acb23 in mainline.

This patch fixes the problem of associating with wpa_secured hidden
AP.  Please try out.

The original author of this patch is Bill Moss <bmoss@clemson.edu>
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f4d709e3

mac80211: store SSID in sta_bss_list · 75fc21b4

John W. Linville authored Oct 26, 2007

patch cffdd30d in mainline.

Some AP equipment "in the wild" services multiple SSIDs using the
same BSSID.  This patch changes the key of sta_bss_list to include
the SSID as well as the BSSID and the channel so as to prevent one
SSID from eclipsing another SSID with the same BSSID.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

75fc21b4

mac80211: store channel info in sta_bss_list · 66bc3f66

John W. Linville authored Oct 26, 2007

patch 65c107ab in mainline.

Some AP equipment "in the wild" uses the same BSSID on multiple channels
(particularly "a" vs. "b/g").  This patch changes the key of sta_bss_list
to include both the BSSID and the channel so as to prevent a BSSID on
one channel from eclipsing the same BSSID on another channel.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

66bc3f66

mac80211: reorder association debug output · 20bf97cf

Johannes Berg authored Oct 26, 2007

patch 1dd84aa2 in mainline.

There's no reason to warn about an invalid AID field when the
association was denied.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

20bf97cf

ieee80211: fix TKIP QoS bug · 31a1d25a

Johannes Berg authored Oct 26, 2007

patch e797aa1b in mainline.

The commit 65b6a277 titled "ieee80211: Fix header->qos_ctl endian issue"
*introduced* an endianness bug. Partially revert it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

31a1d25a

NETFILTER: nf_conntrack_tcp: fix connection reopening · 0ac38060

Jozsef Kadlecsik authored Nov 05, 2007

Upstream commits: 17311393 + bc34b841 merged together.  Merge done by
Patrick McHardy <kaber@trash.net>

[NETFILTER]: nf_conntrack_tcp: fix connection reopening

With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:

   When a connection is >>closed actively<<, it MUST linger in
   TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
   However, it MAY >>accept<< a new SYN from the remote TCP to
   reopen the connection directly from TIME-WAIT state, if it:
   [...]

The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

0ac38060

Fix netlink timeouts. · c6736fd4

Patrick McHardy authored Nov 13, 2007

[NETLINK]: Fix unicast timeouts

[ Upstream commit: c3d8d1e3 ]

Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.

ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c6736fd4

Fix crypto_alloc_comp() error checking. · be8962a1

Herbert Xu authored Nov 13, 2007

[IPSEC]: Fix crypto_alloc_comp error checking

[ Upstream commit: 4999f362 ]

The function crypto_alloc_comp returns an errno instead of NULL
to indicate error.  So it needs to be tested with IS_ERR.

This is based on a patch by Vicenç Beltran Querol.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

be8962a1

Fix SET_VLAN_INGRESS_PRIORITY_CMD error return. · 9def747b

Patrick McHardy authored Nov 13, 2007

patch fffe470a in mainline.

[VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl

Based on report and patch by Doug Kehn <rdkehn@yahoo.com>:

vconfig returns the following error when attempting to execute the
set_ingress_map command:

vconfig: socket or ioctl error for set_ingress_map: Operation not permitted

In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD
sets err = -EPERM and calls vlan_dev_set_ingress_priority.
vlan_dev_set_ingress_priority is a void function so err remains
at -EPERM and results in the vconfig error (even though the ingress
map was set).

Fix by setting err = 0 after the vlan_dev_set_ingress_priority call.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9def747b

Fix VLAN address syncing. · dae1e6e8

Patrick McHardy authored Nov 13, 2007

patch d932e04a in mainline.

[PATCH] [VLAN]: Don't synchronize addresses while the vlan device is down

While the VLAN device is down, the unicast addresses are not configured
on the underlying device, so we shouldn't attempt to sync them.

Noticed by Dmitry Butskoy <buc@odusz.so-cdu.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dae1e6e8

Fix endianness bug in U32 classifier. · e809e9c0

Radu Rendec authored Nov 13, 2007

changeset 543821c6 in mainline.

[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.

While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).

The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).

Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.

This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.

One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.

Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.

The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.

With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e809e9c0

Fix TEQL oops. · 61254a93

Evgeniy Polyakov authored Nov 13, 2007

[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline

[ Upstream commit: 4f9f8311 ]

tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses.
not catch it first :)
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

61254a93

Fix error returns in sys_socketpair() · c669e2ad

David Miller authored Nov 13, 2007

patch bf3c23d1 in mainline.

[NET]: Fix error reporting in sys_socketpair().

If either of the two sock_alloc_fd() calls fail, we
forget to update 'err' and thus we'll erroneously
return zero in these cases.

Based upon a report and patch from Rich Paul, and
commentary from Chuck Ebbert.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c669e2ad

softmac: fix wext MLME request reason code endianness · eeb4e8c2

Johannes Berg authored Oct 25, 2007

patch 94e10bfb in mainline.

The MLME request reason code is host-endian and our passing
it to the low level functions is host-endian as well since
they do the swapping. I noticed that the reason code 768 was
sent (0x300) rather than 3 when wpa_supplicant terminates.
This removes the superfluous cpu_to_le16() call.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

eeb4e8c2

Fix kernel_accept() return handling. · cfebbe5f

Tony Battersby authored Oct 23, 2007

patch fa8705b0 in mainline.

[NET]: sanitize kernel_accept() error path

If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore).  Make it pass back NULL
instead for better safety.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cfebbe5f

TCP: Fix size calculation in sk_stream_alloc_pskb · 9fcba471

Herbert Xu authored Nov 14, 2007

[TCP]: Fix size calculation in sk_stream_alloc_pskb

[ Upstream commit: fb93134d ]

We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room.  Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.

As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.

In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room.  TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.

So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9fcba471

Fix SKB_WITH_OVERHEAD calculations. · 8522b496

Herbert Xu authored Oct 23, 2007

patch deea84b0 in mainline.

[NET]: Fix SKB_WITH_OVERHEAD calculation

The calculation in SKB_WITH_OVERHEAD is incorrect in that it can cause
an overflow across a page boundary which is what it's meant to prevent.
In particular, the header length (X) should not be lumped together with
skb_shared_info.  The latter needs to be aligned properly while the header
has no choice but to sit in front of wherever the payload is.

Therefore the correct calculation is to take away the aligned size of
skb_shared_info, and then subtract the header length.  The resulting
quantity L satisfies the following inequality:

	SKB_DATA_ALIGN(L + X) + sizeof(struct skb_shared_info) <= PAGE_SIZE

This is the quantity used by alloc_skb to do the actual allocation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

8522b496

Fix 9P protocol build · b3540573

Ingo Molnar authored Oct 23, 2007

patch 092e9d93 in mainline.

[9P]: build fix with !CONFIG_SYSCTL

found via make randconfig build testing:

 net/built-in.o: In function `init_p9':
 mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
 net/built-in.o: In function `exit_p9':
 mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b3540573

Fix advertised packet scheduler timer resolution · e20e6446

Patrick McHardy authored Oct 23, 2007

patch 3c0cfc13 in mainline

The fourth parameter of /proc/net/psched is supposed to show the timer
resultion and is used by HTB userspace to calculate the necessary
burst rate. Currently we show the clock resolution, which results in a
too low burst rate when the two differ.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e20e6446

Add get_unaligned to ieee80211_get_radiotap_len · d876cd16

Andy Green authored Oct 09, 2007

patch dfe6e81d in mainline.

ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d876cd16

mac80211: Improve sanity checks on injected packets · 1e3bfd14

Andy Green authored Oct 09, 2007

patch 9b8a74e3 in mainline.

Michael Wu noticed that the skb length checking is not taken care of enough when
a packet is presented on the Monitor interface for injection.

This patch improves the sanity checking and removes fake offsets placed
into the skb network and transport header.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1e3bfd14

mac80211: filter locally-originated multicast frames · be3d7bec

John W. Linville authored Oct 09, 2007

patch b3316157 in mainline.

In STA mode, the AP will echo our traffic.  This includes multicast
traffic.

Receiving these frames confuses some protocols and applications,
notably IPv6 Duplicate Address Detection.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

be3d7bec

Linux 2.6.23.3 · ef7cede8
Greg Kroah-Hartman authored Nov 16, 2007

ef7cede8

revert "x86_64: allocate sparsemem memmap above 4G" · edc0636c

Linus Torvalds authored Nov 01, 2007

Reverted upstream by commit 6a22c57b

Revert this commit:

	commit 2e1c49db
	Author: Zou Nan hai <nanhai.zou@intel.com>
	Date:   Fri Jun 1 00:46:28 2007 -0700
	
	x86_64: allocate sparsemem memmap above 4G

This reverts commit 2e1c49db.

First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes.  So the
commit will likely be reverted in the 2.6.23 stable kernels.

Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Martin Ebourne <fedora@ebourne.me.uk>
Cc: Zou Nan hai <nanhai.zou@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

edc0636c

x86: fix TSC clock source calibration error · 963bbb3b

Dave Johnson authored Oct 23, 2007

patch edaf420f in mainline.

I ran into this problem on a system that was unable to obtain NTP sync
because the clock was running very slow (over 10000ppm slow). ntpd had
declared all of its peers 'reject' with 'peer_dist' reason.

On investigation, the tsc_khz variable was significantly incorrect
causing xtime to run slow.  After a reboot tsc_khz was correct so I
did a reboot test to see how often the problem occurred:

Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them
had unacceptable tsc_khz values (>500ppm):

 range of tsc_khz  # of boots  % of boots
 ----------------  ----------  ----------
        < 1999750           0      0.000%
1999750 - 1999800          21      3.048%
1999800 - 1999850         166     24.128%
1999850 - 1999900         241     35.029%
1999900 - 1999950         211     30.669%
1999950 - 2000000          42      6.105%
2000000 - 2000000           0      0.000%
2000050 - 2000100           0      0.000%
                   [...]
2000100 - 2015000           1      0.145%  << BAD
2015000 - 2030000           6      0.872%  << BAD
2030000 - 2045000           1      0.145%  << BAD
2045000 <                   0      0.000%

The worst boot was 2032.577 Mhz, over 1.5% off!

It appears that on rare occasions, mach_countup() is taking longer to
complete than necessary.

I suspect that this is caused by the CPU taking a periodic SMI
interrupt right at the end of the 30ms calibration loop.  This would
cause the loop to delay while the SMI BIOS hander runs. The resulting
TSC value is beyond what it actually should be resulting in a higher
tsc_khz.

The below patch makes native_calculate_cpu_khz() take the best
(shortest duration, lowest khz) run of it's 3 calibration loops.  If a
SMI goes off causing a bad result (long duration, higher khz) it will
be discarded.

With the patch applied, 300 boots of the same system produce good
results:

 range of tsc_khz  # of boots  % of boots
 ----------------  ----------  ----------
        < 1999750           0      0.000%
1999750 - 1999800          30     10.000%
1999800 - 1999850         166     55.333%
1999850 - 1999900          89     29.667%
1999900 - 1999950          15      5.000%
1999950 <                   0      0.000%

Problem was found and tested against 2.6.18.  Patch is against 2.6.22.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

963bbb3b

x86 setup: sizeof() is unsigned, unbreak comparisons · 2d49e888

H. Peter Anvin authored Oct 25, 2007

patch e6e1ace9 in mainline.


We use signed values for limit checking since the values can go
negative under certain circumstances.  However, sizeof() is unsigned
and forces the comparison to be unsigned, so move the comparison into
the heap_free() macros so we can ensure it is a signed comparison.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2d49e888

x86 setup: handle boot loaders which set up the stack incorrectly · 430bb2ee

H. Peter Anvin authored Oct 25, 2007

patch 6b6815c6 in mainline.

Apparently some specific versions of LILO enter the kernel with a
stack pointer that doesn't match the rest of the segments.  Make our
best attempt at untangling the resulting mess.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

430bb2ee

x86: fix global_flush_tlb() bug · 4b69ffe3

Ingo Molnar authored Oct 19, 2007

patch 9a24d04a upstream

While we were reviewing pageattr_32/64.c for unification,
Thomas Gleixner noticed the following serious SMP bug in
global_flush_tlb():

	down_read(&init_mm.mmap_sem);
	list_replace_init(&deferred_pages, &l);
	up_read(&init_mm.mmap_sem);

this is SMP-unsafe because list_replace_init() done on two CPUs in
parallel can corrupt the list.

This bug has been introduced about a year ago in the 64-bit tree:

       commit ea7322de
       Author: Andi Kleen <ak@suse.de>
       Date:   Thu Dec 7 02:14:05 2006 +0100

       [PATCH] x86-64: Speed and clean up cache flushing in change_page_attr

                down_read(&init_mm.mmap_sem);
        -       dpage = xchg(&deferred_pages, NULL);
        +       list_replace_init(&deferred_pages, &l);
                up_read(&init_mm.mmap_sem);

the xchg() based version was SMP-safe, but list_replace_init() is not.
So this "cleanup" introduced a nasty bug.

why this bug never become prominent is a mystery - it can probably be
explained with the (still) relative obscurity of the x86_64 architecture.

the safe fix for now is to write-lock init_mm.mmap_sem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

4b69ffe3