Commits · c1a2562ac5edcb3965760f4a37368122d85657af · nexedi / linux

20 Sep, 2011 38 commits

powerpc/powernv: Implement MSI support for p5ioc2 PCIe · c1a2562a

Benjamin Herrenschmidt authored Sep 19, 2011

This implements support for MSIs on p5ioc2 PHBs. We only support
MSIs on the PCIe PHBs, not the PCI-X ones as the later hasn't been
properly verified in HW.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

c1a2562a

powerpc/powernv: Add support for p5ioc2 PCI-X and PCIe · 61305a96

Benjamin Herrenschmidt authored Sep 19, 2011

This adds support for PCI-X and PCIe on the p5ioc2 IO hub using
OPAL. This includes allocating & setting up TCE tables and config
space access routines.

This also supports fallbacks via RTAS when OPAL is absent, using
legacy TCE format pre-allocated via the device-tree (BML style)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

61305a96

powerpc/powernv: Machine check and other system interrupts · ed79ba9e

Benjamin Herrenschmidt authored Sep 19, 2011

OPAL can handle various interrupt for us such as Machine Checks (it
performs all sorts of recovery tasks and passes back control to us with
informations about the error), Hardware Management Interrupts and Softpatch
interrupts.

This wires up the mechanisms and prints out specific informations returned
by HAL when a machine check occurs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ed79ba9e

powerpc/powernv: Register and handle OPAL interrupts · a125e092

Benjamin Herrenschmidt authored Sep 19, 2011

We do the minimum which is to "pass" interrupts to HAL, which
makes the console smoother and will allow us to implement
interrupt based completion and console.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a125e092

powerpc/powernv: Add OPAL ICS backend · 5c7c1e94

Benjamin Herrenschmidt authored Sep 19, 2011

OPAL handles HW access to the various ICS or equivalent chips
for us (with the exception of p5ioc2 based HEA which uses a

different backend) similarily to what RTAS does on pSeries.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

5c7c1e94

powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks · 628daa8d

Benjamin Herrenschmidt authored Sep 19, 2011

Implements OPAL RTC and NVRAM support and wire all that up to
the powernv platform.

We use RTAS for RTC as a fallback if available. Using RTAS for nvram
is not supported yet, pending some rework/cleanup and generalization
of the pSeries & CHRP code. We also use RTAS fallbacks for power off
and reboot
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

628daa8d

powerpc/powernv: Hookup reboot and poweroff functions · ec27329f

Benjamin Herrenschmidt authored Sep 19, 2011

This calls the respective HAL functions, and spin on hal_poll_event()
to ensure the HAL has a chance to communicate with the FSP to trigger
the reboot or shutdown operation
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ec27329f

powerpc/powernv: Support for OPAL console · daea1175

Benjamin Herrenschmidt authored Sep 19, 2011

This adds a udbg and an hvc console backend for supporting a console
using the OPAL console interfaces.

On OPAL v1 we have hvc0 mapped to whatever console the system was
configured for (network or hvsi serial port) via the service
processor.

On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
which generally corresponds to:

	hvc0 : network console (raw protocol)
	hvc1 : serial port S1 (hvsi)
	hvc2 : serial port S2 (hvsi)

Note: At this point, early debug console only works with OPAL v1
and shouldn't be enabled in a normal kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

daea1175

powerpc/powernv: Add support for instanciating OPAL v2 from Open Firmware · 6e35d5da

Benjamin Herrenschmidt authored Sep 19, 2011

OPAL v2 is instantiated in a way similar to RTAS using Open Firmware
client interface calls, and the resulting address and entry point are
put in the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

6e35d5da

powerpc/powernv: Basic support for OPAL · 14a43e69

Benjamin Herrenschmidt authored Sep 19, 2011

Add definition of OPAL interfaces along with  the wrappers to call
into OPAL runtime and the early device-tree parsing hook to locate
the OPAL runtime firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

14a43e69

powerpc/powernv: Get kernel command line accross OPAL takeover · 817c21ad

Benjamin Herrenschmidt authored Sep 19, 2011

We stash it in boot_command_line which isn't in BSS and so won't
be overwritten. We then use that as a default cmd_line before
we walk the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

817c21ad

powerpc/powernv: Add OPAL takeover from PowerVM · 27f44888

Benjamin Herrenschmidt authored Sep 19, 2011

On machines supporting the OPAL firmware version 1, the system
is initially booted under pHyp. We then use a special hypercall
to verify if OPAL is available and if it is, we then trigger
a "takeover" which disables pHyp and loads the OPAL runtime
firmware, giving control to the kernel in hypervisor mode.

This patch add the necessary code to detect that the OPAL takeover
capability is present when running under PowerVM (aka pHyp) and
perform said takeover to get hypervisor control of the processor.

To perform the takeover, we must first use RTAS (within Open
Firmware runtime environment) to start all processors & threads,
in order to give control to OPAL on all of them. We then call
the takeover hypercall on everybody, OPAL will re-enter the kernel
main entry point passing it a flat device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

27f44888

powerpc/powernv: Add CPU hotplug support · 344eb010

Benjamin Herrenschmidt authored Sep 19, 2011

Unplugged CPU go into NAP mode in a loop until woken up
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

344eb010

of: Change logic to overwrite cmd_line with CONFIG_CMDLINE · 78b782cb

Benjamin Herrenschmidt authored Sep 19, 2011

We used to overwrite with CONFIG_CMDLINE if we found a chosen
node but failed to get bootargs out of it or they were empty,
unless CONFIG_CMDLINE_FORCE is set.

Instead change that to overwrite if "data" is non empty after
the bootargs check. It allows arch code to have other mechanisms
to retrieve the command line prior to parsing the device-tree.

Note: CONFIG_CMDLINE_FORCE case should ideally be handled elsewhere
as it won't work as it-is if the device-tree has no /chosen node
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: devicetree-discuss@lists-ozlabs.org
CC: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>

78b782cb

powerpc: Add skeleton PowerNV platform · 55190f88

Benjamin Herrenschmidt authored Sep 19, 2011

This adds a skeletton for the new Power "Non Virtualized"
platform which will be used by machines supporting running
without an hypervisor, for example in order to run KVM.

These machines will be using a new firmware called OPAL
for which the support will be provided by later patches.

The PowerNV platform is intended to be also usable under
the BML environment used internally for early CPU bringup
which is why the code also supports using RTAS instead of
OPAL in various places.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

55190f88

powerpc/powernv: Don't clobber r9 in relative_toc() · e550592e

Benjamin Herrenschmidt authored Sep 19, 2011

With OPAL, r8 and r9 will be used to pass the OPAL base and entry
for debugging purposes (those informations are also in the
device-tree). We don't want to clobber those registers that
early.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

e550592e

powerpc/pci: Call pcie_bus_configure_settings() · 781fb7a3

Benjamin Herrenschmidt authored Sep 19, 2011

This new function is used to properly setup the PCI Express Max Payload Size
(and in some circumstances Max Read Request Size).

Some systems will not operate properly if these aren't set correctly and
the firmware doesn't always do it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

781fb7a3

powerpc/smp: More generic support for "soft hotplug" · fb82b839

Benjamin Herrenschmidt authored Sep 19, 2011

This adds more generic support for doing CPU hotplug with a simple
idle loop and no actual reset of the processors. The generic
smp_generic_kick_cpu() does the hotplug bringup trick if the PACA
shows that the CPU has already been started at boot and we provide
an accessor for the CPU state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

fb82b839

powerpc/udbg: Fix Kconfig entry for avoiding 44x early debug with KVM · b8bb922c

Benjamin Herrenschmidt authored Sep 19, 2011

It was preventing the global early debug selection whenever KVM was enabled
instead of only preventing the 440 specific one.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

b8bb922c

hvcs: Ensure page aligned partner info buffer · ac07a4a5

Brian King authored Sep 13, 2011

The Power platform requires the partner info buffer to be page aligned
otherwise it will fail the partner info hcall with H_PARAMETER. Switch
from using kmalloc to allocate this buffer to __get_free_page to ensure
page alignment.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ac07a4a5

powerpc: Fix deadlock in icswx code · 8bdafa39

Anton Blanchard authored Sep 14, 2011

The icswx code introduced an A-B B-A deadlock:

     CPU0                    CPU1
     ----                    ----
lock(&anon_vma->mutex);
                             lock(&mm->mmap_sem);
                             lock(&anon_vma->mutex);
lock(&mm->mmap_sem);

Instead of using the mmap_sem to keep mm_users constant, take the
page table spinlock.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

8bdafa39

powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe · a1194097

Anton Blanchard authored Aug 10, 2011

If we echo an address the hypervisor doesn't like to
/sys/devices/system/memory/probe we oops the box:

# echo 0x10000000000 > /sys/devices/system/memory/probe

kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!

The backtrace is:

create_section_mapping
arch_add_memory
add_memory
memory_probe_store
sysdev_class_store
sysfs_write_file
vfs_write
SyS_write

In create_section_mapping we BUG if htab_bolt_mapping returned
an error. A better approach is to return an error which will
propagate back to userspace.

Rerunning the test with this patch applied:

# echo 0x10000000000 > /sys/devices/system/memory/probe
-bash: echo: write error: Invalid argument
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a1194097

powerpc: Coding style cleanups · dfbe93a2

Anton Blanchard authored Aug 10, 2011

While converting code to use for_each_node_by_type I noticed a
number of coding style issues.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

dfbe93a2

powerpc: Use for_each_node_by_type instead of open coding it · 94db7c5e

Anton Blanchard authored Aug 10, 2011

Use for_each_node_by_type instead of open coding it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

94db7c5e

powerpc/numa: Remove double of_node_put in hot_add_node_scn_to_nid · 60831842

Anton Blanchard authored Aug 10, 2011

During memory hotplug testing, I got the following warning:

ERROR: Bad of_node_put() on /memory@0

of_node_release
kref_put
of_node_put
of_find_node_by_type
hot_add_node_scn_to_nid
hot_add_scn_to_nid
memory_add_physaddr_to_nid
...

of_find_node_by_type() loop does the of_node_put for us so we only
need the handle the case where we terminate the loop early.

As suggested by Stephen Rothwell we can do the of_node_put
unconditionally outside of the loop since of_node_put handles a
NULL argument fine.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

60831842

powerpc/numa: Remove duplicate RECLAIM_DISTANCE definition · e377bc5d

Anton Blanchard authored Jul 24, 2011

We have two identical definitions of RECLAIM_DISTANCE, looks like
the patch got applied twice. Remove one.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

e377bc5d

powerpc/numa: Disable NEWIDLE balancing at node level · 7bebcf09

Anton Blanchard authored Jul 24, 2011

On big POWER7 boxes we see large amounts of CPU time in system
processes like workqueue and watchdog kernel threads.

We currently rebalance the entire machine each time a task goes
idle and this is very expensive on large machines. Disable newidle
balancing at the node level and rely on the scheduler tick to
rebalance across nodes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

7bebcf09

powerpc/numa: Increase SD_NODES_PER_DOMAIN to 32. · d4761ad2

Anton Blanchard authored Jul 24, 2011

The largest POWER7 boxes have 32 nodes. SD_NODES_PER_DOMAIN groups
nodes into chunks of 16 and adds a global balancing domain
(SD_ALLNODES) above it.

If we bump SD_NODES_PER_DOMAIN to 32, then we avoid this extra
level of balancing on our largest boxes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

d4761ad2

sched: Allow SD_NODES_PER_DOMAIN to be overridden · 590e4d85

Anton Blanchard authored Jul 24, 2011

We want to override the default value of SD_NODES_PER_DOMAIN on ppc64,
so move it into linux/topology.h.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

590e4d85

powerpc/numa: Enable SD_WAKE_AFFINE in node definition · a200d8e4

Anton Blanchard authored Jul 24, 2011

When chasing a performance issue on ppc64, I noticed tasks
communicating via a pipe would often end up on different nodes.

It turns out SD_WAKE_AFFINE is not set in our node defition. Commit
9fcd18c9 (sched: re-tune balancing) enabled SD_WAKE_AFFINE
in the node definition for x86 and we need a similar change for
ppc64.

I used lmbench lat_ctx and perf bench pipe to verify this fix. Each
benchmark was run 10 times and the average taken.

lmbench lat_ctx:

before:  66565 ops/sec
after:  204700 ops/sec

3.1x faster

perf bench pipe:

before: 5.6570 usecs
after:  1.3470 usecs

4.2x faster
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a200d8e4

Merge remote-tracking branch 'origin/master' into next · 1cce058b
Benjamin Herrenschmidt authored Sep 20, 2011
```
(Merge in order to get the PCIe mps/mrss code fixes)
```
1cce058b

Merge branch 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip · 9d037a77

Linus Torvalds authored Sep 19, 2011

* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86, iommu: Mark DMAR IRQ as non-threaded
  genirq: Make irq_shutdown() symmetric vs. irq_startup again

9d037a77

Merge branch 'for-linus' of git://github.com/chrismason/linux · 50f2d407

Linus Torvalds authored Sep 19, 2011

* 'for-linus' of git://github.com/chrismason/linux:
  Btrfs: only clear the need lookup flag after the dentry is setup
  BTRFS: Fix lseek return value for error
  Btrfs: don't change inode flag of the dest clone file
  Btrfs: don't make a file partly checksummed through file clone
  Btrfs: fix pages truncation in btrfs_ioctl_clone()
  btrfs: fix d_off in the first dirent

50f2d407

USB: xHCI: prevent infinite loop when processing MSE event · c2d7b49f

Andiry Xu authored Sep 19, 2011

When a xHC host is unable to handle isochronous transfer in the
interval, it reports a Missed Service Error event and skips some tds.

Currently xhci driver handles MSE event in the following ways:

1. When encounter a MSE event, set ep->skip flag, update event ring
   dequeue pointer and return.

2. When encounter the next event on this ep, the driver will run the
   do-while loop, fetch td from ep's td_list to find the td
   corresponding to this event.  All tds missed are marked as short
   transfer(-EXDEV).

The do-while loop will end in two ways:

1. If the td pointed by the event trb is found;

2. If the ep ring's td_list is empty.

However, if a buggy HW reports some unpredicted event (for example, an
overrun event following a MSE event while the ep ring is actually not
empty), the driver will never find the td, and it will loop until the
td_list is empty.

Unfortunately, the spinlock is dropped when give back a urb in the
do-while loop.  During the spinlock released period, the class driver
may still submit urbs and add tds to the td_list.  This may cause
disaster, since the td_list will never be empty and the loop never ends,
and the system hangs.

To fix this, count the number of TDs on the ep ring before skipping TDs,
and quit the loop when skipped that number of tds.  This guarantees the
do-while loop will end after certain number of cycles, and driver will
not be trapped in an infinite loop.
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c2d7b49f

USB: xhci: Set change bit when warm reset change is set. · 44f4c3ed

Greg KH authored Sep 19, 2011

Sometimes, when a USB 3.0 device is disconnected, the Intel Panther
Point xHCI host controller will report a link state change with the
state set to "SS.Inactive". This causes the xHCI host controller to
issue a warm port reset, which doesn't finish before the USB core times
out while waiting for it to complete.

When the warm port reset does complete, and the xHC gives back a port
status change event, the xHCI driver kicks khubd. However, it fails to
set the bit indicating there is a change event for that port because the
logic in xhci-hub.c doesn't check for the warm port reset bit.

After that, the warm port status change bit is never cleared by the USB
core, and the xHC stops reporting port status change bits. (The xHCI
spec says it shouldn't report more port events until all change bits are
cleared.) This means any port changes when a new device is connected
will never be reported, and the port will seem "dead" until the xHCI
driver is unloaded and reloaded, or the computer is rebooted. Fix this
by making the xHCI driver set the port change bit when a warm port reset
change bit is set.

A better solution would be to make the USB core handle warm port reset
in differently, merging the current code with the standard port reset
code that does an incremental backoff on the timeout, and tries to
complete the port reset two more times before giving up. That more
complicated fix will be merged next window, and this fix will be
backported to stable.

This should be backported to kernels as old as 3.0, since that was the
first kernel with commit a11496eb ("xHCI: warm reset support").
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

44f4c3ed

staging: fix comedi build when ISA_DMA_API is enabled but COMEDI_PCI is not enabled · c19cc78e

Randy Dunlap authored Sep 19, 2011

Fix build when CONFIG_ISA_DMA_API is enabled but
CONFIG_COMEDI_PCI[_DRIVERS] is not enabled.

Fixes these build errors:

drivers/staging/comedi/drivers/ni_labpc.c: In function 'labpc_ai_cmd':
drivers/staging/comedi/drivers/ni_labpc.c:1351: error: implicit declaration of function 'labpc_suggest_transfer_size'
drivers/staging/comedi/drivers/ni_labpc.c: At top level:
drivers/staging/comedi/drivers/ni_labpc.c:1802: error: conflicting types for 'labpc_suggest_transfer_size'
drivers/staging/comedi/drivers/ni_labpc.c:1351: note: previous implicit declaration of 'labpc_suggest_transfer_size' was here
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c19cc78e

Make taskstats round statistics down to nearest 1k bytes/events · 58c3c3aa

Linus Torvalds authored Sep 19, 2011

Even with just the interface limited to admin, there really is little to
reason to give byte-per-byte counts for taskstats.  So round it down to
something less intrusive.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

58c3c3aa

Make TASKSTATS require root access · 1a51410a

Linus Torvalds authored Sep 19, 2011

Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more.  But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.

Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.
Reported-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1a51410a

19 Sep, 2011 2 commits

powerpc/ps3: Add gelic udbg driver · c26afe9e

Hector Martin authored Aug 31, 2011

Add a new udbg driver for the PS3 gelic Ehthernet device.

This driver shares only a few stucture and constant definitions with the
gelic Ethernet device driver, so is implemented as a stand-alone driver
with no dependencies on the gelic Ethernet device driver.
Signed-off-by: Hector Martin <hector@marcansoft.com>
Signed-off-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

c26afe9e

powerpc/eeh: Fix /proc/ppc64/eeh creation · 8feaa434

Thadeu Lima de Souza Cascardo authored Aug 26, 2011

Since commit 188917e1, /proc/ppc64 is a
symlink to /proc/powerpc/. That means that creating /proc/ppc64/eeh will
end up with a unaccessible file, that is not listed under /proc/powerpc/
and, then, not listed under /proc/ppc64/.

Creating /proc/powerpc/eeh fixes that problem and maintain the
compatibility intended with the ppc64 symlink.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>	[3.x]

8feaa434