- 13 Mar, 2014 8 commits
-
-
Hans Schillstrom authored
[ Upstream commit accfe0e3 ] The commit 9195bb8e ("ipv6: improve ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr(). When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr() returns -ENOENT when it's found, not the header as expected. A part of IPVS is broken and possible also nft_exthdr_eval(). When target is -1 which it is most cases, it works. This patch exits the do while loop if the specific header is found so the nexthdr could be returned as expected. Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> CC:Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Edward Cree authored
[ Upstream commit 8f355e5c ] If we receive a PTP event from the NIC when we haven't set up PTP state in the driver, we attempt to read through a NULL pointer efx->ptp_data, triggering a panic. Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Hannes Frederic Sowa authored
[ Upstream commit 916e4cf4 ] Currently we generate a new fragmentation id on UFO segmentation. It is pretty hairy to identify the correct net namespace and dst there. Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst available at all. This causes unreliable or very predictable ipv6 fragmentation id generation while segmentation. Luckily we already have pregenerated the ip6_frag_id in ip6_ufo_append_data and can use it here. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jason Wang authored
[ Upstream commit 0e7ede80 ] We should alloc big buffers also when guest can receive UFO packets to let the big packets fit into guest rx buffer. Fixes 5c516751 (virtio-net: Allow UFO feature to be set and advertised.) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Duan Jiong authored
[ Upstream commit feff9ab2 ] If the neigh table's entries is less than gc_thresh1, the function will return directly, and the reachabletime will not be recompute, so the reachabletime can be guessed. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Eric Dumazet authored
[ Upstream commit f5ddcbbb ] This patch fixes two bugs in fastopen : 1) The tcp_sendmsg(..., @size) argument was ignored. Code was relying on user not fooling the kernel with iovec mismatches 2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5 allocations, which are likely to fail when memory gets fragmented. Fixes: 783237e8 ("net-tcp: Fast Open client - sending SYN-data") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Tested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Fernando Luis Vazquez Cao authored
[ Upstream commit 6671b224 ] Even though only the outer vlan tag can be HW accelerated in the transmission path, in the TUN/TAP driver vlan_features mirrors hw_features, which happens to have the NETIF_F_HW_VLAN_?TAG_TX flags set. Because of this, during packet tranmisssion through a stacked vlan device dev_hard_start_xmit, (incorrectly) assuming that the vlan device supports hardware vlan acceleration, does not add the vlan header to the skb payload and the inner vlan tags are lost (vlan_tci contains the outer vlan tag when userspace reads the packet from the tap device). Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Toshiaki Makita authored
[ Upstream commit 8d0d21f4 ] Even if we create a stacked vlan interface such as veth0.10.20, it sends single tagged frames (tagged with only vid 10). Because vlan_features of a veth interface has the NETIF_F_HW_VLAN_[CTAG/STAG]_TX bits, veth0.10 also has that feature, so dev_hard_start_xmit(veth0.10) doesn't call __vlan_put_tag() and vlan_dev_hard_start_xmit(veth0.10) overwrites vlan_tci. This prevents us from using a combination of 802.1ad and 802.1Q in containers, etc. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 12 Mar, 2014 32 commits
-
-
Waiman Long authored
commit a767f680 upstream. Currently, the ebitmap_node structure has a fixed size of 32 bytes. On a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being used as bitmaps. The overhead ratio is 1/4. On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes are left for bitmap purpose and the overhead ratio is 1/2. With a 3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit() function to be called about 9 million times. The average number of ebitmap_node traversal is about 3.7. This patch increases the size of the ebitmap_node structure to 64 bytes for 64-bit system to keep the overhead ratio at 1/4. This may also improve performance a little bit by making node to node traversal less frequent (< 2) as more bits are available in each node. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Waiman Long authored
commit fee71142 upstream. While running the high_systime workload of the AIM7 benchmark on a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel (with HT on), it was found that a pretty sizable amount of time was spent in the SELinux code. Below was the perf trace of the "perf record -a -s" of a test run at 1500 users: 5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit 1.96% ls [kernel.kallsyms] [k] mls_level_isvalid 1.95% ls [kernel.kallsyms] [k] find_next_bit The ebitmap_get_bit() was the hottest function in the perf-report output. Both the ebitmap_get_bit() and find_next_bit() functions were, in fact, called by mls_level_isvalid(). As a result, the mls_level_isvalid() call consumed 8.95% of the total CPU time of all the 24 virtual CPUs which is quite a lot. The majority of the mls_level_isvalid() function invocations come from the socket creation system call. Looking at the mls_level_isvalid() function, it is checking to see if all the bits set in one of the ebitmap structure are also set in another one as well as the highest set bit is no bigger than the one specified by the given policydb data structure. It is doing it in a bit-by-bit manner. So if the ebitmap structure has many bits set, the iteration loop will be done many times. The current code can be rewritten to use a similar algorithm as the ebitmap_contains() function with an additional check for the highest set bit. The ebitmap_contains() function was extended to cover an optional additional check for the highest set bit, and the mls_level_isvalid() function was modified to call ebitmap_contains(). With that change, the perf trace showed that the used CPU time drop down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the total which is about 100X less than before. 0.07% ls [kernel.kallsyms] [k] ebitmap_contains 0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit 0.01% ls [kernel.kallsyms] [k] mls_level_isvalid 0.01% ls [kernel.kallsyms] [k] find_next_bit The remaining ebitmap_get_bit() and find_next_bit() functions calls are made by other kernel routines as the new mls_level_isvalid() function will not call them anymore. This patch also improves the high_systime AIM7 benchmark result, though the improvement is not as impressive as is suggested by the reduction in CPU time spent in the ebitmap functions. The table below shows the performance change on the 2-socket x86-64 system (with HT on) mentioned above. +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | high_systime | +0.1% | +0.9% | +2.6% | +--------------+---------------+----------------+-----------------+ Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit c78e9363 upstream. It has been reported on very large machines that show_mem is taking almost 5 minutes to display information. This is a serious problem if there is an OOM storm. The bulk of the cost is in show_mem doing a very expensive PFN walk to give us the following information Total RAM: Also available as totalram_pages Highmem pages: Also available as totalhigh_pages Reserved pages: Can be inferred from the zone structure Shared pages: PFN walk required Unshared pages: PFN walk required Quick pages: Per-cpu walk required Only the shared/unshared pages requires a full PFN walk but that information is useless. It is also inaccurate as page pins of unshared pages would be accounted for as shared. Even if the information was accurate, I'm struggling to think how the shared/unshared information could be useful for debugging OOM conditions. Maybe it was useful before rmap existed when reclaiming shared pages was costly but it is less relevant today. The PFN walk could be optimised a bit but why bother as the information is useless. This patch deletes the PFN walker and infers the total RAM, highmem and reserved pages count from struct zone. It omits the shared/unshared page usage on the grounds that it is useless. It also corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to HighMem with respect to lowmem/highmem exhaustion. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jason Baron authored
commit 4ff36ee9 upstream. The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock. That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with commmit 67347fe4 ("epoll: do not take global 'epmutex' for simple topologies"). The acquistion of the ep->mtx for the destination 'ep' was added such that a concurrent EPOLL_CTL_ADD operation would see the correct state of the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links') However, by simply not acquiring the lock, we do not serialize behind the ep->mtx from the add path, and thus may perform a full path check when if we had waited a little longer it may not have been necessary. However, this is a transient state, and performing the full loop checking in this case is not harmful. The important point is that we wouldn't miss doing the full loop checking when required, since EPOLL_CTL_ADD always locks any 'ep's that its operating upon. The reason we don't need to do lock ordering in the add path, is that we are already are holding the global 'epmutex' whenever we do the double lock. Further, the original posting of this patch, which was tested for the intended performance gains, did not perform this additional locking. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jason Baron authored
commit 67347fe4 upstream. When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached directly to a wakeup source, we do not need to take the global 'epmutex', unless the epoll file descriptor is nested. The purpose of taking the 'epmutex' on add is to prevent complex topologies such as loops and deep wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD operations. However, for the simple case of an epoll file descriptor attached directly to a wakeup source (with no nesting), we do not need to hold the 'epmutex'. This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb performance: "On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In addition the benchmark when from scaling well on 10 sockets to scaling well on just over 40 sockets. ... Currently the benchmark stops scaling at around 40-44 sockets but it seems like I found a second unrelated bottleneck." [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc] Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jason Baron authored
commit ae10b2b4 upstream. Nathan Zimmer found that once we get over 10+ cpus, the scalability of SPECjbb falls over due to the contention on the global 'epmutex', which is taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path by using rcu to guard against any concurrent traversals. Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for simple topologies. IE when adding a link from an epoll file descriptor to a wakeup source, where the epoll file descriptor is not nested. This patch (of 2): Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by converting the file->f_ep_links list into an rcu one. In this way, we can traverse the epoll network on the add path in parallel with deletes. Since deletes can't create loops or worse wakeup paths, this is safe. This patch in combination with the patch "epoll: Do not take global 'epmutex' for simple topologies", shows a dramatic performance improvement in scalability for SPECjbb. Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> CC: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jiri Slaby authored
commit 5f01c988 upstream. Consider a kernel crash in a module, simulated the following way: static int my_init(void) { char *map = (void *)0x5; *map = 3; return 0; } module_init(my_init); When we turn off FRAME_POINTERs, the very first instruction in that function causes a BUG. The problem is that we print IP in the BUG report using %pB (from printk_address). And %pB decrements the pointer by one to fix printing addresses of functions with tail calls. This was added in commit 71f9e598 ("x86, dumpstack: Use %pB format specifier for stack trace") to fix the call stack printouts. So instead of correct output: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173] We get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa0152000>] 0xffffffffa0151fff To fix that, we use %pS only for stack addresses printouts (via newly added printk_stack_address) and %pB for regs->ip (via printk_address). I.e. we revert to the old behaviour for all except call stacks. And since from all those reliable is 1, we remove that parameter from printk_address. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: joe@perches.com Cc: jirislaby@gmail.com Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
NeilBrown authored
commit 93dc41bd upstream. We have one report of a crash in xs_tcp_setup_socket. The call path to the crash is: xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested. The 'sock' passed to that last function is NULL. The only way I can see this happening is a concurrent call to xs_close: xs_close -> xs_reset_transport -> sock_release -> inet_release inet_release sets: sock->sk = NULL; inet_stream_connect calls lock_sock(sock->sk); which gets NULL. All calls to xs_close are protected by XPRT_LOCKED as are most activations of the workqueue which runs xs_tcp_setup_socket. The exception is xs_tcp_schedule_linger_timeout. So presumably the timeout queued by the later fires exactly when some other code runs xs_close(). To protect against this we can move the cancel_delayed_work_sync() call from xs_destory() to xs_close(). As xs_close is never called from the worker scheduled on ->connect_worker, this can never deadlock. Signed-off-by: NeilBrown <neilb@suse.de> [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Shawn Bohrer authored
commit 6bfa687c upstream. In 76854c7e ("sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles") an optimization was added to select_task_rq_rt() that immediately returns when p->nr_cpus_allowed == 1 at the beginning of the function. This makes the latter p->nr_cpus_allowed > 1 check redundant, which can now be removed. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: tomk@rgmadvisors.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Peter Zijlstra authored
commit 7c3f2ab7 upstream. While discussing the proposed SCHED_DEADLINE patches which in parts mimic the existing FIFO code it was noticed that the wmb in rt_set_overloaded() didn't have a matching barrier. The only site using rt_overloaded() to test the rto_count is pull_rt_task() and we should issue a matching rmb before then assuming there's an rto_mask bit set. Without that smp_rmb() in there we could actually miss seeing the rto_mask bit. Also, change to using smp_[wr]mb(), even though this is SMP only code; memory barriers without smp_ always make me think they're against hardware of some sort. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: vincent.guittot@linaro.org Cc: luca.abeni@unitn.it Cc: bruce.ashfield@windriver.com Cc: dhaval.giani@gmail.com Cc: rostedt@goodmis.org Cc: hgu1972@gmail.com Cc: oleg@redhat.com Cc: fweisbec@gmail.com Cc: darren@dvhart.com Cc: johan.eker@ericsson.com Cc: p.faure@akatech.ch Cc: paulmck@linux.vnet.ibm.com Cc: raistlin@linux.it Cc: claudio@evidence.eu.com Cc: insop.song@gmail.com Cc: michael@amarulasolutions.com Cc: liming.wang@windriver.com Cc: fchecconi@gmail.com Cc: jkacur@redhat.com Cc: tommaso.cucinotta@sssup.it Cc: Juri Lelli <juri.lelli@gmail.com> Cc: harald.gustafsson@ericsson.com Cc: nicola.manica@disi.unitn.it Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mel Gorman authored
commit 5d4cf996 upstream. Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL dereference on sd_busy but the fix also altered what scheduling domain it used for the 'sd_llc' percpu variable. One impact of this is that a task selecting a runqueue may consider idle CPUs that are not cache siblings as candidates for running. Tasks are then running on CPUs that are not cache hot. This was found through bisection where ebizzy threads were not seeing equal performance and it looked like a scheduling fairness issue. This patch mitigates but does not completely fix the problem on all machines tested implying there may be an additional bug or a common root cause. Here are the average range of performance seen by individual ebizzy threads. It was tested on top of candidate patches related to x86 TLB range flushing. 4-core machine 3.13.0-rc3 3.13.0-rc3 vanilla fixsd-v3r3 Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%) Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%) Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%) Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%) Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%) Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%) Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%) 8-core machine Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%) Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%) Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%) Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%) Mean 6 23.21 ( 0.00%) 69.46 (-199.27%) Mean 7 15.85 ( 0.00%) 101.72 (-541.77%) Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%) Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%) Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%) It's eliminated for one machine and reduced for another. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: H Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Peter Zijlstra authored
commit 8e8339a3 upstream. Yinghai reported that he saw a /0 in sg_capacity on his EX parts. Make sure to always initialize power_orig now that we actually use it. Ideally build_sched_domains() -> init_sched_groups_power() would also initialize this; but for some yet unexplained reason some setups seem to miss updates there. Reported-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Peter Zijlstra authored
commit 42eb088e upstream. Commit 37dc6b50 ("sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some conditions leading to a possible NULL deref in set_cpu_sd_state_idle(). Reported-by: Anton Blanchard <anton@samba.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Preeti U Murthy authored
commit 37dc6b50 upstream. nr_busy_cpus parameter is used by nohz_kick_needed() to find out the number of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES flag set. Therefore instead of updating nr_busy_cpus at every level of sched domain, since it is irrelevant, we can update this parameter only at the parent domain of the sd which has this flag set. Introduce a per-cpu parameter sd_busy which represents this parent domain. In nohz_kick_needed() we directly query the nr_busy_cpus parameter associated with the groups of sd_busy. By associating sd_busy with the highest domain which has SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains which could have this flag set and trigger nohz_idle_balancing if any of the levels have more than one busy cpu. sd_busy is irrelevant for asymmetric load balancing. However sd_asym has been introduced to represent the highest sched domain which has SD_ASYM_PACKING flag set so that it can be queried directly when required. While we are at it, we might as well change the nohz_idle parameter to be updated at the sd_busy domain level alone and not the base domain level of a CPU. This will unify the concept of busy cpus at just one level of sched domain where it is currently used. Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: svaidy@linux.vnet.ibm.com Cc: vincent.guittot@linaro.org Cc: bitbucket@online.de Cc: benh@kernel.crashing.org Cc: anton@samba.org Cc: Morten.Rasmussen@arm.com Cc: pjt@google.com Cc: peterz@infradead.org Cc: mikey@neuling.org Link: http://lkml.kernel.org/r/20131030031252.23426.4417.stgit@preeti.in.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Paul E. McKenney authored
commit c229828c upstream. The rcu_try_advance_all_cbs() function is invoked on each attempted entry to and every exit from idle. If this function determines that there are callbacks ready to invoke, the caller will invoke the RCU core, which in turn will result in a pair of context switches. If a CPU enters and exits idle extremely frequently, this can result in an excessive number of context switches and high CPU overhead. This commit therefore causes rcu_try_advance_all_cbs() to throttle itself, refusing to do work more than once per jiffy. Reported-by: Tibor Billes <tbilles@gmx.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Tibor Billes <tbilles@gmx.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Paul E. McKenney authored
commit c337f8f5 upstream. If a non-lazy callback arrives on a CPU that has previously gone idle with no non-lazy callbacks, invoke_rcu_core() forces the RCU core to run. However, it does not update the conditions, which could result in several closely spaced invocations of the RCU core, which in turn could result in an excessively high context-switch rate and resulting high overhead. This commit therefore updates the ->all_lazy and ->nonlazy_posted_snap fields to prevent closely spaced invocations. Reported-by: Tibor Billes <tbilles@gmx.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Tibor Billes <tbilles@gmx.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Benjamin Herrenschmidt authored
commit 0c4888ef upstream. When restoring the PPR value, we incorrectly access the thread structure at a time where MSR:RI is clear, which means we cannot recover from nested faults. However the thread structure isn't covered by the "bolted" SLB entries and thus accessing can fault. This fixes it by splitting the code so that the PPR value is loaded into a GPR before MSR:RI is cleared. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Vaidyanathan Srinivasan authored
commit 2042abe7 upstream. Asymmetric scheduling within a core is a scheduler loadbalancing feature that is triggered when SD_ASYM_PACKING flag is set. The goal for the load balancer is to move tasks to lower order idle SMT threads within a core on a POWER7 system. In nohz_kick_needed(), we intend to check if our sched domain (core) is completely busy or we have idle cpu. The following check for SD_ASYM_PACKING: (cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) < cpu) already covers the case of checking if the domain has an idle cpu, because cpumask_first_and() will not yield any set bits if this domain has no idle cpu. Hence, nr_busy check against group weight can be removed. Reported-by: Michael Neuling <michael.neuling@au1.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: vincent.guittot@linaro.org Cc: bitbucket@online.de Cc: benh@kernel.crashing.org Cc: anton@samba.org Cc: Morten.Rasmussen@arm.com Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131030031242.23426.13019.stgit@preeti.in.ibm.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Bjorn Helgaas authored
commit fbeeb822 upstream. f41f064c ("PCI: Workaround missing pci_set_master in pci drivers") made pci_enable_bridge() turn on bus mastering if the driver hadn't done so already. It also added a warning in this case. But there's no reason to warn about it unless it's actually a problem to enable bus mastering here. This patch drops the warning because I'm not aware of any such problem. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Gleixner authored
commit d689fe22 upstream. RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-and-tested-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: john.stultz@linaro.org Cc: mwhitehe@redhat.com Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.deSigned-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Gleixner authored
commit 0e576acb upstream. If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ. If CONFIG_NO_HZ=y and the nohz functionality is disabled via the command line option "nohz=off" or not enabled due to missing hardware support, then tick_nohz_get_sleep_length() returns 0. That happens because ts->sleep_length is never set in that case. Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive. Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Linus Torvalds authored
commit 5cdec2d8 upstream. When debugging the read-only hugepage case, I was confused by the fact that get_futex_key() did an access_ok() only for the non-shared futex case, since the user address checking really isn't in any way specific to the private key handling. Now, it turns out that the shared key handling does effectively do the equivalent checks inside get_user_pages_fast() (it doesn't actually check the address range on x86, but does check the page protections for being a user page). So it wasn't actually a bug, but the fact that we treat the address differently for private and shared futexes threw me for a loop. Just move the check up, so that it gets done for both cases. Also, use the 'rw' parameter for the type, even if it doesn't actually matter any more (it's a historical artifact of the old racy i386 "page faults from kernel space don't check write protections"). Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
James Bates authored
commit 55aa42f2 upstream. The dmi_list array is initialized using gnu designated initializers, and therefore may contain fewer explicitly defined entries as there are elements in it. This is because the enum above with M_xyz constants contains more items than the designated initializer. Those elements not explicitly initialized are implicitly set to 0. Now efifb_setup() loops through all these array elements, and performs a strcmp on each item. For non explicitly initialized elements this will be a null pointer: This patch swaps the check order in the if statement, thus checks first whether dmi_list[i].base is null. Signed-off-by: James Bates <james.h.bates@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jan Kara authored
commit a404d557 upstream. Currently each task sends BLK_TN_PROCESS event to the first traced device it interacts with after a new trace is started. When there are several traced devices and the task accesses more devices, this logic can result in BLK_TN_PROCESS being sent several times to some devices while it is never sent to other devices. Thus blkparse doesn't display command name when parsing some blktrace files. Fix the problem by sending BLK_TN_PROCESS event to all traced devices when a task interacts with any of them. Signed-off-by: Jan Kara <jack@suse.cz> Review-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Huang Rui authored
commit 02c123ee upstream. Commit "usb: pci-quirks: refactor AMD quirk to abstract AMD chipset types" introduced a new AMD chipset type to filter AMD platforms with different chipsets. According to a recent thread [1], this patch updates SB800 prefetch routine in AMD PLL quirk. And make it use the new chipset type to represent SB800 generation. [1] http://marc.info/?l=linux-usb&m=138012321616452&w=2Signed-off-by: Huang Rui <ray.huang@amd.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Huang Rui authored
commit 3ad145b6 upstream. Commit "usb: pci-quirks: refactor AMD quirk to abstract AMD chipset types" introduced a new AMD chipset type to filter AMD platforms with different chipsets. According to a recent thread [1], this patch updates USB subsystem hang symptom quirk which is observed on AMD all SB600 and SB700 revision 0x3a/0x3b. And make it use the new chipset type to represent. [1] http://marc.info/?l=linux-usb&m=138012321616452&w=2Signed-off-by: Huang Rui <ray.huang@amd.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Huang Rui authored
commit 22b4f0cd upstream. This patch abstracts out a AMD chipset type which includes southbridge generation and its revision. When os excutes usb_amd_find_chipset_info routine to initialize AMD chipset type, driver will know which kind of chipset is used. This update has below benifits: - Driver is able to confirm which southbridge generations and their revision are used, with chipset detection once. - To describe chipset generations with enumeration types brings better readability. - It's flexible to filter AMD platforms to implement new quirks in future. Signed-off-by: Huang Rui <ray.huang@amd.com> Cc: Andiry Xu <andiry.xu@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
David Henningsson authored
commit da4a7a39 upstream. This should help us avoid the following mutex deadlock: [] mutex_lock+0x2a/0x50 [] hdmi_present_sense+0x53/0x3a0 [snd_hda_codec_hdmi] [] generic_hdmi_resume+0x5a/0x70 [snd_hda_codec_hdmi] [] hda_call_codec_resume+0xec/0x1d0 [snd_hda_codec] [] snd_hda_power_save+0x1e4/0x280 [snd_hda_codec] [] codec_exec_verb+0x5f/0x290 [snd_hda_codec] [] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec] [] snd_hdmi_get_eld_size+0x1e/0x20 [snd_hda_codec_hdmi] [] snd_hdmi_get_eld+0x2c/0xd0 [snd_hda_codec_hdmi] [] hdmi_present_sense+0x9a/0x3a0 [snd_hda_codec_hdmi] [] hdmi_repoll_eld+0x34/0x50 [snd_hda_codec_hdmi] Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit efe47108 upstream. There is a small gap between the jack detection unsolicited event and the time the ELD is updated. When user-space queries the HDMI ELD immediately after receiving the notification, it might fail because of this gap. For avoiding such a problem, this patch tries to delay the HDMI jack detect notification until ELD information is fully updated. The workaround is imperfect, but good enough as a starting point. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Takashi Iwai authored
commit fab1285a upstream. "HDA Intel MID" is no correct name for Haswell HDMI controllers. Give them a better name, "HDA Intel HDMI". Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Clemens Ladisch authored
commit bbaa0d66 upstream. The device IDs of the AMD Cypress/Juniper/Redwood/Cedar/Cayman/Antilles/ Barts/Turks/Caicos HDMI HDA controllers weren't added explicitly because the generic entry works, but it made the device appearing as "Generic", and people are confused as if it's no proper HDMI controller. Add them so that the name shows up properly as "ATI HDMI" instead of "Generic". According to Takashi's tests and the lack of complaints, these devices work fine without disabling snooping. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
James Ralston authored
commit 4eeca499 upstream. This patch adds the HD Audio Device IDs for the Intel Wildcat Point-LP PCH. Signed-off-by: James Ralston <james.d.ralston@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-