1. 15 Oct, 2014 6 commits
    • Guillaume Nault's avatar
      l2tp: fix race while getting PMTU on PPP pseudo-wire · 696c5d5f
      Guillaume Nault authored
      [ Upstream commit eed4d839 ]
      
      Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
      
      The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
      could return NULL if tunnel->sock->sk_dst_cache was reset just before the
      call, thus making dst_mtu() dereference a NULL pointer:
      
      [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
      [ 1937.664005] Oops: 0000 [#1] SMP
      [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
      [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
      [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
      [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
      [ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
      [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
      [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
      [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
      [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
      [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
      [ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
      [ 1937.664005] Stack:
      [ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
      [ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
      [ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
      [ 1937.664005] Call Trace:
      [ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
      [ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
      [ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
      [ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
      [ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
      [ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
      [ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
      [ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
      [ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
      [ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
      [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
      [ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005]  RSP <ffff8800c43c7de8>
      [ 1937.664005] CR2: 0000000000000020
      [ 1939.559375] ---[ end trace 82d44500f28f8708 ]---
      
      Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      696c5d5f
    • Jiri Benc's avatar
      openvswitch: fix panic with multiple vlan headers · 522ad79b
      Jiri Benc authored
      [ Upstream commit 2ba5af42 ]
      
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      522ad79b
    • Eric Dumazet's avatar
      packet: handle too big packets for PACKET_V3 · 4035ed7b
      Eric Dumazet authored
      [ Upstream commit dc808110 ]
      
      af_packet can currently overwrite kernel memory by out of bound
      accesses, because it assumed a [new] block can always hold one frame.
      
      This is not generally the case, even if most existing tools do it right.
      
      This patch clamps too long frames as API permits, and issue a one time
      error on syslog.
      
      [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
      
      In this example, packet header tp_snaplen was set to 3966,
      and tp_len was set to 5042 (skb->len)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4035ed7b
    • Neal Cardwell's avatar
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 5f80f4d8
      Neal Cardwell authored
      [ Upstream commit 4fab9071 ]
      
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f80f4d8
    • Shmulik Ladkani's avatar
      sit: Fix ipip6_tunnel_lookup device matching criteria · f2c58cc4
      Shmulik Ladkani authored
      [ Upstream commit bc8fc7b8 ]
      
      As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
      when looking up a tunnel, tunnel's underlying interface (t->parms.link)
      is verified to match incoming traffic's ingress device.
      
      However the comparison was incorrectly based on skb->dev->iflink.
      
      Instead, dev->ifindex should be used, which correctly represents the
      interface from which the IP stack hands the ipip6 packets.
      
      This allows setting up sit tunnels bound to vlan interfaces (otherwise
      incoming ipip6 traffic on the vlan interface was dropped due to
      ipip6_tunnel_lookup match failure).
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2c58cc4
    • Stanislaw Gruszka's avatar
      myri10ge: check for DMA mapping errors · 5d711774
      Stanislaw Gruszka authored
      [ Upstream commit 10545937 ]
      
      On IOMMU systems DMA mapping can fail, we need to check for
      that possibility.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d711774
  2. 09 Oct, 2014 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.57 · f41c15f2
      Greg Kroah-Hartman authored
      f41c15f2
    • Stratos Karafotis's avatar
      cpufreq: ondemand: Change the calculation of target frequency · bed53965
      Stratos Karafotis authored
      commit dfa5bb62 upstream.
      
      The ondemand governor calculates load in terms of frequency and
      increases it only if load_freq is greater than up_threshold
      multiplied by the current or average frequency.  This appears to
      produce oscillations of frequency between min and max because,
      for example, a relatively small load can easily saturate minimum
      frequency and lead the CPU to the max.  Then, it will decrease
      back to the min due to small load_freq.
      
      Change the calculation method of load and target frequency on the
      basis of the following two observations:
      
       - Load computation should not depend on the current or average
         measured frequency.  For example, absolute load of 80% at 100MHz
         is not necessarily equivalent to 8% at 1000MHz in the next
         sampling interval.
      
       - It should be possible to increase the target frequency to any
         value present in the frequency table proportional to the absolute
         load, rather than to the max only, so that:
      
         Target frequency = C * load
      
         where we take C = policy->cpuinfo.max_freq / 100.
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
      Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
      increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
      that middle frequencies are used more, with this patch.  Highest
      and lowest frequencies were used less by ~9%.
      
      [rjw: We have run multiple other tests on kernels with this
       change applied and in the vast majority of cases it turns out
       that the resulting performance improvement also leads to reduced
       consumption of energy.  The change is additionally justified by
       the overall simplification of the code in question.]
      Signed-off-by: default avatarStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bed53965
    • Andreas Schwab's avatar
      cpufreq: Fix wrong time unit conversion · 35c23914
      Andreas Schwab authored
      commit a857c0b9 upstream.
      
      The time spent by a CPU under a given frequency is stored in jiffies unit
      in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
      the frequency.
      
      This is what is displayed in the following file on the right column:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 19835820
           2300000 3172
           [...]
      
      Now cpufreq converts this jiffies unit delta to clock_t before returning it
      to the user as in the above file. And that conversion is achieved using the API
      cputime64_to_clock_t().
      
      Although it accidentally works on traditional tick based cputime accounting, where
      cputime_t maps directly to jiffies, it doesn't work with other types of cputime
      accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
      or any granularity preffered by the architecture.
      
      For example we get a buggy zero delta on full dyntick configurations:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 0
           2300000 0
           [...]
      
      Fix this with using the proper jiffies_64_t to clock_t conversion.
      Reported-and-tested-by: default avatarCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35c23914
    • Johannes Berg's avatar
      nl80211: clear skb cb before passing to netlink · 7dd31112
      Johannes Berg authored
      commit bd8c78e7 upstream.
      
      In testmode and vendor command reply/event SKBs we use the
      skb cb data to store nl80211 parameters between allocation
      and sending. This causes the code for CONFIG_NETLINK_MMAP
      to get confused, because it takes ownership of the skb cb
      data when the SKB is handed off to netlink, and it doesn't
      explicitly clear it.
      
      Clear the skb cb explicitly when we're done and before it
      gets passed to netlink to avoid this issue.
      Reported-by: default avatarAssaf Azulay <assaf.azulay@intel.com>
      Reported-by: default avatarDavid Spinadel <david.spinadel@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dd31112
    • Lars Ellenberg's avatar
      drbd: fix regression 'out of mem, failed to invoke fence-peer helper' · 6353c97a
      Lars Ellenberg authored
      commit bbc1c5e8 upstream.
      
      Since linux kernel 3.13, kthread_run() internally uses
      wait_for_completion_killable().  We sometimes may use kthread_run()
      while we still have a signal pending, which we used to kick our threads
      out of potentially blocking network functions, causing kthread_run() to
      mistake that as a new fatal signal and fail.
      
      Fix: flush_signals() before kthread_run().
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6353c97a
    • Andrew Hunter's avatar
      jiffies: Fix timeval conversion to jiffies · 00790d45
      Andrew Hunter authored
      commit d78c9300 upstream.
      
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarPaul Turner <pjt@google.com>
      Reported-by: default avatarAaron Jacobs <jacobsa@google.com>
      Signed-off-by: default avatarAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00790d45
    • NeilBrown's avatar
      md/raid5: disable 'DISCARD' by default due to safety concerns. · 06905ff8
      NeilBrown authored
      commit 8e0e99ba upstream.
      
      It has come to my attention (thanks Martin) that 'discard_zeroes_data'
      is only a hint.  Some devices in some cases don't do what it
      says on the label.
      
      The use of DISCARD in RAID5 depends on reads from discarded regions
      being predictably zero.  If a write to a previously discarded region
      performs a read-modify-write cycle it assumes that the parity block
      was consistent with the data blocks.  If all were zero, this would
      be the case.  If some are and some aren't this would not be the case.
      This could lead to data corruption after a device failure when
      data needs to be reconstructed from the parity.
      
      As we cannot trust 'discard_zeroes_data', ignore it by default
      and so disallow DISCARD on all raid4/5/6 arrays.
      
      As many devices are trustworthy, and as there are benefits to using
      DISCARD, add a module parameter to over-ride this caution and cause
      DISCARD to work if discard_zeroes_data is set.
      
      If a site want to enable DISCARD on some arrays but not on others they
      should select DISCARD support at the filesystem level, and set the
      raid456 module parameter.
          raid456.devices_handle_discard_safely=Y
      
      As this is a data-safety issue, I believe this patch is suitable for
      -stable.
      DISCARD support for RAID456 was added in 3.7
      
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Heinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: 620125f2Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      [bwh: Backported to 3.10: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06905ff8
    • Hans Verkuil's avatar
      media: vb2: fix VBI/poll regression · f5d34b7c
      Hans Verkuil authored
      commit 58d75f4b upstream.
      
      The recent conversion of saa7134 to vb2 unconvered a poll() bug that
      broke the teletext applications alevt and mtt. These applications
      expect that calling poll() without having called VIDIOC_STREAMON will
      cause poll() to return POLLERR. That did not happen in vb2.
      
      This patch fixes that behavior. It also fixes what should happen when
      poll() is called when STREAMON is called but no buffers have been
      queued. In that case poll() will also return POLLERR, but only for
      capture queues since output queues will always return POLLOUT
      anyway in that situation.
      
      This brings the vb2 behavior in line with the old videobuf behavior.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5d34b7c
    • Mel Gorman's avatar
      mm: numa: Do not mark PTEs pte_numa when splitting huge pages · f35407ac
      Mel Gorman authored
      commit abc40bd2 upstream.
      
      This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the
      NUMA type from the pmd to the pte"). If a huge page is being split due
      a protection change and the tail will be in a PROT_NONE vma then NUMA
      hinting PTEs are temporarily created in the protected VMA.
      
       VM_RW|VM_PROTNONE
      |-----------------|
            ^
            split here
      
      In the specific case above, it should get fixed up by change_pte_range()
      but there is a window of opportunity for weirdness to happen. Similarly,
      if a huge page is shrunk and split during a protection update but before
      pmd_numa is cleared then a pte_numa can be left behind.
      
      Instead of adding complexity trying to deal with the case, this patch
      will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
      will not be triggered which is marginal in comparison to the complexity
      in dealing with the corner cases during THP split.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f35407ac
    • Waiman Long's avatar
      mm, thp: move invariant bug check out of loop in __split_huge_page_map · 183c062c
      Waiman Long authored
      commit f8303c25 upstream.
      
      In __split_huge_page_map(), the check for page_mapcount(page) is
      invariant within the for loop.  Because of the fact that the macro is
      implemented using atomic_read(), the redundant check cannot be optimized
      away by the compiler leading to unnecessary read to the page structure.
      
      This patch moves the invariant bug check out of the loop so that it will
      be done only once.  On a 3.16-rc1 based kernel, the execution time of a
      microbenchmark that broke up 1000 transparent huge pages using munmap()
      had an execution time of 38,245us and 38,548us with and without the
      patch respectively.  The performance gain is about 1%.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      183c062c
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Fix infinite spin in reading buffer · 78a3db11
      Steven Rostedt (Red Hat) authored
      commit 24607f11 upstream.
      
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78a3db11
    • Josh Triplett's avatar
      init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu · 0000372a
      Josh Triplett authored
      commit 62b4d204 upstream.
      
      commit 03b8c7b6 ("futex: Allow
      architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
      HAVE_FUTEX_CMPXCHG symbol right below FUTEX.  This placed it right in
      the middle of the options for the EXPERT menu.  However,
      HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
      placing items in the EXPERT menu, and displays the remaining several
      EXPERT items (starting with EPOLL) directly in the General Setup menu.
      
      Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
      HAVE_FUTEX_CMPXCHG itself depend on FUTEX.  With this change, the
      subsequent items display as part of the EXPERT menu again; the EMBEDDED
      menu now appears as the next top-level item in the General Setup menu,
      which makes General Setup much shorter and more usable.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0000372a
    • Peter Zijlstra's avatar
      perf: fix perf bug in fork() · bee870fc
      Peter Zijlstra authored
      commit 6c72e350 upstream.
      
      Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
      calling perf_event_free_task() when failing sched_fork() we will not yet
      have done the memset() on ->perf_event_ctxp[] and will therefore try and
      'free' the inherited contexts, which are still in use by the parent
      process.  This is bad..
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bee870fc
    • Jan Kara's avatar
      udf: Avoid infinite loop when processing indirect ICBs · 07d209bd
      Jan Kara authored
      commit c03aa9f6 upstream.
      
      We did not implement any bound on number of indirect ICBs we follow when
      loading inode. Thus corrupted medium could cause kernel to go into an
      infinite loop, possibly causing a stack overflow.
      
      Fix the possible stack overflow by removing recursion from
      __udf_read_inode() and limit number of indirect ICBs we follow to avoid
      infinite loops.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07d209bd
  3. 05 Oct, 2014 20 commits