1. 31 Jul, 2014 7 commits
    • Dave Hansen's avatar
      x86/mm: Set TLB flush tunable to sane value (33) · a5102476
      Dave Hansen authored
      This has been run through Intel's LKP tests across a wide range
      of modern sytems and workloads and it wasn't shown to make a
      measurable performance difference positive or negative.
      
      Now that we have some shiny new tracepoints, we can actually
      figure out what the heck is going on.
      
      During a kernel compile, 60% of the flush_tlb_mm_range() calls
      are for a single page.  It breaks down like this:
      
       size   percent  percent<=
        V        V        V
      GLOBAL:   2.20%   2.20% avg cycles:  2283
           1:  56.92%  59.12% avg cycles:  1276
           2:  13.78%  72.90% avg cycles:  1505
           3:   8.26%  81.16% avg cycles:  1880
           4:   7.41%  88.58% avg cycles:  2447
           5:   1.73%  90.31% avg cycles:  2358
           6:   1.32%  91.63% avg cycles:  2563
           7:   1.14%  92.77% avg cycles:  2862
           8:   0.62%  93.39% avg cycles:  3542
           9:   0.08%  93.47% avg cycles:  3289
          10:   0.43%  93.90% avg cycles:  3570
          11:   0.20%  94.10% avg cycles:  3767
          12:   0.08%  94.18% avg cycles:  3996
          13:   0.03%  94.20% avg cycles:  4077
          14:   0.02%  94.23% avg cycles:  4836
          15:   0.04%  94.26% avg cycles:  5699
          16:   0.06%  94.32% avg cycles:  5041
          17:   0.57%  94.89% avg cycles:  5473
          18:   0.02%  94.91% avg cycles:  5396
          19:   0.03%  94.95% avg cycles:  5296
          20:   0.02%  94.96% avg cycles:  6749
          21:   0.18%  95.14% avg cycles:  6225
          22:   0.01%  95.15% avg cycles:  6393
          23:   0.01%  95.16% avg cycles:  6861
          24:   0.12%  95.28% avg cycles:  6912
          25:   0.05%  95.32% avg cycles:  7190
          26:   0.01%  95.33% avg cycles:  7793
          27:   0.01%  95.34% avg cycles:  7833
          28:   0.01%  95.35% avg cycles:  8253
          29:   0.08%  95.42% avg cycles:  8024
          30:   0.03%  95.45% avg cycles:  9670
          31:   0.01%  95.46% avg cycles:  8949
          32:   0.01%  95.46% avg cycles:  9350
          33:   3.11%  98.57% avg cycles:  8534
          34:   0.02%  98.60% avg cycles: 10977
          35:   0.02%  98.62% avg cycles: 11400
      
      We get in to dimishing returns pretty quickly.  On pre-IvyBridge
      CPUs, we used to set the limit at 8 pages, and it was set at 128
      on IvyBrige.  That 128 number looks pretty silly considering that
      less than 0.5% of the flushes are that large.
      
      The previous code tried to size this number based on the size of
      the TLB.  Good idea, but it's error-prone, needs maintenance
      (which it didn't get up to now), and probably would not matter in
      practice much.
      
      Settting it to 33 means that we cover the mallopt
      M_TRIM_THRESHOLD, which is the most universally common size to do
      flushes.
      
      That's the short version.  Here's the long one for why I chose 33:
      
      1. These numbers have a constant bias in the timestamps from the
         tracing.  Probably counts for a couple hundred cycles in each of
         these tests, but it should be fairly _even_ across all of them.
         The smallest delta between the tracepoints I have ever seen is
         335 cycles.  This is one reason the cycles/page cost goes down in
         general as the flushes get larger.  The true cost is nearer to
         100 cycles.
      2. A full flush is more expensive than a single invlpg, but not
         by much (single percentages).
      3. A dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
         (~34 cycles).  At those rates, refilling the 512-entry dTLB takes
         22,000 cycles.
      4. 22,000 cycles is approximately the equivalent of doing 85
         invlpg operations.  But, the odds are that the TLB can
         actually be filled up faster than that because TLB misses that
         are close in time also tend to leverage the same caches.
      6. ~98% of flushes are <=33 pages.  There are a lot of flushes of
         33 pages, probably because libc's M_TRIM_THRESHOLD is set to
         128k (32 pages)
      7. I've found no consistent data to support changing the IvyBridge
         vs. SandyBridge tunable by a factor of 16
      
      I used the performance counters on this hardware (IvyBridge i5-3320M)
      to figure out the tlb miss costs:
      
      ocperf.py stat -e dtlb_load_misses.walk_duration,dtlb_load_misses.walk_completed,dtlb_store_misses.walk_duration,dtlb_store_misses.walk_completed,itlb_misses.walk_duration,itlb_misses.walk_completed,itlb.itlb_flush
      
           7,720,030,970      dtlb_load_misses_walk_duration                                    [57.13%]
             169,856,353      dtlb_load_misses_walk_completed                                    [57.15%]
             708,832,859      dtlb_store_misses_walk_duration                                    [57.17%]
              19,346,823      dtlb_store_misses_walk_completed                                    [57.17%]
           2,779,687,402      itlb_misses_walk_duration                                    [57.15%]
              82,241,148      itlb_misses_walk_completed                                    [57.13%]
                 770,717      itlb_itlb_flush                                              [57.11%]
      
      Show that a dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
      (~34 cycles).  At those rates, refilling the 512-entry dTLB takes
      22,000 cycles.  On a SandyBridge system with more cores and larger
      caches, those are dtlb=13.4ns and itlb=9.5ns.
      
      cat perf.stat.txt | perl -pe 's/,//g'
      	| awk '/itlb_misses_walk_duration/ { icyc+=$1 }
      		/itlb_misses_walk_completed/ { imiss+=$1 }
      		/dtlb_.*_walk_duration/ { dcyc+=$1 }
      		/dtlb_.*.*completed/ { dmiss+=$1 }
      		END {print "itlb cyc/miss: ", icyc/imiss, " dtlb cyc/miss: ", dcyc/dmiss, "   -----    ", icyc,imiss, dcyc,dmiss }
      
      On Westmere CPUs, the counters to use are: itlb_flush,itlb_misses.walk_cycles,itlb_misses.any,dtlb_misses.walk_cycles,dtlb_misses.any
      
      The assumptions that this code went in under:
      https://lkml.org/lkml/2012/6/12/119 say that a flush and a refill are
      about 100ns.  Being generous, that is over by a factor of 6 on the
      refill side, although it is fairly close on the cost of an invlpg.
      An increase of a single invlpg operation seems to lengthen the flush
      range operation by about 200 cycles.  Here is one example of the data
      collected for flushing 10 and 11 pages (full data are below):
      
          10:   0.43%  93.90% avg cycles:  3570 cycles/page:  357 samples: 4714
          11:   0.20%  94.10% avg cycles:  3767 cycles/page:  342 samples: 2145
      
      How to generate this table:
      
      	echo 10000 > /sys/kernel/debug/tracing/buffer_size_kb
      	echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
      	echo 'reason != 0' > /sys/kernel/debug/tracing/events/tlb/tlb_flush/filter
      	echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
      
      Pipe the trace output in to this script:
      
      	http://sr71.net/~dave/intel/201402-tlb/trace-time-diff-process.pl.txt
      
      Note that these data were gathered with the invlpg threshold set to
      150 pages.  Only data points with >=50 of samples were printed:
      
      Flush    % of     %<=
      in       flush    this
      pages      es     size
      ------------------------------------------------------------------------------
          -1:   2.20%   2.20% avg cycles:  2283 cycles/page: xxxx samples: 23960
           1:  56.92%  59.12% avg cycles:  1276 cycles/page: 1276 samples: 620895
           2:  13.78%  72.90% avg cycles:  1505 cycles/page:  752 samples: 150335
           3:   8.26%  81.16% avg cycles:  1880 cycles/page:  626 samples: 90131
           4:   7.41%  88.58% avg cycles:  2447 cycles/page:  611 samples: 80877
           5:   1.73%  90.31% avg cycles:  2358 cycles/page:  471 samples: 18885
           6:   1.32%  91.63% avg cycles:  2563 cycles/page:  427 samples: 14397
           7:   1.14%  92.77% avg cycles:  2862 cycles/page:  408 samples: 12441
           8:   0.62%  93.39% avg cycles:  3542 cycles/page:  442 samples: 6721
           9:   0.08%  93.47% avg cycles:  3289 cycles/page:  365 samples: 917
          10:   0.43%  93.90% avg cycles:  3570 cycles/page:  357 samples: 4714
          11:   0.20%  94.10% avg cycles:  3767 cycles/page:  342 samples: 2145
          12:   0.08%  94.18% avg cycles:  3996 cycles/page:  333 samples: 864
          13:   0.03%  94.20% avg cycles:  4077 cycles/page:  313 samples: 289
          14:   0.02%  94.23% avg cycles:  4836 cycles/page:  345 samples: 236
          15:   0.04%  94.26% avg cycles:  5699 cycles/page:  379 samples: 390
          16:   0.06%  94.32% avg cycles:  5041 cycles/page:  315 samples: 643
          17:   0.57%  94.89% avg cycles:  5473 cycles/page:  321 samples: 6229
          18:   0.02%  94.91% avg cycles:  5396 cycles/page:  299 samples: 224
          19:   0.03%  94.95% avg cycles:  5296 cycles/page:  278 samples: 367
          20:   0.02%  94.96% avg cycles:  6749 cycles/page:  337 samples: 185
          21:   0.18%  95.14% avg cycles:  6225 cycles/page:  296 samples: 1964
          22:   0.01%  95.15% avg cycles:  6393 cycles/page:  290 samples: 83
          23:   0.01%  95.16% avg cycles:  6861 cycles/page:  298 samples: 61
          24:   0.12%  95.28% avg cycles:  6912 cycles/page:  288 samples: 1307
          25:   0.05%  95.32% avg cycles:  7190 cycles/page:  287 samples: 533
          26:   0.01%  95.33% avg cycles:  7793 cycles/page:  299 samples: 94
          27:   0.01%  95.34% avg cycles:  7833 cycles/page:  290 samples: 66
          28:   0.01%  95.35% avg cycles:  8253 cycles/page:  294 samples: 73
          29:   0.08%  95.42% avg cycles:  8024 cycles/page:  276 samples: 846
          30:   0.03%  95.45% avg cycles:  9670 cycles/page:  322 samples: 296
          31:   0.01%  95.46% avg cycles:  8949 cycles/page:  288 samples: 79
          32:   0.01%  95.46% avg cycles:  9350 cycles/page:  292 samples: 60
          33:   3.11%  98.57% avg cycles:  8534 cycles/page:  258 samples: 33936
          34:   0.02%  98.60% avg cycles: 10977 cycles/page:  322 samples: 268
          35:   0.02%  98.62% avg cycles: 11400 cycles/page:  325 samples: 177
          36:   0.01%  98.63% avg cycles: 11504 cycles/page:  319 samples: 161
          37:   0.02%  98.65% avg cycles: 11596 cycles/page:  313 samples: 182
          38:   0.02%  98.66% avg cycles: 11850 cycles/page:  311 samples: 195
          39:   0.01%  98.68% avg cycles: 12158 cycles/page:  311 samples: 128
          40:   0.01%  98.68% avg cycles: 11626 cycles/page:  290 samples: 78
          41:   0.04%  98.73% avg cycles: 11435 cycles/page:  278 samples: 477
          42:   0.01%  98.73% avg cycles: 12571 cycles/page:  299 samples: 74
          43:   0.01%  98.74% avg cycles: 12562 cycles/page:  292 samples: 78
          44:   0.01%  98.75% avg cycles: 12991 cycles/page:  295 samples: 108
          45:   0.01%  98.76% avg cycles: 13169 cycles/page:  292 samples: 78
          46:   0.02%  98.78% avg cycles: 12891 cycles/page:  280 samples: 261
          47:   0.01%  98.79% avg cycles: 13099 cycles/page:  278 samples: 67
          48:   0.01%  98.80% avg cycles: 13851 cycles/page:  288 samples: 77
          49:   0.01%  98.80% avg cycles: 13749 cycles/page:  280 samples: 66
          50:   0.01%  98.81% avg cycles: 13949 cycles/page:  278 samples: 73
          52:   0.00%  98.82% avg cycles: 14243 cycles/page:  273 samples: 52
          54:   0.01%  98.83% avg cycles: 15312 cycles/page:  283 samples: 87
          55:   0.01%  98.84% avg cycles: 15197 cycles/page:  276 samples: 109
          56:   0.02%  98.86% avg cycles: 15234 cycles/page:  272 samples: 208
          57:   0.00%  98.86% avg cycles: 14888 cycles/page:  261 samples: 53
          58:   0.01%  98.87% avg cycles: 15037 cycles/page:  259 samples: 59
          59:   0.01%  98.87% avg cycles: 15752 cycles/page:  266 samples: 63
          62:   0.00%  98.89% avg cycles: 16222 cycles/page:  261 samples: 54
          64:   0.02%  98.91% avg cycles: 17179 cycles/page:  268 samples: 248
          65:   0.12%  99.03% avg cycles: 18762 cycles/page:  288 samples: 1324
          85:   0.00%  99.10% avg cycles: 21649 cycles/page:  254 samples: 50
         127:   0.01%  99.18% avg cycles: 32397 cycles/page:  255 samples: 75
         128:   0.13%  99.31% avg cycles: 31711 cycles/page:  247 samples: 1466
         129:   0.18%  99.49% avg cycles: 33017 cycles/page:  255 samples: 1927
         181:   0.33%  99.84% avg cycles:  2489 cycles/page:   13 samples: 3547
         256:   0.05%  99.91% avg cycles:  2305 cycles/page:    9 samples: 550
         512:   0.03%  99.95% avg cycles:  2133 cycles/page:    4 samples: 304
        1512:   0.01%  99.99% avg cycles:  3038 cycles/page:    2 samples: 65
      
      Here are the tlb counters during a 10-second slice of a kernel compile
      for a SandyBridge system.  It's better than IvyBridge, but probably
      due to the larger caches since this was one of the 'X' extreme parts.
      
          10,873,007,282      dtlb_load_misses_walk_duration
             250,711,333      dtlb_load_misses_walk_completed
           1,212,395,865      dtlb_store_misses_walk_duration
              31,615,772      dtlb_store_misses_walk_completed
           5,091,010,274      itlb_misses_walk_duration
             163,193,511      itlb_misses_walk_completed
               1,321,980      itlb_itlb_flush
      
            10.008045158 seconds time elapsed
      
      # cat perf.stat.1392743721.txt | perl -pe 's/,//g' | awk '/itlb_misses_walk_duration/ { icyc+=$1 } /itlb_misses_walk_completed/ { imiss+=$1 } /dtlb_.*_walk_duration/ { dcyc+=$1 } /dtlb_.*.*completed/ { dmiss+=$1 } END {print "itlb cyc/miss: ", icyc/imiss/3.3, " dtlb cyc/miss: ", dcyc/dmiss/3.3, "   -----    ", icyc,imiss, dcyc,dmiss }'
      itlb ns/miss:  9.45338  dtlb ns/miss:  12.9716
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154103.10C1115E@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a5102476
    • Dave Hansen's avatar
      x86/mm: New tunable for single vs full TLB flush · 2d040a1c
      Dave Hansen authored
      Most of the logic here is in the documentation file.  Please take
      a look at it.
      
      I know we've come full-circle here back to a tunable, but this
      new one is *WAY* simpler.  I challenge anyone to describe in one
      sentence how the old one worked.  Here's the way the new one
      works:
      
      	If we are flushing more pages than the ceiling, we use
      	the full flush, otherwise we use per-page flushes.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154101.12B52CAF@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      2d040a1c
    • Dave Hansen's avatar
      x86/mm: Add tracepoints for TLB flushes · d17d8f9d
      Dave Hansen authored
      We don't have any good way to figure out what kinds of flushes
      are being attempted.  Right now, we can try to use the vm
      counters, but those only tell us what we actually did with the
      hardware (one-by-one vs full) and don't tell us what was actually
      _requested_.
      
      This allows us to select out "interesting" TLB flushes that we
      might want to optimize (like the ranged ones) and ignore the ones
      that we have very little control over (the ones at context
      switch).
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      d17d8f9d
    • Dave Hansen's avatar
      x86/mm: Unify remote INVLPG code · a23421f1
      Dave Hansen authored
      There are currently three paths through the remote flush code:
      
      1. full invalidation
      2. single page invalidation using invlpg
      3. ranged invalidation using invlpg
      
      This takes 2 and 3 and combines them in to a single path by
      making the single-page one just be the start and end be start
      plus a single page.  This makes placement of our tracepoint easier.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154058.E0F90408@viggo.jf.intel.com
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a23421f1
    • Dave Hansen's avatar
      x86/mm: Fix missed global TLB flush stat · 9dfa6dee
      Dave Hansen authored
      If we take the
      
      	if (end == TLB_FLUSH_ALL || vmflag & VM_HUGETLB) {
      		local_flush_tlb();
      		goto out;
      	}
      
      path out of flush_tlb_mm_range(), we will have flushed the tlb,
      but not incremented NR_TLB_LOCAL_FLUSH_ALL.  This unifies the
      way out of the function so that we always take a single path when
      doing a full tlb flush.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154056.FF763B76@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      9dfa6dee
    • Dave Hansen's avatar
      x86/mm: Rip out complicated, out-of-date, buggy TLB flushing · e9f4e0a9
      Dave Hansen authored
      I think the flush_tlb_mm_range() code that tries to tune the
      flush sizes based on the CPU needs to get ripped out for
      several reasons:
      
      1. It is obviously buggy.  It uses mm->total_vm to judge the
         task's footprint in the TLB.  It should certainly be using
         some measure of RSS, *NOT* ->total_vm since only resident
         memory can populate the TLB.
      2. Haswell, and several other CPUs are missing from the
         intel_tlb_flushall_shift_set() function.  Thus, it has been
         demonstrated to bitrot quickly in practice.
      3. It is plain wrong in my vm:
      	[    0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
      	[    0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
      	[    0.037444] tlb_flushall_shift: 6
         Which leads to it to never use invlpg.
      4. The assumptions about TLB refill costs are wrong:
      	http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
          (more on this in later patches)
      5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
         I believe the sample times were too short.  Running the
         benchmark in a loop yields times that vary quite a bit.
      
      Note that this leaves us with a static ceiling of 1 page.  This
      is a conservative, dumb setting, and will be revised in a later
      patch.
      
      This also removes the code which attempts to predict whether we
      are flushing data or instructions.  We expect instruction flushes
      to be relatively rare and not worth tuning for explicitly.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      e9f4e0a9
    • Dave Hansen's avatar
      x86/mm: Clean up the TLB flushing code · 4995ab9c
      Dave Hansen authored
      The
      
      	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
      
      line of code is not exactly the easiest to audit, especially when
      it ends up at two different indentation levels.  This eliminates
      one of the the copy-n-paste versions.  It also gives us a unified
      exit point for each path through this function.  We need this in
      a minute for our tracepoint.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140731154054.44F1CDDC@viggo.jf.intel.comAcked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      4995ab9c
  2. 12 Jun, 2014 1 commit
  3. 08 Jun, 2014 8 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ed81e780
      Linus Torvalds authored
      Pull x86 vdso build fix from Peter Anvin:
       "This fixes building the vdso code on older Linux systems, and probably
        some non-Linux systems"
      
      * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, vdso: Use <tools/le_byteshift.h> for littleendian access
      ed81e780
    • Linus Torvalds's avatar
      Merge branch 'next' (accumulated 3.16 merge window patches) into master · 3f17ea6d
      Linus Torvalds authored
      Now that 3.15 is released, this merges the 'next' branch into 'master',
      bringing us to the normal situation where my 'master' branch is the
      merge window.
      
      * accumulated work in next: (6809 commits)
        ufs: sb mutex merge + mutex_destroy
        powerpc: update comments for generic idle conversion
        cris: update comments for generic idle conversion
        idle: remove cpu_idle() forward declarations
        nbd: zero from and len fields in NBD_CMD_DISCONNECT.
        mm: convert some level-less printks to pr_*
        MAINTAINERS: adi-buildroot-devel is moderated
        MAINTAINERS: add linux-api for review of API/ABI changes
        mm/kmemleak-test.c: use pr_fmt for logging
        fs/dlm/debug_fs.c: replace seq_printf by seq_puts
        fs/dlm/lockspace.c: convert simple_str to kstr
        fs/dlm/config.c: convert simple_str to kstr
        mm: mark remap_file_pages() syscall as deprecated
        mm: memcontrol: remove unnecessary memcg argument from soft limit functions
        mm: memcontrol: clean up memcg zoneinfo lookup
        mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
        mm/mempool.c: update the kmemleak stack trace for mempool allocations
        lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
        mm: introduce kmemleak_update_trace()
        mm/kmemleak.c: use %u to print ->checksum
        ...
      3f17ea6d
    • Linus Torvalds's avatar
      Linux 3.15 · 1860e379
      Linus Torvalds authored
      1860e379
    • Linus Torvalds's avatar
      Revert "x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" · bb077d60
      Linus Torvalds authored
      This reverts commit 3e1a878b.
      
      It came in very late, and already has one reported failure: Sitsofe
      reports that the current tree fails to boot on his EeePC, and bisected
      it down to this.  Rather than waste time trying to figure out what's
      wrong, just revert it.
      Reported-by: default avatarSitsofe Wheeler <sitsofe@gmail.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb077d60
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus-3.16' of git://git.linaro.org/people/mike.turquette/linux into next · 1a5700bc
      Linus Torvalds authored
      Pull clock framework updates from Mike Turquette:
       "The clock framework changes for 3.16 are pretty typical: mostly clock
        driver additions and fixes.  There are additions to the clock core
        code for some of the basic types (e.g. the common divider type has
        some fixes and featured added to it).
      
        One minor annoyance is a last-minute dependency that wasn't handled
        quite right.  Commit ba0fae3b ("clk: berlin: add core clock driver
        for BG2/BG2CD") in this pull request depends on
        include/dt-bindings/clock/berlin2.h, which is already in your tree via
        the arm-soc pull request.  Building for the berlin platform will break
        when the clk tree is built on it's own, but merged into your master
        branch everything should be fine"
      
      * tag 'clk-for-linus-3.16' of git://git.linaro.org/people/mike.turquette/linux: (75 commits)
        mmc: sunxi: Add driver for SD/MMC hosts found on Allwinner sunxi SoCs
        clk: export __clk_round_rate for providers
        clk: versatile: free icst on error return
        clk: qcom: Return error pointers for unimplemented clocks
        clk: qcom: Support msm8974pro global clock control hardware
        clk: qcom: Properly support display clocks on msm8974
        clk: qcom: Support display RCG clocks
        clk: qcom: Return highest rate when round_rate() exceeds plan
        clk: qcom: Fix mmcc-8974's PLL configurations
        clk: qcom: Fix clk_rcg2_is_enabled() check
        clk: berlin: add core clock driver for BG2Q
        clk: berlin: add core clock driver for BG2/BG2CD
        clk: berlin: add driver for BG2x complex divider cells
        clk: berlin: add driver for BG2x simple PLLs
        clk: berlin: add driver for BG2x audio/video PLL
        clk: st: Terminate of match table
        clk/exynos4: Fix compilation warning
        ARM: shmobile: r8a7779: Add clock index macros for DT sources
        clk: divider: Fix overflow in clk_divider_bestdiv
        clk: u300: Terminate of match table
        ...
      1a5700bc
    • Linus Torvalds's avatar
      Merge tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio into next · a68a7509
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
       "A handful of VFIO bug fixes for v3.16"
      
      * tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio:
        drivers/vfio/pci: Fix wrong MSI interrupt count
        drivers/vfio: Rework offsetofend()
        vfio/iommu_type1: Avoid overflow
        vfio/pci: Fix unchecked return value
        vfio/pci: Fix sizing of DPA and THP express capabilities
      a68a7509
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6 into next · 639b4ac6
      Linus Torvalds authored
      Pull crypto updates from Herbert Xu:
       "Here is the crypto update for 3.16:
      
         - Added test vectors for SHA/AES-CCM/DES-CBC/3DES-CBC.
         - Fixed a number of error-path memory leaks in tcrypt.
         - Fixed error-path memory leak in caam.
         - Removed unnecessary global mutex from mxs-dcp.
         - Added ahash walk interface that can actually be asynchronous.
         - Cleaned up caam error reporting.
         - Allow crypto_user get operation to be used by non-root users.
         - Add support for SSS module on Exynos.
         - Misc fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6: (60 commits)
        crypto: testmgr - add aead cbc des, des3_ede tests
        crypto: testmgr - Fix DMA-API warning
        crypto: cesa - tfm->__crt_alg->cra_type directly
        crypto: sahara - tfm->__crt_alg->cra_name directly
        crypto: padlock - tfm->__crt_alg->cra_name directly
        crypto: n2 - tfm->__crt_alg->cra_name directly
        crypto: dcp - tfm->__crt_alg->cra_name directly
        crypto: cesa - tfm->__crt_alg->cra_name directly
        crypto: ccp - tfm->__crt_alg->cra_name directly
        crypto: geode - Don't use tfm->__crt_alg->cra_name directly
        crypto: geode - Weed out printk() from probe()
        crypto: geode - Consistently use AES_KEYSIZE_128
        crypto: geode - Kill AES_IV_LENGTH
        crypto: geode - Kill AES_MIN_BLOCK_SIZE
        crypto: mxs-dcp - Remove global mutex
        crypto: hash - Add real ahash walk interface
        hwrng: n2-drv - Introduce the use of the managed version of kzalloc
        crypto: caam - reinitialize keys_fit_inline for decrypt and givencrypt
        crypto: s5p-sss - fix multiplatform build
        hwrng: timeriomem - remove unnecessary OOM messages
        ...
      639b4ac6
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd into next · 9d2cd01b
      Linus Torvalds authored
      Pull exofs raid6 support from Boaz Harrosh:
       "These simple patches will enable raid6 using the kernel's raid6_pq
        engine for support under exofs and pnfs-objects.
      
        There is nothing needed to do at exofs and pnfs-obj.  Just fire your
        mkfs.exofs with --raid=6 (that was already supported before) and off
        you go as usual.  The ORE will pick up the new map and will start
        writing two devices of redundancy bits.  The patches are so simple
        because most of the ORE was already for the general raid case, only a
        few bug fixes were needed and the actual wiring into the raid6_pq
        engine"
      
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        ore: Support for raid 6
        ore: Remove redundant dev_order(), more cleanups
        ore: (trivial) reformat some code
      9d2cd01b
  4. 07 Jun, 2014 4 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · c593e897
      Linus Torvalds authored
      Pull btrfs fix from Chris Mason:
       "I had this in my 3.16 merge window queue, but it is small and obvious
        enough for 3.15.  I cherry-picked and retested against current rc8"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: send, fix corrupted path strings for long paths
      c593e897
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 052e5c7e
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the remaining fixes for v3.15.
      
        This series includes:
      
         - iser-target fix for ImmediateData exception reference count bug
           (Sagi + nab)
         - iscsi-target fix for MC/S login + potential iser-target MRDSL
           buffer overrun (Santosh + Roland)
         - iser-target fix for v3.15-rc multi network portal shutdown
           regression (nab)
         - target fix for allowing READ_CAPCITY during ALUA Standby access
           state (Chris + nab)
         - target fix for NULL pointer dereference of alua_access_state for
           un-configured devices (Chris + nab)"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix alua_access_state attribute OOPs for un-configured devices
        target: Allow READ_CAPACITY opcode in ALUA Standby access state
        iser-target: Fix multi network portal shutdown regression
        iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()
        iser-target: Add missing target_put_sess_cmd for ImmedateData failure
      052e5c7e
    • Linus Torvalds's avatar
      Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 813895f8
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "A significantly larger than I'd like set of patches for just below the
        wire.  All of these, however, fix real problems.
      
        The one thing that is genuinely scary in here is the change of SMP
        initialization, but that *does* fix a confirmed hang when booting
        virtual machines.
      
        There is also a patch to actually do the right thing about not
        offlining a CPU when there are not enough interrupt vectors available
        in the system; the accounting was done incorrectly.  The worst case
        for that patch is that we fail to offline CPUs when we should (the new
        code is strictly more conservative than the old), so is not
        particularly risky.
      
        Most of the rest is minor stuff; the EFI patches are all about
        exporting correct information to boot loaders and kexec"
      
      * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot: EFI_MIXED should not prohibit loading above 4G
        x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
        x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
        x86: Fix list/memory corruption on CPU hotplug
        x86: irq: Get correct available vectors for cpu disable
        x86/efi: Do not export efi runtime map in case old map
        x86/efi: earlyprintk=efi,keep fix
      813895f8
    • Matt Fleming's avatar
      x86/boot: EFI_MIXED should not prohibit loading above 4G · 745c5167
      Matt Fleming authored
      commit 7d453eee ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
      regression for the functionality to load kernels above 4G. The relevant
      (incorrect) reasoning behind this change can be seen in the commit
      message,
      
        "The xloadflags field in the bzImage header is also updated to reflect
        that the kernel supports both entry points by setting both of
        XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
        XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
        guaranteed to be addressable with 32-bits."
      
      This is obviously bogus since 32-bit EFI loaders will never place the
      kernel above the 4G mark. So this restriction is entirely unnecessary.
      
      But things are worse than that - since we want to encourage people to
      always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
      the box for both 32-bit and 64-bit firmware, commit 7d453eee
      effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.
      
      Remove the overzealous and superfluous restriction and restore the
      XLF_CAN_BE_LOADED_ABOVE_4G functionality.
      
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.orgSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      745c5167
  5. 06 Jun, 2014 20 commits