1. 21 Jul, 2017 8 commits
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5a77f025
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "A cputime fix and code comments/organization fix to the deadline
        scheduler"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/deadline: Fix confusing comments about selection of top pi-waiter
        sched/cputime: Don't use smp_processor_id() in preemptible context
      5a77f025
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bbcdea65
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Two hw-enablement patches, two race fixes, three fixes for regressions
        of semantics, plus a number of tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Add proper condition to run sched_task callbacks
        perf/core: Fix locking for children siblings group read
        perf/core: Fix scheduling regression of pinned groups
        perf/x86/intel: Fix debug_store reset field for freq events
        perf/x86/intel: Add Goldmont Plus CPU PMU support
        perf/x86/intel: Enable C-state residency events for Apollo Lake
        perf symbols: Accept zero as the kernel base address
        Revert "perf/core: Drop kernel samples even though :u is specified"
        perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
        perf evsel: State in the default event name if attr.exclude_kernel is set
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
      bbcdea65
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b810a3a
      Linus Torvalds authored
      Pull locking fixlet from Ingo Molnar:
       "Remove an unnecessary priority adjustment in the rtmutex code"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rtmutex: Remove unnecessary priority adjustment
      8b810a3a
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 34eddefe
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "A resume_irq() fix, plus a number of static declaration fixes"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/digicolor: Drop unnecessary static
        irqchip/mips-cpu: Drop unnecessary static
        irqchip/gic/realview: Drop unnecessary static
        irqchip/mips-gic: Remove population of irq domain names
        genirq/PM: Properly pretend disabled state when force resuming interrupts
      34eddefe
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a6109fd
      Linus Torvalds authored
      Pull core fixes from Ingo Molnar:
       "A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debug: Fix WARN_ON_ONCE() for modules
        MAINTAINERS: Update the PTRACE entry
      0a6109fd
    • Jiri Olsa's avatar
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa authored
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      df6c3db8
    • Jiri Olsa's avatar
      perf/core: Fix locking for children siblings group read · 2aeb1883
      Jiri Olsa authored
      We're missing ctx lock when iterating children siblings
      within the perf_read path for group reading. Following
      race and crash can happen:
      
      User space doing read syscall on event group leader:
      
      T1:
        perf_read
          lock event->ctx->mutex
          perf_read_group
            lock leader->child_mutex
            __perf_read_group_add(child)
              list_for_each_entry(sub, &leader->sibling_list, group_entry)
      
      ---->   sub might be invalid at this point, because it could
              get removed via perf_event_exit_task_context in T2
      
      Child exiting and cleaning up its events:
      
      T2:
        perf_event_exit_task_context
          lock ctx->mutex
          list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
            perf_event_exit_event(child)
              lock ctx->lock
              perf_group_detach(child)
              unlock ctx->lock
      
      ---->   child is removed from sibling_list without any sync
              with T1 path above
      
              ...
              free_event(child)
      
      Before the child is removed from the leader's child_list,
      (and thus is omitted from perf_read_group processing), we
      need to ensure that perf_read_group touches child's
      siblings under its ctx->lock.
      
      Peter further notes:
      
      | One additional note; this bug got exposed by commit:
      |
      |   ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      |
      | which made it possible to actually trigger this code-path.
      Tested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2aeb1883
    • Arnd Bergmann's avatar
      ide: avoid warning for timings calculation · 921edf31
      Arnd Bergmann authored
      gcc-7 warns about the result of a constant multiplication used as
      a boolean:
      
      drivers/ide/ide-timings.c: In function 'ide_timing_quantize':
      drivers/ide/ide-timings.c:112:24: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
        q->setup   = EZ(t->setup   * 1000,  T);
      
      This slightly rearranges the macro to simplify the code and avoid
      the warning at the same time.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      921edf31
  2. 20 Jul, 2017 28 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 96080f69
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) BPF verifier signed/unsigned value tracking fix, from Daniel
          Borkmann, Edward Cree, and Josef Bacik.
      
       2) Fix memory allocation length when setting up calls to
          ->ndo_set_mac_address, from Cong Wang.
      
       3) Add a new cxgb4 device ID, from Ganesh Goudar.
      
       4) Fix FIB refcount handling, we have to set it's initial value before
          the configure callback (which can bump it). From David Ahern.
      
       5) Fix double-free in qcom/emac driver, from Timur Tabi.
      
       6) A bunch of gcc-7 string format overflow warning fixes from Arnd
          Bergmann.
      
       7) Fix link level headroom tests in ip_do_fragment(), from Vasily
          Averin.
      
       8) Fix chunk walking in SCTP when iterating over error and parameter
          headers. From Alexander Potapenko.
      
       9) TCP BBR congestion control fixes from Neal Cardwell.
      
      10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.
      
      11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
          Wang.
      
      12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
          from Gao Feng.
      
      13) Cannot release skb->dst in UDP if IP options processing needs it.
          From Paolo Abeni.
      
      14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
          Levin and myself.
      
      15) Revert some rtnetlink notification changes that are causing
          regressions, from David Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
        net: bonding: Fix transmit load balancing in balance-alb mode
        rds: Make sure updates to cp_send_gen can be observed
        net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
        ipv4: initialize fib_trie prior to register_netdev_notifier call.
        rtnetlink: allocate more memory for dev_set_mac_address()
        net: dsa: b53: Add missing ARL entries for BCM53125
        bpf: more tests for mixed signed and unsigned bounds checks
        bpf: add test for mixed signed and unsigned bounds checks
        bpf: fix up test cases with mixed signed/unsigned bounds
        bpf: allow to specify log level and reduce it for test_verifier
        bpf: fix mixed signed/unsigned derived min/max value bounds
        ipv6: avoid overflow of offset in ip6_find_1stfragopt
        net: tehuti: don't process data if it has not been copied from userspace
        Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
        net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
        dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
        NET: dwmac: Make dwmac reset unconditional
        net: Zero terminate ifr_name in dev_ifname().
        wireless: wext: terminate ifr name coming from userspace
        netfilter: fix netfilter_net_init() return
        ...
      96080f69
    • Kosuke Tatsukawa's avatar
      net: bonding: Fix transmit load balancing in balance-alb mode · cbf5ecb3
      Kosuke Tatsukawa authored
      balance-alb mode used to have transmit dynamic load balancing feature
      enabled by default.  However, transmit dynamic load balancing no longer
      works in balance-alb after commit 8b426dc5 ("bonding: remove
      hardcoded value").
      
      Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
      send packets.  This function uses the parameter tlb_dynamic_lb.
      tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
      now the value is set to 0 except in balance-tlb.
      
      Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
      for balance-alb similar to balance-tlb.
      
      Fixes: 8b426dc5 ("bonding: remove hardcoded value")
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Acked-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbf5ecb3
    • Håkon Bugge's avatar
      rds: Make sure updates to cp_send_gen can be observed · e623a48e
      Håkon Bugge authored
      cp->cp_send_gen is treated as a normal variable, although it may be
      used by different threads.
      
      This is fixed by using {READ,WRITE}_ONCE when it is incremented and
      READ_ONCE when it is read outside the {acquire,release}_in_xmit
      protection.
      
      Normative reference from the Linux-Kernel Memory Model:
      
          Loads from and stores to shared (but non-atomic) variables should
          be protected with the READ_ONCE(), WRITE_ONCE(), and
          ACCESS_ONCE().
      
      Clause 5.1.2.4/25 in the C standard is also relevant.
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: default avatarKnut Omang <knut.omang@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e623a48e
    • Keerthy's avatar
      net: ethernet: ti: cpsw: Push the request_irq function to the end of probe · 070f9c65
      Keerthy authored
      Push the request_irq function to the end of probe so as
      to ensure all the required fields are populated in the event
      of an ISR getting executed right after requesting the irq.
      
      Currently while loading the crash kernel a crash was seen as
      soon as devm_request_threaded_irq was called. This was due to
      n->poll being NULL which is called as part of net_rx_action
      function.
      Suggested-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarKeerthy <j-keerthy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      070f9c65
    • Mahesh Bandewar's avatar
      ipv4: initialize fib_trie prior to register_netdev_notifier call. · 8799a221
      Mahesh Bandewar authored
      Net stack initialization currently initializes fib-trie after the
      first call to netdevice_notifier() call. In fact fib_trie initialization
      needs to happen before first rtnl_register(). It does not cause any problem
      since there are no devices UP at this moment, but trying to bring 'lo'
      UP at initialization would make this assumption wrong and exposes the issue.
      
      Fixes following crash
      
       Call Trace:
        ? alternate_node_alloc+0x76/0xa0
        fib_table_insert+0x1b7/0x4b0
        fib_magic.isra.17+0xea/0x120
        fib_add_ifaddr+0x7b/0x190
        fib_netdev_event+0xc0/0x130
        register_netdevice_notifier+0x1c1/0x1d0
        ip_fib_init+0x72/0x85
        ip_rt_init+0x187/0x1e9
        ip_init+0xe/0x1a
        inet_init+0x171/0x26c
        ? ipv4_offload_init+0x66/0x66
        do_one_initcall+0x43/0x160
        kernel_init_freeable+0x191/0x219
        ? rest_init+0x80/0x80
        kernel_init+0xe/0x150
        ret_from_fork+0x22/0x30
       Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
       RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
       CR2: 0000000000000014
      
      Fixes: 7b1a74fd ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
      Fixes: 7f9b8052 ("[IPV4]: fib hash|trie initialization")
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8799a221
    • WANG Cong's avatar
      rtnetlink: allocate more memory for dev_set_mac_address() · 153711f9
      WANG Cong authored
      virtnet_set_mac_address() interprets mac address as struct
      sockaddr, but upper layer only allocates dev->addr_len
      which is ETH_ALEN + sizeof(sa_family_t) in this case.
      
      We lack a unified definition for mac address, so just fix
      the upper layer, this also allows drivers to interpret it
      to struct sockaddr freely.
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      153711f9
    • Florian Fainelli's avatar
      net: dsa: b53: Add missing ARL entries for BCM53125 · be35e8c5
      Florian Fainelli authored
      The BCM53125 entry was missing an arl_entries member which would
      basically prevent the ARL search from terminating properly. This switch
      has 4 ARL entries, so add that.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be35e8c5
    • David S. Miller's avatar
      Merge branch 'BPF-map-value-adjust-fix' · 5067f4cf
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF map value adjust fix
      
      First patch in the series is the actual fix and the remaining
      patches are just updates to selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5067f4cf
    • Daniel Borkmann's avatar
      bpf: more tests for mixed signed and unsigned bounds checks · 86412502
      Daniel Borkmann authored
      Add a couple of more test cases to BPF selftests that are related
      to mixed signed and unsigned checks.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86412502
    • Edward Cree's avatar
      bpf: add test for mixed signed and unsigned bounds checks · b712296a
      Edward Cree authored
      These failed due to a bug in verifier bounds handling.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b712296a
    • Daniel Borkmann's avatar
      bpf: fix up test cases with mixed signed/unsigned bounds · a1502132
      Daniel Borkmann authored
      Fix the few existing test cases that used mixed signed/unsigned
      bounds and switch them only to one flavor. Reason why we need this
      is that proper boundaries cannot be derived from mixed tests.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1502132
    • Daniel Borkmann's avatar
      bpf: allow to specify log level and reduce it for test_verifier · d6554904
      Daniel Borkmann authored
      For the test_verifier case, it's quite hard to parse log level 2 to
      figure out what's causing an issue when used to log level 1. We do
      want to use bpf_verify_program() in order to simulate some of the
      tests with strict alignment. So just add an argument to pass the level
      and put it to 1 for test_verifier.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6554904
    • Daniel Borkmann's avatar
      bpf: fix mixed signed/unsigned derived min/max value bounds · 4cabc5b1
      Daniel Borkmann authored
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Reported-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cabc5b1
    • Linus Torvalds's avatar
      Merge tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 63a86362
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These are two stable-candidate fixes for the intel_pstate driver and
        the generic power domains (genpd) framework.
      
        Specifics:
      
         - Fix the average CPU load computations in the intel_pstate driver on
           Knights Landing (Xeon Phi) processors that require an extra factor
           to compensate for a rate change differences between the TSC and
           MPERF which is missing (Srinivas Pandruvada).
      
         - Fix an initialization ordering issue in the generic power domains
           (genpd) framework (Sudeep Holla)"
      
      * tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
        cpufreq: intel_pstate: Correct the busy calculation for KNL
      63a86362
    • Linus Torvalds's avatar
      x86: mark kprobe templates as character arrays, not single characters · 54a7d50b
      Linus Torvalds authored
      They really are, and the "take the address of a single character" makes
      the string fortification code unhappy (it believes that you can now only
      acccess one byte, rather than a byte range, and then raises errors for
      the memory copies going on in there).
      
      We could now remove a few 'addressof' operators (since arrays naturally
      degrade to pointers), but this is the minimal patch that just changes
      the C prototypes of those template arrays (the templates themselves are
      defined in inline asm).
      Reported-by: default avatarkernel test robot <xiaolong.ye@intel.com>
      Acked-and-tested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54a7d50b
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 791f2df3
      Linus Torvalds authored
      Pull misc filesystem fixes from Jan Kara:
       "Several ACL related fixes for ext2, reiserfs, and hfsplus.
      
        And also one minor isofs cleanup"
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        hfsplus: Don't clear SGID when inheriting ACLs
        isofs: Fix off-by-one in 'session' mount option parsing
        reiserfs: preserve i_mode if __reiserfs_set_acl() fails
        ext2: preserve i_mode if ext2_set_acl() fails
        ext2: Don't clear SGID when inheriting ACLs
        reiserfs: Don't clear SGID when inheriting ACLs
      791f2df3
    • Linus Torvalds's avatar
      Merge tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 465b0dbb
      Linus Torvalds authored
      Pull f2fs fixes from Jaegeuk Kim:
       "We've filed some bug fixes:
      
         - missing f2fs case in terms of stale SGID bit, introduced by Jan
      
         - build error for seq_file.h
      
         - avoid cpu lockup
      
         - wrong inode_unlock in error case"
      
      * tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
        f2fs: avoid cpu lockup
        f2fs: include seq_file.h for sysfs.c
        f2fs: Don't clear SGID when inheriting ACLs
        f2fs: remove extra inode_unlock() in error path
      465b0dbb
    • Linus Torvalds's avatar
      Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit · f58781c9
      Linus Torvalds authored
      Pull audit fix from Paul Moore:
       "A small audit fix, just a single line, to plug a memory leak in some
        audit error handling code"
      
      * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit:
        audit: fix memleak in auditd_send_unicast_skb.
      f58781c9
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · ae1c9085
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A handful of small fixes for 4.13-rc2. Three of these fixes are tagged
        for -stable. They have all appeared in at least one -next release with
        no reported issues
      
         - Fix handling of media errors that span a sector
      
         - Fix support of multiple namespaces in a libnvdimm region being in
           device-dax mode
      
         - Clean up the machine check notifier properly when the nfit driver
           fails to register
      
         - Address a static analysis (smatch) report in device-dax"
      
      * tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        device-dax: fix sysfs duplicate warnings
        MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system
        acpi/nfit: Fix memory corruption/Unregister mce decoder on failure
        device-dax: fix 'passing zero to ERR_PTR()' warning
        libnvdimm: fix badblock range handling of ARS range
      ae1c9085
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · c6efb454
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - HID multitouch 4.12 regression fix from Dmitry Torokhov
      
       - error handling fix for HID++ driver from Gustavo A. R. Silva
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value
        HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
      c6efb454
    • Rafael J. Wysocki's avatar
      Merge branches 'intel_pstate' and 'pm-domains' · ffa64d5e
      Rafael J. Wysocki authored
      * intel_pstate:
        cpufreq: intel_pstate: Correct the busy calculation for KNL
      
      * pm-domains:
        PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
      ffa64d5e
    • Gustavo A. R. Silva's avatar
      HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value · 929b60a8
      Gustavo A. R. Silva authored
      Check return value from call to devm_kmemdup() in order to prevent a NULL
      pointer dereference.
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      929b60a8
    • Josh Poimboeuf's avatar
      debug: Fix WARN_ON_ONCE() for modules · 325cdacd
      Josh Poimboeuf authored
      Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
      code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
      completely broken when called from a module.
      
      The bug was introduced with the following commit:
      
        19d43626 ("debug: Add _ONCE() logic to report_bug()")
      
      That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
      trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
      bit can be written to the flags to indicate the first warning has
      occurred.
      
      The bug table was made writable for vmlinux, which relies on
      vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
      it wasn't made writable for modules, which rely on the ELF section
      header flags.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Tested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 19d43626 ("debug: Add _ONCE() logic to report_bug()")
      Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      325cdacd
    • Alexander Shishkin's avatar
      perf/core: Fix scheduling regression of pinned groups · 3bda69c1
      Alexander Shishkin authored
      Vince Weaver reported:
      
      > I was tracking down some regressions in my perf_event_test testsuite.
      > Some of the tests broke in the 4.11-rc1 timeframe.
      >
      > I've bisected one of them, this report is about
      >	tests/overflow/simul_oneshot_group_overflow
      > This test creates an event group containing two sampling events, set
      > to overflow to a signal handler (which disables and then refreshes the
      > event).
      >
      > On a good kernel you get the following:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 946 (perf::instructions/1000000)
      > 		fd 4 overflows: 473 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946379875
      > 		Count 1: 946365218
      >
      > With the broken kernels you get:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 938 (perf::instructions/1000000)
      > 		fd 4 overflows: 318 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946373080
      > 		Count 1: 653373058
      
      The root cause of the bug is that the following commit:
      
        487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      
      erronously assumed that event's 'pinned' setting determines whether the
      event belongs to a pinned group or not, but in fact, it's the group
      leader's pinned state that matters.
      
      This was discovered by Vince in the test case described above, where two instruction
      counters are grouped, the group leader is pinned, but the other event is not;
      in the regressed case the counters were off by 33% (the difference between events'
      periods), but should be the same within the error margin.
      
      Fix the problem by looking at the group leader's pinning.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3bda69c1
    • Sabrina Dubroca's avatar
      ipv6: avoid overflow of offset in ip6_find_1stfragopt · 6399f1fa
      Sabrina Dubroca authored
      In some cases, offset can overflow and can cause an infinite loop in
      ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
      cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
      
      This problem has been here since before the beginning of git history.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6399f1fa
    • Colin Ian King's avatar
      net: tehuti: don't process data if it has not been copied from userspace · 1e6c22ae
      Colin Ian King authored
      The array data is only populated with valid information from userspace
      if cmd != SIOCDEVPRIVATE, other cases the array contains garbage on
      the stack. The subsequent switch statement acts on a subcommand in
      data[0] which could be any garbage value if cmd is SIOCDEVPRIVATE which
      seems incorrect to me.  Instead, just return EOPNOTSUPP for the case
      where cmd == SIOCDEVPRIVATE to avoid this issue.
      
      As a side note, I suspect that the original intention of the code
      was for this ioctl to work just for cmd == SIOCDEVPRIVATE (and the
      current logic is reversed). However, I don't wont to change the current
      semantics in case any userspace code relies on this existing behaviour.
      
      Detected by CoverityScan, CID#139647 ("Uninitialized scalar variable")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e6c22ae
    • David Ahern's avatar
      Revert "rtnetlink: Do not generate notifications for CHANGEADDR event" · 3753654e
      David Ahern authored
      This reverts commit cd8966e7.
      
      The duplicate CHANGEADDR event message is sent regardless of link
      status whereas the setlink changes only generate a notification when
      the link is up. Not sending a notification when the link is down breaks
      dhcpcd which only processes hwaddr changes when the link is down.
      
      Fixes reported regression:
          https://bugzilla.kernel.org/show_bug.cgi?id=196355Reported-by: default avatarYaroslav Isakov <yaroslav.isakov@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3753654e
    • Martin Hundebøll's avatar
      net: dsa: mv88e6xxx: Enable CMODE config support for 6390X · bb0a2675
      Martin Hundebøll authored
      Commit f39908d3 ('net: dsa: mv88e6xxx: Set the CMODE for mv88e6390
      ports 9 & 10') added support for setting the CMODE for the 6390X family,
      but only enabled it for 9290 and 6390 - and left out 6390X.
      
      Fix support for setting the CMODE on 6390X also by assigning
      mv88e6390x_port_set_cmode() to the .port_set_cmode function pointer in
      mv88e6390x_ops too.
      
      Fixes: f39908d3 ("net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10")
      Signed-off-by: default avatarMartin Hundebøll <mnhu@prevas.dk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb0a2675
  3. 19 Jul, 2017 4 commits