1. 07 Feb, 2018 4 commits
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ab2d92ad
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
      
       - membarrier updates (Mathieu Desnoyers)
      
       - SMP balancing optimizations (Mel Gorman)
      
       - stats update optimizations (Peter Zijlstra)
      
       - RT scheduler race fixes (Steven Rostedt)
      
       - misc fixes and updates
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS
        sched/fair: Do not migrate if the prev_cpu is idle
        sched/fair: Restructure wake_affine*() to return a CPU id
        sched/fair: Remove unnecessary parameters from wake_affine_idle()
        sched/rt: Make update_curr_rt() more accurate
        sched/rt: Up the root domain ref count when passing it around via IPIs
        sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
        sched/core: Optimize update_stats_*()
        sched/core: Optimize ttwu_stat()
        membarrier/selftest: Test private expedited sync core command
        membarrier/arm64: Provide core serializing command
        membarrier/x86: Provide core serializing command
        membarrier: Provide core serializing command, *_SYNC_CORE
        lockin/x86: Implement sync_core_before_usermode()
        locking: Introduce sync_core_before_usermode()
        membarrier/selftest: Test global expedited command
        membarrier: Provide GLOBAL_EXPEDITED command
        membarrier: Document scheduler barrier requirements
        powerpc, membarrier: Skip memory barrier in switch_mm()
        membarrier/selftest: Test private expedited command
      ab2d92ad
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b0dda4f
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, plus add missing interval sampling to certain x86 PEBS
        events"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Add trace/beauty/generated/ into .gitignore
        perf trace: Fix call-graph output
        x86/events/intel/ds: Add PERF_SAMPLE_PERIOD into PEBS_FREERUNNING_FLAGS
        perf record: Fix period option handling
        perf evsel: Fix period/freq terms setup
        tools headers: Synchoronize x86 features UAPI headers
        tools headers: Synchronize uapi/linux/sched.h
        tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h
        tooling headers: Synchronize updated s390 kvm UAPI headers
        tools headers: Synchronize sound/asound.h
      4b0dda4f
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b3250aab
      Linus Torvalds authored
      Pull locking fixlets from Ingo Molnar:
       "An endianness fix and a jump labels branch hint update"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/qrwlock: include asm/byteorder.h as needed
        jump_label: Add branch hints to static_branch_{un,}likely()
      b3250aab
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0dc400f4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix error path in netdevsim, from Jakub Kicinski.
      
       2) Default values listed in tcp_wmem and tcp_rmem documentation were
          inaccurate, from Tonghao Zhang.
      
       3) Fix route leaks in SCTP, both for ipv4 and ipv6. From Alexey Kodanev
          and Tommi Rantala.
      
       4) Fix "MASK < Y" meant to be "MASK << Y" in xgbe driver, from Wolfram
          Sang.
      
       5) Use after free in u32_destroy_key(), from Paolo Abeni.
      
       6) Fix two TX issues in be2net driver, from Suredh Reddy.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
        be2net: Handle transmit completion errors in Lancer
        be2net: Fix HW stall issue in Lancer
        RDS: IB: Fix null pointer issue
        nfp: fix kdoc warnings on nested structures
        sample/bpf: fix erspan metadata
        net: erspan: fix erspan config overwrite
        net: erspan: fix metadata extraction
        cls_u32: fix use after free in u32_destroy_key()
        net: amd-xgbe: fix comparison to bitshift when dealing with a mask
        net: phy: Handle not having GPIO enabled in the kernel
        ibmvnic: fix empty firmware version and errors cleanup
        sctp: fix dst refcnt leak in sctp_v4_get_dst
        sctp: fix dst refcnt leak in sctp_v6_get_dst()
        dwc-xlgmac: remove Jie Deng as co-maintainer
        doc: Change the min default value of tcp_wmem/tcp_rmem.
        samples/bpf: use bpf_set_link_xdp_fd
        libbpf: add missing SPDX-License-Identifier
        libbpf: add error reporting in XDP
        libbpf: add function to setup XDP
        tools: add netlink.h and if_link.h in tools uapi
        ...
      0dc400f4
  2. 06 Feb, 2018 35 commits
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.16-1' of git://git.infradead.org/linux-platform-drivers-x86 · cbd7b8a7
      Linus Torvalds authored
      Pull x86 platform-driver updates from Darren Hart:
       "New model support added for Dell, Ideapad, Acer, Asus, Thinkpad, and
        GPD laptops. Improvements to the common intel-vbtn driver, including
        tablet mode, rotate, and front button support. Intel CPU support added
        for Cannonlake and platform support for Dollar Cove power button.
      
        Overhaul of the mellanox platform driver, creating a new
        platform/mellanox directory for the newly multi-architecture regmap
        interface.
      
        Significant Intel PMC update with CannonLake support, Coffeelake
        update, CPUID enumeration, module support, new read64 API, refactoring
        and cleanups.
      
        Revert the apple-gmux iGP IO lock, addressing reported issues with
        non-binary drivers, leaving Nvidia binary driver users to comment out
        conflicting code.
      
        Miscellaneous fixes and cleanups"
      
      * tag 'platform-drivers-x86-v4.16-1' of git://git.infradead.org/linux-platform-drivers-x86: (81 commits)
        platform/x86: mlx-platform: Fix an ERR_PTR vs NULL issue
        platform/x86: intel_pmc_core: Special case for Coffeelake
        platform/x86: intel_pmc_core: Add CannonLake PCH support
        x86/cpu: Add Cannonlake to Intel family
        platform/x86: intel_pmc_core: Read base address from LPIT
        ACPI / LPIT: Export lpit_read_residency_count_address()
        platform/x86: intel-vbtn: Replace License by SDPX identifier
        platform/x86: intel-vbtn: Remove redundant inclusions
        platform/x86: intel-vbtn: Support tablet mode switch
        platform/x86: dell-laptop: Allocate buffer on heap rather than globally
        platform/x86: intel_pmc_core: Remove unused header file
        platform/x86: mlx-platform: Add hotplug device unregister to error path
        platform/x86: mlx-platform: fix module aliases
        platform/mellanox: mlxreg-hotplug: Add check for negative adapter number
        platform/x86: mlx-platform: Add IO access verification callbacks
        platform/x86: mlx-platform: Document pdev_hotplug field
        platform/x86: mlx-platform: Allow compilation for 32 bit arch
        platform/mellanox: mlxreg-hotplug: Enable building for ARM
        platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface
        platform/mellanox: Group create/destroy with attribute functions
        ...
      cbd7b8a7
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 3f551e3c
      Linus Torvalds authored
      Pull thermal management updates from Zhang Rui:
      
       - fix a race condition issue in power allocator governor (Yi Zeng).
      
       - add support for AP806 and CP110 in armada thermal driver, together
         with several improvements (Baruch Siach, Miquel Raynal)
      
       - add support for r8z7743 in rcar thermal driver (Biju Das)
      
       - convert thermal core to use new hwmon API to avoid warning (Fabio
         Estevam)
      
       - small fixes and cleanups in thermal core and x86_pkg_thermal,
         int3400_thermal, hisi_thermal, mtk_thermal and imx_thermal drivers
         (Pravin Shedge, Geert Uytterhoeven, Alexey Khoroshilov, Brian Bian,
         Matthias Brugger, Nicolin Chen, Uwe Kleine-König)
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
        thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()
        thermal/x86 pkg temp: Remove debugfs_create_u32() casts
        thermal: int3400_thermal: fix error handling in int3400_thermal_probe()
        thermal/drivers/hisi: Remove bogus const from function return type
        thermal: armada: Give meaningful names to the thermal zones
        thermal: armada: Wait sensors validity before exiting the init callback
        thermal: armada: Change sensors trim default value
        thermal: armada: Update Kconfig and module description
        thermal: armada: Add support for Armada CP110
        thermal: armada: Add support for Armada AP806
        thermal: armada: Use real status register name
        thermal: armada: Clarify control registers accesses
        thermal: armada: Simplify the check of the validity bit
        thermal: armada: Use msleep for long delays
        dt-bindings: thermal: Describe Armada AP806 and CP110
        dt-bindings: thermal: rcar: Add device tree support for r8a7743
        thermal: mtk: Cleanup unused defines
        thermal: imx: update to new formula according to NXP AN5215
        thermal: imx: use consistent style to write temperatures
        thermal: imx: improve comments describing algorithm for temp calculation
        ...
      3f551e3c
    • Stephen Rothwell's avatar
    • Ingo Molnar's avatar
      Merge branch 'linus' into sched/urgent, to resolve conflicts · 82845079
      Ingo Molnar authored
       Conflicts:
      	arch/arm64/kernel/entry.S
      	arch/x86/Kconfig
      	include/linux/sched/mm.h
      	kernel/fork.c
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82845079
    • Linus Torvalds's avatar
      Merge tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 68c5735e
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - videobuf2 was moved to a media/common dir, as it is now used by the
         DVB subsystem too
      
       - Digital TV core memory mapped support interface
      
       - new sensor driver: ov7740
      
       - several improvements at ddbridge driver
      
       - new V4L2 driver: IPU3 CIO2 CSI-2 receiver unit, found on some Intel
         SoCs
      
       - new tuner driver: tda18250
      
       - finally got rid of all LIRC staging drivers
      
       - as we don't have old lirc drivers anymore, restruct the lirc device
         code
      
       - add support for UVC metadata
      
       - add a new staging driver for NVIDIA Tegra Video Decoder Engine
      
       - DVB kAPI headers moved to include/media
      
       - synchronize the kAPI and uAPI for the DVB subsystem, removing the gap
         for non-legacy APIs
      
       - reduce the kAPI gap for V4L2
      
       - lots of other driver enhancements, cleanups, etc.
      
      * tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (407 commits)
        media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
        media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
        media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
        media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
        media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
        media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
        media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
        media: v4l2-compat-ioctl32.c: avoid sizeof(type)
        media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
        media: v4l2-compat-ioctl32.c: fix the indentation
        media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
        media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
        media: v4l2-ioctl.c: use check_fmt for enum/g/s/try_fmt
        media: vivid: fix module load error when enabling fb and no_error_inj=1
        media: dvb_demux: improve debug messages
        media: dvb_demux: Better handle discontinuity errors
        media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
        media: ts2020: avoid integer overflows on 32 bit machines
        media: i2c: ov7740: use gpio/consumer.h instead of gpio.h
        media: entity: Add a nop variant of media_entity_cleanup
        ...
      68c5735e
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 2246edfa
      Linus Torvalds authored
      Pull more rdma updates from Doug Ledford:
       "Items of note:
      
         - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
           worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
           4.15. The fix is here.
      
         - one of the patches (the endian notation patch from Lijun) looks
           like a lot of lines of change, but it's mostly mechanical in
           nature. It amounts to the biggest chunk of change in it (it's about
           2/3rds of the overall pull request).
      
        Summary:
      
         - Clean up some function signatures in rxe for clarity
      
         - Tidy the RDMA netlink header to remove unimplemented constants
      
         - bnxt_re driver fixes, one is a regression this window.
      
         - Minor hns driver fixes
      
         - Various fixes from Dan Carpenter and his tool
      
         - Fix IRQ cleanup race in HFI1
      
         - HF1 performance optimizations and a fix to report counters in the right units
      
         - Fix for an IPoIB startup sequence race with the external manager
      
         - Oops fix for the new kabi path
      
         - Endian cleanups for hns
      
         - Fix for mlx5 related to the new automatic affinity support"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
        net/mlx5: increase async EQ to avoid EQ overrun
        mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
        RDMA/hns: Fix the endian problem for hns
        IB/uverbs: Use the standard kConfig format for experimental
        IB: Update references to libibverbs
        IB/hfi1: Add 16B rcvhdr trace support
        IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
        IB/core: Avoid a potential OOPs for an unused optional parameter
        IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
        IB/ipoib: Fix for potential no-carrier state
        IB/hfi1: Show fault stats in both TX and RX directions
        IB/hfi1: Remove blind constants from 16B update
        IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
        IB/hfi1: Do not override given pcie_pset value
        IB/hfi1: Optimize process_receive_ib()
        IB/hfi1: Remove unnecessary fecn and becn fields
        IB/hfi1: Look up ibport using a pointer in receive path
        IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
        IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
        IB/hfi1: Remove dependence on qp->s_hdrwords
        ...
      2246edfa
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 3ff1b28c
      Linus Torvalds authored
      Pull libnvdimm updates from Ross Zwisler:
      
       - Require struct page by default for filesystem DAX to remove a number
         of surprising failure cases. This includes failures with direct I/O,
         gdb and fork(2).
      
       - Add support for the new Platform Capabilities Structure added to the
         NFIT in ACPI 6.2a. This new table tells us whether the platform
         supports flushing of CPU and memory controller caches on unexpected
         power loss events.
      
       - Revamp vmem_altmap and dev_pagemap handling to clean up code and
         better support future future PCI P2P uses.
      
       - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
         become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
         spec, and instead rely on the generic ND_CMD_CALL approach used by
         the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
      
       - Enhance nfit_test so we can test some of the new things added in
         version 1.6 of the DSM specification. This includes testing firmware
         download and simulating the Last Shutdown State (LSS) status.
      
      * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
        libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
        acpi, nfit: fix register dimm error handling
        libnvdimm, namespace: make min namespace size 4K
        tools/testing/nvdimm: force nfit_test to depend on instrumented modules
        libnvdimm/nfit_test: adding support for unit testing enable LSS status
        libnvdimm/nfit_test: add firmware download emulation
        nfit-test: Add platform cap support from ACPI 6.2a to test
        libnvdimm: expose platform persistence attribute for nd_region
        acpi: nfit: add persistent memory control flag for nd_region
        acpi: nfit: Add support for detect platform CPU cache flush on power loss
        device-dax: Fix trailing semicolon
        libnvdimm, btt: fix uninitialized err_lock
        dax: require 'struct page' by default for filesystem dax
        ext2: auto disable dax instead of failing mount
        ext4: auto disable dax instead of failing mount
        mm, dax: introduce pfn_t_special()
        mm: Fix devm_memremap_pages() collision handling
        mm: Fix memory size alignment in devm_memremap_pages_release()
        memremap: merge find_dev_pagemap into get_dev_pagemap
        memremap: change devm_memremap_pages interface to use struct dev_pagemap
        ...
      3ff1b28c
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 105cf3c8
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
      
       - skip AER driver error recovery callbacks for correctable errors
         reported via ACPI APEI, as we already do for errors reported via the
         native path (Tyler Baicar)
      
       - fix DPC shared interrupt handling (Alex Williamson)
      
       - print full DPC interrupt number (Keith Busch)
      
       - enable DPC only if AER is available (Keith Busch)
      
       - simplify DPC code (Bjorn Helgaas)
      
       - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
         Helgaas)
      
       - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
         Helgaas)
      
       - move ASPM internal interfaces out of public header (Bjorn Helgaas)
      
       - allow hot-removal of VGA devices (Mika Westerberg)
      
       - speed up unplug and shutdown by assuming Thunderbolt controllers
         don't support Command Completed events (Lukas Wunner)
      
       - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
         Jay Cornwall)
      
       - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
      
       - clean up PCI DMA interface usage (Christoph Hellwig)
      
       - remove PCI pool API (replaced with DMA pool) (Romain Perier)
      
       - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
         Kaya)
      
       - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
      
       - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
      
       - remove warnings on sysfs mmap failure (Bjorn Helgaas)
      
       - quiet ROM validation messages (Alex Deucher)
      
       - remove redundant memory alloc failure messages (Markus Elfring)
      
       - fill in types for compile-time VGA and other I/O port resources
         (Bjorn Helgaas)
      
       - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
         Ports to help AmigaOne X1000 (Bjorn Helgaas)
      
       - add SPDX tags to all PCI files (Bjorn Helgaas)
      
       - quirk Marvell 9128 DMA aliases (Alex Williamson)
      
       - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
      
       - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
         Cassel)
      
       - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
      
       - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
      
       - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
      
       - add support for ARTPEC-7 SoC (Niklas Cassel)
      
       - add endpoint-mode support for ARTPEC (Niklas Cassel)
      
       - add Cadence PCIe host and endpoint controller driver (Cyrille
         Pitchen)
      
       - handle multiple INTx status bits being set in dra7xx (Vignesh R)
      
       - translate dra7xx hwirq range to fix INTD handling (Vignesh R)
      
       - remove deprecated Exynos PHY initialization code (Jaehoon Chung)
      
       - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
      
       - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
      
       - fix Keystone interrupt-controller-node lookup (Johan Hovold)
      
       - constify qcom driver structures (Julia Lawall)
      
       - rework Tegra config space mapping to increase space available for
         endpoints (Vidya Sagar)
      
       - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
      
       - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
      
       - add support for Global Fabric Manager Server (GFMS) event to
         Microsemi Switchtec switch driver (Logan Gunthorpe)
      
       - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
      
      * tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
        PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
        dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
        PCI: endpoint: Fix EPF device name to support multi-function devices
        PCI: endpoint: Add the function number as argument to EPC ops
        PCI: cadence: Add host driver for Cadence PCIe controller
        dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
        PCI: Add vendor ID for Cadence
        PCI: Add generic function to probe PCI host controllers
        PCI: generic: fix missing call of pci_free_resource_list()
        PCI: OF: Add generic function to parse and allocate PCI resources
        PCI: Regroup all PCI related entries into drivers/pci/Makefile
        PCI/DPC: Reformat DPC register definitions
        PCI/DPC: Add and use DPC Status register field definitions
        PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
        PCI/DPC: Remove unnecessary RP PIO register structs
        PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
        PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
        PCI/DPC: Make RP PIO log size check more generic
        PCI/DPC: Rename local "status" to "dpc_status"
        PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
        ...
      105cf3c8
    • David S. Miller's avatar
      Merge branch 'be2net-patch-set' · 176bfb40
      David S. Miller authored
      Suresh Reddy says:
      
      ====================
      be2net: patch-set
      
      Hi Dave, Please consider applying these two patches to net
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      176bfb40
    • Suresh Reddy's avatar
      be2net: Handle transmit completion errors in Lancer · ffc39620
      Suresh Reddy authored
      If the driver receives a TX CQE with status as 0x1 or 0x9 or 0xb,
      the completion indexes should not be used. The driver must stop
      consuming CQEs from this TXQ/CQ. The TXQ from this point on-wards
      to be in a bad state. Driver should destroy and recreate the TXQ.
      
      0x1: LANCER_TX_COMP_LSO_ERR
      0x9 LANCER_TX_COMP_SGE_ERR
      0xb: LANCER_TX_COMP_PARITY_ERR
      
      Reset the adapter if driver sees this error in TX completion. Also
      adding sge error counter in ethtool stats.
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffc39620
    • Suresh Reddy's avatar
      be2net: Fix HW stall issue in Lancer · 3df40aad
      Suresh Reddy authored
      Lancer HW cannot handle a TSO packet with a single segment.
      Disable TSO/GSO for such packets.
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3df40aad
    • Guanglei Li's avatar
      RDS: IB: Fix null pointer issue · 2c0aa086
      Guanglei Li authored
      Scenario:
      1. Port down and do fail over
      2. Ap do rds_bind syscall
      
      PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
       #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
       #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
       #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
       #3 [ffff898e35f15b60] no_context at ffffffff8104854c
       #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
       #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
       #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
       #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
          [exception RIP: unknown or invalid address]
          RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
          RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
          RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
          RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
          R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
          R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
          ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
       #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
       #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
       #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
       #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6
      
      PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
       #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
       #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
       #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
       #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
       #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
       #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
       #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
       #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
       #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670
      
      PID: 45659                          PID: 47039
      rds_ib_laddr_check
        /* create id_priv with a null event_handler */
        rdma_create_id
        rdma_bind_addr
          cma_acquire_dev
            /* add id_priv to cma_dev->id_list */
            cma_attach_to_dev
                                          cma_ndev_work_handler
                                            /* event_hanlder is null */
                                            id_priv->id.event_handler
      Signed-off-by: default avatarGuanglei Li <guanglei.li@oracle.com>
      Signed-off-by: default avatarHonglei Wang <honglei.wang@oracle.com>
      Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYanjun Zhu <yanjun.zhu@oracle.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c0aa086
    • Jakub Kicinski's avatar
      nfp: fix kdoc warnings on nested structures · 703f578a
      Jakub Kicinski authored
      Commit 84ce5b98 ("scripts: kernel-doc: improve nested logic to
      handle multiple identifiers") improved the handling of nested structure
      definitions in scripts/kernel-doc, and changed the expected format of
      documentation.  This causes new warnings to appear on W=1 builds.
      
      Only comment changes.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      703f578a
    • David S. Miller's avatar
      Merge branch 'net-erspan-fixes' · 67ae44e1
      David S. Miller authored
      William Tu says:
      
      ====================
      net: erspan fixes
      
      The first patch fixes erspan metadata extraction issue from packet
      header due to commit d350a823 ("net: erspan: create erspan metadata
      uapi header").  The commit moves the erspan 'version' in
      'struct erspan_metadata' in front of 'struct erspan_md2' for later
      extensibility, but breaks the existing metadata extraction code due
      to extra 4-byte size 'version'.  The second patch fixes the case where
      tunnel device receives an erspan packet with different tunnel metadata
      (ex: version, index, hwid, direction), existing code overwrites the
      tunnel device's erspan configuration.  The third patch fixes the bpf
      tests due to the above patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67ae44e1
    • William Tu's avatar
      sample/bpf: fix erspan metadata · 9c33ca43
      William Tu authored
      The commit c69de58b ("net: erspan: use bitfield instead of
      mask and offset") changes the erspan header to use bitfield, and
      commit d350a823 ("net: erspan: create erspan metadata uapi header")
      creates a uapi header file.  The above two commit breaks the current
      erspan test.  This patch fixes it by adapting the above two changes.
      
      Fixes: ac80c2a1 ("samples/bpf: add erspan v2 sample code")
      Fixes: ef88f89c ("samples/bpf: extend test_tunnel_bpf.sh with ERSPAN")
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c33ca43
    • William Tu's avatar
      net: erspan: fix erspan config overwrite · 39f57f67
      William Tu authored
      When an erspan tunnel device receives an erpsan packet with different
      tunnel metadata (ex: version, index, hwid, direction), existing code
      overwrites the tunnel device's erspan configuration with the received
      packet's metadata.  The patch fixes it.
      
      Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
      Fixes: f551c91d ("net: erspan: introduce erspan v2 for ip_gre")
      Fixes: ef7baf5e ("ip6_gre: add ip6 erspan collect_md mode")
      Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f57f67
    • William Tu's avatar
      net: erspan: fix metadata extraction · 3df19283
      William Tu authored
      Commit d350a823 ("net: erspan: create erspan metadata uapi header")
      moves the erspan 'version' in front of the 'struct erspan_md2' for
      later extensibility reason.  This breaks the existing erspan metadata
      extraction code because the erspan_md2 then has a 4-byte offset
      to between the erspan_metadata and erspan_base_hdr.  This patch
      fixes it.
      
      Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
      Fixes: ef7baf5e ("ip6_gre: add ip6 erspan collect_md mode")
      Fixes: 1d7e2ed2 ("net: erspan: refactor existing erspan code")
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3df19283
    • Paolo Abeni's avatar
      cls_u32: fix use after free in u32_destroy_key() · d7cdee5e
      Paolo Abeni authored
      Li Shuang reported an Oops with cls_u32 due to an use-after-free
      in u32_destroy_key(). The use-after-free can be triggered with:
      
      dev=lo
      tc qdisc add dev $dev root handle 1: htb default 10
      tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
      tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
       10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
      tc qdisc del dev $dev root
      
      Which causes the following kasan splat:
      
       ==================================================================
       BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
       Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571
      
       CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
       Call Trace:
        dump_stack+0xd6/0x182
        ? dma_virt_map_sg+0x22e/0x22e
        print_address_description+0x73/0x290
        kasan_report+0x277/0x360
        ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
        u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
        u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
        process_one_work+0xae0/0x1c80
        ? sched_clock+0x5/0x10
        ? pwq_dec_nr_in_flight+0x3c0/0x3c0
        ? _raw_spin_unlock_irq+0x29/0x40
        ? trace_hardirqs_on_caller+0x381/0x570
        ? _raw_spin_unlock_irq+0x29/0x40
        ? finish_task_switch+0x1e5/0x760
        ? finish_task_switch+0x208/0x760
        ? preempt_notifier_dec+0x20/0x20
        ? __schedule+0x839/0x1ee0
        ? check_noncircular+0x20/0x20
        ? firmware_map_remove+0x73/0x73
        ? find_held_lock+0x39/0x1c0
        ? worker_thread+0x434/0x1820
        ? lock_contended+0xee0/0xee0
        ? lock_release+0x1100/0x1100
        ? init_rescuer.part.16+0x150/0x150
        ? retint_kernel+0x10/0x10
        worker_thread+0x216/0x1820
        ? process_one_work+0x1c80/0x1c80
        ? lock_acquire+0x1a5/0x540
        ? lock_downgrade+0x6b0/0x6b0
        ? sched_clock+0x5/0x10
        ? lock_release+0x1100/0x1100
        ? compat_start_thread+0x80/0x80
        ? do_raw_spin_trylock+0x190/0x190
        ? _raw_spin_unlock_irq+0x29/0x40
        ? trace_hardirqs_on_caller+0x381/0x570
        ? _raw_spin_unlock_irq+0x29/0x40
        ? finish_task_switch+0x1e5/0x760
        ? finish_task_switch+0x208/0x760
        ? preempt_notifier_dec+0x20/0x20
        ? __schedule+0x839/0x1ee0
        ? kmem_cache_alloc_trace+0x143/0x320
        ? firmware_map_remove+0x73/0x73
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1c0
        ? schedule+0xf3/0x3b0
        ? lock_downgrade+0x6b0/0x6b0
        ? __schedule+0x1ee0/0x1ee0
        ? do_wait_intr_irq+0x340/0x340
        ? do_raw_spin_trylock+0x190/0x190
        ? _raw_spin_unlock_irqrestore+0x32/0x60
        ? process_one_work+0x1c80/0x1c80
        ? process_one_work+0x1c80/0x1c80
        kthread+0x312/0x3d0
        ? kthread_create_worker_on_cpu+0xc0/0xc0
        ret_from_fork+0x3a/0x50
      
       Allocated by task 1688:
        kasan_kmalloc+0xa0/0xd0
        __kmalloc+0x162/0x380
        u32_change+0x1220/0x3c9e [cls_u32]
        tc_ctl_tfilter+0x1ba6/0x2f80
        rtnetlink_rcv_msg+0x4f0/0x9d0
        netlink_rcv_skb+0x124/0x320
        netlink_unicast+0x430/0x600
        netlink_sendmsg+0x8fa/0xd60
        sock_sendmsg+0xb1/0xe0
        ___sys_sendmsg+0x678/0x980
        __sys_sendmsg+0xc4/0x210
        do_syscall_64+0x232/0x7f0
        return_from_SYSCALL_64+0x0/0x75
      
       Freed by task 112:
        kasan_slab_free+0x71/0xc0
        kfree+0x114/0x320
        rcu_process_callbacks+0xc3f/0x1600
        __do_softirq+0x2bf/0xc06
      
       The buggy address belongs to the object at ffff881b83dae600
        which belongs to the cache kmalloc-4096 of size 4096
       The buggy address is located 24 bytes inside of
        4096-byte region [ffff881b83dae600, ffff881b83daf600)
       The buggy address belongs to the page:
       page:ffffea006e0f6a00 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
       flags: 0x17ffffc0008100(slab|head)
       raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
       raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                   ^
        ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The problem is that the htnode is freed before the linked knodes and the
      latter will try to access the first at u32_destroy_key() time.
      This change addresses the issue using the htnode refcnt to guarantee
      the correct free order. While at it also add a RCU annotation,
      to keep sparse happy.
      
      v1 -> v2: use rtnl_derefence() instead of RCU read locks
      v2 -> v3:
        - don't check refcnt in u32_destroy_hnode()
        - cleaned-up u32_destroy() implementation
        - cleaned-up code comment
      v3 -> v4:
        - dropped unneeded comment
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Fixes: c0d378ef ("net_sched: use tcf_queue_work() in u32 filter")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7cdee5e
    • Wolfram Sang's avatar
      net: amd-xgbe: fix comparison to bitshift when dealing with a mask · a3276892
      Wolfram Sang authored
      Due to a typo, the mask was destroyed by a comparison instead of a bit
      shift.
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3276892
    • Andrew Lunn's avatar
      net: phy: Handle not having GPIO enabled in the kernel · a56c6980
      Andrew Lunn authored
      If CONFIG_GPIOLIB is disabled, fwnode_get_named_gpiod() becomes a stub
      function, which return -ENOSYS. Handle this in the same way as
      -ENOENT, i.e. assume there is no GPIO used to reset the PHYs.
      Reported-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Tested-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Fixes: bafbdd52 ("phylib: Add device reset GPIO support")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a56c6980
    • Dan Carpenter's avatar
      platform/x86: mlx-platform: Fix an ERR_PTR vs NULL issue · 8a0f5b6f
      Dan Carpenter authored
      devm_ioport_map() returns NULL on error but we accidentally check for
      error pointers instead.
      
      Fixes: c6acad68 ("platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarVadim Pasternak <vadimp@melanox.com>
      Signed-off-by: default avatarDarren Hart (VMware) <dvhart@infradead.org>
      8a0f5b6f
    • Arnd Bergmann's avatar
      locking/qrwlock: include asm/byteorder.h as needed · ca66e797
      Arnd Bergmann authored
      Moving the qrwlock struct definition into a header file introduced
      a subtle bug on all little-endian machines, where some files in some
      configurations would see the fields in an incorrect order.  This was
      found by building with an LTO enabled compiler that warns every time we
      try to link together files with incompatible data structures.
      
      A second patch changes linux/kconfig.h to always define the symbols,
      but this seems to be the root cause of most of the issues, so I'd suggest
      we do both.
      
      On a current linux-next kernel, I verified that this header is
      responsible for all type mismatches as a result from the endianess
      confusion.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: e0d02285 ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
      Link: http://lkml.kernel.org/r/20180202154104.1522809-1-arnd@arndb.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ca66e797
    • Peter Zijlstra's avatar
      jump_label: Add branch hints to static_branch_{un,}likely() · 81dcf89f
      Peter Zijlstra authored
      For some reason these were missing, I've not observed this patch
      making a difference in the few code locations I checked, but this
      makes sense.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      81dcf89f
    • Mel Gorman's avatar
      sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS · 32e839dd
      Mel Gorman authored
      The select_idle_sibling() (SIS) rewrite in commit:
      
        10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()")
      
      ... replaced a domain iteration with a search that broadly speaking
      does a wrapped walk of the scheduler domain sharing a last-level-cache.
      
      While this had a number of improvements, one consequence is that two tasks
      that share a waker/wakee relationship push each other around a socket. Even
      though two tasks may be active, all cores are evenly used. This is great from
      a search perspective and spreads a load across individual cores, but it has
      adverse consequences for cpufreq. As each CPU has relatively low utilisation,
      cpufreq may decide the utilisation is too low to used a higher P-state and
      overall computation throughput suffers.
      
      While individual cpufreq and cpuidle drivers may compensate by artifically
      boosting P-state (at c0) or avoiding lower C-states (during idle), it does
      not help if hardware-based cpufreq (e.g. HWP) is used.
      
      This patch tracks a recently used CPU based on what CPU a task was running
      on when it last was a waker a CPU it was recently using when a task is a
      wakee. During SIS, the recently used CPU is used as a target if it's still
      allowed by the task and is idle.
      
      The benefit may be non-obvious so consider an example of two tasks
      communicating back and forth. Task A may be an application doing IO where
      task B is a kworker or kthread like journald. Task A may issue IO, wake
      B and B wakes up A on completion.  With the existing scheme this may look
      like the following (potentially different IDs if SMT is in use but similar
      principal applies).
      
       A (cpu 0)	wake	B (wakes on cpu 1)
       B (cpu 1)	wake	A (wakes on cpu 2)
       A (cpu 2)	wake	B (wakes on cpu 3)
       etc.
      
      A careful reader may wonder why CPU 0 was not idle when B wakes A the
      first time and it's simply due to the fact that A can be rescheduled to
      another CPU and the pattern is that prev == target when B tries to wakeup A
      and the information about CPU 0 has been lost.
      
      With this patch, the pattern is more likely to be:
      
       A (cpu 0)	wake	B (wakes on cpu 1)
       B (cpu 1)	wake	A (wakes on cpu 0)
       A (cpu 0)	wake	B (wakes on cpu 1)
       etc
      
      i.e. two communicating casts are more likely to use just two cores instead
      of all available cores sharing a LLC.
      
      The most dramatic speedup was noticed on dbench using the XFS filesystem on
      UMA as clients interact heavily with workqueues in that configuration. Note
      that a similar speedup is not observed on ext4 as the wakeup pattern
      is different:
      
                                4.15.0-rc9             4.15.0-rc9
                                 waprev-v1        biasancestor-v1
       Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
       Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
       Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
       Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
       Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)
      
      The results can be less dramatic on NUMA where automatic balancing interferes
      with the test. It's also known that network benchmarks running on localhost
      also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
      and TCP depending on the machine). Hackbench also seens small improvements
      (6-11% depending on machine and thread count). The facebook schbench was also
      tested but in most cases showed little or no different to wakeup latencies.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      32e839dd
    • Mel Gorman's avatar
      sched/fair: Do not migrate if the prev_cpu is idle · 806486c3
      Mel Gorman authored
      wake_affine_idle() prefers to move a task to the current CPU if the
      wakeup is due to an interrupt. The expectation is that the interrupt
      data is cache hot and relevant to the waking task as well as avoiding
      a search. However, there is no way to determine if there was cache hot
      data on the previous CPU that may exceed the interrupt data. Furthermore,
      round-robin delivery of interrupts can migrate tasks around a socket where
      each CPU is under-utilised.  This can interact badly with cpufreq which
      makes decisions based on per-cpu data. It has been observed on machines
      with HWP that p-states are not boosted to their maximum levels even though
      the workload is latency and throughput sensitive.
      
      This patch uses the previous CPU for the task if it's idle and cache-affine
      with the current CPU even if the current CPU is idle due to the wakup
      being related to the interrupt. This reduces migrations at the cost of
      the interrupt data not being cache hot when the task wakes.
      
      A variety of workloads were tested on various machines and no adverse
      impact was noticed that was outside noise. dbench on ext4 on UMA showed
      roughly 10% reduction in the number of CPU migrations and it is a case
      where interrupts are frequent for IO competions. In most cases, the
      difference in performance is quite small but variability is often
      reduced. For example, this is the result for pgbench running on a UMA
      machine with different numbers of clients.
      
                                4.15.0-rc9             4.15.0-rc9
                                  baseline              waprev-v1
       Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
       Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
       Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
       Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
       Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
       Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
       Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
       Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
       Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
       Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
       CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)
      
      Note that the different in performance is marginal but for low utilisation,
      there is less variability.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      806486c3
    • Mel Gorman's avatar
      sched/fair: Restructure wake_affine*() to return a CPU id · 3b76c4a3
      Mel Gorman authored
      This is a preparation patch that has wake_affine*() return a CPU ID instead of
      a boolean. The intent is to allow the wake_affine() helpers to be avoided
      if a decision is already made. This patch has no functional change.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-3-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3b76c4a3
    • Mel Gorman's avatar
      sched/fair: Remove unnecessary parameters from wake_affine_idle() · 89a55f56
      Mel Gorman authored
      wake_affine_idle() takes parameters it never uses so clean it up.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180130104555.4125-2-mgorman@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      89a55f56
    • Wen Yang's avatar
      sched/rt: Make update_curr_rt() more accurate · e7ad2031
      Wen Yang authored
      rq->clock_task may be updated between the two calls of
      rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only
      once makes it more accurate and efficient, taking update_curr() as
      reference.
      Signed-off-by: default avatarWen Yang <wen.yang99@zte.com.cn>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarJiang Biao <jiang.biao2@zte.com.cn>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: zhong.weidong@zte.com.cn
      Link: http://lkml.kernel.org/r/1517800721-42092-1-git-send-email-wen.yang99@zte.com.cnSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e7ad2031
    • Steven Rostedt (VMware)'s avatar
      sched/rt: Up the root domain ref count when passing it around via IPIs · 364f5665
      Steven Rostedt (VMware) authored
      When issuing an IPI RT push, where an IPI is sent to each CPU that has more
      than one RT task scheduled on it, it references the root domain's rto_mask,
      that contains all the CPUs within the root domain that has more than one RT
      task in the runable state. The problem is, after the IPIs are initiated, the
      rq->lock is released. This means that the root domain that is associated to
      the run queue could be freed while the IPIs are going around.
      
      Add a sched_get_rd() and a sched_put_rd() that will increment and decrement
      the root domain's ref count respectively. This way when initiating the IPIs,
      the scheduler will up the root domain's ref count before releasing the
      rq->lock, ensuring that the root domain does not go away until the IPI round
      is complete.
      Reported-by: default avatarPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      364f5665
    • Steven Rostedt (VMware)'s avatar
      sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() · ad0f1d9d
      Steven Rostedt (VMware) authored
      When the rto_push_irq_work_func() is called, it looks at the RT overloaded
      bitmask in the root domain via the runqueue (rq->rd). The problem is that
      during CPU up and down, nothing here stops rq->rd from changing between
      taking the rq->rd->rto_lock and releasing it. That means the lock that is
      released is not the same lock that was taken.
      
      Instead of using this_rq()->rd to get the root domain, as the irq work is
      part of the root domain, we can simply get the root domain from the irq work
      that is passed to the routine:
      
       container_of(work, struct root_domain, rto_push_work)
      
      This keeps the root domain consistent.
      Reported-by: default avatarPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ad0f1d9d
    • Peter Zijlstra's avatar
      sched/core: Optimize update_stats_*() · 2ed41a55
      Peter Zijlstra authored
      These functions are already gated by schedstats_enabled(), there is no
      point in then issuing another static_branch for every individual
      update in them.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2ed41a55
    • Peter Zijlstra's avatar
      sched/core: Optimize ttwu_stat() · b85c8b71
      Peter Zijlstra authored
      The whole of ttwu_stat() is guarded by a single schedstat_enabled(),
      there is absolutely no point in then issuing another static_branch for
      every single schedstat_inc() in there.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b85c8b71
    • Desnes Augusto Nunes do Rosario's avatar
      ibmvnic: fix empty firmware version and errors cleanup · 21a2545b
      Desnes Augusto Nunes do Rosario authored
      This patch makes sure that the firmware version is never NULL. Moreover,
      it also performs some cleanup on the error messages.
      
      Fixes: a107311d ("ibmvnic: fix firmware version when no firmware level
      has been provided by the VIOS server")
      Signed-off-by: default avatarDesnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21a2545b
    • Tommi Rantala's avatar
      sctp: fix dst refcnt leak in sctp_v4_get_dst · 4a31a6b1
      Tommi Rantala authored
      Fix dst reference count leak in sctp_v4_get_dst() introduced in commit
      410f0383 ("sctp: add routing output fallback"):
      
      When walking the address_list, successive ip_route_output_key() calls
      may return the same rt->dst with the reference incremented on each call.
      
      The code would not decrement the dst refcount when the dst pointer was
      identical from the previous iteration, causing the dst refcnt leak.
      
      Testcase:
        ip netns add TEST
        ip netns exec TEST ip link set lo up
        ip link add dummy0 type dummy
        ip link add dummy1 type dummy
        ip link add dummy2 type dummy
        ip link set dev dummy0 netns TEST
        ip link set dev dummy1 netns TEST
        ip link set dev dummy2 netns TEST
        ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0
        ip netns exec TEST ip link set dummy0 up
        ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1
        ip netns exec TEST ip link set dummy1 up
        ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2
        ip netns exec TEST ip link set dummy2 up
        ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 20000 -s -B 192.168.1.3
        ip netns del TEST
      
      In 4.4 and 4.9 kernels this results to:
        [  354.179591] unregister_netdevice: waiting for lo to become free. Usage count = 1
        [  364.419674] unregister_netdevice: waiting for lo to become free. Usage count = 1
        [  374.663664] unregister_netdevice: waiting for lo to become free. Usage count = 1
        [  384.903717] unregister_netdevice: waiting for lo to become free. Usage count = 1
        [  395.143724] unregister_netdevice: waiting for lo to become free. Usage count = 1
        [  405.383645] unregister_netdevice: waiting for lo to become free. Usage count = 1
        ...
      
      Fixes: 410f0383 ("sctp: add routing output fallback")
      Fixes: 0ca50d12 ("sctp: fix src address selection if using secondary addresses")
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a31a6b1
    • Alexey Kodanev's avatar
      sctp: fix dst refcnt leak in sctp_v6_get_dst() · 957d761c
      Alexey Kodanev authored
      When going through the bind address list in sctp_v6_get_dst() and
      the previously found address is better ('matchlen > bmatchlen'),
      the code continues to the next iteration without releasing currently
      held destination.
      
      Fix it by releasing 'bdst' before continue to the next iteration, and
      instead of introducing one more '!IS_ERR(bdst)' check for dst_release(),
      move the already existed one right after ip6_dst_lookup_flow(), i.e. we
      shouldn't proceed further if we get an error for the route lookup.
      
      Fixes: dbc2b5e9 ("sctp: fix src address selection if using secondary addresses for ipv6")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      957d761c
  3. 05 Feb, 2018 1 commit
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e237f98a
      Linus Torvalds authored
      Pull more xfs updates from Darrick Wong:
       "As promised, here's a (much smaller) second pull request for the
        second week of the merge cycle. This time around we have a couple
        patches shutting off unsupported fs configurations, and a couple of
        cleanups.
      
        Last, we turn off EXPERIMENTAL for the reverse mapping btree, since
        the primary downstream user of that information (online fsck) is now
        upstream and I haven't seen any major failures in a few kernel
        releases.
      
        Summary:
      
         - Print scrub build status in the xfs build info.
      
         - Explicitly call out the remaining two scenarios where we don't
           support reflink and never have.
      
         - Remove EXPERIMENTAL tag from reverse mapping btree!"
      
      * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: remove experimental tag for reverse mapping
        xfs: don't allow reflink + realtime filesystems
        xfs: don't allow DAX on reflink filesystems
        xfs: add scrub to XFS_BUILD_OPTIONS
        xfs: fix u32 type usage in sb validation function
      e237f98a