1. 02 Jun, 2014 30 commits
    • Bjørn Mork's avatar
      net: cdc_ncm: always reallocate tx_curr_skb when tx_max increases · 1ba5d0ff
      Bjørn Mork authored
      We are calling usbnet_start_xmit() to flush any remaining data,
      depending on the side effect that tx_curr_skb is set to NULL,
      ensuring a new allocation using the updated tx_max.  But this
      side effect will only happen if there were any cached data ready
      to transmit. If not, then an empty tx_curr_skb is still allocated
      using the old tx_max size. Free it to avoid a buffer overrun.
      
      Fixes: 68864abf ("net: cdc_ncm: support rx_max/tx_max updates when running")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ba5d0ff
    • Bjørn Mork's avatar
      net: cdc_ncm: reduce skb truesize in rx path · 1e2c6117
      Bjørn Mork authored
      Cloning the big skbs we use for USB buffering chokes up TCP and
      SCTP because the socket memory limits are hitting earlier than
      they should. It is better to unconditionally copy the unwrapped
      packets to freshly allocated skbs.
      Reported-by: default avatarJim Baxter <jim_baxter@mentor.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e2c6117
    • dingtianhong's avatar
      macvlan: fix the problem when mac address changes for passthru mode · e289fd28
      dingtianhong authored
      The macvlan dev should always have the same mac address like lowerdev
      when in the passthru mode, change the mac address alone will break the
      work mechanism, so when the lowerdev or macvlan mac address changes,
      we should propagate the changes to another dev.
      
      v1->v2: Allow macvlan dev to change mac address for passthru mode and propagate to
      	lowerdev.
      
      v2->v3: Don't set the mac address to the lower dev's unicast address for
      	passthru mode when mac address changes.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e289fd28
    • Chen-Yu Tsai's avatar
      net: stmmac: Handle different error codes from platform_get_irq_byname · d7ec8584
      Chen-Yu Tsai authored
      The following patch moved device tree interrupt resolution into
      platform_get_irq_byname:
      
        ad69674e of/irq: do irq resolution in platform_get_irq_byname()
      
      As a result, the function no longer only return -ENXIO on error.
      This breaks DT based probing of stmmac, as seen in test runs of
      linux-next next-20140526 cubie2-sunxi_defconfig:
      
        http://lists.linaro.org/pipermail/kernel-build-reports/2014-May/003659.html
      
      This patch makes the stmmac_platform probe function properly handle
      error codes, such as returning for deferred probing, and other codes
      returned by of_irq_get_by_name.
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7ec8584
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · 31595de2
      David S. Miller authored
      John W. Linville says:
      
      ====================
      pull request: wireless-next 2014-06-02
      
      Please pull this remaining batch of updates intended for the 3.16 stream...
      
      For the mac80211 bits, Johannes says:
      
      "The remainder for -next right now is mostly fixes, and a handful of
      small new things like some CSA infrastructure, the regdb script mW/dBm
      conversion change and sending wiphy notifications."
      
      For the bluetooth bits, Gustavo says:
      
      "Some more patches for 3.16. There is nothing really special here, just a
      bunch of clean ups, fixes plus some small improvements. Please pull."
      
      For the nfc bits, Samuel says:
      
      "We have:
      
      - Felica (Type3) tags support for trf7970a
      - Type 4b tags support for port100
      - st21nfca DTS typo fix
      - A few sparse warning fixes"
      
      For the atheros bits, Kalle says:
      
      "Ben added support for setting antenna configurations. Michal improved
      warm reset so that we would not need to fall back to cold reset that
      often, an issue where ath10k stripped protected flag while in monitor
      mode and made module initialisation asynchronous to fix the problems
      with firmware loading when the driver is linked to the kernel.
      
      Luca removed unused channel_switch_beacon callbacks both from ath9k and
      ath10k. Marek fixed Protected Management Frames (PMF) when using Action
      Frames. Also we had other small fixes everywhere in the driver."
      
      Along with that, there are a handful of updates to a variety
      of drivers.  This includes updates to at76c50x-usb, ath9k, b43,
      brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31595de2
    • Eric Dumazet's avatar
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet authored
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f156a6
    • Daniel Mack's avatar
      of: of_mdio: export symbol of_mdiobus_link_phydev · e067ee33
      Daniel Mack authored
      Make of_mdiobus_link_phydev externally available.
      This fixes CONFIG_OF_MDIO=m.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: 86f6cf41 ("net: of_mdio: add of_mdiobus_link_phydev()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e067ee33
    • Daniel Mack's avatar
      net: of_mdio: use int type for address variable · 4cd984b0
      Daniel Mack authored
      Use int rather than u32 to fix the following warning:
      
      drivers/of/of_mdio.c:147 of_mdiobus_register() warn: unsigned 'addr' is
      never less than zero.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Fixes: 8f838288 ("net: of_mdio: factor out code to parse a phy's 'reg' property")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cd984b0
    • David S. Miller's avatar
      Merge branch 'netdevsync' · c7bfbe51
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      Provide common means for device address sync
      
      The following series implements a means for synchronizing both unicast and
      multicast addresses on a device interface.  The code is based on the original
      implementation of dev_uc_sync that was available for syncing a VLAN to the
      lower dev.
      
      The original reason for coming up for this patch is a driver that is still in
      the early stages of development.  The nearest driver I could find that
      appeared to have the same limitations as the driver I was working on was the
      Cisco enic driver.  For this reason I chose it as the first driver to make use
      of this interface publicly.
      
      However, I do not have a Cisco enic interface so I have only been able to
      compile test any changes made to the driver.  I tried to keep this change as
      simple as possible to avoid any issues.  Any help with testing would be
      greatly appreciated.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7bfbe51
    • Alexander Duyck's avatar
      enic: Update driver to use __dev_uc/mc_sync/unsync calls · f009618a
      Alexander Duyck authored
      This change updates the enic driver to make use of __dev_uc_sync and
      __dev_mc_sync calls.  Previously the driver was doing its own list
      management by storing the mc_addr and uc_addr list in a 32 address array.
      With this change the sync data is stored in the netdev_addr_list structures
      and instead we just track how many addresses we have written to the device.
      When we encounter 32 we stop and print a message as occurred previously with
      the old approach.
      
      Other than the core change the only other bit needed was to propagate the
      constant attribute with the MAC address as there were several spots where
      is twas only passed as a u8 * instead of a const u8 *.
      
      This patch is meant to maintain the original functionality without the use
      of the mc_addr and uc_addr arrays.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f009618a
    • Alexander Duyck's avatar
      net: Add support for device specific address syncing · 670e5b8e
      Alexander Duyck authored
      This change provides a function to be used in order to break the
      ndo_set_rx_mode call into a set of address add and remove calls.  The code
      is based on the implementation of dev_uc_sync/dev_mc_sync.  Since they
      essentially do the same thing but with only one dev I simply named my
      functions __dev_uc_sync/__dev_mc_sync.
      
      I also implemented an unsync version of the functions as well to allow for
      cleanup on close.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      670e5b8e
    • David S. Miller's avatar
      Merge branch '6lowpan-next' · 3e820811
      David S. Miller authored
      Alexander Aring says:
      
      ====================
      6lowpan: fragmentation fixes
      
      This patch series fix the 6LoWPAN fragmentation which are in two cases broken.
      
      The first case is if we have exactly two 6LoWPAN fragments only. This is fixed
      by patch "6lowpan_rtnl: fix fragmentation with two fragments".
      The second case is a off by one issue if we have payload which hits the fragment
      boundary.
      
      Both issues are introduced by commit d4b2816d
      ("6lowpan: fix fragmentation").
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e820811
    • Alexander Aring's avatar
      6lowpan_rtnl: fix off by one while fragmentation · eb06481d
      Alexander Aring authored
      This patch fix a off by one error while fragmentation. If the frag_cap
      value is equal to skb_unprocessed value we need to stop the
      fragmentation loop because the last fragment which has a size of
      skb_unprocessed fits into the frag capability size.
      
      This issue was introduced by commit d4b2816d
      ("6lowpan: fix fragmentation").
      Signed-off-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb06481d
    • Alexander Aring's avatar
      6lowpan_rtnl: fix fragmentation with two fragments · 51263fff
      Alexander Aring authored
      This patch fix the 6LoWPAN fragmentation for the case if we have exactly
      two fragments. The problem is that the (skb_unprocessed >= frag_cap)
      condition is always false on the second fragment after sending the first
      fragment. A fragmentation with only one fragment doesn't make any sense.
      The solution is that we use a do while loop here, that ensures we sending
      always a minimum of two fragments if we need a fragmentation.
      
      This issue was introduced by commit d4b2816d
      ("6lowpan: fix fragmentation").
      Signed-off-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51263fff
    • Emil Goode's avatar
      stmmac: Remove spin_lock call in stmmac_get_pauseparam() · 86c92ee3
      Emil Goode authored
      The following patch removed unnecessary spin_lock/unlock calls
      in ethtool_ops callback functions. In the second and final version
      of the patch one spin_lock call was left behind.
      
      commit cab6715c
      Author: Yang Wei <Wei.Yang@windriver.com>
      Date:   Sun May 25 09:53:44 2014 +0800
      
          net: driver: stmicro: Remove some useless the lock protection
      
      This introduced the following sparse warning:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:424:1: warning:
      	context imbalance in 'stmmac_get_pauseparam' -
      	different lock contexts for basic block
      Signed-off-by: default avatarEmil Goode <emilgoode@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86c92ee3
    • Denis ChengRq's avatar
      genetlink: remove superfluous assignment · 2f91abd4
      Denis ChengRq authored
      the local variable ops and n_ops were just read out from family,
      and not changed, hence no need to assign back.
      
      Validation functions should operate on const parameters and not
      change anything.
      Signed-off-by: default avatarCheng Renquan <crquan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f91abd4
    • John W. Linville's avatar
      Merge branch 'master' of... · fcb2c0d6
      John W. Linville authored
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
      fcb2c0d6
    • David S. Miller's avatar
      Revert "net/mlx4_en: Use affinity hint" · 96b2e73c
      David S. Miller authored
      This reverts commit 70a640d0.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96b2e73c
    • Stephen Boyd's avatar
      net: ks8851: Don't use regulator_get_optional() · d64eed1d
      Stephen Boyd authored
      We shouldn't be using regulator_get_optional() here. These
      regulators are always present as part of the physical design and
      there isn't any way to use an internal regulator or change the
      source of the reference voltage via software. Given that the only
      users of this driver in the kernel are DT based, this change
      should be transparent to them even if they don't specify any
      supplies because the regulator framework will insert dummy
      supplies as needed.
      
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reviewed-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d64eed1d
    • David S. Miller's avatar
      Merge branch 'filter-next' · c532cea9
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF + test suite updates
      
      These are the last bigger BPF changes that I had in my todo
      queue for now. As the first two patches from this series
      contain additional test cases for the test suite, I have
      rebased them on top of current net-next with the set from [1]
      applied to avoid introducing any unnecessary merge conflicts.
      
      For details, please refer to the individual patches. Test
      suite runs fine with the set applied.
      
       [1] http://patchwork.ozlabs.org/patch/352599/
           http://patchwork.ozlabs.org/patch/352600/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c532cea9
    • Daniel Borkmann's avatar
      net: filter: improve filter block macros · f8f6d679
      Daniel Borkmann authored
      Commit 9739eef1 ("net: filter: make BPF conversion more readable")
      started to introduce helper macros similar to BPF_STMT()/BPF_JUMP()
      macros from classic BPF.
      
      However, quite some statements in the filter conversion functions
      remained in the old style which gives a mixture of block macros and
      non block macros in the code. This patch makes the block macros itself
      more readable by using explicit member initialization, and converts
      the remaining ones where possible to remain in a more consistent state.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8f6d679
    • Daniel Borkmann's avatar
      net: filter: get rid of BPF_S_* enum · 34805931
      Daniel Borkmann authored
      This patch finally allows us to get rid of the BPF_S_* enum.
      Currently, the code performs unnecessary encode and decode
      workarounds in seccomp and filter migration itself when a filter
      is being attached in order to overcome BPF_S_* encoding which
      is not used anymore by the new interpreter resp. JIT compilers.
      
      Keeping it around would mean that also in future we would need
      to extend and maintain this enum and related encoders/decoders.
      We can get rid of all that and save us these operations during
      filter attaching. Naturally, also JIT compilers need to be updated
      by this.
      
      Before JIT conversion is being done, each compiler checks if A
      is being loaded at startup to obtain information if it needs to
      emit instructions to clear A first. Since BPF extensions are a
      subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
      for extensions can be removed at that point. To ease and minimalize
      code changes in the classic JITs, we have introduced bpf_anc_helper().
      
      Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
      arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
      unfortunately didn't have access, but changes are analogous to
      the rest.
      
      Joint work with Alexei Starovoitov.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mircea Gherzan <mgherzan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Acked-by: default avatarChema Gonzalez <chemag@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34805931
    • Daniel Borkmann's avatar
      net: filter: add test for loading SKF_AD_OFF limits · d50bc157
      Daniel Borkmann authored
      This check tests that overloading BPF_LD | BPF_ABS with an
      always invalid BPF extension, that is SKF_AD_MAX, fails to
      make sure classic BPF behaviour is correct in filter checker.
      
      Also, we add a test for loading at packet offset SKF_AD_OFF-1
      which should pass the filter, but later on fail during runtime.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d50bc157
    • Daniel Borkmann's avatar
      net: filter: add slot overlapping test with fully filled M[] · 9fe13baa
      Daniel Borkmann authored
      Also add a test for the scratch memory store that first fills
      all slots and then sucessively reads all of them back adding
      up to A, and eventually returning A. This and the previous
      M[] test with alternating fill/spill will detect possible JIT
      errors on M[].
      Suggested-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fe13baa
    • wangweidong's avatar
      bridge: fix the unbalanced promiscuous count when add_if failed · 019ee792
      wangweidong authored
      As commit 2796d0c6 ("bridge: Automatically manage port
      promiscuous mode."), make the add_if use dev_set_allmulti
      instead of dev_set_promiscuous, so when add_if failed, we
      should do dev_set_allmulti(dev, -1).
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Reviewed-by: default avatarAmos Kong <akong@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      019ee792
    • David S. Miller's avatar
      net: Revert mlx4 cpumask changes. · ee39facb
      David S. Miller authored
      This reverts commit 70a640d0
      ("net/mlx4_en: Use affinity hint") and commit
      c8865b64 ("cpumask: Utility function
      to set n'th cpu - local cpu first") because these changes break
      the build when SMP is disabled amongst other things.
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee39facb
    • Stephen Boyd's avatar
      net: ks8851: Don't use regulator_get_optional() · 2a82e40d
      Stephen Boyd authored
      We shouldn't be using regulator_get_optional() here. These
      regulators are always present as part of the physical design and
      there isn't any way to use an internal regulator or change the
      source of the reference voltage via software. Given that the only
      users of this driver in the kernel are DT based, this change
      should be transparent to them even if they don't specify any
      supplies because the regulator framework will insert dummy
      supplies as needed.
      
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reviewed-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a82e40d
    • David S. Miller's avatar
      Merge branch 'mlx4-next' · b07166b2
      David S. Miller authored
      Amir Vadai says:
      
      ====================
      cpumask,net: Affinity hint helper function
      
      This patchset will set affinity hint to influence IRQs to be allocated on the
      same NUMA node as the one where the card resides. As discussed in
      http://www.spinics.net/lists/netdev/msg271497.html
      
      If the number of IRQs allocated is greater than the number of local NUMA cores, all
      local cores will be used first, and the rest of the IRQs will be on a remote
      NUMA node.
      If no NUMA support - IRQ's and cores will be mapped 1:1
      
      Since the utility function to calculate the mapping could be useful in other mq
      drivers in the kernel, it was added to cpumask.[ch]
      
      This patchset was tested and applied on top of net-next since the first
      consumer is a network device (mlx4_en).  Over commit 506724c4: "tg3: Override
      clock, link aware and link idle mode during NVRAM dump"
      
      I couldn't find a maintainer for cpumask.c, so only added the kernel mailing
      list
      
      Amir
      
      Changes from V5:
      - Moved the utility function from kernel/irq/manage.c to lib/cpumask.c, and
        renamed it's name accordingly to cpumask_set_cpu_local_first()
      - Added some comments as Thomas Gleixner suggested
      - Changed -EINVAL to -EAGAIN, that describes the error situtation better.
      
      Changes from V4:
      - Patch 1/2: irq: Utility function to get affinity_hint by policy
        Thank you Ben for the great review:
        - Moved the function it kernel/irq/manage.c since it could be useful for
          block mq devices
        - Fixed Typo's
        - Use cpumask_t * instead of cpumask_var_t in function header
        - Restructured the function to remove NULL assignment in a cpumask_var_t
        - Fix for offline local CPU's
      
      Changes from V3:
      - Patch 2/2: net/mlx4_en: Use affinity hint
        - somehow patch file was corrupted
      
      Changes from V2:
      - Patch 1/2: net: Utility function to get affinity_hint by policy
        - Fixed style issues
      
      Changes from V1:
      - Patch 1/2: net: Utility function to get affinity_hint by policy
        - Fixed error flow to return -EINVAL on error (thanks govind)
      - Patch 2/2: net/mlx4_en: Use affinity hint
        - Set ring->affinity_hint to NULL on error
      
      Changes from V0:
      - Fixed small style issues
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b07166b2
    • Yuval Atias's avatar
      net/mlx4_en: Use affinity hint · 70a640d0
      Yuval Atias authored
      The “affinity hint” mechanism is used by the user space
      daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
      Irqbalancer can use this hint to balance the irqs between the
      cpus indicated by the mask.
      
      We wish the HCA to preferentially map the IRQs it uses to numa cores
      close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
      sets the affinity hint according the following policy:
      First it maps IRQs to “close” numa cores.  If these are exhausted, the
      remaining IRQs are mapped to “far” numa cores.
      Signed-off-by: default avatarYuval Atias <yuvala@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70a640d0
    • Amir Vadai's avatar
      cpumask: Utility function to set n'th cpu - local cpu first · c8865b64
      Amir Vadai authored
      This function sets the n'th cpu - local cpu's first.
      For example: in a 16 cores server with even cpu's local, will get the
      following values:
      cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
      cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
      ...
      cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
      cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
      cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
      ...
      cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set
      
      Curently this function will be used by multi queue networking devices to
      calculate the irq affinity mask, such that as many local cpu's as
      possible will be utilized to handle the mq device irq's.
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8865b64
  2. 31 May, 2014 10 commits