1. 20 Jan, 2017 17 commits
    • David S. Miller's avatar
      Merge branch 'tipc-multicast-through-replication' · 8d00e202
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: emulate multicast through replication
      
      TIPC multicast messages are currently distributed via L2 broadcast
      or IP multicast to all nodes in the cluster, irrespective of the
      number of real destinations of the message.
      
      In this series we introduce an option to transport messages via
      replication ("replicast") across a selected number of unicast links,
      instead of relying on the underlying media. This option is used when
      true broadcast/multicast is not supported by the media, or when the
      number of true destinations is much smaller than the cluster size.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d00e202
    • Jon Paul Maloy's avatar
      tipc: make replicast a user selectable option · 01fd12bb
      Jon Paul Maloy authored
      If the bearer carrying multicast messages supports broadcast, those
      messages will be sent to all cluster nodes, irrespective of whether
      these nodes host any actual destinations socket or not. This is clearly
      wasteful if the cluster is large and there are only a few real
      destinations for the message being sent.
      
      In this commit we extend the eligibility of the newly introduced
      "replicast" transmit option. We now make it possible for a user to
      select which method he wants to be used, either as a mandatory setting
      via setsockopt(), or as a relative setting where we let the broadcast
      layer decide which method to use based on the ratio between cluster
      size and the message's actual number of destination nodes.
      
      In the latter case, a sending socket must stick to a previously
      selected method until it enters an idle period of at least 5 seconds.
      This eliminates the risk of message reordering caused by method change,
      i.e., when changes to cluster size or number of destinations would
      otherwise mandate a new method to be used.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01fd12bb
    • Jon Paul Maloy's avatar
      tipc: introduce replicast as transport option for multicast · a853e4c6
      Jon Paul Maloy authored
      TIPC multicast messages are currently carried over a reliable
      'broadcast link', making use of the underlying media's ability to
      transport packets as L2 broadcast or IP multicast to all nodes in
      the cluster.
      
      When the used bearer is lacking that ability, we can instead emulate
      the broadcast service by replicating and sending the packets over as
      many unicast links as needed to reach all identified destinations.
      We now introduce a new TIPC link-level 'replicast' service that does
      this.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a853e4c6
    • Jon Paul Maloy's avatar
      tipc: add functionality to lookup multicast destination nodes · 2ae0b8af
      Jon Paul Maloy authored
      As a further preparation for the upcoming 'replicast' functionality,
      we add some necessary structs and functions for looking up and returning
      a list of all nodes that host destinations for a given multicast message.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ae0b8af
    • Jon Paul Maloy's avatar
      tipc: add function for checking broadcast support in bearer · 9999974a
      Jon Paul Maloy authored
      As a preparation for the 'replicast' functionality we are going to
      introduce in the next commits, we need the broadcast base structure to
      store whether bearer broadcast is available at all from the currently
      used bearer or bearers.
      
      We do this by adding a new function tipc_bearer_bcast_support() to
      the bearer layer, and letting the bearer selection function in
      bcast.c use this to give a new boolean field, 'bcast_support' the
      appropriate value.
      Reviewed-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9999974a
    • Gianluca Borello's avatar
      bpf: add bpf_probe_read_str helper · a5e8c070
      Gianluca Borello authored
      Provide a simple helper with the same semantics of strncpy_from_unsafe():
      
      int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr)
      
      This gives more flexibility to a bpf program. A typical use case is
      intercepting a file name during sys_open(). The current approach is:
      
      SEC("kprobe/sys_open")
      void bpf_sys_open(struct pt_regs *ctx)
      {
      	char buf[PATHLEN]; // PATHLEN is defined to 256
      	bpf_probe_read(buf, sizeof(buf), ctx->di);
      
      	/* consume buf */
      }
      
      This is suboptimal because the size of the string needs to be estimated
      at compile time, causing more memory to be copied than often necessary,
      and can become more problematic if further processing on buf is done,
      for example by pushing it to userspace via bpf_perf_event_output(),
      since the real length of the string is unknown and the entire buffer
      must be copied (and defining an unrolled strnlen() inside the bpf
      program is a very inefficient and unfeasible approach).
      
      With the new helper, the code can easily operate on the actual string
      length rather than the buffer size:
      
      SEC("kprobe/sys_open")
      void bpf_sys_open(struct pt_regs *ctx)
      {
      	char buf[PATHLEN]; // PATHLEN is defined to 256
      	int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di);
      
      	/* consume buf, for example push it to userspace via
      	 * bpf_perf_event_output(), but this time we can use
      	 * res (the string length) as event size, after checking
      	 * its boundaries.
      	 */
      }
      
      Another useful use case is when parsing individual process arguments or
      individual environment variables navigating current->mm->arg_start and
      current->mm->env_start: using this helper and the return value, one can
      quickly iterate at the right offset of the memory area.
      
      The code changes simply leverage the already existent
      strncpy_from_unsafe() kernel function, which is safe to be called from a
      bpf program as it is used in bpf_trace_printk().
      Signed-off-by: default avatarGianluca Borello <g.borello@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5e8c070
    • David S. Miller's avatar
      Merge branch 'bus-agnostic-num-vf' · 07604628
      David S. Miller authored
      Phil Sutter says:
      
      ====================
      Retrieve number of VFs in a bus-agnostic way
      
      Previously, it was assumed that only PCI NICs would be capable of having
      virtual functions - with my proposed enhancement of dummy NIC driver
      implementing (fake) ones for testing purposes, this is no longer true.
      
      Discussion of said patch has led to the suggestion of implementing a
      bus-agnostic method for VF count retrieval so rtnetlink could work with
      both real VF-capable PCI NICs as well as my dummy modifications without
      introducing ugly hacks.
      
      The following series tries to achieve just that by introducing a bus
      type callback to retrieve a device's number of VFs, implementing this
      callback for PCI bus and finally adjusting rtnetlink to make use of the
      generalized infrastructure.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07604628
    • Phil Sutter's avatar
      device: Implement a bus agnostic dev_num_vf routine · 9af15c38
      Phil Sutter authored
      Now that pci_bus_type has num_vf callback set, dev_num_vf can be
      implemented in a bus type independent way and the check for whether a
      PCI device is being handled in rtnetlink can be dropped.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9af15c38
    • Phil Sutter's avatar
      PCI: implement num_vf bus type callback · 02e0bea6
      Phil Sutter authored
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02e0bea6
    • Phil Sutter's avatar
      device: bus_type: Introduce num_vf callback · 582a686f
      Phil Sutter authored
      This allows for bus types to implement their own method of retrieving
      the number of virtual functions a NIC on that type of bus supports.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      582a686f
    • Geliang Tang's avatar
      sock: use hlist_entry_safe · 6c59ebd3
      Geliang Tang authored
      Use hlist_entry_safe() instead of open-coding it.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c59ebd3
    • Jakub Sitnicki's avatar
      gre6: Clean up unused struct ipv6_tel_txoption definition · c10aa71b
      Jakub Sitnicki authored
      Commit b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE
      functions") removed the ip6gre specific transmit function, but left the
      struct ipv6_tel_txoption definition. Clean it up.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c10aa71b
    • Eric Dumazet's avatar
      net: remove bh disabling around percpu_counter accesses · c2a2efbb
      Eric Dumazet authored
      Shaohua Li made percpu_counter irq safe in commit 098faf58
      ("percpu_counter: make APIs irq safe")
      
      We can safely remove BH disable/enable sections around various
      percpu_counter manipulations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2a2efbb
    • Arnd Bergmann's avatar
      cxgb4: hide unused warnings · 0a327889
      Arnd Bergmann authored
      The two new variables are only used inside of an #ifdef and cause
      harmless warnings when that is disabled:
      
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one':
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:9: error: unused variable 'port_vec' [-Werror=unused-variable]
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:6: error: unused variable 'v' [-Werror=unused-variable]
      
      This adds another #ifdef around the declarations.
      
      Fixes: 96fe11f2 ("cxgb4: Implement ndo_get_phys_port_id for mgmt dev")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a327889
    • David Ahern's avatar
      net: ipv6: Keep nexthop of multipath route on admin down · a1a22c12
      David Ahern authored
      IPv6 deletes route entries associated with multipath routes on an
      admin down where IPv4 does not. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24 metric 64
                  nexthop via 10.100.1.254  dev eth1 weight 1
                  nexthop via 10.100.2.254  dev eth2 weight 1
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Set link down:
          $ ip li set eth1 down
      
      IPv4 retains the multihop route but flags eth1 route as dead:
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
                  nexthop via 10.100.1.16  dev eth1 weight 1 dead linkdown
                  nexthop via 10.100.2.16  dev eth2 weight 1
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
      and IPv6 deletes the route as part of flushing all routes for the device:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Worse, on admin up of the device the multipath route has to be deleted
      to get this leg of the route re-added.
      
      This patch keeps routes that are part of a multipath route if
      ignore_routes_with_linkdown is set with the dead and linkdown flags
      enabling consistency between IPv4 and IPv6:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a22c12
    • Eric Dumazet's avatar
      mlx4: support __GFP_MEMALLOC for rx · dceeab0e
      Eric Dumazet authored
      Commit 04aeb56a ("net/mlx4_en: allocate non 0-order pages for RX
      ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
      that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.
      
      As using memory reserves is a must in some situations (swap over NFS or
      iSCSI), this patch adds this flag.
      
      Note that this driver does not reuse pages (yet) so we do not have to
      add anything else.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dceeab0e
    • Timur Tabi's avatar
      Revert "net: qcom/emac: configure the external phy to allow pause frames" · 8a43c052
      Timur Tabi authored
      This reverts commit 3e884493.
      
      With commit 529ed127 ("net: phy: phy drivers should not set
      SUPPORTED_[Asym_]Pause"), phylib now handles automatically enabling
      pause frame support in the PHY, and the MAC driver should follow suit.
      
      Since the EMAC driver driver does this,  we no longer need to force
      pause frames support.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a43c052
  2. 19 Jan, 2017 4 commits
  3. 18 Jan, 2017 19 commits