1. 19 Mar, 2015 16 commits
  2. 18 Mar, 2015 24 commits
    • David S. Miller's avatar
      Merge branch 'txq_max_rate' · 8f6320de
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Add max rate TXQ attribute
      
      Add the ability to set a max-rate limitation for TX queues.
      The attribute name is maxrate and the units are Mbs, to make
      it similar to the existing max-rate limitation knobs (ETS and
      SRIOV ndo calls).
      
      changes from V2:
        - added Documentation (thanks Florian and Tom)
        - rebased to latest net-next to comply with the swdev ndo removal
        - addressed more feedback from Dave on the comments style
      
      changes from V1:
        - addressed feedback from Dave
      
      changes from V0:
        - addressed feedback from Sergei
      
      John Fastabend (1):
        net: Add max rate tx queue attribute
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f6320de
    • Or Gerlitz's avatar
      net/mlx4_en: Add tx queue maxrate support · c10e4fc6
      Or Gerlitz authored
      Add ndo_set_tx_maxrate support.
      
      To support per tx queue maxrate limit, we use the update-qp firmware
      command to do run-time rate setting for the qp that serves this tx ring.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c10e4fc6
    • Or Gerlitz's avatar
      net/mlx4_core: Add basic support for QP max-rate limiting · fc31e256
      Or Gerlitz authored
      Add the low-level device commands and definitions used for QP max-rate limiting.
      
      This is done through the following elements:
      
        - read rate-limit device caps in QUERY_DEV_CAP: number of different
          rates and the min/max rates in Kbs/Mbs/Gbs units
      
        - enhance the QP context struct to contain rate limit units and value
      
        - allow to do run time rate-limit setting to QPs through the
          update-qp firmware command
      
        - QP rate-limiting is disallowed for VFs
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc31e256
    • John Fastabend's avatar
      net: Add max rate tx queue attribute · 822b3b2e
      John Fastabend authored
      This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
      for max-rate limiting. Along with DCB-ETS and BQL this provides another
      knob to tune queue performance. The limit units are Mbps.
      
      By default it is disabled. To disable the rate limitation after it
      has been set for a queue, it should be set to zero.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      822b3b2e
    • David S. Miller's avatar
      Merge branch 'rhashtable_remove_shift' · b65885d2
      David S. Miller authored
      Herbert Xu says:
      
      ====================
      rhashtable: Kill redundant shift parameter
      
      I was trying to squeeze bucket_table->rehash in by downsizing
      bucket_table->size, only to find that my spot had been taken
      over by bucket_table->shift.  These patches kill shift and makes
      me feel better :)
      
      v2 corrects the typo in the test_rhashtable changelog and also
      notes the min_shift parameter in the tipc patch changelog.
      ====================
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b65885d2
    • Herbert Xu's avatar
      rhashtable: Remove max_shift and min_shift · e2e21c1c
      Herbert Xu authored
      Now that nobody uses max_shift and min_shift, we can safely remove
      them.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2e21c1c
    • Herbert Xu's avatar
      test_rhashtable: Use rhashtable max_size instead of max_shift · 4f509df4
      Herbert Xu authored
      This patch converts test_rhashtable to use rhashtable max_size
      instead of the obsolete max_shift.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f509df4
    • Herbert Xu's avatar
      tipc: Use rhashtable max/min_size instead of max/min_shift · 446c89ac
      Herbert Xu authored
      This patch converts tipc to use rhashtable max/min_size instead of
      the obsolete max/min_shift.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      446c89ac
    • Herbert Xu's avatar
      netlink: Use rhashtable max_size instead of max_shift · b06eee59
      Herbert Xu authored
      This patch converts netlink to use rhashtable max_size instead
      of the obsolete max_shift.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b06eee59
    • Herbert Xu's avatar
      rhashtable: Introduce max_size/min_size · c2e213cf
      Herbert Xu authored
      This patch adds the parameters max_size and min_size which are
      meant to replace max_shift and min_shift.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2e213cf
    • Herbert Xu's avatar
      rhashtable: Remove shift from bucket_table · 6aebd940
      Herbert Xu authored
      Keeping both size and shift is silly.  We only need one.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6aebd940
    • David S. Miller's avatar
      Merge branch 'xgene-next' · a61bfa65
      David S. Miller authored
      Keyur Chudgar says:
      
      ====================
      drivers: net: xgene: Add second SGMII based 1G interface
      
      This patch adds support for second SGMII based 1G interface.
      ====================
      Signed-off-by: default avatarKeyur Chudgar <kchudgar@apm.com>
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a61bfa65
    • Keyur Chudgar's avatar
      drivers: net: xgene: Add second SGMII based 1G interface · ca626454
      Keyur Chudgar authored
      - Added resource initialization based on port-id field
      - Enabled second SGMII 1G interface
      Signed-off-by: default avatarKeyur Chudgar <kchudgar@apm.com>
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca626454
    • Keyur Chudgar's avatar
      dtb: xgene: Add second SGMII based 1G interface node · 2d33394e
      Keyur Chudgar authored
      - Added new SGMII node for port 1
      - Added port-id field
      Signed-off-by: default avatarKeyur Chudgar <kchudgar@apm.com>
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d33394e
    • Keyur Chudgar's avatar
    • David S. Miller's avatar
      Merge branch 'tipc_netns_leak' · e7a9eee5
      David S. Miller authored
      Ying Xue says:
      
      ====================
      tipc: fix netns refcnt leak
      
      The series aims to eliminate the issue of netns refcount leak. But
      during fixing it, another two additional problems are found. So all
      of known issues associated with the netns refcnt leak are resolved
      at the same time in the patchset.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7a9eee5
    • Ying Xue's avatar
      tipc: withdraw tipc topology server name when namespace is deleted · 2b9bb7f3
      Ying Xue authored
      The TIPC topology server is a per namespace service associated with the
      tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
      before we call sk_release_kernel because the kernel socket release is
      done in init_net and trying to withdraw a TIPC name published in another
      namespace will fail with an error as:
      
      [  170.093264] Unable to remove local publication
      [  170.093264] (type=1, lower=1, ref=2184244004, key=2184244005)
      
      We fix this by breaking the association between the topology server name
      and socket before calling sk_release_kernel.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b9bb7f3
    • Ying Xue's avatar
      tipc: fix a potential deadlock when nametable is purged · 8460504b
      Ying Xue authored
      [   28.531768] =============================================
      [   28.532322] [ INFO: possible recursive locking detected ]
      [   28.532322] 3.19.0+ #194 Not tainted
      [   28.532322] ---------------------------------------------
      [   28.532322] insmod/583 is trying to acquire lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]
      [   28.532322] but task is already holding lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] other info that might help us debug this:
      [   28.532322]  Possible unsafe locking scenario:
      [   28.532322]
      [   28.532322]        CPU0
      [   28.532322]        ----
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]
      [   28.532322]  *** DEADLOCK ***
      [   28.532322]
      [   28.532322]  May be due to missing lock nesting notation
      [   28.532322]
      [   28.532322] 3 locks held by insmod/583:
      [   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
      [   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
      [   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] stack backtrace:
      [   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
      [   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
      [   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
      [   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
      [   28.532322] Call Trace:
      [   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
      [   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
      [   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
      [   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
      [   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
      [   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
      [   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
      [   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
      [   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
      [   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
      [   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
      [   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
      [   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
      [   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
      [   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
      [   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
      [   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
      [   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
      [   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
      [   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
      
      Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
      remove a publication with a name sequence, the name sequence's lock
      is held. However, when tipc_nametbl_remove_publ() calling
      tipc_nameseq_remove_publ() to remove the publication, it first tries
      to query name sequence instance with the publication, and then holds
      the lock of the found name sequence. But as the lock may be already
      taken in tipc_purge_publications(), deadlock happens like above
      scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
      sequence's lock, the deadlock can be avoided if it's directly invoked
      by tipc_purge_publications().
      
      Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8460504b
    • Ying Xue's avatar
      tipc: fix netns refcnt leak · 76100a8a
      Ying Xue authored
      When the TIPC module is loaded, we launch a topology server in kernel
      space, which in its turn is creating TIPC sockets for communication
      with topology server users. Because both the socket's creator and
      provider reside in the same module, it is necessary that the TIPC
      module's reference count remains zero after the server is started and
      the socket created; otherwise it becomes impossible to perform "rmmod"
      even on an idle module.
      
      Currently, we achieve this by defining a separate "tipc_proto_kern"
      protocol struct, that is used only for kernel space socket allocations.
      This structure has the "owner" field set to NULL, which restricts the
      module reference count from being be bumped when sk_alloc() for local
      sockets is called. Furthermore, we have defined three kernel-specific
      functions, tipc_sock_create_local(), tipc_sock_release_local() and
      tipc_sock_accept_local(), to avoid the module counter being modified
      when module local sockets are created or deleted. This has worked well
      until we introduced name space support.
      
      However, after name space support was introduced, we have observed that
      a reference count leak occurs, because the netns counter is not
      decremented in tipc_sock_delete_local().
      
      This commit remedies this problem. But instead of just modifying
      tipc_sock_delete_local(), we eliminate the whole parallel socket
      handling infrastructure, and start using the regular sk_create_kern(),
      kernel_accept() and sk_release_kernel() calls. Since those functions
      manipulate the module counter, we must now compensate for that by
      explicitly decrementing the counter after module local sockets are
      created, and increment it just before calling sk_release_kernel().
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reported-by: default avatarCong Wang <cwang@twopensource.com>
      Tested-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76100a8a
    • David S. Miller's avatar
      Merge branch 'listener_refactor_part_12' · 52841430
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: tcp listener refactoring, part 12
      
      By adding a pointer back to listener, we are preparing synack rtx
      handling to no longer be governed by listener keepalive timer,
      as this is the most problematic source of contention on listener
      spinlock. Note that TCP FastOpen had such pointer anyway, so we
      make it generic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52841430
    • Eric Dumazet's avatar
      inet: fix request sock refcounting · 0470c8ca
      Eric Dumazet authored
      While testing last patch series, I found req sock refcounting was wrong.
      
      We must set skc_refcnt to 1 for all request socks added in hashes,
      but also on request sockets created by FastOpen or syncookies.
      
      It is tricky because we need to defer this initialization so that
      future RCU lookups do not try to take a refcount on a not yet
      fully initialized request socket.
      
      Also get rid of ireq_refcnt alias.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 13854e5a ("inet: add proper refcounting to request sock")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0470c8ca
    • Eric Dumazet's avatar
      inet: avoid fastopen lock for regular accept() · e3d95ad7
      Eric Dumazet authored
      It is not because a TCP listener is FastOpen ready that
      all incoming sockets actually used FastOpen.
      
      Avoid taking queue->fastopenq->lock if not needed.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3d95ad7
    • Eric Dumazet's avatar
      tcp: rename struct tcp_request_sock listener · 9439ce00
      Eric Dumazet authored
      The listener field in struct tcp_request_sock is a pointer
      back to the listener. We now have req->rsk_listener, so TCP
      only needs one boolean and not a full pointer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9439ce00
    • Eric Dumazet's avatar
      inet: add rsk_listener field to struct request_sock · 4e9a578e
      Eric Dumazet authored
      Once we'll be able to lookup request sockets in ehash table,
      we'll need to get access to listener which created this request.
      
      This avoid doing a lookup to find the listener, which benefits
      for a more solid SO_REUSEPORT, and is needed once we no
      longer queue request sock into a listener private queue.
      
      Note that 'struct tcp_request_sock'->listener could be reduced
      to a single bit, as TFO listener should match req->rsk_listener.
      TFO will no longer need to hold a reference on the listener.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e9a578e