1. 11 Mar, 2014 15 commits
    • Eric Dumazet's avatar
      tcp: tcp_release_cb() should release socket ownership · c3f9b018
      Eric Dumazet authored
      Lars Persson reported following deadlock :
      
      -000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
      -001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
      -002 |ip_local_deliver_finish(skb = 0x8BD527A0)
      -003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
      -004 |netif_receive_skb(skb = 0x8BD527A0)
      -005 |elk_poll(napi = 0x8C770500, budget = 64)
      -006 |net_rx_action(?)
      -007 |__do_softirq()
      -008 |do_softirq()
      -009 |local_bh_enable()
      -010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
      -011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
      -012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
      -013 |tcp_release_cb(sk = 0x8BE6B2A0)
      -014 |release_sock(sk = 0x8BE6B2A0)
      -015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
      -016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
      -017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
      -018 |smb_send_kvec()
      -019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
      -020 |cifs_call_async()
      -021 |cifs_async_writev(wdata = 0x87FD6580)
      -022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
      -023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
      -024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
      -025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
      -026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
      -027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
      -028 |bdi_writeback_workfn(work = 0x87E4A9CC)
      -029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
      -030 |worker_thread(__worker = 0x8B045880)
      -031 |kthread(_create = 0x87CADD90)
      -032 |ret_from_kernel_thread(asm)
      
      Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
      it is running from softirq context.
      
      Lars trace involved a NIC without RX checksum support but other points
      are problematic as well, like the prequeue stuff.
      
      Problem is triggered by a timer, that found socket being owned by user.
      
      tcp_release_cb() should call tcp_write_timer_handler() or
      tcp_delack_timer_handler() in the appropriate context :
      
      BH disabled and socket lock held, but 'owned' field cleared,
      as if they were running from timer handlers.
      
      Fixes: 6f458dfb ("tcp: improve latencies of timer triggered events")
      Reported-by: default avatarLars Persson <lars.persson@axis.com>
      Tested-by: default avatarLars Persson <lars.persson@axis.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3f9b018
    • David S. Miller's avatar
      Merge branch 'skb_frags' · c7b76f85
      David S. Miller authored
      Michael S. Tsirkin says:
      
      ====================
      skbuff: fix skb_segment with zero copy skbs
      
      This fixes a bug in skb_segment where it moves frags
      between skbs without orphaning them.
      This causes userspace to assume it's safe to
      reuse the buffer, and receiver gets corrupted data.
      This further might leak information from the
      transmitter on the wire.
      
      To fix track which skb does a copied frag belong
      to, and orphan frags when copying them.
      
      As we are tracking multiple skbs here, using
      short names (skb,nskb,fskb,skb_frag,frag) becomes confusing.
      So before adding another one, I refactor these names
      slightly.
      
      Patch is split out to make it easier to
      verify that all trasformations are trivially correct.
      
      The problem was observed in the field,
      so I think that the patch is necessary on stable
      as well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c7b76f85
    • Michael S. Tsirkin's avatar
      skbuff: skb_segment: orphan frags before copying · 1fd819ec
      Michael S. Tsirkin authored
      skb_segment copies frags around, so we need
      to copy them carefully to avoid accessing
      user memory after reporting completion to userspace
      through a callback.
      
      skb_segment doesn't normally happen on datapath:
      TSO needs to be disabled - so disabling zero copy
      in this case does not look like a big deal.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fd819ec
    • Michael S. Tsirkin's avatar
      skbuff: skb_segment: s/fskb/list_skb/ · 1a4cedaf
      Michael S. Tsirkin authored
      fskb is unrelated to frag: it's coming from
      frag_list. Rename it list_skb to avoid confusion.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a4cedaf
    • Michael S. Tsirkin's avatar
      skbuff: skb_segment: s/skb/head_skb/ · df5771ff
      Michael S. Tsirkin authored
      rename local variable to make it easier to tell at a glance that we are
      dealing with a head skb.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df5771ff
    • Michael S. Tsirkin's avatar
      skbuff: skb_segment: s/skb_frag/frag/ · 4e1beba1
      Michael S. Tsirkin authored
      skb_frag can in fact point at either skb
      or fskb so rename it generally "frag".
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e1beba1
    • Michael S. Tsirkin's avatar
      skbuff: skb_segment: s/frag/nskb_frag/ · 8cb19905
      Michael S. Tsirkin authored
      frag points at nskb, so name it appropriately
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cb19905
    • David S. Miller's avatar
      Merge branch 'stmmac' · 9d79b3c7
      David S. Miller authored
      Giuseppe Cavallaro says:
      
      ====================
      stmmac fixes: EEE and chained mode
      
      These patches are to fix some new problems in the STMMAC driver.
      
      Mandatory changes are for EEE that needs to be disabled if not supported
      and for the chain mode that is broken and the kernel panics if this mode
      is enabled.
      
      v3: removed a patch from my previous set that touched the stmmac_tx path
          that has not to be applied. Other patches for cleaning-up will be
          sent on top of net-next git repo.
      
      v4: do not surround the defaul buffer selection using Koption and adopt
          a default to 1536bytes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d79b3c7
    • Giuseppe CAVALLARO's avatar
      stmmac: dwmac-sti: fix broken STiD127 compatibility · c5e9103d
      Giuseppe CAVALLARO authored
      This is to fix the compatibility to the STiD127 SoC.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5e9103d
    • Giuseppe CAVALLARO's avatar
      stmmac: fix chained mode · 29896a67
      Giuseppe CAVALLARO authored
      This patch is to fix the chain mode that was broken
      and generated a panic. This patch reviews the chain/ring
      modes now shaing the same structure and taking care
      about the pointers and callbacks.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29896a67
    • Giuseppe CAVALLARO's avatar
      stmmac: fix and better tune the default buffer sizes · d916701c
      Giuseppe CAVALLARO authored
      This patch is to fix and tune the default buffer sizes.
      It reduces the default bufsize used by the driver from
      4KiB to 1536 bytes.
      
      Patch has been tested on both ARM and SH4 platform based.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d916701c
    • Giuseppe CAVALLARO's avatar
      stmmac: disable at run-time the EEE if not supported · 83bf79b6
      Giuseppe CAVALLARO authored
      This patch is to disable the EEE (so HW and timers)
      for example when the phy communicates that the EEE
      can be supported anymore.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83bf79b6
    • Neil Horman's avatar
      vmxnet3: fix netpoll race condition · d25f06ea
      Neil Horman authored
      vmxnet3's netpoll driver is incorrectly coded.  It directly calls
      vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
      controller method doesn't block real napi polls in any way, there is a potential
      for race conditions in which the netpoll controller method and the napi poll
      method run concurrently.  The result is data corruption causing panics such as this
      one recently observed:
      PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
       #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
       #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
       #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
       #3 [ffff88023abd58e0] die at ffffffff81010e0b
       #4 [ffff88023abd5910] do_trap at ffffffff8152add4
       #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
       #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
          [exception RIP: vmxnet3_rq_rx_complete+1968]
          RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
          RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
          RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
          RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
          R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
          R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
       #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
       #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7
      
      The fix is to do as other drivers do, and have the poll controller call the top
      half interrupt handler, which schedules a napi poll properly to recieve frames
      
      Tested by myself, successfully.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Shreyas Bhatewara <sbhatewara@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarShreyas N Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d25f06ea
    • Peter Boström's avatar
      vlan: Set correct source MAC address with TX VLAN offload enabled · dd38743b
      Peter Boström authored
      With TX VLAN offload enabled the source MAC address for frames sent using the
      VLAN interface is currently set to the address of the real interface. This is
      wrong since the VLAN interface may be configured with a different address.
      
      The bug was introduced in commit 2205369a
      ("vlan: Fix header ops passthru when doing TX VLAN offload.").
      
      This patch sets the source address before calling the create function of the
      real interface.
      Signed-off-by: default avatarPeter Boström <peter.bostrom@netrounds.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd38743b
    • Annie Li's avatar
      Xen-netback: Fix issue caused by using gso_type wrongly · 5bd07670
      Annie Li authored
      Current netback uses gso_type to check whether the skb contains
      gso offload, and this is wrong. Gso_size is the right one to
      check gso existence, and gso_type is only used to check gso type.
      
      Some skbs contains nonzero gso_type and zero gso_size, current
      netback would treat these skbs as gso and create wrong response
      for this. This also causes ssh failure to domu from other server.
      
      V2: use skb_is_gso function as Paul Durrant suggested
      Signed-off-by: default avatarAnnie Li <annie.li@oracle.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Reviewed-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bd07670
  2. 10 Mar, 2014 4 commits
  3. 09 Mar, 2014 1 commit
    • Michael Chan's avatar
      bnx2: Fix shutdown sequence · a8d9bc2e
      Michael Chan authored
      The pci shutdown handler added in:
      
          bnx2: Add pci shutdown handler
          commit 25bfb1dd
      
      created a shutdown down sequence without chip reset if the device was
      never brought up.  This can cause the firmware to shutdown the PHY
      prematurely and cause MMIO read cycles to be unresponsive.  On some
      systems, it may generate NMI in the bnx2's pci shutdown handler.
      
      The fix is to tell the firmware not to shutdown the PHY if there was
      no prior chip reset.
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d9bc2e
  4. 07 Mar, 2014 1 commit
  5. 06 Mar, 2014 19 commits
    • Sabrina Dubroca's avatar
      ipv6: don't set DST_NOCOUNT for remotely added routes · c88507fb
      Sabrina Dubroca authored
      DST_NOCOUNT should only be used if an authorized user adds routes
      locally. In case of routes which are added on behalf of router
      advertisments this flag must not get used as it allows an unlimited
      number of routes getting added remotely.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88507fb
    • Amir Vadai's avatar
      net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready · 97989356
      Amir Vadai authored
      Currently, the PF call to pci_enable_sriov from the PF probe function
      stalls for 10 seconds times the number of VFs probed on the host. This
      happens because the way for such VFs to determine of the PF
      initialization finished, is by attempting to issue reset on the
      comm-channel and get timeout (after 10s).
      
      The PF probe function is called from a kenernel workqueue, and therefore
      during that time, rcu lock is being held and kernel's workqueue is
      stalled. This blocks other processes that try to use the workqueue
      or rcu lock.  For example, interface renaming which is calling
      rcu_synchronize is blocked, and timedout by systemd.
      
      Changed mlx4_init_slave() to allow VF probed on the host to immediatly
      detect that the PF is not ready, and return EPROBE_DEFER instantly.
      
      Only when the PF finishes the initialization, allow such VFs to
      access the comm channel.
      
      This issue and fix are relevant only for probed VFs on the hypervisor,
      there is no way to pass this information to a VM until comm channel is
      ready, so in a VM, if PF is not ready, the first command will be timedout
      after 10 seconds and return EPROBE_DEFER.
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97989356
    • Amir Vadai's avatar
      net/mlx4_core: Fix memory access error in mlx4_QUERY_DEV_CAP_wrapper() · 57352ef4
      Amir Vadai authored
      Fix a regression introduced by [1]. outbox was accessed instead of
      outbox->buf. Typo was copy-pasted to [2] and [3].
      
      [1] - cc1ade9 mlx4_core: Disable memory windows for virtual functions
      [2] - 4de65803 mlx4_core: Add support for steerable IB UD QPs
      [3] - 7ffdf726 net/mlx4_core: Add basic support for TCP/IP offloads under
            tunneling
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57352ef4
    • Sasha Levin's avatar
      bonding: correctly handle out of range parameters for lp_interval · 5bd4e4c1
      Sasha Levin authored
      We didn't correctly check cases where the value for lp_interval is not
      within the legal range due to a missing table terminator.
      
      This would let userspace trigger a kernel panic by specifying a value out
      of range:
      
      	echo -1 > /sys/devices/virtual/net/bond0/bonding/lp_interval
      
      Introduced by commit 4325b374 ("bonding: convert lp_interval to use
      the new option API").
      Acked-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bd4e4c1
    • Anton Nayshtut's avatar
      ipv6: Fix exthdrs offload registration. · d2d273ff
      Anton Nayshtut authored
      Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
      offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).
      
      This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
      header.
      
      The issue detected and the fix verified by running MS HCK Offload LSO test on
      top of QEMU Windows guests, as this test sends IPv6 packets with
      IPPROTO_DSTOPTS.
      Signed-off-by: default avatarAnton Nayshtut <anton@swortex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2d273ff
    • Anton Blanchard's avatar
      ibmveth: Fix endian issues with MAC addresses · d746ca95
      Anton Blanchard authored
      The code to load a MAC address into a u64 for passing to the
      hypervisor via a register is broken on little endian.
      
      Create a helper function called ibmveth_encode_mac_addr
      which does the right thing in both big and little endian.
      
      We were storing the MAC address in a long in struct ibmveth_adapter.
      It's never used so remove it - we don't need another place in the
      driver where we create endian issues with MAC addresses.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d746ca95
    • Anton Blanchard's avatar
      net: unix socket code abuses csum_partial · 0a13404d
      Anton Blanchard authored
      The unix socket code is using the result of csum_partial to
      hash into a lookup table:
      
      	unix_hash_fold(csum_partial(sunaddr, len, 0));
      
      csum_partial is only guaranteed to produce something that can be
      folded into a checksum, as its prototype explains:
      
       * returns a 32-bit number suitable for feeding into itself
       * or csum_tcpudp_magic
      
      The 32bit value should not be used directly.
      
      Depending on the alignment, the ppc64 csum_partial will return
      different 32bit partial checksums that will fold into the same
      16bit checksum.
      
      This difference causes the following testcase (courtesy of
      Gustavo) to sometimes fail:
      
      #include <sys/socket.h>
      #include <stdio.h>
      
      int main()
      {
      	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	int i = 1;
      	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
      
      	struct sockaddr addr;
      	addr.sa_family = AF_LOCAL;
      	bind(fd, &addr, 2);
      
      	listen(fd, 128);
      
      	struct sockaddr_storage ss;
      	socklen_t sslen = (socklen_t)sizeof(ss);
      	getsockname(fd, (struct sockaddr*)&ss, &sslen);
      
      	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
      		perror(NULL);
      		return 1;
      	}
      	printf("OK\n");
      	return 0;
      }
      
      As suggested by davem, fix this by using csum_fold to fold the
      partial 32bit checksum into a 16bit checksum before using it.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a13404d
    • Andrew Lutomirski's avatar
      net: Improve SO_TIMESTAMPING documentation and fix a minor code bug · adca4767
      Andrew Lutomirski authored
      The original documentation was very unclear.
      
      The code fix is presumably related to the formerly unclear
      documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
      __sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
      from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
      set is pointless.  This should have no user-observable effect.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adca4767
    • Bjorn Helgaas's avatar
      phy: fix compiler array bounds warning on settings[] · 4ae6e50c
      Bjorn Helgaas authored
      With -Werror=array-bounds, gcc v4.7.x warns that in phy_find_valid(), the
      settings[] "array subscript is above array bounds", I think because idx is
      a signed integer and if the caller supplied idx < 0, we pass the guard but
      still reference out of bounds.
      
      Fix this by making idx unsigned here and elsewhere.
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ae6e50c
    • Florian Westphal's avatar
      inet: frag: make sure forced eviction removes all frags · e588e2f2
      Florian Westphal authored
      Quoting Alexander Aring:
        While fragmentation and unloading of 6lowpan module I got this kernel Oops
        after few seconds:
      
        BUG: unable to handle kernel paging request at f88bbc30
        [..]
        Modules linked in: ipv6 [last unloaded: 6lowpan]
        Call Trace:
         [<c012af4c>] ? call_timer_fn+0x54/0xb3
         [<c012aef8>] ? process_timeout+0xa/0xa
         [<c012b66b>] run_timer_softirq+0x140/0x15f
      
      Problem is that incomplete frags are still around after unload; when
      their frag expire timer fires, we get crash.
      
      When a netns is removed (also done when unloading module), inet_frag
      calls the evictor with 'force' argument to purge remaining frags.
      
      The evictor loop terminates when accounted memory ('work') drops to 0
      or the lru-list becomes empty.  However, the mem accounting is done
      via percpu counters and may not be accurate, i.e. loop may terminate
      prematurely.
      
      Alter evictor to only stop once the lru list is empty when force is
      requested.
      Reported-by: default avatarPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Reported-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Tested-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e588e2f2
    • David S. Miller's avatar
      Merge branch 'tipc' · 409e1456
      David S. Miller authored
      Eric Hugne says:
      
      ====================
      tipc: refcount and memory leak fixes
      
      v3: Remove error logging from data path completely. Rebased on top of
          latest net merge.
      
      v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection
          shutdown callback to be invoked in advance) And add a general error
          message if an internal server tries to send a message on a
          closed/nonexisting connection.
      
      In addition to the fix for refcount leak and memory leak during
      module removal, we also fix a problem where the topology server
      listening socket where unexpectedly closed. We also eliminate an
      unnecessary context switch during accept()/recvmsg() for nonblocking
      sockets.
      
      It might be good to include this patchset in stable aswell. After the
      v3 rebase on latest merge from net all patches apply cleanly on that
      tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      409e1456
    • Erik Hugne's avatar
      tipc: don't log disabled tasklet handler errors · 2892505e
      Erik Hugne authored
      Failure to schedule a TIPC tasklet with tipc_k_signal because the
      tasklet handler is disabled is not an error. It means TIPC is
      currently in the process of shutting down. We remove the error
      logging in this case.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2892505e
    • Erik Hugne's avatar
      tipc: fix memory leak during module removal · 1bb8dce5
      Erik Hugne authored
      When the TIPC module is removed, the tasklet handler is disabled
      before all other subsystems. This will cause lingering publications
      in the name table because the node_down tasklets responsible to
      clean up publications from an unreachable node will never run.
      When the name table is shut down, these publications are detected
      and an error message is logged:
      tipc: nametbl_stop(): orphaned hash chain detected
      This is actually a memory leak, introduced with commit
      993b858e ("tipc: correct the order
      of stopping services at rmmod")
      
      Instead of just logging an error and leaking memory, we free
      the orphaned entries during nametable shutdown.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bb8dce5
    • Erik Hugne's avatar
      tipc: drop subscriber connection id invalidation · edcc0511
      Erik Hugne authored
      When a topology server subscriber is disconnected, the associated
      connection id is set to zero. A check vs zero is then done in the
      subscription timeout function to see if the subscriber have been
      shut down. This is unnecessary, because all subscription timers
      will be cancelled when a subscriber terminates. Setting the
      connection id to zero is actually harmful because id zero is the
      identity of the topology server listening socket, and can cause a
      race that leads to this socket being closed instead.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edcc0511
    • Ying Xue's avatar
      tipc: avoid to unnecessary process switch under non-block mode · fe8e4649
      Ying Xue authored
      When messages are received via tipc socket under non-block mode,
      schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is,
      the process of receiving messages will be scheduled once although
      timeout value passed to schedule_timeout() is 0. The same issue
      exists in accept()/wait_for_accept(). To avoid this unnecessary
      process switch, we only call schedule_timeout() if the timeout
      value is non-zero.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8e4649
    • Ying Xue's avatar
      tipc: fix connection refcount leak · 4652edb7
      Ying Xue authored
      When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a
      connection instance, its reference count value is increased if
      it's found. But subsequently if it's found that the connection is
      closed, the work of sending message is not queued into its server
      send workqueue, and the connection reference count is not decreased.
      This will cause a reference count leak. To reproduce this problem,
      an application would need to open and closes topology server
      connections with high intensity.
      
      We fix this by immediately decrementing the connection reference
      count if a send fails due to the connection being closed.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4652edb7
    • Ying Xue's avatar
      tipc: allow connection shutdown callback to be invoked in advance · 6d4ebeb4
      Ying Xue authored
      Currently connection shutdown callback function is called when
      connection instance is released in tipc_conn_kref_release(), and
      receiving packets and sending packets are running in different
      threads. Even if connection is closed by the thread of receiving
      packets, its shutdown callback may not be called immediately as
      the connection reference count is non-zero at that moment. So,
      although the connection is shut down by the thread of receiving
      packets, the thread of sending packets doesn't know it. Before
      its shutdown callback is invoked to tell the sending thread its
      connection has been closed, the sending thread may deliver
      messages by tipc_conn_sendmsg(), this is why the following error
      information appears:
      
      "Sending subscription event failed, no memory"
      
      To eliminate it, allow connection shutdown callback function to
      be called before connection id is removed in tipc_close_conn(),
      which makes the sending thread know the truth in time that its
      socket is closed so that it doesn't send message to it. We also
      remove the "Sending XXX failed..." error reporting for topology
      and config services.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d4ebeb4
    • Guillaume Nault's avatar
      l2tp: fix userspace reception on plain L2TP sockets · 9e9cb622
      Guillaume Nault authored
      As pppol2tp_recv() never queues up packets to plain L2TP sockets,
      pppol2tp_recvmsg() never returns data to userspace, thus making
      the recv*() system calls unusable.
      
      Instead of dropping packets when the L2TP socket isn't bound to a PPP
      channel, this patch adds them to its reception queue.
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e9cb622
    • Guillaume Nault's avatar
      l2tp: fix manual sequencing (de)activation in L2TPv2 · bb5016ea
      Guillaume Nault authored
      Commit e0d4435f "l2tp: Update PPP-over-L2TP driver to work over L2TPv3"
      broke the PPPOL2TP_SO_SENDSEQ setsockopt. The L2TP header length was
      previously computed by pppol2tp_l2t_header_len() before each call to
      l2tp_xmit_skb(). Now that header length is retrieved from the hdr_len
      session field, this field must be updated every time the L2TP header
      format is modified, or l2tp_xmit_skb() won't push the right amount of
      data for the L2TP header.
      
      This patch uses l2tp_session_set_header_len() to adjust hdr_len every
      time sequencing is (de)activated from userspace (either by the
      PPPOL2TP_SO_SENDSEQ setsockopt or the L2TP_ATTR_SEND_SEQ netlink
      attribute).
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb5016ea