1. 28 Jul, 2013 40 commits
    • Sreekanth Reddy's avatar
      SCSI: mpt3sas: fix for kernel panic when driver loads with HBA conected to non... · e90c1dc2
      Sreekanth Reddy authored
      SCSI: mpt3sas: fix for kernel panic when driver loads with HBA conected to non LUN 0 configured expander
      
      commit b65cfedf upstream.
      
      With some enclosures when LUN 0 is not created but LUN 1 or LUN X is created
      then SCSI scan procedure calls target_alloc, slave_alloc call back functions
      for LUN 0 and slave_destory() for same LUN 0.
      
      In these kind of cases within slave_destroy, pointer to scsi_target in
      _sas_device structure is set to NULL, following which when slave_alloc for LUN
      1 is called then starget would not be set properly for this LUN.  So,
      scsi_target pointer pointing to NULL value would lead to a crash later in the
      discovery procedure.
      
      To solve this issue set the sas_device's scsi_target pointer to scsi_device's
      scsi_target if it is NULL earlier in slave_alloc callback function.
      Signed-off-by: default avatarSreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      e90c1dc2
    • Sreekanth Reddy's avatar
      SCSI: mpt3sas: Infinite loops can occur if MPI2_IOCSTATUS_CONFIG_INVALID_PAGE is not returned · 72ffadba
      Sreekanth Reddy authored
      commit 14be49ac upstream.
      
      Infinite loop can occur if IOCStatus is not equal to
      MPI2_IOCSTATUS_CONFIG_INVALID_PAGE value in the while loops in functions
      _scsih_search_responding_sas_devices,
      _scsih_search_responding_raid_devices and
      _scsih_search_responding_expanders
      
      So, Instead of checking for MPI2_IOCSTATUS_CONFIG_INVALID_PAGE value,
      in this patch code is modified to check for IOCStatus not equals to
      MPI2_IOCSTATUS_SUCCESS to break the while loop.
      Signed-off-by: default avatarSreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72ffadba
    • Borislav Petkov's avatar
      EDAC: Fix lockdep splat · f46ef77d
      Borislav Petkov authored
      commit 88d84ac9 upstream.
      
      Fix the following:
      
      BUG: key ffff88043bdd0330 not in .data!
      ------------[ cut here ]------------
      WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
      DEBUG_LOCKS_WARN_ON(1)
      Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
      CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
      Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
       0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
       ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
       ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
      Call Trace:
        dump_stack
        warn_slowpath_common
        warn_slowpath_fmt
        lockdep_init_map
        ? trace_hardirqs_on_caller
        ? trace_hardirqs_on
        debug_mutex_init
        __mutex_init
        bus_register
        edac_create_sysfs_mci_device
        edac_mc_add_mc
        sbridge_probe
        pci_device_probe
        driver_probe_device
        __driver_attach
        ? driver_probe_device
        bus_for_each_dev
        driver_attach
        bus_add_driver
        driver_register
        __pci_register_driver
        ? 0xffffffffa0010fff
        sbridge_init
        ? 0xffffffffa0010fff
        do_one_initcall
        load_module
        ? unset_module_init_ro_nx
        SyS_init_module
        tracesys
      ---[ end trace d24a70b0d3ddf733 ]---
      EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
      EDAC sbridge: Driver loaded.
      
      What happens is that bus_register needs a statically allocated lock_key
      because the last is handed in to lockdep. However, struct mem_ctl_info
      embeds struct bus_type (the whole struct, not a pointer to it) and the
      whole thing gets dynamically allocated.
      
      Fix this by using a statically allocated struct bus_type for the MC bus.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarMauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f46ef77d
    • Kent Overstreet's avatar
      bcache: Journal replay fix · 5e20e8b3
      Kent Overstreet authored
      commit faa56736 upstream.
      
      The journal replay code starts by finding something that looks like a
      valid journal entry, then it does a binary search over the unchecked
      region of the journal for the journal entries with the highest sequence
      numbers.
      
      Trouble is, the logic was wrong - journal_read_bucket() returns true if
      it found journal entries we need, but if the range of journal entries
      we're looking for loops around the end of the journal - in that case
      journal_read_bucket() could return true when it hadn't found the highest
      sequence number we'd seen yet, and in that case the binary search did
      the wrong thing. Whoops.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e20e8b3
    • Kent Overstreet's avatar
      bcache: Fix GC_SECTORS_USED() calculation · 99e56bf5
      Kent Overstreet authored
      commit 29ebf465 upstream.
      
      Part of the job of garbage collection is to add up however many sectors
      of live data it finds in each bucket, but that doesn't work very well if
      it doesn't reset GC_SECTORS_USED() when it starts. Whoops.
      
      This wouldn't have broken anything horribly, but allocation tries to
      preferentially reclaim buckets that are mostly empty and that's not
      gonna work with an incorrect GC_SECTORS_USED() value.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99e56bf5
    • Kent Overstreet's avatar
      bcache: Fix a sysfs splat on shutdown · fe1b7105
      Kent Overstreet authored
      commit c9502ea4 upstream.
      
      If we stopped a bcache device when we were already detaching (or
      something like that), bcache_device_unlink() would try to remove a
      symlink from sysfs that was already gone because the bcache dev kobject
      had already been removed from sysfs.
      
      So keep track of whether we've removed stuff from sysfs.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe1b7105
    • Kent Overstreet's avatar
      bcache: Shutdown fix · 63a53870
      Kent Overstreet authored
      commit 5caa52af upstream.
      
      Stopping a cache set is supposed to make it stop attached backing
      devices, but somewhere along the way that code got lost. Fixing this
      mainly has the effect of fixing our reboot notifier.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63a53870
    • Kent Overstreet's avatar
      bcache: Advertise that flushes are supported · 3fcbc176
      Kent Overstreet authored
      commit 54d12f2b upstream.
      
      Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered
      out unless we say we support them...
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fcbc176
    • Kent Overstreet's avatar
      bcache: Fix a dumb race · c0d8f455
      Kent Overstreet authored
      commit 6aa8f1a6 upstream.
      
      In the far-too-complicated closure code - closures can have destructors,
      for probably dubious reasons; they get run after the closure is no
      longer waiting on anything but before dropping the parent ref, intended
      just for freeing whatever memory the closure is embedded in.
      
      Trouble is, when remaining goes to 0 and we've got nothing more to run -
      we also have to unlock the closure, setting remaining to -1. If there's
      a destructor, that unlock isn't doing anything - nobody could be trying
      to lock it if we're about to free it - but if the unlock _is needed...
      that check for a destructor was racy. Argh.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0d8f455
    • Miklos Szeredi's avatar
      fuse: readdirplus: sanity checks · 223828d8
      Miklos Szeredi authored
      commit a28ef45c upstream.
      
      Add sanity checks before adding or updating an entry with data received
      from readdirplus.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      223828d8
    • Miklos Szeredi's avatar
      fuse: readdirplus: fix instantiate · dc2a6c2d
      Miklos Szeredi authored
      commit 2914941e upstream.
      
      Fuse does instantiation slightly differently from NFS/CIFS which use
      d_materialise_unique().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc2a6c2d
    • Niels de Vos's avatar
      fuse: readdirplus: fix dentry leak · 05ac7b3a
      Niels de Vos authored
      commit 53ce9a33 upstream.
      
      In case d_lookup() returns a dentry with d_inode == NULL, the dentry is not
      returned with dput(). This results in triggering a BUG() in
      shrink_dcache_for_umount_subtree():
      
        BUG: Dentry ...{i=0,n=...} still in use (1) [unmount of fuse fuse]
      
      [SzM: need to d_drop() as well]
      Reported-by: default avatarJustin Clift <jclift@redhat.com>
      Signed-off-by: default avatarNiels de Vos <ndevos@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Tested-by: default avatarBrian Foster <bfoster@redhat.com>
      Tested-by: default avatarNiels de Vos <ndevos@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05ac7b3a
    • Ralf Baechle's avatar
      RAPIDIO: IDT_GEN2: Fix build error. · df18f5f9
      Ralf Baechle authored
      commit 27f62b9f upstream.
      
        CC      drivers/rapidio/switches/idt_gen2.o
      drivers/rapidio/switches/idt_gen2.c: In function ‘idtg2_show_errlog’:
      drivers/rapidio/switches/idt_gen2.c:379:30: error: ‘PAGE_SIZE’ undeclared (first use in this function)
      drivers/rapidio/switches/idt_gen2.c:379:30: note: each undeclared identifier is reported only once for each function it appears in
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Acked-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df18f5f9
    • Ralf Baechle's avatar
      MIPS: Oceton: Fix build error. · cf6a37e7
      Ralf Baechle authored
      commit 39205750 upstream.
      
      If CONFIG_CAVIUM_OCTEON_LOCK_L2_TLB, CONFIG_CAVIUM_OCTEON_LOCK_L2_EXCEPTION,
      CONFIG_CAVIUM_OCTEON_LOCK_L2_LOW_LEVEL_INTERRUPT and
      CONFIG_CAVIUM_OCTEON_LOCK_L2_INTERRUPT are all undefined:
      
      arch/mips/cavium-octeon/setup.c: In function ‘prom_init’:
      arch/mips/cavium-octeon/setup.c:715:12: error: unused variable ‘ebase’ [-Werror=unused-variable]
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf6a37e7
    • Eric Dumazet's avatar
      vlan: fix a race in egress prio management · 5110890c
      Eric Dumazet authored
      [ Upstream commit 3e3aac49 ]
      
      egress_priority_map[] hash table updates are protected by rtnl,
      and we never remove elements until device is dismantled.
      
      We have to make sure that before inserting an new element in hash table,
      all its fields are committed to memory or else another cpu could
      find corrupt values and crash.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5110890c
    • Eric Dumazet's avatar
      vlan: mask vlan prio bits · 37b25f3f
      Eric Dumazet authored
      [ Upstream commit d4b812de ]
      
      In commit 48cc32d3
      ("vlan: don't deliver frames for unknown vlans to protocols")
      Florian made sure we set pkt_type to PACKET_OTHERHOST
      if the vlan id is set and we could find a vlan device for this
      particular id.
      
      But we also have a problem if prio bits are set.
      
      Steinar reported an issue on a router receiving IPv6 frames with a
      vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
      because skb->vlan_tci is set.
      
      Forwarded frame is completely corrupted : We can see (8100:4000)
      being inserted in the middle of IPv6 source address :
      
      16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
      9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
             0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
             0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
             0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
             0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
      
      It seems we are not really ready to properly cope with this right now.
      
      We can probably do better in future kernels :
      vlan_get_ingress_priority() should be a netdev property instead of
      a per vlan_dev one.
      
      For stable kernels, lets clear vlan_tci to fix the bugs.
      Reported-by: default avatarSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37b25f3f
    • Jason Wang's avatar
      macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · 7d9e6dd8
      Jason Wang authored
      [ Upstream commit ece793fc ]
      
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d9e6dd8
    • Jason Wang's avatar
      tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · 05464d21
      Jason Wang authored
      [ Upstream commit 88529176 ]
      
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      The bug were introduced from commit 0690899b
      (tun: experimental zero copy tx support)
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05464d21
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: remove a source of high packet delay/jitter · c1d220fb
      Paolo Valente authored
      [ Upstream commit 87f40dd6 ]
      
      QFQ+ inherits from QFQ a design choice that may cause a high packet
      delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
      special quantity, the system virtual time, to track the service
      provided by the ideal system it approximates. When a packet is
      dequeued, this quantity must be incremented by the size of the packet,
      divided by the sum of the weights of the aggregates waiting to be
      served. Tracking this sum correctly is a non-trivial task, because, to
      preserve tight service guarantees, the decrement of this sum must be
      delayed in a special way [1]: this sum can be decremented only after
      that its value would decrease also in the ideal system approximated by
      QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
      weight sum, increased and decreased immediately as the weight of an
      aggregate changes, and as an aggregate is created or destroyed (which,
      in its turn, happens as a consequence of some class being
      created/destroyed/changed). However, to avoid the problems caused to
      service guarantees by these immediate decreases, QFQ+ increments the
      system virtual time using the maximum value allowed for the weight
      sum, 2^10, in place of the dynamic, instantaneous value. The
      instantaneous value of the weight sum is used only to check whether a
      request of weight increase or a class creation can be satisfied.
      
      Unfortunately, the problems caused by this choice are worse than the
      temporary degradation of the service guarantees that may occur, when a
      class is changed or destroyed, if the instantaneous value of the
      weight sum was used to update the system virtual time. In fact, the
      fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
      equal to the ratio between the weight of the aggregate and the sum of
      the weights of the competing aggregates. The packet delay guaranteed
      to the aggregate is instead inversely proportional to the guaranteed
      bandwidth. By using the maximum possible value, and not the actual
      value of the weight sum, QFQ+ provides each aggregate with the worst
      possible service guarantees, and not with service guarantees related
      to the actual set of competing aggregates. To see the consequences of
      this fact, consider the following simple example.
      
      Suppose that only the following aggregates are backlogged, i.e., that
      only the classes in the following aggregates have packets to transmit:
      one aggregate with weight 10, say A, and ten aggregates with weight 1,
      say B1, B2, ..., B10. In particular, suppose that these aggregates are
      always backlogged. Given the weight distribution, the smoothest and
      fairest service order would be:
      A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...
      
      QFQ+ would provide exactly this optimal service if it used the actual
      value for the weight sum instead of the maximum possible value, i.e.,
      11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
      serves aggregates as follows (easy to prove and to reproduce
      experimentally):
      A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...
      
      By replacing 10 with N in the above example, and by increasing N, one
      can increase at will the maximum packet delay and the jitter
      experienced by the classes in aggregate A.
      
      This patch addresses this issue by just using the above
      'instantaneous' value of the weight sum, instead of the maximum
      possible value, when updating the system virtual time.  After the
      instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
      service for a time interval in the order of the time to serve one
      maximum-size packet for each backlogged class. The worst-case extent
      of the deviation exhibited by QFQ+ during this time interval [1] is
      basically the same as of the deviation described above (but, without
      this patch, QFQ+ suffers from such a deviation all the time). Finally,
      this patch modifies the comment to the function qfq_slot_insert, to
      make it coherent with the fact that the weight sum used by QFQ+ can
      now be lower than the maximum possible value.
      
      [1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
      Proceedings of AAA-IDEA'05, June 2005.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1d220fb
    • Haiyang Zhang's avatar
      hyperv: Fix the NETIF_F_SG flag setting in netvsc · 98bec4a1
      Haiyang Zhang authored
      [ Upstream commit f4570820 ]
      
      SG mode is not currently supported by netvsc, so remove this flag for now.
      Otherwise, it will be unconditionally enabled by commit ec5f0615
          "Kill link between CSUM and SG features"
      Previously, the SG feature is disabled because CSUM is not set here.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98bec4a1
    • Sarveshwar Bandi's avatar
      be2net: Fix to avoid hardware workaround when not needed · e0ca176c
      Sarveshwar Bandi authored
      [ Upstream commit 52fe29e4 ]
      
      Hardware workaround requesting hardware to skip vlan insertion is necessary
      only when umc or qnq is enabled. Enabling this workaround in other scenarios
      could cause controller to stall.
      Signed-off-by: default avatarSarveshwar Bandi <sarveshwar.bandi@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0ca176c
    • Eric Dumazet's avatar
      ipv4: set transport header earlier · b3923f82
      Eric Dumazet authored
      [ Upstream commit 21d1196a ]
      
      commit 45f00f99 ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
      performance regression for non GRO traffic, basically disabling
      IP early demux.
      
      IPv6 stack resets transport header in ip6_rcv() before calling
      IP early demux in ip6_rcv_finish(), while IPv4 does this only in
      ip_local_deliver_finish(), _after_ IP early demux.
      
      GRO traffic happened to enable IP early demux because transport header
      is also set in inet_gro_receive()
      
      Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
      same : transport_header should be set in ip_rcv() instead of
      ip_local_deliver_finish()
      
      ip_local_deliver_finish() can also use skb_network_header_len() which is
      faster than ip_hdrlen()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3923f82
    • Neil Horman's avatar
      atl1e: unmap partially mapped skb on dma error and free skb · da7e35ce
      Neil Horman authored
      [ Upstream commit 584ec435 ]
      
      Ben Hutchings pointed out that my recent update to atl1e
      in commit 352900b5
      ("atl1e: fix dma mapping warnings") was missing a bit of code.
      
      Specifically it reset the hardware tx ring to its origional state when
      we hit a dma error, but didn't unmap any exiting mappings from the
      operation.  This patch fixes that up.  It also remembers to free the
      skb in the event that an error occurs, so we don't leak.  Untested, as
      I don't have hardware.  I think its pretty straightforward, but please
      review closely.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Jay Cliburn <jcliburn@gmail.com>
      CC: Chris Snook <chris.snook@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da7e35ce
    • Neil Horman's avatar
      atl1e: fix dma mapping warnings · dc419b2d
      Neil Horman authored
      [ Upstream commit 352900b5 ]
      
      Recently had this backtrace reported:
      WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930()
      Hardware name: System Product Name
      ATL1E 0000:02:00.0: DMA-API: device driver failed to check map error[device
      address=0x00000000cbfd1000] [size=90 bytes] [mapped as single]
      Modules linked in: xt_conntrack nf_conntrack ebtable_filter ebtables
      ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt
      iTCO_vendor_support snd_hda_intel acpi_cpufreq mperf coretemp btrfs zlib_deflate
      snd_hda_codec snd_hwdep microcode raid6_pq libcrc32c snd_seq usblp serio_raw xor
      snd_seq_device joydev snd_pcm snd_page_alloc snd_timer snd lpc_ich i2c_i801
      soundcore mfd_core atl1e asus_atk0110 ata_generic pata_acpi radeon i2c_algo_bit
      drm_kms_helper ttm drm i2c_core pata_marvell uinput
      Pid: 314, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.3.fc19.x86_64 #1
      Call Trace:
       <IRQ>  [<ffffffff81069106>] warn_slowpath_common+0x66/0x80
       [<ffffffff8106916c>] warn_slowpath_fmt+0x4c/0x50
       [<ffffffff8138151d>] check_unmap+0x47d/0x930
       [<ffffffff810ad048>] ? sched_clock_cpu+0xa8/0x100
       [<ffffffff81381a2f>] debug_dma_unmap_page+0x5f/0x70
       [<ffffffff8137ce30>] ? unmap_single+0x20/0x30
       [<ffffffffa01569a1>] atl1e_intr+0x3a1/0x5b0 [atl1e]
       [<ffffffff810d53fd>] ? trace_hardirqs_off+0xd/0x10
       [<ffffffff81119636>] handle_irq_event_percpu+0x56/0x390
       [<ffffffff811199ad>] handle_irq_event+0x3d/0x60
       [<ffffffff8111cb6a>] handle_fasteoi_irq+0x5a/0x100
       [<ffffffff8101c36f>] handle_irq+0xbf/0x150
       [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
       [<ffffffff81073b10>] ? irq_enter+0x50/0xa0
       [<ffffffff8172738d>] do_IRQ+0x4d/0xc0
       [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
       [<ffffffff8171c6b2>] common_interrupt+0x72/0x72
       <EOI>  [<ffffffff810db5b2>] ? lock_release+0xc2/0x310
       [<ffffffff8109ea04>] lg_local_unlock_cpu+0x24/0x50
       [<ffffffff811dcb2f>] file_sb_list_del+0x3f/0x50
       [<ffffffff811dcb6d>] fput+0x2d/0xc0
       [<ffffffff811d8ea1>] filp_close+0x61/0x90
       [<ffffffff811fae4d>] __close_fd+0x8d/0x150
       [<ffffffff811d8ef0>] sys_close+0x20/0x50
       [<ffffffff81725699>] system_call_fastpath+0x16/0x1b
      
      The usual straighforward failure to check for dma_mapping_error after a map
      operation is completed.
      
      This patch should fix it, the reporter wandered off after filing this bz:
      https://bugzilla.redhat.com/show_bug.cgi?id=954170
      
      and I don't have hardware to test, but the fix is pretty straightforward, so I
      figured I'd post it for review.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Jay Cliburn <jcliburn@gmail.com>
      CC: Chris Snook <chris.snook@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc419b2d
    • Hannes Frederic Sowa's avatar
      ipv6: only static routes qualify for equal cost multipathing · d6245ef7
      Hannes Frederic Sowa authored
      [ Upstream commit 307f2fb9 ]
      
      Static routes in this case are non-expiring routes which did not get
      configured by autoconf or by icmpv6 redirects.
      
      To make sure we actually get an ecmp route while searching for the first
      one in this fib6_node's leafs, also make sure it matches the ecmp route
      assumptions.
      
      v2:
      a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
         already ensures that this route, even if added again without
         RTF_EXPIRES (in case of a RA announcement with infinite timeout),
         does not cause the rt6i_nsiblings logic to go wrong if a later RA
         updates the expiration time later.
      
      v3:
      a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
         because an pmtu event could update the RTF_EXPIRES flag and we would
         not count this route, if another route joins this set. We now filter
         only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
         don't get changed after rt6_info construction.
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6245ef7
    • Alexander Duyck's avatar
      gre: Fix MTU sizing check for gretap tunnels · 6afbcb59
      Alexander Duyck authored
      [ Upstream commit 8c91e162 ]
      
      This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
      packets are sent from the interface.
      
      In my case I was able to reproduce the issue by simply sending a ping of
      1421 bytes with the gretap interface created on a device with a standard
      1500 mtu.
      
      This fix is based on the fact that the tunnel mtu is already adjusted by
      dev->hard_header_len so it would make sense that any packets being compared
      against that mtu should also be adjusted by hard_header_len and the tunnel
      header instead of just the tunnel header.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: default avatarCong Wang <amwang@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6afbcb59
    • dingtianhong's avatar
      ifb: fix oops when loading the ifb failed · f84ddbc5
      dingtianhong authored
      [ Upstream commit f2966cd5 ]
      
      If __rtnl_link_register() return faild when loading the ifb, it will
      take the wrong path and get oops, so fix it just like dummy.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f84ddbc5
    • dingtianhong's avatar
      dummy: fix oops when loading the dummy failed · 629b1d3d
      dingtianhong authored
      [ Upstream commit 2c8a0189 ]
      
      We rename the dummy in modprobe.conf like this:
      
      install dummy0 /sbin/modprobe -o dummy0 --ignore-install dummy
      install dummy1 /sbin/modprobe -o dummy1 --ignore-install dummy
      
      We got oops when we run the command:
      
      modprobe dummy0
      modprobe dummy1
      
      ------------[ cut here ]------------
      
      [ 3302.187584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [ 3302.195411] IP: [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.201844] PGD 85c94a067 PUD 8517bd067 PMD 0
      [ 3302.206305] Oops: 0002 [#1] SMP
      [ 3302.299737] task: ffff88105ccea300 ti: ffff880eba4a0000 task.ti: ffff880eba4a0000
      [ 3302.307186] RIP: 0010:[<ffffffff813fe62a>]  [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.316044] RSP: 0018:ffff880eba4a1dd8  EFLAGS: 00010246
      [ 3302.321332] RAX: 0000000000000000 RBX: ffffffff81a9d738 RCX: 0000000000000002
      [ 3302.328436] RDX: 0000000000000000 RSI: ffffffffa04d602c RDI: ffff880eba4a1dd8
      [ 3302.335541] RBP: ffff880eba4a1e18 R08: dead000000200200 R09: dead000000100100
      [ 3302.342644] R10: 0000000000000080 R11: 0000000000000003 R12: ffffffff81a9d788
      [ 3302.349748] R13: ffffffffa04d7020 R14: ffffffff81a9d670 R15: ffff880eba4a1dd8
      [ 3302.364910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3302.370630] CR2: 0000000000000008 CR3: 000000085e15e000 CR4: 00000000000427e0
      [ 3302.377734] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
      [ 3302.384838] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 3302.391940] Stack:
      [ 3302.393944]  ffff880eba4a1dd8 ffff880eba4a1dd8 ffff880eba4a1e18 ffffffffa04d70c0
      [ 3302.401350]  00000000ffffffef ffffffffa01a8000 0000000000000000 ffffffff816111c8
      [ 3302.408758]  ffff880eba4a1e48 ffffffffa01a80be ffff880eba4a1e48 ffffffffa04d70c0
      [ 3302.416164] Call Trace:
      [ 3302.418605]  [<ffffffffa01a8000>] ? 0xffffffffa01a7fff
      [ 3302.423727]  [<ffffffffa01a80be>] dummy_init_module+0xbe/0x1000 [dummy0]
      [ 3302.430405]  [<ffffffffa01a8000>] ? 0xffffffffa01a7fff
      [ 3302.435535]  [<ffffffff81000322>] do_one_initcall+0x152/0x1b0
      [ 3302.441263]  [<ffffffff810ab24b>] do_init_module+0x7b/0x200
      [ 3302.446824]  [<ffffffff810ad3d2>] load_module+0x4e2/0x530
      [ 3302.452215]  [<ffffffff8127ae40>] ? ddebug_dyndbg_boot_param_cb+0x60/0x60
      [ 3302.458979]  [<ffffffff810ad5f1>] SyS_init_module+0xd1/0x130
      [ 3302.464627]  [<ffffffff814b9652>] system_call_fastpath+0x16/0x1b
      [ 3302.490090] RIP  [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.496607]  RSP <ffff880eba4a1dd8>
      [ 3302.500084] CR2: 0000000000000008
      [ 3302.503466] ---[ end trace 8342d49cd49f78ed ]---
      
      The reason is that when loading dummy, if __rtnl_link_register() return failed,
      the init_module should return and avoid take the wrong path.
      Signed-off-by: default avatarTan Xiaojun <tanxiaojun@huawei.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      629b1d3d
    • Hannes Frederic Sowa's avatar
      ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF · b4f1489e
      Hannes Frederic Sowa authored
      [ Upstream commit afc154e9 ]
      
      This is a follow-up patch to 3630d400
      ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
      information are available").
      
      Since the removal of rt->n in rt6_info we can end up with a dst ==
      NULL in rt6_check_neigh. In case the kernel is not compiled with
      CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
      NUD state but we must not avoid doing round robin selection on routes
      with the same target. So introduce and pass down a boolean ``do_rr'' to
      indicate when we should update rt->rr_ptr. As soon as no route is valid
      we do backtracking and do a lookup on a higher level in the fib trie.
      
      v2:
      a) Improved rt6_check_neigh logic (no need to create neighbour there)
         and documented return values.
      
      v3:
      a) Introduce enum rt6_nud_state to get rid of the magic numbers
         (thanks to David Miller).
      b) Update and shorten commit message a bit to actualy reflect
         the source.
      Reported-by: default avatarPierre Emeriaud <petrus.lt@gmail.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4f1489e
    • Maarten Lankhorst's avatar
      alx: fix lockdep annotation · 61b6f128
      Maarten Lankhorst authored
      [ Upstream commit a8798a5c ]
      
      Move spin_lock_init to be called before the spinlocks are used, preventing a lockdep splat.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61b6f128
    • Sasha Levin's avatar
      9p: fix off by one causing access violations and memory corruption · 53effe16
      Sasha Levin authored
      [ Upstream commit 110ecd69 ]
      
      p9_release_pages() would attempt to dereference one value past the end of
      pages[]. This would cause the following crashes:
      
      [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
      [ 6293.174146] IP: [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
      [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 6293.180060] Modules linked in:
      [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G        W    3.10.0-next-20130710-sasha #3954
      [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
      [ 6293.180060] RIP: 0010:[<ffffffff8412793b>]  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.214316] RSP: 0000:ffff880787ddfc28  EFLAGS: 00010202
      [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
      [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
      [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
      [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
      [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
      [ 6293.222017] FS:  00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
      [ 6293.256784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
      [ 6293.256784] Stack:
      [ 6293.256784]  ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
      [ 6293.256784]  ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
      [ 6293.256784]  ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
      [ 6293.256784] Call Trace:
      [ 6293.256784]  [<ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630
      [ 6293.256784]  [<ffffffff8115c610>] ? wake_up_bit+0x40/0x40
      [ 6293.256784]  [<ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0
      [ 6293.256784]  [<ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120
      [ 6293.256784]  [<ffffffff84122a21>] p9_client_read+0xe1/0x2c0
      [ 6293.256784]  [<ffffffff81708a90>] v9fs_file_read+0x90/0xc0
      [ 6293.256784]  [<ffffffff812bd073>] vfs_read+0xc3/0x130
      [ 6293.256784]  [<ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10
      [ 6293.256784]  [<ffffffff812bd5a2>] SyS_read+0x62/0xa0
      [ 6293.256784]  [<ffffffff841a1a00>] tracesys+0xdd/0xe2
      [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
      [ 6293.256784] RIP  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
      [ 6293.256784]  RSP <ffff880787ddfc28>
      [ 6293.256784] CR2: ffff8807c96f3000
      [ 6293.256784] ---[ end trace 50822ee72cd360fc ]---
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53effe16
    • Hannes Frederic Sowa's avatar
      ipv6: in case of link failure remove route directly instead of letting it expire · a025e28a
      Hannes Frederic Sowa authored
      [ Upstream commit 1eb4f758 ]
      
      We could end up expiring a route which is part of an ecmp route set. Doing
      so would invalidate the rt->rt6i_nsiblings calculations and could provoke
      the following panic:
      
      [   80.144667] ------------[ cut here ]------------
      [   80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
      [   80.145172] invalid opcode: 0000 [#1] SMP
      [   80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
      +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
      [   80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
      [   80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
      [   80.145172] RIP: 0010:[<ffffffff815f3b5d>]  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172] RSP: 0018:ffff880118771798  EFLAGS: 00010202
      [   80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
      [   80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
      [   80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
      [   80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
      [   80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
      [   80.145172] FS:  00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
      [   80.145172] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
      [   80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   80.145172] Stack:
      [   80.145172]  0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
      [   80.145172]  0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
      [   80.145172]  0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
      [   80.145172] Call Trace:
      [   80.145172]  [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
      [   80.145172]  [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
      [   80.145172]  [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40
      [   80.145172]  [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
      [   80.145172]  [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
      [   80.145172]  [<ffffffff81616077>] fib6_rule_action+0xd7/0x210
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140
      [   80.145172]  [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff815edea3>] ip6_route_output+0x73/0xb0
      [   80.145172]  [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
      [   80.145172]  [<ffffffff813007b1>] ? list_del+0x11/0x40
      [   80.145172]  [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
      [   80.145172]  [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
      [   80.145172]  [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
      [   80.145172]  [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
      [   80.145172]  [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
      [   80.145172]  [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
      [   80.145172]  [<ffffffff81524a68>] SYSC_sendto+0x128/0x180
      [   80.145172]  [<ffffffff8109825c>] ? update_curr+0xec/0x170
      [   80.145172]  [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
      [   80.145172]  [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
      [   80.145172]  [<ffffffff8152509e>] SyS_sendto+0xe/0x10
      [   80.145172]  [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
      [   80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
      [   80.145172] RIP  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172]  RSP <ffff880118771798>
      [   80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
      [   80.390154] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a025e28a
    • Jason Wang's avatar
      macvtap: correctly linearize skb when zerocopy is used · bd31fdd2
      Jason Wang authored
      [ Upstream commit 61d46bf9 ]
      
      Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
      linearize parts of the skb to let the rest of iov to be fit in
      the frags, we need count copylen into linear when calling macvtap_alloc_skb()
      instead of partly counting it into data_len. Since this breaks
      zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
      be zero at beginning. This cause nr_frags to be increased wrongly without
      setting the correct frags.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd31fdd2
    • Jason Wang's avatar
      tuntap: correctly linearize skb when zerocopy is used · d09ec76a
      Jason Wang authored
      [ Upstream commit 3dd5c330 ]
      
      Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
      linearize parts of the skb to let the rest of iov to be fit in
      the frags, we need count copylen into linear when calling tun_alloc_skb()
      instead of partly counting it into data_len. Since this breaks
      zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
      be zero at beginning. This cause nr_frags to be increased wrongly without
      setting the correct frags.
      
      This bug were introduced from 0690899b
      (tun: experimental zero copy tx support)
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d09ec76a
    • dingtianhong's avatar
      ifb: fix rcu_sched self-detected stalls · 6683151a
      dingtianhong authored
      [ Upstream commit 440d57bc ]
      
      According to the commit 16b0dc29
      (dummy: fix rcu_sched self-detected stalls)
      
      Eric Dumazet fix the problem in dummy, but the ifb will occur the
      same problem like the dummy modules.
      
      Trying to "modprobe ifb numifbs=30000" triggers :
      
      INFO: rcu_sched self-detected stall on CPU
      
      After this splat, RTNL is locked and reboot is needed.
      
      We must call cond_resched() to avoid this, even holding RTNL.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6683151a
    • Dave Kleikamp's avatar
      sunvnet: vnet_port_remove must call unregister_netdev · c51a7a30
      Dave Kleikamp authored
      [ Upstream commit aabb9875 ]
      
      The missing call to unregister_netdev() leaves the interface active
      after the driver is unloaded by rmmod.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c51a7a30
    • Michael S. Tsirkin's avatar
      vhost-net: fix use-after-free in vhost_net_flush · f5ce1d25
      Michael S. Tsirkin authored
      [ Upstream commit c38e39c3 ]
      
      vhost_net_ubuf_put_and_wait has a confusing name:
      it will actually also free it's argument.
      Thus since commit 1280c27f
          "vhost-net: flush outstanding DMAs on memory change"
      vhost_net_flush tries to use the argument after passing it
      to vhost_net_ubuf_put_and_wait, this results
      in use after free.
      To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
      add an new API for callers that want to free ubufs.
      Acked-by: default avatarAsias He <asias@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5ce1d25
    • Michael S. Tsirkin's avatar
      virtio_net: fix race in RX VQ processing · 2b0e8a4f
      Michael S. Tsirkin authored
      [ Upstream commit cbdadbbf ]
      
      virtio net called virtqueue_enable_cq on RX path after napi_complete, so
      with NAPI_STATE_SCHED clear - outside the implicit napi lock.
      This violates the requirement to synchronize virtqueue_enable_cq wrt
      virtqueue_add_buf.  In particular, used event can move backwards,
      causing us to lose interrupts.
      In a debug build, this can trigger panic within START_USE.
      
      Jason Wang reports that he can trigger the races artificially,
      by adding udelay() in virtqueue_enable_cb() after virtio_mb().
      
      However, we must call napi_complete to clear NAPI_STATE_SCHED before
      polling the virtqueue for used buffers, otherwise napi_schedule_prep in
      a callback will fail, causing us to lose RX events.
      
      To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
      set (under napi lock), later call virtqueue_poll with
      NAPI_STATE_SCHED clear (outside the lock).
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Tested-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b0e8a4f
    • Michael S. Tsirkin's avatar
      virtio: support unlocked queue poll · c23b1ece
      Michael S. Tsirkin authored
      [ Upstream commit cc229884 ]
      
      This adds a way to check ring empty state after enable_cb outside any
      locks. Will be used by virtio_net.
      
      Note: there's room for more optimization: caller is likely to have a
      memory barrier already, which means we might be able to get rid of a
      barrier here.  Deferring this optimization until we do some
      benchmarking.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c23b1ece
    • Jongsung Kim's avatar