1. 31 Oct, 2014 40 commits
    • David S. Miller's avatar
      sparc64: Implement __get_user_pages_fast(). · 13957783
      David S. Miller authored
      [ Upstream commit 06090e8e ]
      
      It is not sufficient to only implement get_user_pages_fast(), you
      must also implement the atomic version __get_user_pages_fast()
      otherwise you end up using the weak symbol fallback implementation
      which simply returns zero.
      
      This is dangerous, because it causes the futex code to loop forever
      if transparent hugepages are supported (see get_futex_key()).
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      13957783
    • David S. Miller's avatar
      sparc64: Fix register corruption in top-most kernel stack frame during boot. · 797df813
      David S. Miller authored
      [ Upstream commit ef3e035c ]
      
      Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
      eventually narrowed this down to only impacting machines using
      UltraSPARC-III and derivitive cpus.
      
      The crash happens right when the first user process is spawned:
      
      [   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      [   54.451346]
      [   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
      [   54.666431] Call Trace:
      [   54.698453]  [0000000000762f8c] panic+0xb0/0x224
      [   54.759071]  [000000000045cf68] do_exit+0x948/0x960
      [   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
      [   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
      [   54.978662] Press Stop-A (L1-A) to return to the boot prom
      [   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      
      Further investigation showed that compiling only per_cpu_patch() with
      an older compiler fixes the boot.
      
      Detailed analysis showed that the function is not being miscompiled by
      gcc-4.9, but it is using a different register allocation ordering.
      
      With the gcc-4.9 compiled function, something during the code patching
      causes some of the %i* input registers to get corrupted.  Perhaps
      we have a TLB miss path into the firmware that is deep enough to
      cause a register window spill and subsequent restore when we get
      back from the TLB miss trap.
      
      Let's plug this up by doing two things:
      
      1) Stop using the firmware stack for client interface calls into
         the firmware.  Just use the kernel's stack.
      
      2) As soon as we can, call into a new function "start_early_boot()"
         to put a one-register-window buffer between the firmware's
         deepest stack frame and the top-most initial kernel one.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      797df813
    • Dave Kleikamp's avatar
      sparc64: Increase size of boot string to 1024 bytes · 07ddd137
      Dave Kleikamp authored
      [ Upstream commit 1cef94c3 ]
      
      This is the longest boot string that silo supports.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      07ddd137
    • David S. Miller's avatar
      sparc64: Kill unnecessary tables and increase MAX_BANKS. · e7153bb8
      David S. Miller authored
      [ Upstream commit d195b71b ]
      
      swapper_low_pmd_dir and swapper_pud_dir are actually completely
      useless and unnecessary.
      
      We just need swapper_pg_dir[].  Naturally the other page table chunks
      will be allocated on an as-needed basis.  Since the kernel actually
      accesses these tables in the PAGE_OFFSET view, there is not even a TLB
      locality advantage of placing them in the kernel image.
      
      Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
      naturally page aligned.
      
      Increase MAX_BANKS to 1024 in order to handle heavily fragmented
      virtual guests.
      
      Even with this MAX_BANKS increase, the kernel is 20K+ smaller.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e7153bb8
    • bob picco's avatar
      sparc64: sparse irq · cc46c9ff
      bob picco authored
      [ Upstream commit ee6a9333 ]
      
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cc46c9ff
    • David S. Miller's avatar
      sparc64: Adjust vmalloc region size based upon available virtual address bits. · 83ca786f
      David S. Miller authored
      [ Upstream commit bb4e6e85 ]
      
      In order to accomodate embedded per-cpu allocation with large numbers
      of cpus and numa nodes, we have to use as much virtual address space
      as possible for the vmalloc region.  Otherwise we can get things like:
      
      PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
      
      So, once we select a value for PAGE_OFFSET, derive the size of the
      vmalloc region based upon that.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      83ca786f
    • David S. Miller's avatar
      sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. · d93d8235
      David S. Miller authored
      [ Upstream commit 7c0fa0f2 ]
      
      Make sure, at compile time, that the kernel can properly support
      whatever MAX_PHYS_ADDRESS_BITS is defined to.
      
      On M7 chips, use a max_phys_bits value of 49.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d93d8235
    • David S. Miller's avatar
      sparc64: Use kernel page tables for vmemmap. · c44eb16d
      David S. Miller authored
      [ Upstream commit c06240c7 ]
      
      For sparse memory configurations, the vmemmap array behaves terribly
      and it takes up an inordinate amount of space in the BSS section of
      the kernel image unconditionally.
      
      Just build huge PMDs and look them up just like we do for TLB misses
      in the vmalloc area.
      
      Kernel BSS shrinks by about 2MB.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c44eb16d
    • David S. Miller's avatar
      sparc64: Fix physical memory management regressions with large max_phys_bits. · 6057f359
      David S. Miller authored
      [ Upstream commit 0dd5b7b0 ]
      
      If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
      DEBUG_PAGEALLOC stop working because the 3-level page tables only
      can cover up to 43 bits.
      
      Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
      47, several statically allocated tables became enormous.
      
      Compounding this is that we will need to support up to 49 bits of
      physical addressing for M7 chips.
      
      The two tables in question are sparc64_valid_addr_bitmap and
      kpte_linear_bitmap.
      
      The first holds a bitmap, with 1 bit for each 4MB chunk of physical
      memory, indicating whether that chunk actually exists in the machine
      and is valid.
      
      The second table is a set of 2-bit values which tell how large of a
      mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
      chunk of ram in the system.
      
      These tables are huge and take up an enormous amount of the BSS
      section of the sparc64 kernel image.  Specifically, the
      sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
      
      So let's solve the space wastage and the DEBUG_PAGEALLOC problem
      at the same time, by using the kernel page tables (as designed) to
      manage this information.
      
      We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
      and we do this by encoding huge PMDs and PUDs.
      
      On a T4-2 with 256GB of ram the kernel page table takes up 16K with
      DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
      memory is dynamically allocated at run time rather than coded
      statically into the kernel image.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6057f359
    • David S. Miller's avatar
      sparc64: Adjust KTSB assembler to support larger physical addresses. · a79b0734
      David S. Miller authored
      [ Upstream commit 8c82dc0e ]
      
      As currently coded the KTSB accesses in the kernel only support up to
      47 bits of physical addressing.
      
      Adjust the instruction and patching sequence in order to support
      arbitrary 64 bits addresses.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a79b0734
    • David S. Miller's avatar
      sparc64: Define VA hole at run time, rather than at compile time. · fca31a28
      David S. Miller authored
      [ Upstream commit 4397bed0 ]
      
      Now that we use 4-level page tables, we can provide up to 53-bits of
      virtual address space to the user.
      
      Adjust the VA hole based upon the capabilities of the cpu type probed.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fca31a28
    • David S. Miller's avatar
      sparc64: Switch to 4-level page tables. · ca3d2276
      David S. Miller authored
      [ Upstream commit ac55c768 ]
      
      This has become necessary with chips that support more than 43-bits
      of physical addressing.
      
      Based almost entirely upon a patch by Bob Picco.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca3d2276
    • bob picco's avatar
      sparc64: T5 PMU · b34d4622
      bob picco authored
      [ Upstream commit 05aa1651 ]
      
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b34d4622
    • Allen Pais's avatar
      sparc64: cpu hardware caps support for sparc M6 and M7 · 205da5eb
      Allen Pais authored
      [ Upstream commit 40831625 ]
      Signed-off-by: default avatarAllen Pais <allen.pais@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      205da5eb
    • Allen Pais's avatar
      sparc64: support M6 and M7 for building CPU distribution map · 379ca504
      Allen Pais authored
      [ Upstream commit 9bd3ee33 ]
      
      Add M6 and M7 chip type in cpumap.c to correctly build CPU distribution map that spans all online CPUs.
      Signed-off-by: default avatarAllen Pais <allen.pais@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      379ca504
    • Allen Pais's avatar
      sparc64: correctly recognise M6 and M7 cpu type · 8a12544e
      Allen Pais authored
      [ Upstream commit cadbb580 ]
      
      The following patch adds support for correctly
      recognising M6 and M7 cpu type.
      Signed-off-by: default avatarAllen Pais <allen.pais@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8a12544e
    • David S. Miller's avatar
      sparc64: Fix hibernation code refrence to PAGE_OFFSET. · df2a98c2
      David S. Miller authored
      [ Upstream commit 9d0713ed ]
      
      We changed PAGE_OFFSET to be a variable rather than a constant,
      but this reference here in the hibernate assembler got missed.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      df2a98c2
    • David S. Miller's avatar
      sparc64: Do not define thread fpregs save area as zero-length array. · 21113443
      David S. Miller authored
      [ Upstream commit e2653143 ]
      
      This breaks the stack end corruption detection facility.
      
      What that facility does it write a magic value to "end_of_stack()"
      and checking to see if it gets overwritten.
      
      "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
      the beginning of the FPU register save area.
      
      So once the user uses the FPU, the magic value is overwritten and the
      debug checks trigger.
      
      Fix this by making the size explicit.
      
      Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
      are limited to 7 levels of FPU state saves.  So each FPU register set
      is 256 bytes, allocate 256 * 7 for the fpregs area.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      21113443
    • David S. Miller's avatar
      sparc64: Fix FPU register corruption with AES crypto offload. · 974b59b0
      David S. Miller authored
      [ Upstream commit f4da3628 ]
      
      The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
      key material is preloaded into the FPU registers, and then we loop
      over and over doing the crypt operation, reusing those pre-cooked key
      registers.
      
      There are intervening blkcipher*() calls between the crypt operation
      calls.  And those might perform memcpy() and thus also try to use the
      FPU.
      
      The sparc64 kernel FPU usage mechanism is designed to allow such
      recursive uses, but with a catch.
      
      There has to be a trap between the two FPU using threads of control.
      
      The mechanism works by, when the FPU is already in use by the kernel,
      allocating a slot for FPU saving at trap time.  Then if, within the
      trap handler, we try to use the FPU registers, the pre-trap FPU
      register state is saved into the slot.  Then at trap return time we
      notice this and restore the pre-trap FPU state.
      
      Over the long term there are various more involved ways we can make
      this work, but for a quick fix let's take advantage of the fact that
      the situation where this happens is very limited.
      
      All sparc64 chips that support the crypto instructiosn also are using
      the Niagara4 memcpy routine, and that routine only uses the FPU for
      large copies where we can't get the source aligned properly to a
      multiple of 8 bytes.
      
      We look to see if the FPU is already in use in this context, and if so
      we use the non-large copy path which only uses integer registers.
      
      Furthermore, we also limit this special logic to when we are doing
      kernel copy, rather than a user copy.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      974b59b0
    • David S. Miller's avatar
      sparc64: Fix lockdep warnings on reboot on Ultra-5 · d7e30385
      David S. Miller authored
      [ Upstream commit bdcf81b6 ]
      
      Inconsistently, the raw_* IRQ routines do not interact with and update
      the irqflags tracing and lockdep state, whereas the raw_* spinlock
      interfaces do.
      
      This causes problems in p1275_cmd_direct() because we disable hardirqs
      by hand using raw_local_irq_restore() and then do a raw_spin_lock()
      which triggers a lockdep trace because the CPU's hw IRQ state doesn't
      match IRQ tracing's internal software copy of that state.
      
      The CPU's irqs are disabled, yet current->hardirqs_enabled is true.
      
      ====================
      reboot: Restarting system
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      Modules linked in: openpromfs
      CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G        W      3.17.0-dirty #145
      Call Trace:
       [000000000045919c] warn_slowpath_common+0x5c/0xa0
       [0000000000459210] warn_slowpath_fmt+0x30/0x40
       [000000000048f41c] check_flags+0x7c/0x240
       [0000000000493280] lock_acquire+0x20/0x1c0
       [0000000000832b70] _raw_spin_lock+0x30/0x60
       [000000000068f2fc] p1275_cmd_direct+0x1c/0x60
       [000000000068ed28] prom_reboot+0x28/0x40
       [000000000043610c] machine_restart+0x4c/0x80
       [000000000047d2d4] kernel_restart+0x54/0x80
       [000000000047d618] SyS_reboot+0x138/0x200
       [00000000004060b4] linux_sparc_syscall32+0x34/0x60
      ---[ end trace 5c439fe81c05a100 ]---
      possible reason: unannotated irqs-off.
      irq event stamp: 2010267
      hardirqs last  enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
      hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
      softirqs last  enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
      softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
      Resetting ...
      ====================
      
      Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
      all of our changes.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d7e30385
    • David S. Miller's avatar
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · f5a045e4
      David S. Miller authored
      [ Upstream commit 473ad7f4 ]
      
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f5a045e4
    • Andreas Larsson's avatar
      sparc: Let memset return the address argument · 8d206bf0
      Andreas Larsson authored
      [ Upstream commit 74cad25c ]
      
      This makes memset follow the standard (instead of returning 0 on success). This
      is needed when certain versions of gcc optimizes around memset calls and assume
      that the address argument is preserved in %o0.
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8d206bf0
    • Sowmini Varadhan's avatar
      sparc64: Move request_irq() from ldc_bind() to ldc_alloc() · 11342eb0
      Sowmini Varadhan authored
      [ Upstream commit c21c4ab0 ]
      
      The request_irq() needs to be done from ldc_alloc()
      to avoid the following (caught by lockdep)
      
       [00000000004a0738] __might_sleep+0xf8/0x120
       [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
       [00000000004faf80] request_threaded_irq+0x80/0x160
       [000000000044f71c] ldc_bind+0x7c/0x220
       [0000000000452454] vio_port_up+0x54/0xe0
       [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
       [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
       [0000000000451a88] vio_device_probe+0x48/0x60
       [000000000074c56c] really_probe+0x6c/0x300
       [000000000074c83c] driver_probe_device+0x3c/0xa0
       [000000000074c92c] __driver_attach+0x8c/0xa0
       [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
       [000000000074c1dc] driver_attach+0x1c/0x40
       [000000000074b0fc] bus_add_driver+0xbc/0x280
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      11342eb0
    • bob picco's avatar
      sparc64: find_node adjustment · a92d48f7
      bob picco authored
      [ Upstream commit 3dee9df5 ]
      
      We have seen an issue with guest boot into LDOM that causes early boot failures
      because of no matching rules for node identitity of the memory. I analyzed this
      on my T4 and concluded there might not be a solution. I saw the issue in
      mainline too when booting into the control/primary domain - with guests
      configured.  Note, this could be a firmware bug on some older machines.
      
      I'll provide a full explanation of the issues below. Should we not find a
      matching BEST latency group for a real address (RA) then we will assume node 0.
      On the T4-2 here with the information provided I can't see an alternative.
      
      Technically the LDOM shown below should match the MBLOCK to the
      favorable latency group. However other factors must be considered too. Were
      the memory controllers configured "fine" grained interleave or "coarse"
      grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
      node?
      
      There has to be at least one Machine Description (MD) "group" and hence one
      NUMA node. The group can have one or more latency groups (lg) - more than one
      memory controller. The current code chooses the smallest latency as the most
      favorable per group. The latency and lg information is in MLGROUP below.
      MBLOCK is the base and size of the RAs for the machine as fetched from OBP
      /memory "available" property. My machine has one MBLOCK but more would be
      possible - with holes?
      
      For a T4-2 the following information has been gathered:
      with LDOM guest
      MEMBLOCK configuration:
       memory size = 0x27f870000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
       memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
       memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
      MBLOCK[0]: base[20000000] size[280000000] offset[0]
      (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
      (note: (RA + offset) & mask = val is the formula to detect a match for the
      memory controller. should there be no match for find_node node, a return
      value of -1 resulted for the node - BAD)
      
      There is one group. It has these forward links
      MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
      MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
      NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
      (note: "val" is the best lg's (smallest latency) "match")
      
      no LDOM guest - bare metal
      MEMBLOCK configuration:
       memory size = 0xfdf2d0000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
       memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
       memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
      MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
      
      there are two groups
      group node[16d5]
      MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
      MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
      NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
      group node[171d]
      MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
      MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
      NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
      (note: for this two "group" bare metal machine, 1/2 memory is in group one's
      lg and 1/2 memory is in group two's lg).
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a92d48f7
    • David S. Miller's avatar
      sparc64: Fix corrupted thread fault code. · 6c558d67
      David S. Miller authored
      [ Upstream commit 84bd6d8b ]
      
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6c558d67
    • bob picco's avatar
      sparc64: sun4v TLB error power off events · 5aa329df
      bob picco authored
      [ Upstream commit 4ccb9272 ]
      
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5aa329df
    • Daniel Hellstrom's avatar
      sparc32: dma_alloc_coherent must honour gfp flags · bbf7a79b
      Daniel Hellstrom authored
      [ Upstream commit d1105287 ]
      
      dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO)
      but the sparc32 implementations sbus_alloc_coherent() and
      pci32_alloc_coherent() doesn't take the gfp flags into
      account.
      
      Tested on the SPARC32/LEON GRETH Ethernet driver which fails
      due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed
      pages.
      Signed-off-by: default avatarDaniel Hellstrom <daniel@gaisler.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bbf7a79b
    • David S. Miller's avatar
      sparc64: Fix pcr_ops initialization and usage bugs. · 226c864d
      David S. Miller authored
      [ Upstream commit 8bccf5b3 ]
      
      Christopher reports that perf_event_print_debug() can crash in uniprocessor
      builds.  The crash is due to pcr_ops being NULL.
      
      This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
      only executes in SMP builds.
      
      init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
      therefore:
      
      1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
         from smp_cpus_done().
      
      2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().
      
      3) Move init_hw_perf_events to a later initcall so that it we will be
         sure to invoke pcr_arch_init() after all cpus are brought up.
      
      Finally, guard the one naked sequence of pcr_ops dereferences in
      __global_pmu_self() with an appropriate NULL check.
      Reported-by: default avatarChristopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      226c864d
    • David S. Miller's avatar
      sparc64: Do not disable interrupts in nmi_cpu_busy() · 3554b332
      David S. Miller authored
      [ Upstream commit 58556104 ]
      
      nmi_cpu_busy() is a SMP function call that just makes sure that all of the
      cpus are spinning using cpu cycles while the NMI test runs.
      
      It does not need to disable IRQs because we just care about NMIs executing
      which will even with 'normal' IRQs disabled.
      
      It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
      triggers the BUG check in irq_work_run_list():
      
      	BUG_ON(!irqs_disabled());
      
      Because now irq_work_run() is invoked from the tail of
      generic_smp_call_function_single_interrupt().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3554b332
    • Per Hurtig's avatar
      tcp: fixing TLP's FIN recovery · 3d552f5f
      Per Hurtig authored
      [ Upstream commit bef1909e ]
      
      Fix to a problem observed when losing a FIN segment that does not
      contain data.  In such situations, TLP is unable to recover from
      *any* tail loss and instead adds at least PTO ms to the
      retransmission process, i.e., RTO = RTO + PTO.
      Signed-off-by: default avatarPer Hurtig <per.hurtig@kau.se>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3d552f5f
    • Vlad Yasevich's avatar
      sctp: handle association restarts when the socket is closed. · 27233e7e
      Vlad Yasevich authored
      [ Upstream commit bdf6fa52 ]
      
      Currently association restarts do not take into consideration the
      state of the socket.  When a restart happens, the current assocation
      simply transitions into established state.  This creates a condition
      where a remote system, through a the restart procedure, may create a
      local association that is no way reachable by user.  The conditions
      to trigger this are as follows:
        1) Remote does not acknoledge some data causing data to remain
           outstanding.
        2) Local application calls close() on the socket.  Since data
           is still outstanding, the association is placed in SHUTDOWN_PENDING
           state.  However, the socket is closed.
        3) The remote tries to create a new association, triggering a restart
           on the local system.  The association moves from SHUTDOWN_PENDING
           to ESTABLISHED.  At this point, it is no longer reachable by
           any socket on the local system.
      
      This patch addresses the above situation by moving the newly ESTABLISHED
      association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
      the COOKIE-ACK chunk.  This way, the restarted associate immidiately
      enters the shutdown procedure and forces the termination of the
      unreachable association.
      Reported-by: default avatarDavid Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      27233e7e
    • Joe Lawrence's avatar
      team: avoid race condition in scheduling delayed work · 08e2bf25
      Joe Lawrence authored
      [ Upstream commit 47549650 ]
      
      When team_notify_peers and team_mcast_rejoin are called, they both reset
      their respective .count_pending atomic variable. Then when the actual
      worker function is executed, the variable is atomically decremented.
      This pattern introduces a potential race condition where the
      .count_pending rolls over and the worker function keeps rescheduling
      until .count_pending decrements to zero again:
      
      THREAD 1                           THREAD 2
      
      ========                           ========
      team_notify_peers(teamX)
        atomic_set count_pending = 1
        schedule_delayed_work
                                         team_notify_peers(teamX)
                                         atomic_set count_pending = 1
      team_notify_peers_work
        atomic_dec_and_test
          count_pending = 0
        (return)
                                         schedule_delayed_work
                                         team_notify_peers_work
                                         atomic_dec_and_test
                                           count_pending = -1
                                         schedule_delayed_work
                                         (repeat until count_pending = 0)
      
      Instead of assigning a new value to .count_pending, use atomic_add to
      tack-on the additional desired worker function invocations.
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      08e2bf25
    • Nicolas Dichtel's avatar
      ip6_gre: fix flowi6_proto value in xmit path · fd496bbb
      Nicolas Dichtel authored
      [ Upstream commit 3be07244 ]
      
      In xmit path, we build a flowi6 which will be used for the output route lookup.
      We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
      protocol should be IPPROTO_GRE.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarMatthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fd496bbb
    • KY Srinivasan's avatar
      hyperv: Fix a bug in netvsc_start_xmit() · f40ce87b
      KY Srinivasan authored
      [ Upstream commit dedb845d ]
      
      After the packet is successfully sent, we should not touch the skb
      as it may have been freed. This patch is based on the work done by
      Long Li <longli@microsoft.com>.
      
      In this version of the patch I have fixed issues pointed out by David.
      David, please queue this up for stable.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Tested-by: default avatarLong Li <longli@microsoft.com>
      Tested-by: default avatarSitsofe Wheeler <sitsofe@yahoo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f40ce87b
    • Eric Dumazet's avatar
      gro: fix aggregation for skb using frag_list · 960166f3
      Eric Dumazet authored
      [ Upstream commit 73d3fe6d ]
      
      In commit 8a29111c ("net: gro: allow to build full sized skb")
      I added a regression for linear skb that traditionally force GRO
      to use the frag_list fallback.
      
      Erez Shitrit found that at most two segments were aggregated and
      the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.
      
      This is because pinfo at this spot still points to the last skb in the
      chain, instead of the first one, where we find the correct gso_size
      information.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Reported-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      960166f3
    • Soren Brinkmann's avatar
      Revert "net/macb: add pinctrl consumer support" · e2b27cfe
      Soren Brinkmann authored
      [ Upstream commit 9026968a ]
      
      This reverts commit 8ef29f8a.
      The driver core already calls pinctrl_get() and claims the default
      state. There is no need to replicate this in the driver.
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e2b27cfe
    • Vlad Yasevich's avatar
      macvtap: Fix race between device delete and open. · 7eaa5736
      Vlad Yasevich authored
      [ Upstream commit 40b8fe45 ]
      
      In macvtap device delete and open calls can race and
      this causes a list curruption of the vlan queue_list.
      
      The race intself is triggered by the idr accessors
      that located the vlan device.  The device is stored
      into and removed from the idr under both an rtnl and
      a mutex.  However, when attempting to locate the device
      in idr, only a mutex is taken.  As a result, once cpu
      perfoming a delete may take an rtnl and wait for the mutex,
      while another cput doing an open() will take the idr
      mutex first to fetch the device pointer and later take
      an rtnl to add a queue for the device which may have
      just gotten deleted.
      
      With this patch, we now hold the rtnl for the duration
      of the macvtap_open() call thus making sure that
      open will not race with delete.
      
      CC: Michael S. Tsirkin <mst@redhat.com>
      CC: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7eaa5736
    • Steffen Klassert's avatar
      xfrm: Generate queueing routes only from route lookup functions · 0f398045
      Steffen Klassert authored
      [ Upstream commit b8c203b2 ]
      
      Currently we genarate a queueing route if we have matching policies
      but can not resolve the states and the sysctl xfrm_larval_drop is
      disabled. Here we assume that dst_output() is called to kill the
      queued packets. Unfortunately this assumption is not true in all
      cases, so it is possible that these packets leave the system unwanted.
      
      We fix this by generating queueing routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: a0073fe1 ("xfrm: Add a state resolution packet queue")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0f398045
    • Steffen Klassert's avatar
      xfrm: Generate blackhole routes only from route lookup functions · 7705ff41
      Steffen Klassert authored
      [ Upstream commit f92ee619 ]
      
      Currently we genarate a blackhole route route whenever we have
      matching policies but can not resolve the states. Here we assume
      that dst_output() is called to kill the balckholed packets.
      Unfortunately this assumption is not true in all cases, so
      it is possible that these packets leave the system unwanted.
      
      We fix this by generating blackhole routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7705ff41
    • Vlad Yasevich's avatar
      tg3: Allow for recieve of full-size 8021AD frames · 9a337cdb
      Vlad Yasevich authored
      [ Upstream commit 7d3083ee ]
      
      When receiving a vlan-tagged frame that still contains
      a vlan header, the length of the packet will be greater
      then MTU+ETH_HLEN since it will account of the extra
      vlan header.  TG3 checks this for the case for 802.1Q,
      but not for 802.1ad.  As a result, full sized 802.1ad
      frames get dropped by the card.
      
      Add a check for 802.1ad protocol when receving full
      sized frames.
      Suggested-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9a337cdb