1. 21 Oct, 2013 13 commits
    • Julian Anastasov's avatar
      ipv6: fill rt6i_gateway with nexthop address · 550bab42
      Julian Anastasov authored
      Make sure rt6i_gateway contains nexthop information in
      all routes returned from lookup or when routes are directly
      attached to skb for generated ICMP packets.
      
      The effect of this patch should be a faster version of
      rt6_nexthop() and the consideration of local addresses as
      nexthop.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      550bab42
    • Julian Anastasov's avatar
      ipv6: always prefer rt6i_gateway if present · 96dc8095
      Julian Anastasov authored
      In v3.9 6fd6ce20 ("ipv6: Do not depend on rt->n in
      ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
      such that the recently introduced rt6_nexthop() is used
      instead of an assigned neighbor.
      
      As rt6_nexthop() prefers rt6i_gateway only for gatewayed
      routes this causes a problem for users like IPVS, xt_TEE and
      RAW(hdrincl) if they want to use different address for routing
      compared to the destination address.
      
      Another case is when redirect can create RTF_DYNAMIC
      route without RTF_GATEWAY flag, we ignore the rt6i_gateway
      in rt6_nexthop().
      
      Fix the above problems by considering the rt6i_gateway if
      present, so that traffic routed to address on local subnet is
      not wrongly diverted to the destination address.
      
      Thanks to Simon Horman and Phil Oester for spotting the
      problematic commit.
      
      Thanks to Hannes Frederic Sowa for his review and help in testing.
      Reported-by: default avatarPhil Oester <kernel@linuxace.com>
      Reported-by: default avatarMark Brooks <mark@loadbalancer.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96dc8095
    • David S. Miller's avatar
      Merge branch 'bnx2x' · 4440c6f7
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      bnx2x: Bug fixes patch series
      
      This patch series contains fixes for various flows - several SR-IOV issues
      are fixed, ethtool callbacks (coalescing and register dump) are corrected,
      null pointer dereference on error flows is prevented, etc.
      
      Changes from V1
      ---------------
       - Patch 2  "bnx2x: Prevent an illegal pointer dereference during panic"
         is revised, with improved handling of edge cases.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4440c6f7
    • Merav Sicron's avatar
      bnx2x: Set NETIF_F_HIGHDMA unconditionally · edd31476
      Merav Sicron authored
      Current driver implementation incorrectly sets the flag only if 64-bit
      DMA mask succeeded.
      Signed-off-by: default avatarMerav Sicron <meravs@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edd31476
    • Dmitry Kravkov's avatar
      bnx2x: Don't pretend during register dump · 4293b9f5
      Dmitry Kravkov authored
      As part of a register dump, the interface pretends to have the identity
      of other interfaces of the same physical device in order to perform
      HW configuration for them - specifically, it needs to prevent attentions
      from generating on those functions as the register dump accesses registers
      in common blocks which whose reading might generate an attention.
      
      However, such pretension is unsafe - unlike other flows in which the driver
      uses pretend, during register dump there is no guarantee no other HW access
      will take place (by other flows). If such access will take place, the HW will
      be accessed by the wrong interface, and leave both functions in an incorrect
      state.
      
      This patch removes all pretensions from the register dump flow. Instead, it
      changes initial configuration of attentions such that no fatal attention will
      be generated for other functions as a result of the register dump
      (notice however, a debug print claiming an attention from other functions IS
      possible during the register dump)
      Signed-off-by: default avatarDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4293b9f5
    • Ariel Elior's avatar
      bnx2x: Lock DMAE when used by statistic flow · 32316a46
      Ariel Elior authored
      bnx2x has several clients to its DMAE machines - all of them with the exception
      of the statistics flow used the same locking mechanisms to synchronize the DMAE
      machines' usage.
      
      Since statistics (which are periodically entered) use DMAE without taking the
      locks, they may erase the commands which were previously set -
      e.g., it may cause a VF to timeout while waiting for a PF answer on the VF-PF
      channel as that command header would have been overwritten by the statistics'
      header.
      
      This patch makes certain that all flows utilizing DMAE will use the same
      API, assuring that the locking scheme will be kept by all said flows.
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32316a46
    • Yuval Mintz's avatar
      bnx2x: Prevent null pointer dereference on error flow · 6b991c37
      Yuval Mintz authored
      If debug message is open and bnx2x_vfop_qdtor_cmd() were to fail,
      the resulting print would have caused a null pointer dereference.
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b991c37
    • Ariel Elior's avatar
      bnx2x: Fix config when SR-IOV and iSCSI are enabled · 0907f34c
      Ariel Elior authored
      Starting with commit b9871bcf "bnx2x: VF RSS support - PF side", if a PF will
      have SR-IOV supported in its PCI configuration space, storage drivers will not
      work for that interface.
      
      This patch fixes the resource calculation to allow such a configuration to
      properly work.
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0907f34c
    • Dmitry Kravkov's avatar
      bnx2x: Fix Coalescing configuration · 6802516e
      Dmitry Kravkov authored
      bnx2x drivers configure coalescing incorrectly (e.g., as a result of a call
      to 'ethtool -c'). Although this is almost invisible to the user (due to NAPI)
      designated tests will show the configuration is incorrect.
      Signed-off-by: default avatarDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6802516e
    • Ariel Elior's avatar
      bnx2x: Unlock VF-PF channel on MAC/VLAN config error · 31329afd
      Ariel Elior authored
      Current code returns upon failure, leaving the VF-PF in an unusable state;
      This patch adds the missing release so further commands could pass between
      PF and VF.
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31329afd
    • Yuval Mintz's avatar
      bnx2x: Prevent an illegal pointer dereference during panic · 1a6974b2
      Yuval Mintz authored
      During a panic, the driver tries to print the Management FW buffer of recent
      commands. To do so, the driver reads the address of that buffer from a known
      address. If the buffer is unavailable (e.g., PCI reads don't work, MCP is
      failing, etc.), the driver will try to access the address it has read, possibly
      causing a kernel panic.
      
      This check 'sanitizes' the access, validating the read value is indeed a valid
      address inside the management FW's buffers.
      The patch also removes a read outside the scope of the buffer, which resulted
      in some unrelated chraracters appearing in the log.
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a6974b2
    • Yuval Mintz's avatar
      bnx2x: Fix Maximum CoS estimation for VFs · b1239723
      Yuval Mintz authored
      bnx2x VFs do not support Multi-CoS; Current implementation
      erroneously sets the VFs maximal number of CoS to be > 1.
      
      This will cause the driver to call alloc_etherdev_mqs() with
      a number of queues it cannot possibly support and reflects
      in 'odd' driver prints.
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1239723
    • Mugunthan V N's avatar
      drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing · 49595b7b
      Mugunthan V N authored
      When interrupt pacing is enabled, receive/transmit statistics are not
      updated properly by hardware which leads to ISR return with IRQ_NONE
      and inturn kernel disables the interrupt. This patch removed the checking
      of receive/transmit statistics from ISR.
      
      This patch is verified with AM335x Beagle Bone Black and below is the
      kernel warn when interrupt pacing is enabled.
      
      [  104.298254] irq 58: nobody cared (try booting with the "irqpoll" option)
      [  104.305356] CPU: 0 PID: 1073 Comm: iperf Not tainted 3.12.0-rc3-00342-g77d4015b #3
      [  104.313284] [<c001bb84>] (unwind_backtrace+0x0/0xf0) from [<c0017db0>] (show_stack+0x10/0x14)
      [  104.322282] [<c0017db0>] (show_stack+0x10/0x14) from [<c0507920>] (dump_stack+0x78/0x94)
      [  104.330816] [<c0507920>] (dump_stack+0x78/0x94) from [<c0088c1c>] (__report_bad_irq+0x20/0xc0)
      [  104.339889] [<c0088c1c>] (__report_bad_irq+0x20/0xc0) from [<c008912c>] (note_interrupt+0x1dc/0x23c)
      [  104.349505] [<c008912c>] (note_interrupt+0x1dc/0x23c) from [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238)
      [  104.359851] [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238) from [<c0086f24>] (handle_irq_event+0x3c/0x5c)
      [  104.370198] [<c0086f24>] (handle_irq_event+0x3c/0x5c) from [<c008991c>] (handle_level_irq+0xac/0x10c)
      [  104.379907] [<c008991c>] (handle_level_irq+0xac/0x10c) from [<c00866d8>] (generic_handle_irq+0x20/0x30)
      [  104.389812] [<c00866d8>] (generic_handle_irq+0x20/0x30) from [<c0014ce8>] (handle_IRQ+0x4c/0xb0)
      [  104.399066] [<c0014ce8>] (handle_IRQ+0x4c/0xb0) from [<c000856c>] (omap3_intc_handle_irq+0x60/0x74)
      [  104.408598] [<c000856c>] (omap3_intc_handle_irq+0x60/0x74) from [<c050d8e4>] (__irq_svc+0x44/0x5c)
      [  104.418021] Exception stack(0xde4f7c00 to 0xde4f7c48)
      [  104.423345] 7c00: 00000001 00000000 00000000 dd002140 60000013 de006e54 00000002 00000000
      [  104.431952] 7c20: de345748 00000040 c11c8588 00018ee0 00000000 de4f7c48 c009dfc8 c050d300
      [  104.440553] 7c40: 60000013 ffffffff
      [  104.444237] [<c050d8e4>] (__irq_svc+0x44/0x5c) from [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44)
      [  104.454220] [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44) from [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38)
      [  104.465295] [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38) from [<c0088068>] (enable_irq+0x4c/0x74)
      [  104.474829] [<c0088068>] (enable_irq+0x4c/0x74) from [<c03abd24>] (cpsw_poll+0xb8/0xdc)
      [  104.483276] [<c03abd24>] (cpsw_poll+0xb8/0xdc) from [<c044ef68>] (net_rx_action+0xc0/0x1e8)
      [  104.492085] [<c044ef68>] (net_rx_action+0xc0/0x1e8) from [<c0048a90>] (__do_softirq+0x100/0x27c)
      [  104.501338] [<c0048a90>] (__do_softirq+0x100/0x27c) from [<c0048cd0>] (do_softirq+0x68/0x70)
      [  104.510224] [<c0048cd0>] (do_softirq+0x68/0x70) from [<c0048e8c>] (local_bh_enable+0xd0/0xe4)
      [  104.519211] [<c0048e8c>] (local_bh_enable+0xd0/0xe4) from [<c048c774>] (tcp_rcv_established+0x450/0x648)
      [  104.529201] [<c048c774>] (tcp_rcv_established+0x450/0x648) from [<c0494904>] (tcp_v4_do_rcv+0x154/0x474)
      [  104.539195] [<c0494904>] (tcp_v4_do_rcv+0x154/0x474) from [<c043d750>] (release_sock+0xac/0x1ac)
      [  104.548448] [<c043d750>] (release_sock+0xac/0x1ac) from [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c)
      [  104.557528] [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c) from [<c04a8720>] (inet_recvmsg+0xcc/0xf0)
      [  104.566507] [<c04a8720>] (inet_recvmsg+0xcc/0xf0) from [<c0439744>] (sock_recvmsg+0x90/0xb0)
      [  104.575394] [<c0439744>] (sock_recvmsg+0x90/0xb0) from [<c043b778>] (SyS_recvfrom+0x88/0xd8)
      [  104.584280] [<c043b778>] (SyS_recvfrom+0x88/0xd8) from [<c043b7e0>] (sys_recv+0x18/0x20)
      [  104.592805] [<c043b7e0>] (sys_recv+0x18/0x20) from [<c0013da0>] (ret_fast_syscall+0x0/0x48)
      [  104.601587] handlers:
      [  104.603992] [<c03acd94>] cpsw_interrupt
      [  104.608040] Disabling IRQ #58
      
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49595b7b
  2. 19 Oct, 2013 6 commits
    • David S. Miller's avatar
      Merge branch 'ufo_fixes' · 77d4015b
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      UFO fixes
      
      Couple of patches fixing UFO functionality in different situations.
      
      v1->v2:
      - minor if{}else{} coding style adjustment suggested by Sergei Shtylyov
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d4015b
    • Jiri Pirko's avatar
      ip_output: do skb ufo init for peeked non ufo skb as well · e93b7d74
      Jiri Pirko authored
      Now, if user application does:
      sendto len<mtu flag MSG_MORE
      sendto len>mtu flag 0
      The skb is not treated as fragmented one because it is not initialized
      that way. So move the initialization to fix this.
      
      introduced by:
      commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e93b7d74
    • Jiri Pirko's avatar
      ip6_output: do skb ufo init for peeked non ufo skb as well · c547dbf5
      Jiri Pirko authored
      Now, if user application does:
      sendto len<mtu flag MSG_MORE
      sendto len>mtu flag 0
      The skb is not treated as fragmented one because it is not initialized
      that way. So move the initialization to fix this.
      
      introduced by:
      commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c547dbf5
    • Jiri Pirko's avatar
      udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames · e36d3ff9
      Jiri Pirko authored
      if up->pending != 0 dontfrag is left with default value -1. That
      causes that application that do:
      sendto len>mtu flag MSG_MORE
      sendto len>mtu flag 0
      will receive EMSGSIZE errno as the result of the second sendto.
      
      This patch fixes it by respecting IPV6_DONTFRAG socket option.
      
      introduced by:
      commit 4b340ae2 "IPv6: Complete IPV6_DONTFRAG support"
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e36d3ff9
    • Seif Mazareeb's avatar
      net: fix cipso packet validation when !NETLABEL · f2e5ddcc
      Seif Mazareeb authored
      When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
      forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
      crash in an SMP system, since the CPU executing this function will
      stall /not respond to IPIs.
      
      This problem can be reproduced by running the IP Stack Integrity Checker
      (http://isic.sourceforge.net) using the following command on a Linux machine
      connected to DUT:
      
      "icmpsic -s rand -d <DUT IP address> -r 123456"
      wait (1-2 min)
      Signed-off-by: default avatarSeif Mazareeb <seif@marvell.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2e5ddcc
    • Daniel Borkmann's avatar
      net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race · 90c6bd34
      Daniel Borkmann authored
      In the case of credentials passing in unix stream sockets (dgram
      sockets seem not affected), we get a rather sparse race after
      commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default").
      
      We have a stream server on receiver side that requests credential
      passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
      on each spawned/accepted socket on server side to 1 first (as it's
      not inherited), it can happen that in the time between accept() and
      setsockopt() we get interrupted, the sender is being scheduled and
      continues with passing data to our receiver. At that time SO_PASSCRED
      is neither set on sender nor receiver side, hence in cmsg's
      SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
      (== overflow{u,g}id) instead of what we actually would like to see.
      
      On the sender side, here nc -U, the tests in maybe_add_creds()
      invoked through unix_stream_sendmsg() would fail, as at that exact
      time, as mentioned, the sender has neither SO_PASSCRED on his side
      nor sees it on the server side, and we have a valid 'other' socket
      in place. Thus, sender believes it would just look like a normal
      connection, not needing/requesting SO_PASSCRED at that time.
      
      As reverting 16e57262 would not be an option due to the significant
      performance regression reported when having creds always passed,
      one way/trade-off to prevent that would be to set SO_PASSCRED on
      the listener socket and allow inheriting these flags to the spawned
      socket on server side in accept(). It seems also logical to do so
      if we'd tell the listener socket to pass those flags onwards, and
      would fix the race.
      
      Before, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
              msg_flags=0}, 0) = 5
      
      After, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
              msg_flags=0}, 0) = 5
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90c6bd34
  3. 18 Oct, 2013 8 commits
    • Himanshu Madhani's avatar
      qlcnic: Validate Tx queue only for 82xx adapters. · 66c562ef
      Himanshu Madhani authored
      o validate Tx queue only in case of adapters which supports
        multi Tx queue.
      
        This patch is to fix regression introduced in commit
        aa4a1f7d
        "qlcnic: Enable Tx queue changes using ethtool for 82xx Series adapter"
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66c562ef
    • Vasundhara Volam's avatar
      be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd · 0fb88d61
      Vasundhara Volam authored
      It is a required field for all TX_CREATE cmd versions > 0.
      This fixes a driver initialization failure, caused by recent SH-R Firmwares
      (versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is
      not passed.
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fb88d61
    • Salva Peiró's avatar
      wanxl: fix info leak in ioctl · 2b13d06c
      Salva Peiró authored
      The wanxl_ioctl() code fails to initialize the two padding bytes of
      struct sync_serial_settings after the ->loopback member. Add an explicit
      memset(0) before filling the structure to avoid the info leak.
      Signed-off-by: default avatarSalva Peiró <speiro@ai2.upv.es>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b13d06c
    • David S. Miller's avatar
      Merge branch 'bridge_pvid' · b8bde1c4
      David S. Miller authored
      Toshiaki Makita says:
      
      ====================
      bridge: Fix problems around the PVID
      
      There seem to be some undesirable behaviors related with PVID.
      1. It has no effect assigning PVID to a port. PVID cannot be applied
      to any frame regardless of whether we set it or not.
      2. FDB entries learned via frames applied PVID are registered with
      VID 0 rather than VID value of PVID.
      3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
      This leads interoperational problems such as sending frames with VID
      4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
      0 as they belong to VLAN 0, which is expected to be handled as they have
      no VID according to IEEE 802.1Q.
      
      Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
      is fixed, because we cannot activate PVID due to it.
      
      This is my analysis for each behavior.
      1. We are using VLAN_TAG_PRESENT bit when getting PVID, and not when
      adding/deleting PVID.
      It can be fixed in either way using or not using VLAN_TAG_PRESENT,
      but I think the latter is slightly more efficient.
      
      2. We are setting skb->vlan_tci with the value of PVID but the variable
      vid, which is used in FDB later, is set to 0 at br_allowed_ingress()
      when untagged frames arrive at a port with PVID valid. I'm afraid that
      vid should be updated to the value of PVID if PVID is valid.
      
      3. According to IEEE 802.1Q-2011 (6.9.1 and Table 9-2), we cannot use
      VID 0 or 4095 as a PVID.
      It looks like that there are more stuff to consider.
      
      - VID 0:
      VID 0 shall not be configured in any FDB entry and used in a tag header
      to indicate it is a 802.1p priority-tagged frame.
      Priority-tagged frames should be applied PVID (from IEEE 802.1Q 6.9.1).
      In my opinion, since we can filter incomming priority-tagged frames by
      deleting PVID, we don't need to filter them by vlan_bitmap.
      In other words, priority-tagged frames don't have VID 0 but have no VID,
      which is the same as untagged frames, and should be filtered by unsetting
      PVID.
      So, not only we cannot set PVID as 0, but also we don't need to add 0 to
      vlan_bitmap, which enables us to simply forbid to add vlan 0.
      
      - VID 4095:
      VID 4095 shall not be transmitted in a tag header. This VID value may be
      used to indicate a wildcard match for the VID in management operations or
      FDB entries (from IEEE 802.1Q Table 9-2).
      In current implementation, we can create a static FDB entry with all
      existing VIDs by not specifying any VID when creating it.
      I don't think this way to add wildcard-like entries needs to change,
      and VID 4095 looks no use and can be unacceptable to add.
      
      Consequently, I believe what we should do for 3rd problem is below:
      - Not allowing VID 0 and 4095 to be added.
      - Applying PVID to priority-tagged (VID 0) frames.
      
      Note: It has been descovered that another problem related to priority-tags
      remains. If we use vlan 0 interface such as eth0.0, we cannot communicate
      with another end station via a linux bridge.
      This problem exists regardless of whether this patch set is applied or not
      because we might receive untagged frames from another end station even if we
      are sending priority-tagged frames.
      This issue will be addressed by another patch set introducing an additional
      egress policy, on which Vlad Yasevich is working.
      See http://marc.info/?t=137880893800001&r=1&w=2 for detailed discussion.
      
      Patch set follows this mail.
      The order of patches is not the same as described above, because the way
      to fix 1st problem is based on the assumption that we don't use VID 0 as
      a PVID, which is realized by fixing 3rd problem.
      (1/4)(2/4): Fix 3rd problem.
      (3/4): Fix 1st problem.
      (4/4): Fix 2nd probelm.
      
      v2:
      - Add descriptions about the problem related to priority-tags in cover letter.
      - Revise patch comments to reference the newest spec.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8bde1c4
    • Toshiaki Makita's avatar
      bridge: Fix updating FDB entries when the PVID is applied · dfb5fa32
      Toshiaki Makita authored
      We currently set the value that variable vid is pointing, which will be
      used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
      or priority-tagged frames, even though the PVID is valid.
      This leads to FDB updates in such a wrong way that they are learned with
      VID 0.
      Update the value to that of PVID if the PVID is applied.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Reviewed-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfb5fa32
    • Toshiaki Makita's avatar
      bridge: Fix the way the PVID is referenced · d1c6c708
      Toshiaki Makita authored
      We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is
      set or not at br_get_pvid(), while we don't care about the bit in
      adding/deleting the PVID, which makes it impossible to forward any
      incomming untagged frame with vlan_filtering enabled.
      
      Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate
      that the PVID is not set, which is slightly more efficient than using
      the VLAN_TAG_PRESENT.
      
      Fix the problem by getting rid of using the VLAN_TAG_PRESENT.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Reviewed-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c6c708
    • Toshiaki Makita's avatar
      bridge: Apply the PVID to priority-tagged frames · b90356ce
      Toshiaki Makita authored
      IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
      use the PVID for the port as its VID.
      (See IEEE 802.1Q-2011 6.9.1 and Table 9-2)
      
      Apply the PVID to not only untagged frames but also priority-tagged frames.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Reviewed-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b90356ce
    • Toshiaki Makita's avatar
      bridge: Don't use VID 0 and 4095 in vlan filtering · 8adff41c
      Toshiaki Makita authored
      IEEE 802.1Q says that:
      - VID 0 shall not be configured as a PVID, or configured in any Filtering
      Database entry.
      - VID 4095 shall not be configured as a PVID, or transmitted in a tag
      header. This VID value may be used to indicate a wildcard match for the VID
      in management operations or Filtering Database entries.
      (See IEEE 802.1Q-2011 6.9.1 and Table 9-2)
      
      Don't accept adding these VIDs in the vlan_filtering implementation.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Reviewed-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8adff41c
  4. 17 Oct, 2013 13 commits