1. 10 Feb, 2017 26 commits
    • Ido Schimmel's avatar
      ipv4: fib: Only flush FIB aliases belonging to currently flushed table · 58e3bdd5
      Ido Schimmel authored
      In case the MAIN table is flushed and its trie is shared with the LOCAL
      table, then we might be flushing FIB aliases belonging to the latter.
      This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
      ID.
      
      The above doesn't affect current listeners, as the table ID is ignored
      during entry deletion, but this will change later in the patchset.
      
      When flushing a particular table, skip any aliases belonging to a
      different one.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Patrick McHardy <kaber@trash.net>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58e3bdd5
    • David S. Miller's avatar
    • Jarno Rajahalme's avatar
      openvswitch: Pack struct sw_flow_key. · 316d4d78
      Jarno Rajahalme authored
      struct sw_flow_key has two 16-bit holes. Move the most matched
      conntrack match fields there.  In some typical cases this reduces the
      size of the key that needs to be hashed into half and into one cache
      line.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      316d4d78
    • Jarno Rajahalme's avatar
      openvswitch: Add force commit. · dd41d33f
      Jarno Rajahalme authored
      Stateful network admission policy may allow connections to one
      direction and reject connections initiated in the other direction.
      After policy change it is possible that for a new connection an
      overlapping conntrack entry already exists, where the original
      direction of the existing connection is opposed to the new
      connection's initial packet.
      
      Most importantly, conntrack state relating to the current packet gets
      the "reply" designation based on whether the original direction tuple
      or the reply direction tuple matched.  If this "directionality" is
      wrong w.r.t. to the stateful network admission policy it may happen
      that packets in neither direction are correctly admitted.
      
      This patch adds a new "force commit" option to the OVS conntrack
      action that checks the original direction of an existing conntrack
      entry.  If that direction is opposed to the current packet, the
      existing conntrack entry is deleted and a new one is subsequently
      created in the correct direction.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd41d33f
    • Jarno Rajahalme's avatar
      openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890
      Jarno Rajahalme authored
      Add the fields of the conntrack original direction 5-tuple to struct
      sw_flow_key.  The new fields are initially marked as non-existent, and
      are populated whenever a conntrack action is executed and either finds
      or generates a conntrack entry.  This means that these fields exist
      for all packets that were not rejected by conntrack as untrackable.
      
      The original tuple fields in the sw_flow_key are filled from the
      original direction tuple of the conntrack entry relating to the
      current packet, or from the original direction tuple of the master
      conntrack entry, if the current conntrack entry has a master.
      Generally, expected connections of connections having an assigned
      helper (e.g., FTP), have a master conntrack entry.
      
      The main purpose of the new conntrack original tuple fields is to
      allow matching on them for policy decision purposes, with the premise
      that the admissibility of tracked connections reply packets (as well
      as original direction packets), and both direction packets of any
      related connections may be based on ACL rules applying to the master
      connection's original direction 5-tuple.  This also makes it easier to
      make policy decisions when the actual packet headers might have been
      transformed by NAT, as the original direction 5-tuple represents the
      packet headers before any such transformation.
      
      When using the original direction 5-tuple the admissibility of return
      and/or related packets need not be based on the mere existence of a
      conntrack entry, allowing separation of admission policy from the
      established conntrack state.  While existence of a conntrack entry is
      required for admission of the return or related packets, policy
      changes can render connections that were initially admitted to be
      rejected or dropped afterwards.  If the admission of the return and
      related packets was based on mere conntrack state (e.g., connection
      being in an established state), a policy change that would make the
      connection rejected or dropped would need to find and delete all
      conntrack entries affected by such a change.  When using the original
      direction 5-tuple matching the affected conntrack entries can be
      allowed to time out instead, as the established state of the
      connection would not need to be the basis for packet admission any
      more.
      
      It should be noted that the directionality of related connections may
      be the same or different than that of the master connection, and
      neither the original direction 5-tuple nor the conntrack state bits
      carry this information.  If needed, the directionality of the master
      connection can be stored in master's conntrack mark or labels, which
      are automatically inherited by the expected related connections.
      
      The fact that neither ARP nor ND packets are trackable by conntrack
      allows mutual exclusion between ARP/ND and the new conntrack original
      tuple fields.  Hence, the IP addresses are overlaid in union with ARP
      and ND fields.  This allows the sw_flow_key to not grow much due to
      this patch, but it also means that we must be careful to never use the
      new key fields with ARP or ND packets.  ARP is easy to distinguish and
      keep mutually exclusive based on the ethernet type, but ND being an
      ICMPv6 protocol requires a bit more attention.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dd7f890
    • Jarno Rajahalme's avatar
      openvswitch: Inherit master's labels. · 09aa98ad
      Jarno Rajahalme authored
      We avoid calling into nf_conntrack_in() for expected connections, as
      that would remove the expectation that we want to stick around until
      we are ready to commit the connection.  Instead, we do a lookup in the
      expectation table directly.  However, after a successful expectation
      lookup we have set the flow key label field from the master
      connection, whereas nf_conntrack_in() does not do this.  This leads to
      master's labels being inherited after an expectation lookup, but those
      labels not being inherited after the corresponding conntrack action
      with a commit flag.
      
      This patch resolves the problem by changing the commit code path to
      also inherit the master's labels to the expected connection.
      Resolving this conflict in favor of inheriting the labels allows more
      information be passed from the master connection to related
      connections, which would otherwise be much harder if the 32 bits in
      the connmark are not enough.  Labels can still be set explicitly, so
      this change only affects the default values of the labels in presense
      of a master connection.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09aa98ad
    • Jarno Rajahalme's avatar
      openvswitch: Refactor labels initialization. · 6ffcea79
      Jarno Rajahalme authored
      Refactoring conntrack labels initialization makes changes in later
      patches easier to review.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ffcea79
    • Jarno Rajahalme's avatar
      openvswitch: Simplify labels length logic. · b87cec38
      Jarno Rajahalme authored
      Since 23014011 ("netfilter: conntrack: support a fixed size of 128
      distinct labels"), the size of conntrack labels extension has fixed to
      128 bits, so we do not need to check for labels sizes shorter than 128
      at run-time.  This patch simplifies labels length logic accordingly,
      but allows the conntrack labels size to be increased in the future
      without breaking the build.  In the event of conntrack labels
      increasing in size OVS would still be able to deal with the 128 first
      label bits.
      Suggested-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87cec38
    • Jarno Rajahalme's avatar
      openvswitch: Unionize ovs_key_ct_label with a u32 array. · cb80d58f
      Jarno Rajahalme authored
      Make the array of labels in struct ovs_key_ct_label an union, adding a
      u32 array of the same byte size as the existing u8 array.  It is
      faster to loop through the labels 32 bits at the time, which is also
      the alignment of netlink attributes.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb80d58f
    • Jarno Rajahalme's avatar
      openvswitch: Do not trigger events for unconfirmed connections. · 193e3096
      Jarno Rajahalme authored
      Receiving change events before the 'new' event for the connection has
      been received can be confusing.  Avoid triggering change events for
      setting conntrack mark or labels before the conntrack entry has been
      confirmed.
      
      Fixes: 182e3042 ("openvswitch: Allow matching on conntrack mark")
      Fixes: c2ac6673 ("openvswitch: Allow matching on conntrack label")
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      193e3096
    • Jarno Rajahalme's avatar
      openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. · 9ff464db
      Jarno Rajahalme authored
      The conntrack lookup for existing connections fails to invert the
      packet 5-tuple for NATted packets, and therefore fails to find the
      existing conntrack entry.  Conntrack only stores 5-tuples for incoming
      packets, and there are various situations where a lookup on a packet
      that has already been transformed by NAT needs to be made.  Looking up
      an existing conntrack entry upon executing packet received from the
      userspace is one of them.
      
      This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
      for the conntrack lookup whenever the packet has already been
      transformed by conntrack from its input form as evidenced by one of
      the NAT flags being set in the conntrack state metadata.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ff464db
    • Jarno Rajahalme's avatar
      openvswitch: Fix comments for skb->_nfct · 5e17da63
      Jarno Rajahalme authored
      Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
      they are combined into '_nfct'.
      Signed-off-by: default avatarJarno Rajahalme <jarno@ovn.org>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e17da63
    • David S. Miller's avatar
      Merge branch 'ena-bug-fixes' · 9878f602
      David S. Miller authored
      Netanel Belgazal says:
      
      ====================
      Bug Fixes in ENA driver
      
      Changes from V3:
      * Rebase patchset to master and solve merge conflicts.
      * Remove redundant bug fix (fix error handling when probe fails)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9878f602
    • Netanel Belgazal's avatar
    • Netanel Belgazal's avatar
      net/ena: change condition for host attribute configuration · dd8427a7
      Netanel Belgazal authored
      Move the host info config to be the first admin command that is executed.
      This change require the driver to remove the 'feature check'
      from host info configuration flow.
      The check is removed since the supported features bitmask field
      is retrieved only after calling ENA_ADMIN_DEVICE_ATTRIBUTES admin command.
      
      If set host info is not supported an error will be returned by the device.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd8427a7
    • Netanel Belgazal's avatar
      net/ena: change driver's default timeouts · 7102a18a
      Netanel Belgazal authored
      The timeouts were too agressive and sometimes cause false alarms.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7102a18a
    • Netanel Belgazal's avatar
    • Netanel Belgazal's avatar
      net/ena: use READ_ONCE to access completion descriptors · a8496eb8
      Netanel Belgazal authored
      Completion descriptors are accessed from the driver and from the device.
      To avoid reading the old value, use READ_ONCE macro.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8496eb8
    • Netanel Belgazal's avatar
      net/ena: use napi_complete_done() return value · b1669c9f
      Netanel Belgazal authored
      Do not unamsk interrupts if we are in busy poll mode.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1669c9f
    • Netanel Belgazal's avatar
      net/ena: fix potential access to freed memory during device reset · 3f6159db
      Netanel Belgazal authored
      If the ena driver detects that the device is not behave as expected,
      it tries to reset the device.
      The reset flow calls ena_down, which will frees all the resources
      the driver allocates and then it will reset the device.
      
      This flow can cause memory corruption if the device is still writes
      to the driver's memory space.
      To overcome this potential race, move the reset before the device
      resources are freed.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6159db
    • Netanel Belgazal's avatar
      net/ena: refactor ena_get_stats64 to be atomic context safe · d81db240
      Netanel Belgazal authored
      ndo_get_stat64() can be called from atomic context, but the current
      implementation sends an admin command to retrieve the statistics from
      the device. This admin command can sleep.
      
      This patch re-factors the implementation of ena_get_stats64() to use
      the {rx,tx}bytes/count from the driver's inner counters, and to obtain
      the rx drop counter from the asynchronous keep alive (heart bit)
      event.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d81db240
    • Netanel Belgazal's avatar
      net/ena: fix NULL dereference when removing the driver after device reset failed · 22b331c9
      Netanel Belgazal authored
      If for some reason the device stops responding, and the device reset
      failes to recover the device, the mmio register read data structure
      will not be reinitialized.
      
      On driver removal, the driver will also try to reset the device, but
      this time the mmio data structure will be NULL.
      
      To solve this issue, perform the device reset in the remove function
      only if the device is runnig.
      
      Crash log
         54.240382] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [   54.244186] IP: [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186] PGD 0
      [   54.244186] Oops: 0002 [#1] SMP
      [   54.244186] Modules linked in: ena_drv(OE-) snd_hda_codec_generic kvm_intel kvm crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_intel aes_x86_64 snd_hda_controller lrw gf128mul cirrus glue_helper ablk_helper ttm snd_hda_codec drm_kms_helper cryptd snd_hwdep drm snd_pcm pvpanic snd_timer syscopyarea sysfillrect snd parport_pc sysimgblt serio_raw soundcore i2c_piix4 mac_hid lp parport psmouse floppy
      [   54.244186] CPU: 5 PID: 1841 Comm: rmmod Tainted: G           OE 3.16.0-031600-generic #201408031935
      [   54.244186] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   54.244186] task: ffff880135852880 ti: ffff8800bb640000 task.ti: ffff8800bb640000
      [   54.244186] RIP: 0010:[<ffffffffc067de5a>]  [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186] RSP: 0018:ffff8800bb643d50  EFLAGS: 00010083
      [   54.244186] RAX: 000000000000deb0 RBX: 0000000000030d40 RCX: 0000000000000003
      [   54.244186] RDX: 0000000000000202 RSI: 0000000000000058 RDI: ffffc90000775104
      [   54.244186] RBP: ffff8800bb643d88 R08: 0000000000000000 R09: cf00000000000000
      [   54.244186] R10: 0000000fffffffe0 R11: 0000000000000001 R12: 0000000000000000
      [   54.244186] R13: ffffc90000765000 R14: ffffc90000775104 R15: 00007fca1fa98090
      [   54.244186] FS:  00007fca1f1bd740(0000) GS:ffff88013fd40000(0000) knlGS:0000000000000000
      [   54.244186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   54.244186] CR2: 0000000000000000 CR3: 00000000b9cf6000 CR4: 00000000001406e0
      [   54.244186] Stack:
      [   54.244186]  0000000000000202 0000005800000286 ffffc90000765000 ffffc90000765000
      [   54.244186]  ffff880135f6b000 ffff8800b9360000 00007fca1fa98090 ffff8800bb643db8
      [   54.244186]  ffffffffc0680b3d ffff8800b93608c0 ffffc90000765000 ffff880135f6b000
      [   54.244186] Call Trace:
      [   54.244186]  [<ffffffffc0680b3d>] ena_com_dev_reset+0x1d/0x1b0 [ena_drv]
      [   54.244186]  [<ffffffffc0678497>] ena_remove+0xa7/0x130 [ena_drv]
      [   54.244186]  [<ffffffff813d4df6>] pci_device_remove+0x46/0xc0
      [   54.244186]  [<ffffffff814c3b7f>] __device_release_driver+0x7f/0xf0
      [   54.244186]  [<ffffffff814c4738>] driver_detach+0xc8/0xd0
      [   54.244186]  [<ffffffff814c3969>] bus_remove_driver+0x59/0xd0
      [   54.244186]  [<ffffffff814c4fde>] driver_unregister+0x2e/0x60
      [   54.244186]  [<ffffffff810f0a80>] ? show_refcnt+0x40/0x40
      [   54.244186]  [<ffffffff813d4ec3>] pci_unregister_driver+0x23/0xa0
      [   54.244186]  [<ffffffffc068413f>] ena_cleanup+0x10/0xed1 [ena_drv]
      [   54.244186]  [<ffffffff810f3a47>] SyS_delete_module+0x157/0x1e0
      [   54.244186]  [<ffffffff81014fb7>] ? do_notify_resume+0xc7/0xd0
      [   54.244186]  [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
      [   54.244186] Code: c3 4d 8d b5 04 01 01 00 4c 89 f7 e8 e1 5a 11 c1 48 89 45 c8 41 0f b7 85 00 01 01 00 8d 48 01 66 2d 52 21 66 41 89 8d 00 01 01 00 <66> 41 89 04 24 0f b7 45 d4 89 45 d0 89 c1 41 0f b7 85 00 01 01
      [   54.244186] RIP  [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186]  RSP <ffff8800bb643d50>
      [   54.244186] CR2: 0000000000000000
      [   54.244186] ---[ end trace 18dd9889b6497810 ]---
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22b331c9
    • Netanel Belgazal's avatar
      net/ena: fix RSS default hash configuration · 422e21e7
      Netanel Belgazal authored
      ENA default hash configures IPv4_frag hash twice instead of
      configure non-IP packets.
      
      The bug caused IPv4 fragmented packets to be calculated based on
      L2 source and destination address instead of L3 source and destination.
      IPv4 packets can reach to the wrong Rx queue.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      422e21e7
    • Netanel Belgazal's avatar
      net/ena: fix ethtool RSS flow configuration · 6e2de20d
      Netanel Belgazal authored
      ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
      treat the ena_flow_hash_to_flow_type enum as power of two values.
      
      Change the values of ena_admin_flow_hash_fields to be power of two values.
      
      This bug effect the ethtool set/get rxnfc.
      ethtool will report wrong values hash fields for get and will
      configure wrong hash fields in set.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e2de20d
    • Netanel Belgazal's avatar
      net/ena: fix queues number calculation · 6a1ce2fb
      Netanel Belgazal authored
      The ENA driver tries to open a queue per vCPU.
      To determine how many vCPUs the instance have it uses num_possible_cpus()
      while it should have use num_online_cpus() instead.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a1ce2fb
    • Netanel Belgazal's avatar
      net/ena: remove ntuple filter support from device feature list · fdeea0ad
      Netanel Belgazal authored
      Remove NETIF_F_NTUPLE from netdev->features.
      The ENA device driver does not support ntuple filtering.
      Signed-off-by: default avatarNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdeea0ad
  2. 09 Feb, 2017 14 commits