1. 17 Oct, 2016 9 commits
  2. 15 Oct, 2016 4 commits
    • Julia Lawall's avatar
      ipvlan: constify l3mdev_ops structure · ab530f63
      Julia Lawall authored
      This l3mdev_ops structure is only stored in the l3mdev_ops field of a
      net_device structure.  This field is declared const, so the l3mdev_ops
      structure can be declared as const also.  Additionally drop the
      __read_mostly annotation.
      
      The semantic patch that adds const is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct l3mdev_ops i@p = { ... };
      
      @ok@
      identifier r.i;
      struct net_device *e;
      position p;
      @@
      e->l3mdev_ops = &i@p;
      
      @bad@
      position p != {r.p,ok.p};
      identifier r.i;
      struct l3mdev_ops e;
      @@
      e@i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r.i;
      @@
      static
      +const
       struct l3mdev_ops i = { ... };
      // </smpl>
      
      The effect on the layout of the .o file is shown by the following output
      of the size command, first before then after the transformation:
      
         text    data     bss     dec     hex filename
         7364     466      52    7882    1eca drivers/net/ipvlan/ipvlan_main.o
         7412     434      52    7898    1eda drivers/net/ipvlan/ipvlan_main.o
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab530f63
    • David S. Miller's avatar
      Merge branch 'ila-cached-route' · f9dbd5a3
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      ila: Cache a route in ILA lwt structure
      
      Add a dst_cache to ila_lwt structure. This holds a cached route for the
      translated address. In ila_output we now perform a route lookup after
      translation and if possible (destination in original route is full 128
      bits) we set the dst_cache. Subsequent calls to ila_output can then use
      the cache to avoid the route lookup.
      
      This eliminates the need to set the gateway on ILA routes as previously
      was being done. Now we can do somthing like:
      
      ./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
          csum-mode neutral-map dev eth0  ## No via needed!
      
      Also, add destroy_state to lwt ops. We need this do destroy the
      dst_cache.
      
      - v2
        - Fixed comparisons to fc_dst_len to make comparison against number
          of bits in data structure not bytes.
        - Move destroy_state under build_state (requested by Jiri)
        - Other minor cleanup
      
      Tested:
      
      Running 200 TCP_RR streams:
      
        Baseline, no ILA
      
          1730716 tps
          102/170/313 50/90/99% latencies
          88.11 CPU utilization
      
        Using ILA in both directions
      
          1680428 tps
          105/176/325 50/90/99% latencies
          88.16 CPU utilization
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9dbd5a3
    • Tom Herbert's avatar
      ila: Cache a route to translated address · 79ff2fc3
      Tom Herbert authored
      Add a dst_cache to ila_lwt structure. This holds a cached route for the
      translated address. In ila_output we now perform a route lookup after
      translation and if possible (destination in original route is full 128
      bits) we set the dst_cache. Subsequent calls to ila_output can then use
      the cache to avoid the route lookup.
      
      This eliminates the need to set the gateway on ILA routes as previously
      was being done. Now we can do something like:
      
      ./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
          csum-mode neutral-map dev eth0  ## No via needed!
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79ff2fc3
    • Tom Herbert's avatar
      lwtunnel: Add destroy state operation · 1104d9ba
      Tom Herbert authored
      Users of lwt tunnels may set up some secondary state in build_state
      function. Add a corresponding destroy_state function to allow users to
      clean up state. This destroy state function is called from lwstate_free.
      Also, we now free lwstate using kfree_rcu so user can assume structure
      is not freed before rcu.
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1104d9ba
  3. 14 Oct, 2016 27 commits
    • David S. Miller's avatar
      Merge branch 'fjes-next' · 02dc7658
      David S. Miller authored
      Taku Izumi says:
      
      ====================
      FUJITSU Extended Socket driver version 1.2
      
      This patchset updates FUJITSU Extended Socket network driver into version 1.2.
      This includes the following enhancements:
        - ethtool -d support
        - ethtool -S enhancement
        - ethtool -w/-W support
        - Add some debugging feature (tracepoints etc)
      
      v1 -> v2:
        - Use u64 instead of phys_addr_t as TP_STRUCT__entry
        - Use ethtool facility to achieve debug mode instead of using debugfs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02dc7658
    • Taku Izumi's avatar
      8f87d775
    • Taku Izumi's avatar
      fjes: Add debugfs entry for EP status information in fjes driver · c753119e
      Taku Izumi authored
      This patch adds debugfs entry to show EP status information.
      You can get each EP's status information like the following:
      
        # cat /sys/kernel/debug/fjes/fjes.0/status
      
      EPID    STATUS           SAME_ZONE        CONNECTED
      ep0     shared           Y                Y
      ep1     -                -                -
      ep2     unshared         N                N
      ep3     unshared         N                N
      ep4     unshared         N                N
      ep5     unshared         N                N
      ep6     unshared         N                N
      ep7     unshared         N                N
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c753119e
    • Taku Izumi's avatar
      fjes: ethtool -w and -W support for fjes driver · b6ba737d
      Taku Izumi authored
      This patch adds implementation of supporting
      ethtool -w and -W for fjes driver.
      
      You can enable and disable firmware debug mode by
      using ethtool -W, and also retrieve firmware
      activity information by using ethtool -w.
      
      This is useful for debugging.
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ba737d
    • Taku Izumi's avatar
      fjes: Add tracepoints in fjes driver · 82f6aea8
      Taku Izumi authored
      This patch adds tracepoints in fjes driver.
      This is useful for debugging purpose.
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82f6aea8
    • Taku Izumi's avatar
      fjes: Enhance ethtool -S for fjes driver · 21b7efbc
      Taku Izumi authored
      This patch enhances ethtool -S for fjes driver so that
      EP related statistics can be retrieved.
      
      The following statistics can be displayed via ethtool -S:
      
           ep%d_com_regist_buf_exec
           ep%d_com_unregist_buf_exec
           ep%d_send_intr_rx
           ep%d_send_intr_unshare
           ep%d_send_intr_zoneupdate
           ep%d_recv_intr_rx
           ep%d_recv_intr_unshare
           ep%d_recv_intr_stop
           ep%d_recv_intr_zoneupdate
           ep%d_tx_buffer_full
           ep%d_tx_dropped_not_shared
           ep%d_tx_dropped_ver_mismatch
           ep%d_tx_dropped_buf_size_mismatch
           ep%d_tx_dropped_vlanid_mismatch
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21b7efbc
    • Taku Izumi's avatar
      fjes: ethtool -d support for fjes driver · 462d8074
      Taku Izumi authored
      This patch adds implementation of supporting
      ethtool -d for fjes driver. By using ethtool -d,
      you can get registers dump of Exetnded socket device.
      
        # ethtool -d es0
      
      Offset          Values
      ------          ------
      0x0000:         01 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00
      0x0010:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0020:         02 00 00 80 02 00 00 80 64 a6 58 08 07 00 00 00
      0x0030:         00 00 00 00 28 80 00 00 00 00 f9 e3 06 00 00 00
      0x0040:         00 00 00 00 18 00 00 00 80 a4 58 08 07 00 00 00
      0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0060:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0070:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0080:         00 00 00 00 00 00 e0 7f 00 00 01 00 00 00 01 00
      0x0090:         00 00 00 00
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      462d8074
    • David S. Miller's avatar
      Merge branch 'qed-next' · 9c7664cb
      David S. Miller authored
      Manish Chopra says:
      
      ====================
      qed*: driver updates
      
      There are several new additions in this series;
      Most are connected to either Tx offloading or Rx classifications
      [either fastpath changes or supporting configuration].
      
      In addition, there's a single IOV enhancement.
      
      Please consider applying this series to `net-next'.
      
      V2->V3:
      Fixes below kbuild warning
      call to '__compiletime_assert_60' declared with
      attribute error: Need native word sized stores/loads for atomicity.
      
      V1->V2:
      Added a fix for the race in ramrod handling
      pointed by Eric Dumazet [patch 7].
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c7664cb
    • Manish Chopra's avatar
      qed: Fix possible race when reading firmware return code. · d5df7688
      Manish Chopra authored
      While handling SPQ ramrod completion, there is a possible race
      where driver might not read updated fw return code based on
      ramrod completion done. This patch ensures that fw return code
      is written first and then completion done flag is updated
      using appropriate memory barriers.
      Signed-off-by: default avatarManish Chopra <manish.chopra@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5df7688
    • Yuval Mintz's avatar
      qed: Handle malicious VFs events · 7eff82b0
      Yuval Mintz authored
      Malicious VFs might be caught in several different methods:
        - Misusing their bar permission and being blocked by hardware.
        - Misusing their fastpath logic and being blocked by firmware.
        - Misusing their interaction with their PF via hw-channel,
          and being blocked by PF driver.
      
      On the first two items, firmware would indicate to driver that
      the VF is to be considered malicious, but would sometime still
      allow the VF to communicate with the PF [depending on the exact
      nature of the malicious activity done by the VF].
      The current existing logic on the PF side lacks handling of such events,
      and might allow the PF to perform some incorrect configuration on behalf
      of a VF that was previously indicated as malicious.
      
      The new scheme is simple -
      Once the PF determines a VF is malicious it would:
       a. Ignore any further requests on behalf of the VF-driver.
       b. Prevent any configurations initiated by the hyperuser for
          the malicious VF, as firmware isn't willing to serve such.
      
      The malicious indication would be cleared upon the VF flr,
      after which it would become usable once again.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eff82b0
    • Yuval Mintz's avatar
      qed: Allow chance for fast ramrod completions · c59f5291
      Yuval Mintz authored
      Whenever a ramrod is being sent for some device configuration,
      the driver is going to sleep at least 5ms between each iteration
      of polling on the completion of the ramrod.
      
      However, in almost every configuration scenario the firmware
      would be able to comply and complete the ramrod in a manner of
      several usecs. This is especially important in cases where there
      might be a lot of sequential configurations applying to the hardware
      [e.g., RoCE], in which case the existing scheme might cause some
      visible user delays.
      
      This patch changes the completion scheme - instead of immediately
      starting to sleep for a 'long' period, allow the device to quickly
      poll on the first iteration after a couple of usecs.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c59f5291
    • Yuval Mintz's avatar
      qed*: Allow unicast filtering · 7b7e70f9
      Yuval Mintz authored
      Apparently qede fails to set IFF_UNICAST_FLT, and as a result is not
      actually performing unicast MAC filtering.
      While we're at it - relax a hard-coded limitation that limits each
      interface into using at most 15 unicast MAC addresses before turning
      promiscuous. Instead utilize the HW resources to their limit.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b7e70f9
    • Manish Chopra's avatar
      qede: Prevent GSO on long Geneve headers · 25695853
      Manish Chopra authored
      Due to hardware limitation, when transmitting a geneve-encapsulated
      packet with more than 32 bytes worth of geneve options the hardware
      would not be able to crack the packet and consider it a regular UDP
      packet.
      
      This implements the ndo_features_check() in qede in order to prevent
      GSO on said transmitted packets.
      Signed-off-by: default avatarManish Chopra <manish.chopra@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25695853
    • Manish Chopra's avatar
      qede: GSO support for tunnels with outer csum · a150241c
      Manish Chopra authored
      This patch adds GSO support for GRE and UDP tunnels
      where outer checksums are enabled.
      Signed-off-by: default avatarManish Chopra <manish.chopra@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a150241c
    • Yuval Mintz's avatar
      qed: Pass MAC hints to VFs · c3aaa403
      Yuval Mintz authored
      Some hypervisors can support MAC hints to their VFs.
      Even though we don't have such a hypervisor API in linux, we add
      sufficient logic for the VF to be able to receive such hints and
      set the mac accordingly - as long as the VF has not been set with
      a MAC already.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3aaa403
    • David S. Miller's avatar
      Merge branch 'ingress-actions' · d0b3fbb2
      David S. Miller authored
      Shmulik Ladkani says:
      
      ====================
      act_mirred: Ingress actions support
      
      This patch series implements action mirred 'ingress' actions
      TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
      
      This allows attaching filters whose target is to hand matching skbs into
      the rx processing of a specified device.
      
      v4:
        in 4/4, check ret code of netif_receive_skb, as suggested by Cong Wang
      v3:
        in 4/4, addressed non coherency due to reading m->tcfm_eaction multiple
        times, as spotted by Eric Dumazet
      v2:
        in 1/4, declare tcfm_mac_header_xmit as bool instead of int
      ====================
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0b3fbb2
    • Shmulik Ladkani's avatar
      net/sched: act_mirred: Implement ingress actions · 53592b36
      Shmulik Ladkani authored
      Up until now, 'action mirred' supported only egress actions (either
      TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
      
      This patch implements the corresponding ingress actions
      TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
      
      This allows attaching filters whose target is to hand matching skbs into
      the rx processing of a specified device.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53592b36
    • Shmulik Ladkani's avatar
      net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' · 5724b8b5
      Shmulik Ladkani authored
      These accessors are used in various drivers that support tc offloading,
      to detect properties of a given 'tc_action'.
      
      'is_tcf_mirred_redirect' tests that the action is TCA_EGRESS_REDIR.
      'is_tcf_mirred_mirror' tests that the action is TCA_EGRESS_MIRROR.
      
      As a prep towards supporting INGRESS redir/mirror, rename these
      predicates to reflect their true meaning:
        s/is_tcf_mirred_redirect/is_tcf_mirred_egress_redirect/
        s/is_tcf_mirred_mirror/is_tcf_mirred_egress_mirror/
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5724b8b5
    • Shmulik Ladkani's avatar
      net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header · dcf80034
      Shmulik Ladkani authored
      Move detection logic that tests whether device expects skb data to point
      at mac_header upon xmit into a function.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf80034
    • Shmulik Ladkani's avatar
      net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit and make it a bool · 16577923
      Shmulik Ladkani authored
      'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
      egress to the target device (if action is performed at ingress).
      
      Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
      the target device (and use a bool instead of int).
      
      This allows to decouple the attribute from the action to be taken.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16577923
    • Allan W. Nielsen's avatar
      net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs. · 4f58e6dc
      Allan W. Nielsen authored
      Edge-Rate cleanup include the following:
      - Updated device tree bindings documentation for edge-rate
      - The edge-rate is now specified as a "slowdown", meaning that it is now
        being specified as positive values instead of negative (both
        documentation and implementation wise).
      - Only explicitly documented values for "vsc8531,vddmac" and
        "vsc8531,edge-slowdown" are accepted by the device driver.
      - Deleted include/dt-bindings/net/mscc-phy-vsc8531.h as it was not needed.
      - Read/validate devicetree settings in probe instead of init
      Signed-off-by: default avatarAllan W. Nielsen <allan.nielsen@microsemi.com>
      Signed-off-by: default avatarRaju Lakkaraju <raju.lakkaraju@microsemi.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f58e6dc
    • stephen hemminger's avatar
      Revert "net: Add driver helper functions to determine checksum offloadability" · cf53b1da
      stephen hemminger authored
      This reverts commit 6ae23ad3.
      
      The code has been in kernel since 4.4 but there are no in tree
      code that uses. Unused code is broken code, remove it.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf53b1da
    • David S. Miller's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 29fbff86
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix various build warnings in tlan/qed/xen-netback drivers, from
          Arnd Bergmann.
      
       2) Propagate proper error code in strparser's strp_recv(), from Geert
          Uytterhoeven.
      
       3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
          Dumazret.
      
       4) Need to use list_for_each_entry_safe() in qed driver, from Wei
          Yongjun.
      
       5) Openvswitch 802.1AD bug fixes from Jiri Benc.
      
       6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.
      
       7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.
      
       8) stmmac driver fixes from Giuseppe CAVALLARO.
      
       9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.
      
      10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
        net: bridge: add the multicast_flood flag attribute to brport_attrs
        net: axienet: Remove unused parameter from __axienet_device_reset
        liquidio: CN23XX: fix a loop timeout
        net: rtnl: info leak in rtnl_fill_vfinfo()
        tipc: info leak in __tipc_nl_add_udp_addr()
        net: ipv4: Do not drop to make_route if oif is l3mdev
        net: phy: Trigger state machine on state change and not polling.
        ipv6: tcp: restore IP6CB for pktoptions skbs
        netvsc: Remove mistaken udp.h inclusion.
        xen-netback: fix type mismatch warning
        stmmac: fix error check when init ptp
        stmmac: fix ptp init for gmac4
        qed: fix old-style function definition
        netvsc: fix checksum on UDP IPV6
        net_sched: reorder pernet ops and act ops registrations
        xen-netback: fix guest Rx stall detection (after guest Rx refactor)
        drivers/ptp: Fix kernel memory disclosure
        net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
        qmi_wwan: add support for Quectel EC21 and EC25
        openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
        ...
      29fbff86
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · c4a86165
      Linus Torvalds authored
      Pull NFS client updates from Anna Schumaker:
       "Highlights include:
      
        Stable bugfixes:
         - sunrpc: fix writ espace race causing stalls
         - NFS: Fix inode corruption in nfs_prime_dcache()
         - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
         - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
         - NFSv4: Open state recovery must account for file permission changes
         - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
      
        Features:
         - Add support for tracking multiple layout types with an ordered list
         - Add support for using multiple backchannel threads on the client
         - Add support for pNFS file layout session trunking
         - Delay xprtrdma use of DMA API (for device driver removal)
         - Add support for xprtrdma remote invalidation
         - Add support for larger xprtrdma inline thresholds
         - Use a scatter/gather list for sending xprtrdma RPC calls
         - Add support for the CB_NOTIFY_LOCK callback
         - Improve hashing sunrpc auth_creds by using both uid and gid
      
        Bugfixes:
         - Fix xprtrdma use of DMA API
         - Validate filenames before adding to the dcache
         - Fix corruption of xdr->nwords in xdr_copy_to_scratch
         - Fix setting buffer length in xdr_set_next_buffer()
         - Don't deadlock the state manager on the SEQUENCE status flags
         - Various delegation and stateid related fixes
         - Retry operations if an interrupted slot receives EREMOTEIO
         - Make nfs boot time y2038 safe"
      
      * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
        NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
        fs: nfs: Make nfs boot time y2038 safe
        sunrpc: replace generic auth_cred hash with auth-specific function
        sunrpc: add RPCSEC_GSS hash_cred() function
        sunrpc: add auth_unix hash_cred() function
        sunrpc: add generic_auth hash_cred() function
        sunrpc: add hash_cred() function to rpc_authops struct
        Retry operation on EREMOTEIO on an interrupted slot
        pNFS: Fix atime updates on pNFS clients
        sunrpc: queue work on system_power_efficient_wq
        NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
        NFSv4: If recovery failed for a specific open stateid, then don't retry
        NFSv4: Fix retry issues with nfs41_test/free_stateid
        NFSv4: Open state recovery must account for file permission changes
        NFSv4: Mark the lock and open stateids as invalid after freeing them
        NFSv4: Don't test open_stateid unless it is set
        NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
        NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
        NFSv4: Fix a race when updating an open_stateid
        NFSv4: Fix a race in nfs_inode_reclaim_delegation()
        ...
      c4a86165
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux · 27785564
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "Some RDMA work and some good bugfixes, and two new features that could
        benefit from user testing:
      
         - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
           COPY is already supported on the client side, so a call to
           copy_file_range() on a recent client should now result in a
           server-side copy that doesn't require all the data to make a round
           trip to the client and back.
      
         - Jeff Layton implemented callbacks to notify clients when contended
           locks become available, which should reduce latency on workloads
           with contended locks"
      
      * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
        NFSD: Implement the COPY call
        nfsd: handle EUCLEAN
        nfsd: only WARN once on unmapped errors
        exportfs: be careful to only return expected errors.
        nfsd4: setclientid_confirm with unmatched verifier should fail
        nfsd: randomize SETCLIENTID reply to help distinguish servers
        nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
        nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
        nfsd: add a LRU list for blocked locks
        nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
        nfsd: plumb in a CB_NOTIFY_LOCK operation
        NFSD: fix corruption in notifier registration
        svcrdma: support Remote Invalidation
        svcrdma: Server-side support for rpcrdma_connect_private
        rpcrdma: RDMA/CM private message data structure
        svcrdma: Skip put_page() when send_reply() fails
        svcrdma: Tail iovec leaves an orphaned DMA mapping
        nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
        nfsd: eliminate cb_minorversion field
        nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
      27785564
    • Linus Torvalds's avatar
      Merge tag 'xfs-reflink-for-linus-4.9-rc1' of... · 35a891be
      Linus Torvalds authored
      Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
      
          < XFS has gained super CoW powers! >
           ----------------------------------
                  \   ^__^
                   \  (oo)\_______
                      (__)\       )\/\
                          ||----w |
                          ||     ||
      
      Pull XFS support for shared data extents from Dave Chinner:
       "This is the second part of the XFS updates for this merge cycle.  This
        pullreq contains the new shared data extents feature for XFS.
      
        Given the complexity and size of this change I am expecting - like the
        addition of reverse mapping last cycle - that there will be some
        follow-up bug fixes and cleanups around the -rc3 stage for issues that
        I'm sure will show up once the code hits a wider userbase.
      
        What it is:
      
        At the most basic level we are simply adding shared data extents to
        XFS - i.e. a single extent on disk can now have multiple owners. To do
        this we have to add new on-disk features to both track the shared
        extents and the number of times they've been shared. This is done by
        the new "refcount" btree that sits in every allocation group. When we
        share or unshare an extent, this tree gets updated.
      
        Along with this new tree, the reverse mapping tree needs to be updated
        to track each owner or a shared extent. This also needs to be updated
        ever share/unshare operation. These interactions at extent allocation
        and freeing time have complex ordering and recovery constraints, so
        there's a significant amount of new intent-based transaction code to
        ensure that operations are performed atomically from both the runtime
        and integrity/crash recovery perspectives.
      
        We also need to break sharing when writes hit a shared extent - this
        is where the new copy-on-write implementation comes in. We allocate
        new storage and copy the original data along with the overwrite data
        into the new location. We only do this for data as we don't share
        metadata at all - each inode has it's own metadata that tracks the
        shared data extents, the extents undergoing CoW and it's own private
        extents.
      
        Of course, being XFS, nothing is simple - we use delayed allocation
        for CoW similar to how we use it for normal writes. ENOSPC is a
        significant issue here - we build on the reservation code added in
        4.8-rc1 with the reverse mapping feature to ensure we don't get
        spurious ENOSPC issues part way through a CoW operation. These
        mechanisms also help minimise fragmentation due to repeated CoW
        operations. To further reduce fragmentation overhead, we've also
        introduced a CoW extent size hint, which indicates how large a region
        we should allocate when we execute a CoW operation.
      
        With all this functionality in place, we can hook up .copy_file_range,
        .clone_file_range and .dedupe_file_range and we gain all the
        capabilities of reflink and other vfs provided functionality that
        enable manipulation to shared extents. We also added a fallocate mode
        that explicitly unshares a range of a file, which we implemented as an
        explicit CoW of all the shared extents in a file.
      
        As such, it's a huge chunk of new functionality with new on-disk
        format features and internal infrastructure. It warns at mount time as
        an experimental feature and that it may eat data (as we do with all
        new on-disk features until they stabilise). We have not released
        userspace suport for it yet - userspace support currently requires
        download from Darrick's xfsprogs repo and build from source, so the
        access to this feature is really developer/tester only at this point.
        Initial userspace support will be released at the same time the kernel
        with this code in it is released.
      
        The new code causes 5-6 new failures with xfstests - these aren't
        serious functional failures but things the output of tests changing
        slightly due to perturbations in layouts, space usage, etc. OTOH,
        we've added 150+ new tests to xfstests that specifically exercise this
        new functionality so it's got far better test coverage than any
        functionality we've previously added to XFS.
      
        Darrick has done a pretty amazing job getting us to this stage, and
        special mention also needs to go to Christoph (review, testing,
        improvements and bug fixes) and Brian (caught several intricate bugs
        during review) for the effort they've also put in.
      
        Summary:
      
         - unshare range (FALLOC_FL_UNSHARE) support for fallocate
      
         - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
           interface
      
         - shared extent support for XFS
      
         - copy-on-write support for shared extents
      
         - copy_file_range support
      
         - clone_file_range support (implements reflink)
      
         - dedupe_file_range support
      
         - defrag support for reverse mapping enabled filesystems"
      
      * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
        xfs: convert COW blocks to real blocks before unwritten extent conversion
        xfs: rework refcount cow recovery error handling
        xfs: clear reflink flag if setting realtime flag
        xfs: fix error initialization
        xfs: fix label inaccuracies
        xfs: remove isize check from unshare operation
        xfs: reduce stack usage of _reflink_clear_inode_flag
        xfs: check inode reflink flag before calling reflink functions
        xfs: implement swapext for rmap filesystems
        xfs: refactor swapext code
        xfs: various swapext cleanups
        xfs: recognize the reflink feature bit
        xfs: simulate per-AG reservations being critically low
        xfs: don't mix reflink and DAX mode for now
        xfs: check for invalid inode reflink flags
        xfs: set a default CoW extent size of 32 blocks
        xfs: convert unwritten status of reverse mappings for shared files
        xfs: use interval query for rmap alloc operations on shared files
        xfs: add shared rmap map/unmap/convert log item types
        xfs: increase log reservations for reflink
        ...
      35a891be