Commit 5025b99c authored by David S. Miller's avatar David S. Miller

Merge branch 'cake-qdisc'

Toke Høiland-Jørgensen says:

====================
sched: Add Common Applications Kept Enhanced (cake) qdisc

This patch series adds the CAKE qdisc, and has been split up to ease
review.

I have attempted to split out each configurable feature into its own patch.
The first commit adds the base shaper and packet scheduler, while
subsequent commits add the optional features. The full userspace API and
most data structures are included in this commit, but options not
understood in the base version will be ignored.

The result of applying the entire series is identical to the out of tree
version that have seen extensive testing in previous deployments, most
notably as an out of tree patch to OpenWrt. However, note that I have only
compile tested the individual patches; so the whole series should be
considered as a unit.

---
Changelog

v19:
  - Rebase to current net-next.
  - Don't rely on the value of sch->q.qlen to break loops; fixes possible
    infinite loop on multi-queue devices.
  - Don't overwrite NAT flag when setting flow mode.

v18:
  - Rework classification logic in the diffserv case to always hash if
    filter doesn't select a queue, and to run TC filters before
    selecting the diffserv tin (allowing filter to influence this).
  - Make sure we always call qdisc_watchdog_init() in cake_init(), so we
    don't crash in cake_destroy().

v17:
  - Rebase to newest net-next and move the conntrack callback to
    nf_ct_hook
  - Fix a compile error when NF_CONNTRACK is unset.

v16:
  - Move conntrack lookup function into conntrack core and read it via
    RCU so it is only active when the nf_conntrack module is loaded.
    This avoids the module dependency on conntrack for NAT mode. Thanks
    to Pablo for the idea.

v15:
  - Handle ECN flags in ACK filter

v14:
  - Handle seqno wraps and DSACKs in ACK filter

v13:
  - Avoid ktime_t to scalar compares
  - Add class dumping and basic stats
  - Fail with ENOTSUPP when requesting NAT mode and conntrack is not
    available.
  - Parse all TCP options in ACK filter and make sure to only drop safe
    ones. Also handle SACK ranges properly.

v12:
  - Get rid of custom time typedefs. Use ktime_t for time and u64 for
    duration instead.

v11:
  - Fix overhead compensation calculation for GSO packets
  - Change configured rate to be u64 (I ran out of bits before I ran out
    of CPU when testing the effects of the above)

v10:
  - Christmas tree gardening (fix variable declarations to be in reverse
    line length order)

v9:
  - Remove duplicated checks around kvfree() and just call it
    unconditionally.
  - Don't pass __GFP_NOWARN when allocating memory
  - Move options in cake_dump() that are related to optional features to
    later patches implementing the features.
  - Support attaching filters to the qdisc and use the classification
    result to select flow queue.
  - Support overriding diffserv priority tin from skb->priority

v8:
  - Remove inline keyword from function definitions
  - Simplify ACK filter; remove the complex state handling to make the
    logic easier to follow. This will potentially be a bit less efficient,
    but I have not been able to measure a difference.

v7:
  - Split up patch into a series to ease review.
  - Constify the ACK filter.

v6:
  - Fix 6in4 encapsulation checks in ACK filter code
  - Checkpatch fixes

v5:
  - Refactor ACK filter code and hopefully fix the safety issues
    properly this time.

v4:
  - Only split GSO packets if shaping at speeds <= 1Gbps
  - Fix overhead calculation code to also work for GSO packets
  - Don't re-implement kvzalloc()
  - Remove local header include from out-of-tree build (fixes kbuild-bot
    complaint).
  - Several fixes to the ACK filter:
    - Check pskb_may_pull() before deref of transport headers.
    - Don't run ACK filter logic on split GSO packets
    - Fix TCP sequence number compare to deal with wraparounds

v3:
  - Use IS_REACHABLE() macro to fix compilation when sch_cake is
    built-in and conntrack is a module.
  - Switch the stats output to use nested netlink attributes instead
    of a versioned struct.
  - Remove GPL boilerplate.
  - Fix array initialisation style.

v2:
  - Fix kbuild test bot complaint
  - Clean up the netlink ABI
  - Fix checkpatch complaints
  - A few tweaks to the behaviour of cake based on testing carried out
    while writing the paper.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 52b50921 0c850344
...@@ -414,8 +414,17 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) ...@@ -414,8 +414,17 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
extern void (*ip_ct_attach)(struct sk_buff *, const struct sk_buff *) __rcu; extern void (*ip_ct_attach)(struct sk_buff *, const struct sk_buff *) __rcu;
void nf_ct_attach(struct sk_buff *, const struct sk_buff *); void nf_ct_attach(struct sk_buff *, const struct sk_buff *);
struct nf_conntrack_tuple;
bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple,
const struct sk_buff *skb);
#else #else
static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {} static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
struct nf_conntrack_tuple;
static inline bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple,
const struct sk_buff *skb)
{
return false;
}
#endif #endif
struct nf_conn; struct nf_conn;
...@@ -424,6 +433,8 @@ enum ip_conntrack_info; ...@@ -424,6 +433,8 @@ enum ip_conntrack_info;
struct nf_ct_hook { struct nf_ct_hook {
int (*update)(struct net *net, struct sk_buff *skb); int (*update)(struct net *net, struct sk_buff *skb);
void (*destroy)(struct nf_conntrack *); void (*destroy)(struct nf_conntrack *);
bool (*get_tuple_skb)(struct nf_conntrack_tuple *,
const struct sk_buff *);
}; };
extern struct nf_ct_hook __rcu *nf_ct_hook; extern struct nf_ct_hook __rcu *nf_ct_hook;
......
...@@ -955,4 +955,118 @@ enum { ...@@ -955,4 +955,118 @@ enum {
#define TCA_ETF_MAX (__TCA_ETF_MAX - 1) #define TCA_ETF_MAX (__TCA_ETF_MAX - 1)
/* CAKE */
enum {
TCA_CAKE_UNSPEC,
TCA_CAKE_PAD,
TCA_CAKE_BASE_RATE64,
TCA_CAKE_DIFFSERV_MODE,
TCA_CAKE_ATM,
TCA_CAKE_FLOW_MODE,
TCA_CAKE_OVERHEAD,
TCA_CAKE_RTT,
TCA_CAKE_TARGET,
TCA_CAKE_AUTORATE,
TCA_CAKE_MEMORY,
TCA_CAKE_NAT,
TCA_CAKE_RAW,
TCA_CAKE_WASH,
TCA_CAKE_MPU,
TCA_CAKE_INGRESS,
TCA_CAKE_ACK_FILTER,
TCA_CAKE_SPLIT_GSO,
__TCA_CAKE_MAX
};
#define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
enum {
__TCA_CAKE_STATS_INVALID,
TCA_CAKE_STATS_PAD,
TCA_CAKE_STATS_CAPACITY_ESTIMATE64,
TCA_CAKE_STATS_MEMORY_LIMIT,
TCA_CAKE_STATS_MEMORY_USED,
TCA_CAKE_STATS_AVG_NETOFF,
TCA_CAKE_STATS_MIN_NETLEN,
TCA_CAKE_STATS_MAX_NETLEN,
TCA_CAKE_STATS_MIN_ADJLEN,
TCA_CAKE_STATS_MAX_ADJLEN,
TCA_CAKE_STATS_TIN_STATS,
TCA_CAKE_STATS_DEFICIT,
TCA_CAKE_STATS_COBALT_COUNT,
TCA_CAKE_STATS_DROPPING,
TCA_CAKE_STATS_DROP_NEXT_US,
TCA_CAKE_STATS_P_DROP,
TCA_CAKE_STATS_BLUE_TIMER_US,
__TCA_CAKE_STATS_MAX
};
#define TCA_CAKE_STATS_MAX (__TCA_CAKE_STATS_MAX - 1)
enum {
__TCA_CAKE_TIN_STATS_INVALID,
TCA_CAKE_TIN_STATS_PAD,
TCA_CAKE_TIN_STATS_SENT_PACKETS,
TCA_CAKE_TIN_STATS_SENT_BYTES64,
TCA_CAKE_TIN_STATS_DROPPED_PACKETS,
TCA_CAKE_TIN_STATS_DROPPED_BYTES64,
TCA_CAKE_TIN_STATS_ACKS_DROPPED_PACKETS,
TCA_CAKE_TIN_STATS_ACKS_DROPPED_BYTES64,
TCA_CAKE_TIN_STATS_ECN_MARKED_PACKETS,
TCA_CAKE_TIN_STATS_ECN_MARKED_BYTES64,
TCA_CAKE_TIN_STATS_BACKLOG_PACKETS,
TCA_CAKE_TIN_STATS_BACKLOG_BYTES,
TCA_CAKE_TIN_STATS_THRESHOLD_RATE64,
TCA_CAKE_TIN_STATS_TARGET_US,
TCA_CAKE_TIN_STATS_INTERVAL_US,
TCA_CAKE_TIN_STATS_WAY_INDIRECT_HITS,
TCA_CAKE_TIN_STATS_WAY_MISSES,
TCA_CAKE_TIN_STATS_WAY_COLLISIONS,
TCA_CAKE_TIN_STATS_PEAK_DELAY_US,
TCA_CAKE_TIN_STATS_AVG_DELAY_US,
TCA_CAKE_TIN_STATS_BASE_DELAY_US,
TCA_CAKE_TIN_STATS_SPARSE_FLOWS,
TCA_CAKE_TIN_STATS_BULK_FLOWS,
TCA_CAKE_TIN_STATS_UNRESPONSIVE_FLOWS,
TCA_CAKE_TIN_STATS_MAX_SKBLEN,
TCA_CAKE_TIN_STATS_FLOW_QUANTUM,
__TCA_CAKE_TIN_STATS_MAX
};
#define TCA_CAKE_TIN_STATS_MAX (__TCA_CAKE_TIN_STATS_MAX - 1)
#define TC_CAKE_MAX_TINS (8)
enum {
CAKE_FLOW_NONE = 0,
CAKE_FLOW_SRC_IP,
CAKE_FLOW_DST_IP,
CAKE_FLOW_HOSTS, /* = CAKE_FLOW_SRC_IP | CAKE_FLOW_DST_IP */
CAKE_FLOW_FLOWS,
CAKE_FLOW_DUAL_SRC, /* = CAKE_FLOW_SRC_IP | CAKE_FLOW_FLOWS */
CAKE_FLOW_DUAL_DST, /* = CAKE_FLOW_DST_IP | CAKE_FLOW_FLOWS */
CAKE_FLOW_TRIPLE, /* = CAKE_FLOW_HOSTS | CAKE_FLOW_FLOWS */
CAKE_FLOW_MAX,
};
enum {
CAKE_DIFFSERV_DIFFSERV3 = 0,
CAKE_DIFFSERV_DIFFSERV4,
CAKE_DIFFSERV_DIFFSERV8,
CAKE_DIFFSERV_BESTEFFORT,
CAKE_DIFFSERV_PRECEDENCE,
CAKE_DIFFSERV_MAX
};
enum {
CAKE_ACK_NONE = 0,
CAKE_ACK_FILTER,
CAKE_ACK_AGGRESSIVE,
CAKE_ACK_MAX
};
enum {
CAKE_ATM_NONE = 0,
CAKE_ATM_ATM,
CAKE_ATM_PTM,
CAKE_ATM_MAX
};
#endif #endif
...@@ -603,6 +603,21 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct) ...@@ -603,6 +603,21 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
} }
EXPORT_SYMBOL(nf_conntrack_destroy); EXPORT_SYMBOL(nf_conntrack_destroy);
bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple,
const struct sk_buff *skb)
{
struct nf_ct_hook *ct_hook;
bool ret = false;
rcu_read_lock();
ct_hook = rcu_dereference(nf_ct_hook);
if (ct_hook)
ret = ct_hook->get_tuple_skb(dst_tuple, skb);
rcu_read_unlock();
return ret;
}
EXPORT_SYMBOL(nf_ct_get_tuple_skb);
/* Built-in default zone used e.g. by modules. */ /* Built-in default zone used e.g. by modules. */
const struct nf_conntrack_zone nf_ct_zone_dflt = { const struct nf_conntrack_zone nf_ct_zone_dflt = {
.id = NF_CT_DEFAULT_ZONE_ID, .id = NF_CT_DEFAULT_ZONE_ID,
......
...@@ -1683,6 +1683,41 @@ static int nf_conntrack_update(struct net *net, struct sk_buff *skb) ...@@ -1683,6 +1683,41 @@ static int nf_conntrack_update(struct net *net, struct sk_buff *skb)
return 0; return 0;
} }
static bool nf_conntrack_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple,
const struct sk_buff *skb)
{
const struct nf_conntrack_tuple *src_tuple;
const struct nf_conntrack_tuple_hash *hash;
struct nf_conntrack_tuple srctuple;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
ct = nf_ct_get(skb, &ctinfo);
if (ct) {
src_tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo));
memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple));
return true;
}
if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
NFPROTO_IPV4, dev_net(skb->dev),
&srctuple))
return false;
hash = nf_conntrack_find_get(dev_net(skb->dev),
&nf_ct_zone_dflt,
&srctuple);
if (!hash)
return false;
ct = nf_ct_tuplehash_to_ctrack(hash);
src_tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir);
memcpy(dst_tuple, src_tuple, sizeof(*dst_tuple));
nf_ct_put(ct);
return true;
}
/* Bring out ya dead! */ /* Bring out ya dead! */
static struct nf_conn * static struct nf_conn *
get_next_corpse(int (*iter)(struct nf_conn *i, void *data), get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
...@@ -2204,6 +2239,7 @@ int nf_conntrack_init_start(void) ...@@ -2204,6 +2239,7 @@ int nf_conntrack_init_start(void)
static struct nf_ct_hook nf_conntrack_hook = { static struct nf_ct_hook nf_conntrack_hook = {
.update = nf_conntrack_update, .update = nf_conntrack_update,
.destroy = destroy_conntrack, .destroy = destroy_conntrack,
.get_tuple_skb = nf_conntrack_get_tuple_skb,
}; };
void nf_conntrack_init_end(void) void nf_conntrack_init_end(void)
......
...@@ -295,6 +295,17 @@ config NET_SCH_FQ_CODEL ...@@ -295,6 +295,17 @@ config NET_SCH_FQ_CODEL
If unsure, say N. If unsure, say N.
config NET_SCH_CAKE
tristate "Common Applications Kept Enhanced (CAKE)"
help
Say Y here if you want to use the Common Applications Kept Enhanced
(CAKE) queue management algorithm.
To compile this driver as a module, choose M here: the module
will be called sch_cake.
If unsure, say N.
config NET_SCH_FQ config NET_SCH_FQ
tristate "Fair Queue" tristate "Fair Queue"
help help
......
...@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_SCH_CHOKE) += sch_choke.o ...@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_SCH_CHOKE) += sch_choke.o
obj-$(CONFIG_NET_SCH_QFQ) += sch_qfq.o obj-$(CONFIG_NET_SCH_QFQ) += sch_qfq.o
obj-$(CONFIG_NET_SCH_CODEL) += sch_codel.o obj-$(CONFIG_NET_SCH_CODEL) += sch_codel.o
obj-$(CONFIG_NET_SCH_FQ_CODEL) += sch_fq_codel.o obj-$(CONFIG_NET_SCH_FQ_CODEL) += sch_fq_codel.o
obj-$(CONFIG_NET_SCH_CAKE) += sch_cake.o
obj-$(CONFIG_NET_SCH_FQ) += sch_fq.o obj-$(CONFIG_NET_SCH_FQ) += sch_fq.o
obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o
obj-$(CONFIG_NET_SCH_PIE) += sch_pie.o obj-$(CONFIG_NET_SCH_PIE) += sch_pie.o
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment