Commit 2a016d48 authored by Kirill Smelkov's avatar Kirill Smelkov

Draft support for E-UTRAN IP Throughput KPI

The most interesting patches are

- d102ffaa (drb: Start of the package)
- 5bf7dc1c (amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message)
- 499a7c1b (amari.kpi: Teach LogMeasure to handle x.drb_stats messages)
- 2824f50d (kpi: Calc: Add support for E-UTRAN IP Throughput KPI)
- 4b2c8c21 (demo/kpidemo.*: Add support for E-UTRAN IP Throughput KPI + demonstrate it in the notebook)

The other patches introduce or adjust needed infrastructure. A byproduct
of particular note is that kpi.Measurement now supports QCI.

A demo might be seen in the last part of
https://nbviewer.org/urls/lab.nexedi.com/kirr/xlte/raw/43aac33e/demo/kpidemo.ipynb

And below we provide the overall overview of the implementation.

Overview of E-UTRAN IP Throughput computation
---------------------------------------------

Before we begin explaining how IP Throughput is computed, let's first refresh
what it is and have a look at what is required to compute it reasonably.

This KPI is defined in TS 32.450[1] and aggregates transmission volume and
time over bursts of transmissions from an average UE point of view. It should be
particularly noted that only the time, during which transmission is going on,
should be accounted. For example if an UE receives 10KB over 4ms burst and the rest of
the time there is no transmission to it during, say, 1 minute, the downlink IP
Throughput for that UE over the minute is 20Mbit/s (= 8·10KB/4ms), not 1.3Kbit/s (= 8·10KB/60s).
This KPI basically shows what would be the speed to e.g. download a response for
HTTP request issued from a mobile.

[1] https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13

To compute IP Throughput we thus need to know Σ of transmitted amount
of bytes, and Σ of the time of all transmission bursts.

Σ of the bytes is relatively easy to get. eNB already provides close values in
overall `stats` and in per-UE `ue_get[stats]` messages. However there is no
anything readily available out-of-the box for Σ of bursts transmission time.
Thus we need to measure the time of transmission bursts ourselves somehow.

It turns out that with current state of things the only practical way to
measure it to some degree is to poll eNB frequently with `ue_get[stats]` and
estimate transmission time based on δ of `ue_get` timestamps.

Let's see how frequently we need to poll to get to reasonably accuracy of resulting throughput.

A common situation for HTTP requests issued via LTE is that response content
downloading time takes only few milliseconds. For example I used chromium
network profiler to access various sites via internet tethered from my phone
and saw that for many requests response content downloading time was e.g. 4ms,
5ms, 3.2ms, etc. The accuracy of measuring transmission time should be thus in
the order of millisecond to cover that properly. It makes a real difference for
reported throughput, if say a download sample with 10KB took 4ms, or it took
e.g. "something under 100ms". In the first case we know that for that sample
downlink throughput is 2500KB/s, while in the second case all we know is that
downlink throughput is "higher than 100KB/s" - a 25 times difference and not
certain. Similarly if we poll at 10ms rate we would get that throughput is "higher
than 1000KB/s" - a 2.5 times difference from actual value. The accuracy of 1
millisecond coincides with TTI time and with how downlink/uplink transmissions
generally work in LTE.

With the above the scheme to compute IP Throughput looks to be as
follows: poll eNB at 1000Hz rate for `ue_get[stats]`, process retrieved
information into per-UE and per-QCI streams, detect bursts on each UE/QCI pair,
and aggregate `tx_bytes` and `tx_time` from every burst.

It looks to be straightforward, but 1000Hz polling will likely create
non-negligible additional load on the system and disturb eNB itself
introducing much jitter and harming its latency requirements. That's probably
why eNB actually rate-limits WebSocket requests not to go higher than 100Hz -
the frequency 10 times less compared to what we need to get to reasonable
accuracy for IP throughput.

Fortunately there is additional information that provides a way to improve
accuracy of measured `tx_time` even when polled every 10ms at 100Hz rate:
that additional information is the number of transmitted transport blocks to/from
an UE. If we know that during 10ms frame it was e.g. 4 transport blocks transmitted
to the UE, that there were no retransmissions *and* that eNB is not congested, we can
reasonably estimate that it was actually a 4ms transmission. And if eNB is
congested we can still say that transmission time is somewhere in `[4ms, 10ms]`
interval because transmitting each transport block takes 1 TTI. Even if
imprecise that still provides some information that could be useful.

Also 100Hz polling turns to be acceptable from performance point of view and
does not disturb the system much. For example on the callbox machine the process,
that issues polls, takes only about 3% of CPU load and only on one core, and
the CPU usage of eNB does not practically change and its reported tx/rx latency
does not change as well. For sure, there is some disturbance, but it appears to
be small. To have a better idea of what rate of polling is possible, I've made
an experiment with the poller accessing my own websocket echo server quickly
implemented in python. Both the poller and the echo server are not optimized,
but without rate-limiting they could go to 8000Hz frequency with reaching 100%
CPU usage of one CPU core. That 8000Hz is 80x times more compared to 100Hz
frequency actually allowed by eNB. This shows what kind of polling
frequency limit the system can handle, if absolutely needed, and that 100Hz
turns out to be not so high a frequency. Also the Linux 5.6 kernel, installed
on the callbox from Fedora32, is configured with `CONFIG_HZ=1000`, which is
likely helping here.

Implementation overview
~~~~~~~~~~~~~~~~~~~~~~~

The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.

Then organize the polling process to provide aggregated statistics in the form of
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
`enb.xlog` together with `stats`. Then further adjust `amari.kpi.LogMeasure`
and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.

That is how it is implemented.

The main part, that performs 100Hz polling and flow aggregation, is in amari/drb.py.
There `Sampler` extracts bursts of data transmissions from stream of `ue_get[stats]`
observations and `x_stats_srv` organizes whole 100Hz sampling process and provides
aggregated `x.drb_stats` messages to `amari.xlog`.

Even though the main idea is relatively straightforward, several aspects
deserves to be noted:

1. information about transmitted bytes and corresponding transmitted transport
   blocks is emitted by eNB not synchronized in time. The reason here is that,
   for example, for DL a block is transmitted via PDCCH+PDSCH during one TTI, and
   then the base station awaits HARQ ACK/NACK. That ACK/NACK comes later via
   PUCCH or PUSCH. The time window in between original transmission and
   reception of the ACK/NACK is 4 TTIs for FDD and 4-13 TTIs for TDD(*).
   And Amarisoft LTEENB updates counters for dl_total_bytes and dl_tx at
   different times:

       ue.erab.dl_total_bytes      - right after sending data on  PDCCH+PDSCH
       ue.cell.{dl_tx,dl_retx}     - after receiving ACK/NACK via PUCCH|PUSCH

   this way an update to dl_total_bytes might be seen in one frame (= 10·TTI),
   while corresponding update to dl_tx/dl_retx might be seen in either same, or
   next, or next-next frame.

   `Sampler` brings δ(tx_bytes) and #tx_tb in sync itself via `BitSync`.

2. when we see multiple transmissions related to UE on different QCIs, we
   cannot directly use corresponding global number of transport blocks to estimate
   transmissions times because we do not know how eNB scheduler placed those
   transmissions onto resource map. So without additional information we can only
   estimate corresponding lower and upper bounds.

3. for output stability and to avoid throughput being affected by partial fill
   of tail TTI of a burst, E-UTRAN IP Throughput is required to be computed
   without taking into account last TTI of every sample. We don't have that
   level of details since all we have is total amount of transmitted bytes in a
   burst and estimation of how long in time the burst is. Thus, once again, we
   can only provide an estimation so that resulting E-UTRAN IP
   Throughput uncertainty window cover the right value required by 3GPP standard.

A curious reader might be interested to look at tests in `amari/drb_test.py` ,
and at the whole changes that brought E-UTRAN IP Throughput alive.

Limitations
~~~~~~~~~~~

Current implementation has the following limitations:

- we account whole PDCP instead of only IP traffic.
- the KPI is computed with uncertainty window instead of being precise even when the
  connection to eNB is alive all the time. The shorter bursts are the more
  the uncertainty.
- the implementation works correctly for FDD, but not for TDD. That's because
  BitSync currently supports only "next frame" case and support for "next-next
  frame" case is marked as TODO.
- eNB `t` monitor command practically stops working and now only reports
  ``Warning, remote API ue_get (stats = true) pending...`` instead of reporting
  useful information. This is due to that contrary to `stats`, for `ue_get` eNB
  does not maintain per-connection state and uses global singleton counters.
- the performance overhead might be more noticeable on machines less
  powerful compared to callbox.

To address the limitations I plan to talk to Amarisoft about eNB improvements
so that E-UTRAN IP Throughput could be computed precisely from DRB statistics
directly provided by eNB itself.

However it is still useful to have current implementation, even with all its
limitations, because it already works today with existing eNB versions.

Kirill
parents e1a5ceea 43aac33e
......@@ -5,6 +5,7 @@
XLTE repository provides assorted tools and packages with functionality related to LTE:
- `kpi` - process measurements and compute KPIs from them.
- `amari.drb` - infrastructure to process flows on data radio bearers.
- `amari.kpi` - driver for Amarisoft LTE stack to retrieve KPI-related measurements from logs.
- `amari.xlog` - extra logging facilities for Amarisoft LTE stack.
- `xamari` - supplementary tool for managing Amarisoft LTE services.
# -*- coding: utf-8 -*-
# Copyright (C) 2022 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
# Copyright (C) 2022-2023 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
#
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
......@@ -45,21 +45,23 @@ class ConnClosedError(ConnError):
# connect connects to a service via WebSocket.
def connect(wsuri): # -> Conn
def connect(ctx, wsuri): # -> Conn
#websocket.enableTrace(True) # TODO on $XLTE_AMARI_WS_DEBUG=y ?
ws = websocket.WebSocket()
ws.settimeout(5) # reasonable default
try:
# FIXME handle ctx cancel (but it won't stuck forever due to ._ws own timeout)
ws.connect(wsuri)
except Exception as ex:
raise ConnError("connect") from ex
return Conn(ws)
return Conn(ws, wsuri)
# Conn represents WebSocket connection to a service.
#
# It provides functionality to issue requests, and (TODO) to receive notifications.
# Conn should be created via connect.
class Conn:
# .wsuri websocket uri of the service
# ._ws websocket connection to service
# ._srv_ready_msg message we got for "ready"
......@@ -71,7 +73,7 @@ class Conn:
# ._rx_wg sync.WorkGroup for spawned _serve_recv
# ._down_once sync.Once
def __init__(conn, ws):
def __init__(conn, ws, wsuri):
try:
msg0_raw = ws.recv()
msg0 = json.loads(msg0_raw)
......@@ -82,6 +84,7 @@ class Conn:
ws.close()
raise ConnError("handshake") from ex
conn.wsuri = wsuri
conn._ws = ws
conn._srv_ready_msg = msg0
......@@ -167,13 +170,13 @@ class Conn:
# req sends request and waits for response.
def req(conn, msg, args_dict): # -> response
rx, _ = conn.req_(msg, args_dict)
def req(conn, ctx, msg, args_dict): # -> response
rx, _ = conn.req_(ctx, msg, args_dict)
return rx
@func
def req_(conn, msg, args_dict): # -> response, raw_response
rxq = conn._send_msg(msg, args_dict)
def req_(conn, ctx, msg, args_dict): # -> response, raw_response
rxq = conn._send_msg(ctx, msg, args_dict)
# handle rx timeout ourselves. We cannot rely on global rx timeout
# since e.g. other replies might be coming in again and again.
......@@ -185,10 +188,13 @@ class Conn:
rxt = _.c
_, _rx = select(
rxt.recv, # 0
rxq.recv_, # 1
ctx.done().recv, # 0
rxt.recv, # 1
rxq.recv_, # 2
)
if _ == 0:
raise ctx.err()
if _ == 1:
raise websocket.WebSocketTimeoutException("timed out waiting for response")
_, ok = _rx
......@@ -201,7 +207,7 @@ class Conn:
# _send_msg sends message to the service.
def _send_msg(conn, msg, args_dict): # -> rxq
def _send_msg(conn, ctx, msg, args_dict): # -> rxq
assert isinstance(args_dict, dict)
assert 'message' not in args_dict
assert 'message_id' not in args_dict
......@@ -217,6 +223,7 @@ class Conn:
d.update(args_dict)
jmsg = json.dumps(d)
try:
# FIXME handle ctx cancel (but it won't stuck forever due to ._ws own timeout)
conn._ws.send(jmsg)
except Exception as ex:
raise ConnError("send") from ex
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Based on https://lab.nexedi.com/nexedi/zodbtools/blob/master/zodbtools/zodb.py
# Copyright (C) 2017-2022 Nexedi SA and Contributors.
# Copyright (C) 2017-2023 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
# Jérome Perrin <jerome@nexedi.com>
#
......@@ -30,6 +30,10 @@ import getopt
import importlib
import sys
from golang import func, defer, chan, go
from golang import context, os as gos, syscall
from golang.os import signal
# command_name -> command_module
command_dict = {}
......@@ -97,6 +101,7 @@ def help(argv):
sys.exit(2)
@func
def main():
try:
optv, argv = getopt.getopt(sys.argv[1:], "h", ["help"])
......@@ -127,7 +132,24 @@ def main():
print("Run 'xamari help' for usage.", file=sys.stderr)
sys.exit(2)
return command_module.main(argv)
# SIGINT/SIGTERM -> ctx cancel
ctx, cancel = context.with_cancel(context.background())
sigq = chan(1, dtype=gos.Signal)
signal.Notify(sigq, syscall.SIGINT, syscall.SIGTERM)
def _():
signal.Stop(sigq)
sigq.close()
defer(_)
def _(cancel):
sig, ok = sigq.recv_()
if not ok:
return
print("# %s" % sig, file=sys.stderr)
cancel()
go(_, cancel)
defer(cancel)
return command_module.main(ctx, argv)
if __name__ == '__main__':
......
# -*- coding: utf-8 -*-
# Copyright (C) 2022 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
# Copyright (C) 2022-2023 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
#
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
......@@ -24,7 +24,7 @@
- use Reader to read logged information from xlog.
(*) for example result of stats, ue_get and erab_get queries.
(*) for example result of stats, ue_get, erab_get and synthetic queries.
"""
# XLog protocol
......@@ -59,11 +59,12 @@
from xlte import amari
from xlte.amari import drb
import json
import traceback
from golang import func, defer
from golang import time
from golang import func, defer, chan, select
from golang import context, sync, time
from golang.gcompat import qq
import logging; log = logging.getLogger('xlte.amari.xlog')
......@@ -124,7 +125,7 @@ class LogSpec:
# xlog queries service @wsuri periodically according to queries specified by
# logspecv and logs the result.
def xlog(wsuri, logspecv):
def xlog(ctx, wsuri, logspecv):
xl = _XLogger(wsuri, logspecv)
slogspecv = ' '.join(['%s' % _ for _ in logspecv])
......@@ -132,8 +133,10 @@ def xlog(wsuri, logspecv):
while 1:
try:
xl.xlog1()
xl.xlog1(ctx)
except Exception as ex:
if ctx.err() is not None:
raise
if not isinstance(ex, amari.ConnError):
log.exception('xlog failure:')
try:
......@@ -144,7 +147,7 @@ def xlog(wsuri, logspecv):
time.sleep(3)
# _XLogger serves xlog implementation.
class _XLogger:
def __init__(xl, wsuri, logspecv):
xl.wsuri = wsuri
......@@ -152,6 +155,7 @@ class _XLogger:
# emit saves line to the log.
def emit(xl, line):
assert isinstance(line, str)
assert '\n' not in line, line
print(line, flush=True)
......@@ -164,10 +168,10 @@ class _XLogger:
# xlog1 performs one cycle of attach/log,log,log.../detach.
@func
def xlog1(xl):
def xlog1(xl, ctx):
# connect to the service
try:
conn = amari.connect(xl.wsuri)
conn = amari.connect(ctx, xl.wsuri)
except Exception as ex:
xl.jemit("service connect failure", {"reason": str(ex)})
if not isinstance(ex, amari.ConnError):
......@@ -180,22 +184,48 @@ class _XLogger:
"srv_type": conn.srv_type,
"srv_version": conn.srv_version}
xl.jemit("service attach", srv_info)
try:
xl._xlog1(conn)
except Exception as ex:
d = srv_info.copy()
d['reason'] = str(ex)
xl.jemit("service detach", d)
if not isinstance(ex, amari.ConnError):
def _():
try:
raise
except Exception as ex:
d = srv_info.copy()
d['reason'] = str(ex)
xl.jemit("service detach", d)
if not isinstance(ex, amari.ConnError):
raise
defer(_)
wg = sync.WorkGroup(ctx)
defer(wg.wait)
# spawn servers to handle queries with synthetic messages
xmsgsrv_dict = {}
for l in xl.logspecv:
if l.query in _xmsg_registry:
xsrv = _XMsgServer(l.query, _xmsg_registry[l.query])
xmsgsrv_dict[l.query] = xsrv
xsrv_ready = chan() # wait for xmsg._runCtx to be initialized
wg.go(xsrv.run, conn, xsrv_ready)
xsrv_ready.recv()
# spawn main logger
wg.go(xl._xlog1, conn, xmsgsrv_dict)
def _xlog1(xl, ctx, conn, xmsgsrv_dict):
# req_ queries either amari service directly, or an extra message service.
def req_(ctx, query, opts): # -> resp_raw
if query in xmsgsrv_dict:
query_xsrv = xmsgsrv_dict[query]
_, resp_raw = query_xsrv.req_(ctx, opts)
else:
_, resp_raw = conn.req_(ctx, query, opts)
return resp_raw
def _xlog1(xl, conn):
# emit config_get after attach
_, cfg_raw = conn.req_('config_get', {})
cfg_raw = req_(ctx, 'config_get', {})
xl.emit(cfg_raw)
# loop emitting requested logspecs
t0 = time.now()
tnextv = [0]*len(xl.logspecv) # [i] - next time to arm for logspecv[i] relative to t0
......@@ -230,12 +260,90 @@ class _XLogger:
tarm = t0 + tmin
δtsleep = tarm - tnow
if δtsleep > 0:
time.sleep(δtsleep)
_, resp_raw = conn.req_(logspec.query, opts)
_, _rx = select(
ctx.done().recv, # 0
time.after(δtsleep).recv, # 1
)
if _ == 0:
raise ctx.err()
resp_raw = req_(ctx, logspec.query, opts)
xl.emit(resp_raw)
# _XMsgServer represents a server for handling particular synthetic requests.
#
# for example the server for synthetic x.drb_stats query.
class _XMsgServer:
def __init__(xsrv, name, f):
xsrv.name = name # str message name, e.g. "x.drb_stats"
xsrv._func = f # func(ctx, conn) to run the service
xsrv._reqch = chan() # chan<respch> to send requests to the service
xsrv._runCtx = None # context not done while .run is running
# run runs the extra server on amari service attached to via conn.
@func
def run(xsrv, ctx, conn: amari.Conn, ready: chan):
xsrv._runCtx, cancel = context.with_cancel(ctx)
defer(cancel)
ready.close()
# establish dedicated conn2 so that server does not semantically
# affect requests issued by main logger. For example if we do not and
# main logger queries stats, and x.drb_stats server also queries stats
# internally, then data received by main logger will cover only small
# random period of time instead of full wanted period.
conn2 = amari.connect(ctx, conn.wsuri)
defer(conn2.close)
xsrv._func(ctx, xsrv._reqch, conn2)
# req queries the server and returns its response.
@func
def req_(xsrv, ctx, opts): # -> resp, resp_raw
origCtx = ctx
ctx, cancel = context.merge(ctx, xsrv._runCtx) # need only merge_cancel
defer(cancel)
respch = chan(1)
_, _rx = select(
ctx.done().recv, # 0
(xsrv._reqch.send, (opts, respch)), # 1
)
if _ == 0:
if xsrv._runCtx.err() and not origCtx.err():
raise RuntimeError("%s server is down" % xsrv.name)
raise ctx.err()
_, _rx = select(
ctx.done().recv, # 0
respch.recv, # 1
)
if _ == 0:
if xsrv._runCtx.err() and not origCtx.err():
raise RuntimeError("%s server is down" % xsrv.name)
raise ctx.err()
resp = _rx
r = {'message': xsrv.name} # place 'message' first
r.update(resp)
resp = r
resp_raw = json.dumps(resp,
separators=(',', ':'), # most compact, like Amari does
ensure_ascii=False) # so that e.g. δt comes as is
return resp, resp_raw
# @_xmsg registers func f to provide server for extra messages with specified name.
_xmsg_registry = {} # name -> xsrv_func(ctx, reqch, conn)
def _xmsg(name, f, doc1):
assert name not in _xmsg_registry
f.xlog_doc1 = doc1
_xmsg_registry[name] = f
_xmsg("x.drb_stats", drb._x_stats_srv, "retrieve statistics about data radio bearers")
# ----------------------------------------
# Reader wraps IO reader to read information generated by xlog.
......@@ -420,14 +528,21 @@ Example for <logspec>+:
stats[samples,rf]/30s ue_get[stats] erab_get/10s qos_flow_get
Besides queries supported by Amarisoft LTE stack natively, support for the
following synthetic queries is also provided:
%s
Options:
-h --help show this help
""" % LogSpec.DEFAULT_PERIOD, file=out)
""" % (LogSpec.DEFAULT_PERIOD,
'\n'.join(" %-14s %s" % (q, f.xlog_doc1)
for q, f in sorted(_xmsg_registry.items()))),
file=out)
def main(argv):
def main(ctx, argv):
try:
optv, argv = getopt.getopt(argv[1:], "h", ["help"])
except getopt.GetoptError as e:
......@@ -450,4 +565,4 @@ def main(argv):
for arg in argv[1:]:
logspecv.append( LogSpec.parse(arg) )
xlog(wsuri, logspecv)
xlog(ctx, wsuri, logspecv)
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -11,7 +11,8 @@ from golang import func, defer
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
from matplotlib import ticker
from datetime import datetime, timedelta
import sys
from urllib.request import urlopen
......@@ -45,37 +46,58 @@ def main():
# The data, as contained in the measurement log, is kept there in the form
# of kpi.Measurement, which is driver-independent representation for
# KPI-related measurement data.
mlog = kpi.MeasurementLog()
while 1:
m = alogm.read()
if m is None:
break
mlog.append(m)
# Step 3. Compute E-RAB Accessibility KPI over MeasurementLog with
# specified granularity period. We partition entries in the measurement log
# by specified time period, and further use kpi.Calc to compute the KPI
# over each period.
def load_measurements(alogm: akpi.LogMeasure) -> kpi.MeasurementLog:
mlog = kpi.MeasurementLog()
while 1:
m = alogm.read()
if m is None:
break
mlog.append(m)
return mlog
mlog = load_measurements(alogm)
# Step 3. Compute KPIs over MeasurementLog with specified granularity
# period. We partition entries in the measurement log by specified time
# period, and further use kpi.Calc to compute the KPIs over each period.
# calc_each_period partitions mlog data into periods and yields kpi.Calc for each period.
def calc_each_period(mlog: kpi.MeasurementLog, tperiod: float): # -> yield kpi.Calc
τ = mlog.data()[0]['X.Tstart']
for m in mlog.data()[1:]:
τ_ = m['X.Tstart']
if (τ_ - τ) >= tperiod:
calc = kpi.Calc(mlog, τ, τ+tperiod)
τ = calc.τ_hi
yield calc
tperiod = float(sys.argv[1])
vτ = []
vInititialEPSBEstabSR = []
vAddedEPSBEstabSR = []
vIPThp_qci = []
for calc in calc_each_period(mlog, tperiod):
vτ.append(calc.τ_lo)
_ = calc.erab_accessibility() # E-RAB Accessibility
vInititialEPSBEstabSR.append(_[0])
vAddedEPSBEstabSR .append(_[1])
τ = mlog.data()[0]['X.Tstart']
for m in mlog.data()[1:]:
τ_ = m['X.Tstart']
if (τ_ - τ) >= tperiod:
calc = kpi.Calc(mlog, τ, τ+tperiod)
vτ.append(calc.τ_lo)
τ = calc.τ_hi
_ = calc.erab_accessibility()
vInititialEPSBEstabSR.append(_[0])
vAddedEPSBEstabSR .append(_[1])
_ = calc.eutran_ip_throughput() # E-UTRAN IP Throughput
vIPThp_qci.append(_)
vτ = np.asarray([datetime.fromtimestamp(_) for _ in vτ])
vInititialEPSBEstabSR = np.asarray(vInititialEPSBEstabSR)
vAddedEPSBEstabSR = np.asarray(vAddedEPSBEstabSR)
vIPThp_qci = np.asarray(vIPThp_qci)
# Step 4. Plot computed KPI.
# The E-RAB Accessibility KPI has two parts: initial E-RAB establishment
# Step 4. Plot computed KPIs.
# 4a) The E-RAB Accessibility KPI has two parts: initial E-RAB establishment
# success rate, and additional E-RAB establishment success rate. kpi.Calc
# provides both of them in the form of their confidence intervals. The
# lower margin of the confidence interval coincides with 3GPP definition of
......@@ -94,37 +116,124 @@ def main():
#
# For each of the parts we plot both its lower margin and the whole
# confidence interval area.
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, layout='constrained')
pmin, psec = divmod(tperiod, 60)
fig.suptitle("E-RAB Accessibility / %s%s" % ("%d'" % pmin if pmin else '',
'%d"' % psec if psec else ''))
ax1.set_title("Initial E-RAB establishment success rate")
ax2.set_title("Added E-RAB establishment success rate")
vτ = [datetime.fromtimestamp(_) for _ in vτ]
def plot1(ax, v, label): # plot1 plots KPI data from vector v on ax.
v = np.asarray(v)
ax.plot(vτ, v['lo'], drawstyle='steps-post', label=label)
ax.fill_between(vτ, v['lo'], v['hi'],
step='post', alpha=0.1, label='%s\nuncertainty' % label)
# 4b) The E-UTRAN IP Throughput KPI provides throughput measurements for
# all QCIs and does not have uncertainty. QCIs for which throughput data is
# all zeros are said to be silent and are not plotted.
plot1(ax1, vInititialEPSBEstabSR, "InititialEPSBEstabSR")
plot1(ax2, vAddedEPSBEstabSR, "AddedEPSBEstabSR")
fig = plt.figure(constrained_layout=True, figsize=(12,8))
facc, fthp = fig.subfigures(1, 2)
figplot_erab_accessibility (facc, vτ, vInititialEPSBEstabSR, vAddedEPSBEstabSR, tperiod)
figplot_eutran_ip_throughput(fthp, vτ, vIPThp_qci, tperiod)
plt.show()
for ax in (ax1, ax2):
ax.set_ylabel("%")
ax.set_ylim([0-10, 100+10])
ax.set_yticks([0,20,40,60,80,100])
xloc = mdates.AutoDateLocator()
xfmt = mdates.ConciseDateFormatter(xloc)
ax.xaxis.set_major_locator(xloc)
ax.xaxis.set_major_formatter(xfmt)
# ---- plotting routines ----
ax.grid(True)
ax.legend(loc='upper left')
# figplot_erab_accessibility plots E-RAB Accessibility KPI data on the figure.
def figplot_erab_accessibility(fig: plt.Figure, vτ, vInititialEPSBEstabSR, vAddedEPSBEstabSR, tperiod=None):
ax1, ax2 = fig.subplots(2, 1, sharex=True)
fig.suptitle("E-RAB Accessibility / %s" % (tpretty(tperiod) if tperiod is not None else
vτ_period_pretty(vτ)))
ax1.set_title("Initial E-RAB establishment success rate")
ax2.set_title("Added E-RAB establishment success rate")
plt.show()
plot_success_rate(ax1, vτ, vInititialEPSBEstabSR, "InititialEPSBEstabSR")
plot_success_rate(ax2, vτ, vAddedEPSBEstabSR, "AddedEPSBEstabSR")
# figplot_eutran_ip_throughput plots E-UTRAN IP Throughput KPI data on the figure.
def figplot_eutran_ip_throughput(fig: plt.Figure, vτ, vIPThp_qci, tperiod=None):
ax1, ax2 = fig.subplots(2, 1, sharex=True)
fig.suptitle("E-UTRAN IP Throughput / %s" % (tpretty(tperiod) if tperiod is not None else
vτ_period_pretty(vτ)))
ax1.set_title("Downlink")
ax2.set_title("Uplink")
ax1.set_ylabel("Mbit/s")
ax2.set_ylabel("Mbit/s")
v_qci = (vIPThp_qci .view(np.float64) / 1e6) \
.view(vIPThp_qci.dtype)
plot_per_qci(ax1, vτ, v_qci[:,:]['dl'], 'IPThp')
plot_per_qci(ax2, vτ, v_qci[:,:]['ul'], 'IPThp')
_, dmax = ax1.get_ylim()
_, umax = ax2.get_ylim()
ax1.set_ylim(ymin=0, ymax=dmax*1.05)
ax2.set_ylim(ymin=0, ymax=umax*1.05)
# plot_success_rate plots success-rate data from vector v on ax.
# v is array with Intervals.
def plot_success_rate(ax, vτ, v, label):
ax.plot(vτ, v['lo'], drawstyle='steps-post', label=label)
ax.fill_between(vτ, v['lo'], v['hi'],
step='post', alpha=0.1, label='%s\nuncertainty' % label)
ax.set_ylabel("%")
ax.set_ylim([0-10, 100+10])
ax.set_yticks([0,20,40,60,80,100])
fmt_dates_pretty(ax.xaxis)
ax.grid(True)
ax.legend(loc='upper left')
# plot_per_qci plots data from per-QCI vector v_qci.
#
# v_qci should be array[t, QCI].
# QCIs, for which v[:,qci] is all zeros, are said to be silent and are not plotted.
def plot_per_qci(ax, vτ, v_qci, label):
ax.set_xlim((vτ[0], vτ[-1])) # to have correct x range even if we have no data
assert len(v_qci.shape) == 2
silent = True
propv = list(plt.rcParams['axes.prop_cycle'])
for qci in range(v_qci.shape[1]):
v = v_qci[:, qci]
if (v['hi'] == 0).all(): # skip silent QCIs
continue
silent = False
prop = propv[qci % len(propv)] # to have same colors for same qci in different graphs
ax.plot(vτ, v['lo'], label="%s.%d" % (label, qci), **prop)
ax.fill_between(vτ, v['lo'], v['hi'], alpha=0.3, **prop)
if silent:
ax.plot([],[], ' ', label="all QCI silent")
fmt_dates_pretty(ax.xaxis)
ax.grid(True)
ax.legend(loc='upper left')
# fmt_dates_pretty instructs axis to use concise dates formatting.
def fmt_dates_pretty(axis):
xloc = mdates.AutoDateLocator()
xfmt = mdates.ConciseDateFormatter(xloc)
axis.set_major_locator(xloc)
axis.set_major_formatter(xfmt)
axis.set_minor_locator(ticker.AutoMinorLocator(5))
# tpretty returns pretty form for time, e.g. 1'2" for 62 seconds.
def tpretty(t):
tmin, tsec = divmod(t, 60)
return "%s%s" % ("%d'" % tmin if tmin else '',
'%d"' % tsec if tsec else '')
# vτ_period_pretty returns pretty form for time period in vector vτ.
# for example [2,5,8,11] gives 3'.
def vτ_period_pretty(vτ):
if len(vτ) < 2:
return "?"
s = timedelta(seconds=1)
δvτ = (vτ[1:] - vτ[:-1]) / s # in seconds
min = δvτ.min()
avg = δvτ.mean()
max = δvτ.max()
std = δvτ.std()
if min == max:
return tpretty(min)
return "%s ±%s [%s, %s]" % (tpretty(avg), tpretty(std), tpretty(min), tpretty(max))
if __name__ == '__main__':
......
This diff is collapsed.
# -*- coding: utf-8 -*-
# Copyright (C) 2022 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
# Copyright (C) 2022-2023 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
#
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
......@@ -18,7 +18,7 @@
# See COPYING file for full licensing terms.
# See https://www.nexedi.com/licensing for rationale and options.
from xlte.kpi import Calc, MeasurementLog, Measurement, Interval, NA, isNA
from xlte.kpi import Calc, MeasurementLog, Measurement, Interval, NA, isNA, Σqci, Σcause, nqci
import numpy as np
from pytest import raises
......@@ -29,12 +29,18 @@ def test_Measurement():
# verify that all fields are initialized to NA
def _(name):
assert isNA(m[name])
v = m[name]
if v.shape == ():
assert isNA(v) # scalar
else:
assert isNA(v).all() # array
# several fields explicitly
_('X.Tstart') # time
_('RRC.ConnEstabAtt.sum') # Tcc
_('DRB.PdcpSduBitrateDl.sum') # float32
_('DRB.IPThpVolDl.sum') # int64
_('DRB.IPVolDl.sum') # int64
_('DRB.IPTimeDl.7') # .QCI alias
_('DRB.IPTimeDl.QCI') # .QCI array
# everything automatically
for name in m.dtype.names:
_(name)
......@@ -45,16 +51,29 @@ def test_Measurement():
assert m['S1SIG.ConnEstabAtt'] == 123
m['RRC.ConnEstabAtt.sum'] = 17
assert m['RRC.ConnEstabAtt.sum'] == 17
m['DRB.IPVolDl.QCI'][:] = 0
m['DRB.IPVolDl.5'] = 55
m['DRB.IPVolDl.7'] = NA(m['DRB.IPVolDl.7'].dtype)
m['DRB.IPVolDl.QCI'][9] = 99
assert m['DRB.IPVolDl.5'] == 55; assert m['DRB.IPVolDl.QCI'][5] == 55
assert isNA(m['DRB.IPVolDl.7']); assert isNA(m['DRB.IPVolDl.QCI'][7])
assert m['DRB.IPVolDl.9'] == 99; assert m['DRB.IPVolDl.QCI'][9] == 99
for k in range(len(m['DRB.IPVolDl.QCI'])):
if k in {5,7,9}:
continue
assert m['DRB.IPVolDl.%d' % k] == 0
assert m['DRB.IPVolDl.QCI'][k] == 0
# str/repr
assert repr(m) == "Measurement(RRC.ConnEstabAtt.sum=17, S1SIG.ConnEstabAtt=123)"
assert repr(m) == "Measurement(RRC.ConnEstabAtt.sum=17, DRB.IPVolDl.QCI={5:55 7:ø 9:99}, S1SIG.ConnEstabAtt=123)"
s = str(m)
assert s[0] == '('
assert s[-1] == ')'
v = s[1:-1].split(', ')
vok = ['ø'] * len(m.dtype.names)
vok = ['ø'] * len(m._dtype0.names)
vok[m.dtype.names.index("RRC.ConnEstabAtt.sum")] = "17"
vok[m.dtype.names.index("S1SIG.ConnEstabAtt")] = "123"
vok[m.dtype.names.index("DRB.IPVolDl.QCI")] = "{5:55 7:ø 9:99}"
assert v == vok
# verify that time fields has enough precision
......@@ -420,9 +439,107 @@ def test_Calc_erab_accessibility():
_(InititialEPSBEstabSR, 100 * 2*3*4 / (7*8*9))
# verify Calc.eutran_ip_throughput .
def test_Calc_eutran_ip_throughput():
# most of the job is done by drivers collecting DRB.IPVol{Dl,Ul} and DRB.IPTime{Dl,Ul}
# here we verify final aggregation, that eutran_ip_throughput does, only lightly.
m = Measurement()
m['X.Tstart'] = 10
m['X.δT'] = 10
m['DRB.IPVolDl.5'] = 55e6
m['DRB.IPVolUl.5'] = 55e5
m['DRB.IPTimeDl.5'] = 1e2
m['DRB.IPTimeUl.5'] = 1e2
m['DRB.IPVolDl.7'] = 75e6
m['DRB.IPVolUl.7'] = 75e5
m['DRB.IPTimeDl.7'] = 1e2
m['DRB.IPTimeUl.7'] = 1e2
m['DRB.IPVolDl.9'] = 0
m['DRB.IPVolUl.9'] = 0
m['DRB.IPTimeDl.9'] = 0
m['DRB.IPTimeUl.9'] = 0
for qci in {5,7,9}:
m['XXX.DRB.IPTimeDl_err.QCI'][qci] = 0
m['XXX.DRB.IPTimeUl_err.QCI'][qci] = 0
# (other QCIs are left with na)
for qci in set(range(nqci)).difference({5,7,9}):
assert isNA(m['DRB.IPVolDl.QCI'][qci])
assert isNA(m['DRB.IPVolUl.QCI'][qci])
assert isNA(m['DRB.IPTimeDl.QCI'][qci])
assert isNA(m['DRB.IPTimeUl.QCI'][qci])
assert isNA(m['XXX.DRB.IPTimeDl_err.QCI'][qci])
assert isNA(m['XXX.DRB.IPTimeUl_err.QCI'][qci])
mlog = MeasurementLog()
mlog.append(m)
calc = Calc(mlog, 10,20)
thp = calc.eutran_ip_throughput()
def I(x): return Interval(x,x)
assert thp[5]['dl'] == I(55e4)
assert thp[5]['ul'] == I(55e3)
assert thp[7]['dl'] == I(75e4)
assert thp[7]['ul'] == I(75e3)
assert thp[9]['dl'] == I(0)
assert thp[9]['ul'] == I(0)
for qci in set(range(nqci)).difference({5,7,9}):
assert thp[qci]['dl'] == I(0)
assert thp[qci]['ul'] == I(0)
# verify Σqci.
def test_Σqci():
m = Measurement()
x = 'ERAB.EstabInitAttNbr'
def Σ():
return Σqci(m, x+'.QCI')
assert isNA(Σ())
m[x+'.sum'] = 123
assert Σ() == 123
m[x+'.17'] = 17
m[x+'.23'] = 23
m[x+'.255'] = 255
assert Σ() == 123 # from .sum
m[x+'.sum'] = NA(m[x+'.sum'].dtype)
assert isNA(Σ()) # from array, but NA values lead to sum being NA
v = m[x+'.QCI']
l = len(v)
for i in range(l):
v[i] = 1 + i
assert Σ() == 1*l + (l-1)*l/2
# verify Σcause.
def test_Σcause():
m = Measurement()
x = 'RRC.ConnEstabAtt'
def Σ():
return Σcause(m, x+'.CAUSE')
assert isNA(Σ())
m[x+'.sum'] = 123
assert Σ() == 123
# TODO sum over individual causes (when implemented)
def test_NA():
def _(typ):
return NA(typ(0).dtype)
na = NA(typ(0).dtype)
assert type(na) is typ
assert isNA(na)
return na
assert np.isnan( _(np.float16) )
assert np.isnan( _(np.float32) )
......
#!/usr/bin/env python
# Copyright (C) 2023 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
#
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
# option) any later version, as published by the Free Software Foundation.
#
# You can also Link and Combine this program with other software covered by
# the terms of any of the Free Software licenses or any of the Open Source
# Initiative approved licenses and Convey the resulting work. Corresponding
# source of such a combination shall include the source code for all other
# software used.
#
# This program is distributed WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See COPYING file for full licensing terms.
# See https://www.nexedi.com/licensing for rationale and options.
"""Program udpflood sends/floods packets via UDP.
It is useful to test how E-UTRAN IP Throughput KPI implementation handles bursts.
Usage: udpflood host:port npkt/period pause_ms
"""
import sys, time
from socket import socket, AF_INET, SOCK_DGRAM, IPPROTO_UDP
def main():
addr = sys.argv[1]
host, port = addr.split(':')
port = int(port)
npkt_period = 1
pause_ms = 0
if len(sys.argv) >= 3:
npkt_period = int(sys.argv[2])
if len(sys.argv) >= 4:
pause_ms = int(sys.argv[3])
print("# udpflood -> %s :%s %d pkt/period, %dms pause in between periods" %
(host, port, npkt_period, pause_ms))
sk = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
pkt = b'\xff'*1000
while 1:
for _ in range(npkt_period):
sk.sendto(pkt, (host, port))
if pause_ms:
time.sleep(pause_ms*0.001)
if __name__ == '__main__':
main()
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment