1. 08 Mar, 2023 12 commits
    • Kirill Smelkov's avatar
      amari.kpi: Teach LogMeasure to handle x.drb_stats messages · 499a7c1b
      Kirill Smelkov authored
      This patch provides next building block for E-UTRAN IP Throughput KPI
      and continues
      
          d102ffaa (drb: Start of the package)
          5bf7dc1c (amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message)
      
      Quoting those patches
      
          The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
          100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
          streams and the number of transport blocks dl/ul-ed to the UE in question
          during that 10ms frame. Estimate `tx_time` taking into account
          the number of transmitted transport blocks. And estimate whether eNB is congested or
          not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
          also need to poll for `stats` at 100Hz frequency and synchronize
          `ue_get[stats]` and `stats` requests in time so that they both cover the same
          time interval of particular frame.
      
          Then organize the polling process to provide aggregated statistics in the form of
          new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
          `enb.xlog` together with `stats`.
      
          Then further adjust `amari.kpi.LogMeasure`						<-- NOTE
          and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
      
      So here we implement the noted step:
      
      We teach LogMeasure to take x.drb_stats messages into account and update IP
      Throughput related fields in appropriate Measurement from x.drb_stats
      data.
      
      This process is relatively straightforward besides one place: for stable
      output E-UTRAN IP Throughput is required to be computed without taking
      into account last TTI of every sample. We don't have that level of
      details since all we have is total amount of transmitted bytes in a
      burst and estimation of how long in time the burst is. Thus we can only
      provide an estimation for the E-UTRAN IP Throughput as follows:
      
          DRB.IPVol and DRB.IPTime are collected to compute throughput.
      
          thp = ΣB*/ΣT*  where B* is tx'ed bytes in the sample without taking last tti into account
                         and   T* is time of tx also without taking that sample's tail tti.
      
          we only know ΣB (whole amount of tx), ΣT and ΣT* with some error.
      
          -> thp can be estimated to be inside the following interval:
      
                   ΣB            ΣB
                  ───── ≤ thp ≤ ─────           (1)
                  ΣT_hi         ΣT*_lo
      
          the upper layer in xlte.kpi will use the following formula for
          final throughput calculation:
      
                        DRB.IPVol
                  thp = ──────────              (2)
                        DRB.IPTime
      
          -> set DRB.IPTime and its error to mean and δ of ΣT_hi and ΣT*_lo
          so that (2) becomes (1).
      
      for this to work we also need to introduce new fields to Measurement
      that represent error of DRB.IPTime. The hope is that introduction is
      temporary and should be removed once we rework DRB stats to provide B*
      and T* directly.
      499a7c1b
    • Kirill Smelkov's avatar
      amari.kpi: Rework LogMeasure to prepare Measurement incrementally · fd7870f4
      Kirill Smelkov authored
      We added LogMeasure in 71087f67 (amari.kpi: New package with driver for
      Amarisoft LTE stack to retrieve KPI-related measurements from logs) and
      its original logic is to read `stats` messages and to create Measurement
      that covers [Sx, Sx+1) only after seeing Sx+1.
      
      However in the next patch we will need to also take into account other
      smaller messages besides stats, and for that messages we need
      being-prepared Measurement to already exist to be able to amend it with
      partial data we see. So we need to rework the process to create
      Measurement that will cover [Sx, Sx+1) right after seeing Sx without
      waiting for Sx+1 to come in.
      
      This patch does that.
      
      Along the way it unifies how events and stats are handled. Previously
      events and stats were handled via different objects and the code had many
      scattered places that tried to handle cases like event-event,
      event-stats, stats-event and stats-stats. And for all those cases the
      intent was that we still want to emit corresponding Measurement for all
      of them, even if maybe if all NA data besides timestamps. Thus it does
      not make sense to split events and stats into different flows - as we can
      handle all combinations by considering just one flow of "stats or
      events". This simplifies logic and removes several sporadic branches
      of code to emit M(ø) around events. It also discovers several places
      where we were not emitting such M(ø) even though the intent was to do
      so. All this is fixed now with updated tests.
      fd7870f4
    • Kirill Smelkov's avatar
      amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message · 5bf7dc1c
      Kirill Smelkov authored
      This patch provides next building block for E-UTRAN IP Throughput KPI
      and continues d102ffaa (drb: Start of the package). Quoting that patch
      
          The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
          100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
          streams and the number of transport blocks dl/ul-ed to the UE in question
          during that 10ms frame. Estimate `tx_time` taking into account
          the number of transmitted transport blocks. And estimate whether eNB is congested or
          not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
          also need to poll for `stats` at 100Hz frequency and synchronize
          `ue_get[stats]` and `stats` requests in time so that they both cover the same
          time interval of particular frame.
      
          Then organize the polling process to provide aggregated statistics in the form of	<-- NOTE
          new `x.drb_stats` message, and teach `xamari xlog` to save that messages to		<-- NOTE
          `enb.xlog` together with `stats`.							<-- NOTE
      
          Then further adjust `amari.kpi.LogMeasure` and generic
          `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
      
      So here we implement the noted step:
      
      - add drv._x_stats_srv server that polls eNB at 100Hz rate, uses Sampler
        to extract bursts and aggregates information about those bursts.
      
      - teach xlog to organize servers for synthetic messages and communicate
        with them, and register drv._x_stats_srv as such server to handle
        generation of x.drb_stats message.
      5bf7dc1c
    • Kirill Smelkov's avatar
      amari.drb += _IncStats · 78f26e3a
      Kirill Smelkov authored
      An utility class to compute avg/std incrementally.
      
      Thanks to https://www.johndcook.com/blog/standard_deviation/ for the
      recipe of how to do it.
      78f26e3a
    • Kirill Smelkov's avatar
      amari.drb: Start of the package · d102ffaa
      Kirill Smelkov authored
      This package will be used to implement E-UTRAN IP Throughput KPI.
      
      In hereby patch we add `drb.Sampler` that extracts samples of
      transmission bursts from `ue_get[stats]` observations.
      
      Let's go through what E-UTRAN IP Throughput KPI is and how it motivates
      functionality provided by this patch.
      
      Overview of E-UTRAN IP Throughput computation
      ---------------------------------------------
      
      This KPI is defined in TS 32.450 [1] and aggregates transmission volume and
      time over bursts of transmissions from an average UE point of view. It should be
      particularly noted that only the time, during which transmission is going on,
      should be accounted. For example if an UE receives 10KB over 4ms burst and the rest of
      the time there is no transmission to it during, say, 1 minute, the downlink IP
      Throughput for that UE over the minute is 20Mbit/s (= 8·10KB/4ms), not 1.3Kbit/s (= 8·10KB/60s).
      This KPI basically shows what would be the speed to e.g. download a response for
      HTTP request issued from a mobile.
      
      [1] https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13
      
      To compute IP Throughput we thus need to know Σ of transmitted amount
      of bytes, and Σ of the time of all transmission bursts.
      
      Σ of the bytes is relatively easy to get. eNB already provides close values in
      overall `stats` and in per-UE `ue_get[stats]` messages. However there is no
      anything readily available out-of-the box for Σ of bursts transmission time.
      Thus we need to measure the time of transmission bursts ourselves somehow.
      
      It turns out that with current state of things the only practical way to
      measure it to some degree is to poll eNB frequently with `ue_get[stats]` and
      estimate transmission time based on δ of `ue_get` timestamps.
      
      Let's see how frequently we need to poll to get to reasonably accuracy of resulting throughput.
      
      A common situation for HTTP requests issued via LTE is that response content
      downloading time takes only few milliseconds. For example I used chromium
      network profiler to access various sites via internet tethered from my phone
      and saw that for many requests response content downloading time was e.g. 4ms,
      5ms, 3.2ms, etc. The accuracy of measuring transmission time should be thus in
      the order of millisecond to cover that properly. It makes a real difference for
      reported throughput, if say a download sample with 10KB took 4ms, or it took
      e.g. "something under 100ms". In the first case we know that for that sample
      downlink throughput is 2500KB/s, while in the second case all we know is that
      downlink throughput is "higher than 100KB/s" - a 25 times difference and not
      certain. Similarly if we poll at 10ms rate we would get that throughput is "higher
      than 1000KB/s" - a 2.5 times difference from actual value. The accuracy of 1
      millisecond coincides with TTI time and with how downlink/uplink transmissions
      generally work in LTE.
      
      With the above the scheme to compute IP Throughput looks to be as
      follows: poll eNB at 1000Hz rate for `ue_get[stats]`, process retrieved
      information into per-UE and per-QCI streams, detect bursts on each UE/QCI pair,
      and aggregate `tx_bytes` and `tx_time` from every burst.
      
      It looks to be straightforward, but 1000Hz polling will likely create
      non-negligible additional load on the system and disturb eNB itself
      introducing much jitter and harming its latency requirements. That's probably
      why eNB actually rate-limits WebSocket requests not to go higher than 100Hz -
      the frequency 10 times less compared to what we need to get to reasonable
      accuracy for IP throughput.
      
      Fortunately there is additional information that provides a way to improve
      accuracy of measured `tx_time` even when polled every 10ms at 100Hz rate:
      that additional information is the number of transmitted transport blocks to/from
      an UE. If we know that during 10ms frame it was e.g. 4 transport blocks transmitted
      to the UE, that there were no retransmissions *and* that eNB is not congested, we can
      reasonably estimate that it was actually a 4ms transmission. And if eNB is
      congested we can still say that transmission time is somewhere in `[4ms, 10ms]`
      interval because transmitting each transport block takes 1 TTI. Even if
      imprecise that still provides some information that could be useful.
      
      Also 100Hz polling turns to be acceptable from performance point of view and
      does not disturb the system much. For example on the callbox machine the process,
      that issues polls, takes only about 3% of CPU load and only on one core, and
      the CPU usage of eNB does not practically change and its reported tx/rx latency
      does not change as well. For sure, there is some disturbance, but it appears to
      be small. To have a better idea of what rate of polling is possible, I've made
      an experiment with the poller accessing my own websocket echo server quickly
      implemented in python. Both the poller and the echo server are not optimized,
      but without rate-limiting they could go to 8000Hz frequency with reaching 100%
      CPU usage of one CPU core. That 8000Hz is 80x times more compared to 100Hz
      frequency actually allowed by eNB. This shows what kind of polling
      frequency limit the system can handle, if absolutely needed, and that 100Hz
      turns out to be not so high a frequency. Also the Linux 5.6 kernel, installed
      on the callbox from Fedora32, is configured with `CONFIG_HZ=1000`, which is
      likely helping here.
      
      Implementation overview
      ~~~~~~~~~~~~~~~~~~~~~~~
      
      The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
      100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
      streams and the number of transport blocks dl/ul-ed to the UE in question
      during that 10ms frame. Estimate `tx_time` taking into account
      the number of transmitted transport blocks. And estimate whether eNB is congested or
      not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
      also need to poll for `stats` at 100Hz frequency and synchronize
      `ue_get[stats]` and `stats` requests in time so that they both cover the same
      time interval of particular frame.
      
      Then organize the polling process to provide aggregated statistics in the form of
      new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
      `enb.xlog` together with `stats`.  Then further adjust `amari.kpi.LogMeasure`
      and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
      
      ----------------------------------------
      
      In this patch we provide first building block - `Sampler` that extracts bursts
      of data transmissions from stream of `ue_get[stats]` observations.
      
      Even though main idea behind `Sampler` is relatively straightforward, several
      aspects deserves to be noted:
      
      1. information about transmitted bytes and corresponding transmitted transport
         blocks is emitted by eNB not synchronized in time. The reason here is that,
         for example, for DL a block is transmitted via PDCCH+PDSCH during one TTI, and
         then the base station awaits HARQ ACK/NACK. That ACK/NACK comes later via
         PUCCH or PUSCH. The time window in between original transmission and
         reception of the ACK/NACK is 4 TTIs for FDD and 4-13 TTIs for TDD (*).
         And Amarisoft LTEENB updates counters for dl_total_bytes and dl_tx at
         different times:
      
             ue.erab.dl_total_bytes      - right after sending data on  PDCCH+PDSCH
             ue.cell.{dl_tx,dl_retx}     - after receiving ACK/NACK via PUCCH|PUSCH
      
         this way an update to dl_total_bytes might be seen in one frame (= 10·TTI),
         while corresponding update to dl_tx/dl_retx might be seen in either same, or
         next, or next-next frame.
      
         We bring `δ(tx_bytes)` and `#tx_tb` in sync ourselves via _BitSync.
      
         (*) see e.g. Figure 8.1 in "An introduction to LTE, 2nd ed."
      
      2. when we see multiple transmissions related to UE on different QCIs, we
         cannot directly use corresponding number of transport blocks to estimate
         transmissions times because we do not know how eNB scheduler placed those
         transmissions onto resource map. So without additional information we can only
         estimate corresponding lower and upper bounds.
      d102ffaa
    • Kirill Smelkov's avatar
      amari.xlog: Move main logger to a thread · 79d10eb9
      Kirill Smelkov authored
      We will soon need to run 2 threads:
      
      - one with the main logger, and
      - another one to serve requests for synthetic x.drb_stats queries
      
      Both main and the second thread will be run via sync.WorkGroup to cancel
      each other in case of failure somewhere. So since WorkGroup.wait(),
      similarly to all pygolang operations, is not interrupted by signals(*),
      we need to wire ctx to be passed through all operations and manage to
      cancel that context on SIGINT/SIGTERM.
      
      This patch:
      
      1. adjusts xlog to wire ctx through all call chains and moves ._xlog1()
         to be run in the thread.
      2. adjusts amari.Conn to take ctx as argument on all operations and
         react reasonably on that ctx cancel. We need to do it here because
         xlog uses Conn internally.
      3. adjusts xamari main driver to setup root context that is canceled on
         SIGINT/SIGTERM similarly e.g. to how nxdtest does it in
         nexedi/nxdtest@b0cf277d .
      
      (*) see nexedi/pygolang@e18adbab for details.
      79d10eb9
    • Kirill Smelkov's avatar
      amari.xlog: Rework "service detach" to be detected and done via defer instead of try/except · c967c8b5
      Kirill Smelkov authored
      We will soon add more levels of trying to this part of the code and
      linear defers are easier to follow compared to many levels of try/except
      nesting.
      c967c8b5
    • Kirill Smelkov's avatar
      amari: Conn: Provide a way to retrieve websocket URI to where a Conn is connected · 749e1659
      Kirill Smelkov authored
      We will soon need this to know at runtime the address of eNB service
      attached by Conn to establish another connection attached to the same eNB.
      749e1659
    • Kirill Smelkov's avatar
      kpi: Add support for QCI to Measurements · ffffb933
      Kirill Smelkov authored
      Previously for Measurement fields with .QCI or .CAUSE suffix we had only
      the .sum value and no per-QCI nor per-CAUSE values. In other words
      support for QCI and CAUSE was stub. In this patch we add support for
      QCI: every field X.QCI is now automatically expanded into X[256] array
      and X.sum . For convenience we also provide X.<qci> aliases that alias
      X[qci]. For example field DRB.IPVolDl.9 aliases 9'th element of
      DRB.IPVolDl array.
      
      We will need QCI support for E-UTRAN IP Throughput KPI which is required
      to provide resulting values for every QCI individually.
      
      CAUSE support remains stub for now.
      ffffb933
    • Kirill Smelkov's avatar
      kpi: Fix Σqci and Σcause when invoked on NA data · 6eccb446
      Kirill Smelkov authored
      There was a thinko that led to returning 0 instead of NA when there are
      not .QCI or .CAUSE fields except .sum . Without added fix, e.g.
      
      	Σqci(Measurement(), 'ERAB.EstabInitAttNbr.QCI')
      
      was returning 0 instead of NA.
      
      -> Fix it.
      6eccb446
    • Kirill Smelkov's avatar
      kpi: Fix naming of Measurement fields related to IP Throughput · 762153ab
      Kirill Smelkov authored
          DRB.IPThpVol   ->  DRB.IPVol	(no "Thp" inside)
          DRB.IPThpTime  ->  DRB.IPTime	(no "Thp" inside
      
      TS 32.450 and TS 32.425 defines those names as in corrected variants -
      please see corresponding references in the code for details.
      
      It was my thinko in dc1d5481 (kpi: Start of the package) to use "Thp" in
      the names.
      762153ab
    • Kirill Smelkov's avatar
      kpi: Fix NA(dtype) to return object with the same dtype · c30ee6db
      Kirill Smelkov authored
      We were not caring about that and so previously e.g. NA(np.int16) was
      giving int instead of np.int16 .
      
      Fix it.
      c30ee6db
  2. 05 Jan, 2023 2 commits
    • Kirill Smelkov's avatar
      setup: Make check-manifest happy · e1a5ceea
      Kirill Smelkov authored
      Before this patch it was complaining:
      
          (xlte3.venv) kirr@deca:~/src/wendelin/xlte$ check-manifest
          lists of files in version control and sdist do not match!
          missing from sdist:
            demo/kpidemo.ipynb
            demo/kpidemo.py
            xlte.py
          suggested MANIFEST.in rules:
            include *.py
            recursive-include demo *.ipynb
            recursive-include demo *.py
      
      Noticed during !2 review.
      e1a5ceea
    • Xavier Thompson's avatar
      setup.py: Include CHANGELOG.rst in sdist · 6bd55e52
      Xavier Thompson authored
      Make `python setup.py sdist` include CHANGELOG.rst in the tar.gz.
      Without it, installing from the released tar.gz will fail because
      setup.py expects to read CHANGELOG.rst to generate the long description.
      
      /reviewed-by @kirr
      /reviewed-on !2
      6bd55e52
  3. 03 Jan, 2023 2 commits
  4. 12 Dec, 2022 12 commits
    • Kirill Smelkov's avatar
      demo/kpidemo.ipynb: New JupyterLab notebook that demonstrates how to compute KPIs with xlte · b3370cd8
      Kirill Smelkov authored
      Add JupyterLab notebook that shows how to build KPI-computing pipeline
      and to compute the KPIs. The notebook comes with extensive comments
      describing every step. Please see those comments for details.
      b3370cd8
    • Kirill Smelkov's avatar
      demo/kpidemo.py: New program that demonstrates how to compute KPIs with xlte · 7934cb0e
      Kirill Smelkov authored
      Add demo program that shows how to build KPI-computing pipeline and to
      compute the KPIs. It can be used e.g. as follows:
      
          $ ./demo/kpidemo.py 60 https://lab.nexedi.com/kirr/misc/raw/162307b9/lte/20221211-overload.xlog
      
      The program comes with extensive comments describing every step.
      Please see those comments for details.
      
      The next patch will also add analogous JupyterLab notebook.
      7934cb0e
    • Kirill Smelkov's avatar
      kpi += Calc · f3f3ec89
      Kirill Smelkov authored
      kpi.Calc is calculator to compute KPIs. It can be instantiated on
      MeasurementLog and time interval over which to perform computations.
      It currently implements calculations for only one "E-RAB Accessibility KPI".
      
      Please see added docstrings and tests for details.
      
      The next patch will also add demo program that uses all kpi.Calc and
      other parts of KPI-computation pipeline to build and visualize E-RAB
      Accessibility from real data.
      f3f3ec89
    • Kirill Smelkov's avatar
      amari.kpi: New package with driver for Amarisoft LTE stack to retrieve... · 71087f67
      Kirill Smelkov authored
      amari.kpi: New package with driver for Amarisoft LTE stack to retrieve KPI-related measurements from logs
      
      amari.kpi provides LogMeasure that takes enb.xlog (TODO and enb.log) as
      input, and produces kpi.Measurements on output.
      
          enb.xlog     ─────────
          ─────────>  │   Log   │
                      │         │ ────> []kpi.Measurement
          ─────────>  │ Measure │
          enb.log      ─────────
      
      We read log data organizing periods around stats queries, and for now we
      build Measurement from stats' counters. To do so we take δ(stats_prev, stat)
      and process it mapping Amarisoft counters to 3GPP ones specified by
      kpi.Measurement.
      
      We emit measurement X after reading stats X+2 - i.e. we emit measurement
      for a period after reading data covering _next_ period. It is organized
      this way to account for init/fini correction:
      
                   fini adjust
                  -------------
                 '             '
           Sx    v     Sx+1    '   Sx+2
        ────|───────────|───────────|────
             Measurement Measurement
                  X          X+1
      
      This approach has following limitations:
      
      - for most of the counters there is no direct mapping in between
        Amarisoft and 3GPP. For example we currently use s1_erab_setup_request for
        ERAB.EstabAddAtt.sum, but this mapping is not strictly correct and will
        break if corresponding S1 E-RAB SETUP REQUEST message contains multiple
        ERABs. The code has corresponding FIXME marks where such approximations
        are used.
      
      - it is not possible to implement init/fini correction precisely. From
        aggregated statistics we only get total amount for a fini value for a
        period - without knowing which part of it corresponds to init events
        from previous period, and which part to init events from current one.
        With that it is only possible to make a reasonable guess and try to
        preserve statistical properties, but not more. See m_initfini in the
        code for details.
      
      - it is possible to handle eNB with single cell only. This limitation
        comes from the fact that in Amarisoft LTE stack S1-related counters
        come as "globals" ones, while e.g. RRC-related counters are "per-cell".
        It is thus not possible to see how much S1 connection establishments
        are associated with one particular cell if there are several of them.
      
      TODO also parse enb.log to fix those issues.
      71087f67
    • Kirill Smelkov's avatar
      amari.xlog += Reader · 0633d26f
      Kirill Smelkov authored
      xlog.Reader could be used to parse and read back data previously saved
      by xlog. In the next patch we will use it in Amarisoft driver for KPI
      measurements.
      0633d26f
    • Kirill Smelkov's avatar
      kpi: Start of the package · dc1d5481
      Kirill Smelkov authored
      Start the package to process measurements and compute KPIs from them.
      
      In this patch we add kpi.Measurement - a central part to represent
      measurement results in intermediate generic form. kpi.Measurement will
      be used by both KPI calculator, and by drivers for particular LTE stacks
      to provide their KPI-related data in this uniform common format.
      
      kpi.Measurement also establishes semantic for such measurement results
      to be followed by drivers. The semantic is stated in kpi.Measurement
      docstring and in comment for every field. Also in particular, according
      to TS 32.401 and common sense, measurement data are required to be
      correctly accounted for initiation/termination events to avoid
      discrepancies. Quoting kpi.Measurement documentation:
      
          Important note (init/fini correction):
      
            Termination events should be counted in the same granularity period, where
            corresponding initiation event occurred, even if termination event happens
            _after_ granularity period covering the initiation event. For example in the
            following illustration "ConnEstab Success" event should be counted in the
            same granularity period 1 as "ConnEstab Initiate" event:
      
                           -----------------------
                          '                       '
                  | p e r ' i o d 1       | p e r ' i o d 2    |
                  |       '               |       v            |
              ────'───────x───────────────'───────x────────────'────────────>
                      ConnEstab               ConnEstab                time
                      Initiate                 Success
      
            This preserves invariant that N(initiations) is always ≥ N(results) and
            goes in line with what TS 32.401 4.3.2 "Perceived accuracy -> Same period
            for the same two events" requires.
      
      kpi.Measurement comes accompanied by kpi.MeasurementLog which in essence
      is array of kpi.Measurements.
      
      We will use kpi.Measurement and kpi.MeasurementLog in later patches to
      both provide Amarisoft-specific data in this common format, and to
      compute KPIs from it.
      dc1d5481
    • Kirill Smelkov's avatar
      amari.xlog: Mention it in readme · 949cc753
      Kirill Smelkov authored
      949cc753
    • Kirill Smelkov's avatar
      amari.xlog: Clarify why we will need to implement log rotation ourselves · d20f64be
      Kirill Smelkov authored
      Upon rotation we want to emit trailing part to the old file, and emit
      new header into new log file. All this is custom and cannot be handled
      reliably when rotation is done by external tool.
      d20f64be
    • Kirill Smelkov's avatar
      amari.xlog: Document xlog protocol · 22292b62
      Kirill Smelkov authored
      Document kind of messages and events that could be emitted by xlog.
      
      The messages come from Amarisoft software directly, but events are
      xlog-specific and without proper documentation it is easy to miss what
      they are and which information and semantic they carry.
      22292b62
    • Kirill Smelkov's avatar
      amari.xlog: Switch time emitted in events to be seconds/epoch instead of RFC822 · 3175e9c6
      Kirill Smelkov authored
      The time emitted in messages by Amarisoft is in seconds. It also makes
      sense to emit meta/event times in seconds as well for uniformity.
      
      This is backward-incompatible change, but it should be ok at this early time.
      3175e9c6
    • Kirill Smelkov's avatar
      amari.xlog: Flush each emitted line · 9c8da27f
      Kirill Smelkov authored
      xlog logging is kind of slow - usually it comes once per several seconds
      or once per minute. And without flushing many entries can remain sitting
      up in the file buffer in userspace without being conveyed to OS kernel.
      Which is not very convenient because in such situation we cannot make good
      use of tools like `tail -f`.
      
      Since flushing is relatively cheap operation - it is just one write
      syscall - let's do it after every emitted line. The write syscall does
      not force data to be synced to disk, so it should not slow things down,
      but make it convenient to have latest logs right away in the filesystem
      view.
      9c8da27f
    • Kirill Smelkov's avatar
      amari: Fix Conn rx timeout handling · 8f94b47d
      Kirill Smelkov authored
      Conn multiplexes many requests/responses over single WebSocket
      connection. To do so it organizes dedicated receive thread that
      continuously receives messages from underlying websocket connection and
      dispatches received replies back to threads that issued corresponding requests.
      
      An rx timeout in that receive thread is thus not something unexpected -
      it can happen e.g. if there is simply no requests sent. But I missed
      that in 61ad9032 (amari: Add functionality to interoperate with an
      Amarisoft LTE service via WebSocket) and implicitly did not ignored such
      global rx timeout. As the result `amari xlog` does not work properly if
      period of requests is greater than timeout value, for example:
      
              $ xamari xlog ws://localhost:9001 ue_get/30s
              {"meta": {"event": "start", "time": 1670588996.0623107, "generator": "xlog ws://localhost:9001 ue_get[]/30.0s"}}
              {"meta": {"event": "service attach", "time": 1670588996.1852894, "srv_name": "ENB", "srv_type": "ENB", "srv_version": "2022-12-01"}}
              {"message":"config_get", ...}
        note  {"message":"ue_get","ue_list":[],"message_id":2,"time":3045.323,"utc":1670588996.423}
        ----> {"meta": {"event": "service detach", "time": 1670589026.3569217, "srv_name": "ENB", "srv_type": "ENB", "srv_version": "2022-12-01", "reason": "timed out"}}
              {"meta": {"event": "service attach", "time": 1670589029.485363, "srv_name": "ENB", "srv_type": "ENB", "srv_version": "2022-12-01"}}
              {"message":"config_get", ...}
              {"message":"ue_get","ue_list":[],"message_id":2,"time":3078.606,"utc":1670589029.706}
              ...
      
      -> Fix it by ignoring global rx timeout.
      
      NOTE: we must also add manual handling of per-request timeout when
      waiting for corresponding reply. If we don't do that a situation where
      particular reply does not come back, but replies for other requests are
      coming back ok, will never be detected.
      
      Here is how fixed version works now:
      
              $ xamari xlog ws://localhost:9001 ue_get/30s
              {"meta": {"event": "start", "time": 1670589223.0339117, "generator": "xlog ws://localhost:9001 ue_get[]/30.0s"}}
              {"meta": {"event": "service attach", "time": 1670589223.1970558, "srv_name": "ENB", "srv_type": "ENB", "srv_version": "2022-12-01"}}
              {"message":"config_get", ...}
              {"message":"ue_get","ue_list":[],"message_id":2,"time":3272.292,"utc":1670589223.391}
              {"message":"ue_get","ue_list":[],"message_id":3,"time":3302.274,"utc":1670589253.373}
              {"message":"ue_get","ue_list":[],"message_id":4,"time":3332.266,"utc":1670589283.365}
              ...
      
      Note that ue_get messages are coming sequentially and there is no
      "service detach" event, that was artificially popping up due to wrong
      timeout handling.
      8f94b47d
  5. 02 Nov, 2022 1 commit
    • Kirill Smelkov's avatar
      amari.xlog: Begin emitting queries right after startup · 134f3a1e
      Kirill Smelkov authored
      For example with `stats/100s` stats query is emitted every 100 seconds,
      but currently it will be first emitted only after waiting for 100
      seconds after `xamari xlog` startup, or after reconnection to eNB if eNB
      itself was restarted. And this way it will be hard to interpret obtained
      numbers in relation to last 100s time interval.
      
      -> Start emitting queries right after reconnection / xlog start, so that
      we see the values as observed at the beginning, and can compare result
      of further queries to them.
      134f3a1e
  6. 19 Oct, 2022 7 commits
    • Kirill Smelkov's avatar
      amari.xlog: Initial draft · e0cc8a38
      Kirill Smelkov authored
      `xamari xlog` can be used to maintin extra log for an Amarisoft LTE
      service: in addition to native logs, xlog contains results of periodic
      queries done via WebSocket. For example with the following aguments
      
      	xamari xlog <wsuri>  stats/10s ue_get/3s erab_get/3s
      
      xlog will emit results of stats query - every 10 seconds, results from
      ue_get query - every 3 seconds, and similarly for erab_get query.
      
      The results are saved into xlog in JSON Lines format for easy future
      processing. For the reference below is a copy of corresponding help
      entries:
      
          xlte$ xamari help xlog
          Usage: xamari xlog [OPTIONS] <wsuri> <logspec>+
          Maintain extra log for a service.
      
          The service is queried periodically according to logspec and results are saved
          in JSON format to a file (see 'xamari help jsonlog').
      
          <wsuri> is URI (see 'xamari help websock') of an Amarisoft-service.
          <logspec> is specification of what to log. It has the following parts:
      
              <query>[<options>]/<period>
      
          The query specifies a message, that should be used to query service. For
          example "stats", "ue_get", "erab_get", ... Query part is mandatory.
      
          Options specifies additional flags for the query. Options part can be omitted.
      
          Period specifies periodicity of how often the service should be queried.
          Period is optional and defaults to 60 seconds.
      
          Example for <logspec>+:
      
              stats[samples,rf]/30s  ue_get[stats]  erab_get/10s  qos_flow_get
      
          Options:
      
              -h  --help            show this help
      
          xlte$ xamari help jsonlog
          Some commands produce logs with JSON entries. Such logs are organized with JSON
          Lines format(*) with each entry occupying one line.
      
          Logs in JSON Lines format are handy to exchange in between programs, and with
          corresponding tools, e.g. jq(+), they can be also displayed in human-readable
          form and inspected quickly.
      
          (*) https://jsonlines.org/
          (+) https://stedolan.github.io/jq/
      
      Example output:
      
      (xlte3.venv) kirr@deca:~/src/wendelin/xlte$ xamari xlog ws://localhost:9001 ue_get/3s erab_get/3s
      {"meta": {"event": "start", "time": "Wed, 19 Oct 2022 15:05:50 -0000", "generator": "xlog ws://localhost:9001 ue_get[]/3.0s erab_get[]/3.0s"}}
      {"meta": {"event": "service attach", "time": "Wed, 19 Oct 2022 15:05:51 -0000", "srv_name": "ENB", "srv_type": "ENB", "srv_version": "2022-09-16"}}
      {"message":"config_get","version":"2022-09-16","type":"ENB","name":"ENB","license_id":"d9a961c166d2d4b15249fc559cdec925efbbe942d73b143aff","license_user":"rapid.space","logs":{"layers":{"PHY":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"MAC":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"RLC":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"PDCP":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"RRC":{"level":"debug","max_size":1,"key":false,"crypto":false,"payload":false},"NAS":{"level":"debug","max_size":1,"key":false,"crypto":false,"payload":false},"S1AP":{"level":"debug","max_size":1,"key":false,"crypto":false,"payload":false},"NGAP":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"GTPU":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"X2AP":{"level":"debug","max_size":1,"key":false,"crypto":false,"payload":false},"XnAP":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"M2AP":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"LPPa":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"NRPPa":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false},"TRX":{"level":"error","max_size":0,"key":false,"crypto":false,"payload":false}},"bcch":false,"mib":false,"signal":false,"cch":false,"rep":false,"dci_size":false,"csi":false,"cell_meas":false,"count":8192,"rotate":250000000,"path":"/var/log/lte"},"tai":9610.092,"global_enb_id":{"plmn":"00101","enb_id_type":"macro","enb_id":107216,"enb_name":"enb1a2d0"},"cells":{"1":{"n_antenna_dl":2,"n_antenna_ul":2,"n_layer_dl":2,"n_layer_ul":1,"gain":0,"ul_disabled":false,"rf_port":0,"dl_qam":256,"ul_qam":64,"ecgi":{"plmn":"00101","eci":27447297},"n_id_cell":1,"n_rb_dl":100,"n_rb_ul":100,"dl_earfcn":38350,"ul_earfcn":38350,"band":39,"dl_freq":1890000000,"ul_freq":1890000000,"mode":"TDD","uldl_config":2,"sp_config":7,"prach_sequence_index":204,"dl_cyclic_prefix":"normal","ul_cyclic_prefix":"normal","prach_config_index":4,"prach_freq_offset":11,"delta_pucch_shift":2,"n_rb_cqi":1,"n_cs_an":0,"pucch_allocation":[{"type":"2/2a/2b","rbs":1,"n":6}],"pucch_ack_nack_start":11,"pucch_reserved_rbs":[0,0,22,0,0,0,0,22,0,0],"sr_resource_count":44,"cqi_resource_count":48,"srs_resources":{"offsets":16,"freqs":20,"total":640},"gbr":{"dl_limit":7123840,"ul_limit":1797120},"tac":1,"plmn_list":[{"plmn":"00101","reserved":false}]}},"rx_channels":[{"gain":9,"freq":1890,"port":0},{"gain":9,"freq":1890,"port":0}],"tx_channels":[{"gain":60,"freq":1890,"port":0},{"gain":60,"freq":1890,"port":0}],"rf_ports":[{"sample_rate":30720000}],"message_id":1,"time":9610.087}
      {"message":"ue_get","ue_list":[],"message_id":2,"time":9613.347}
      {"message":"erab_get","erab_list":[],"timestamp":9613704,"message_id":3,"time":9613.704}
      {"message":"ue_get","ue_list":[],"message_id":4,"time":9616.355}
      {"message":"erab_get","erab_list":[],"timestamp":9616618,"message_id":5,"time":9616.618}
      {"message":"ue_get","ue_list":[],"message_id":6,"time":9619.272}
      {"message":"erab_get","erab_list":[],"timestamp":9619547,"message_id":7,"time":9619.547}
      {"message":"ue_get","ue_list":[],"message_id":8,"time":9622.319}
      {"message":"erab_get","erab_list":[],"timestamp":9622588,"message_id":9,"time":9622.588}
      {"message":"ue_get","ue_list":[],"message_id":10,"time":9625.263}
      {"message":"erab_get","erab_list":[],"timestamp":9625663,"message_id":11,"time":9625.663}
      {"message":"ue_get","ue_list":[],"message_id":12,"time":9628.403}
      {"message":"erab_get","erab_list":[],"timestamp":9628725,"message_id":13,"time":9628.725}
      {"message":"ue_get","ue_list":[],"message_id":14,"time":9631.381}
      {"message":"erab_get","erab_list":[],"timestamp":9631798,"message_id":15,"time":9631.798}
      {"message":"ue_get","ue_list":[],"message_id":16,"time":9634.303}
      {"message":"erab_get","erab_list":[],"timestamp":9634596,"message_id":17,"time":9634.596}
      {"message":"ue_get","ue_list":[],"message_id":18,"time":9637.268}
      {"message":"erab_get","erab_list":[],"timestamp":9637568,"message_id":19,"time":9637.568}
      
          (below an UE registers with eNB)
      
      {"message":"ue_get","ue_list":[{"enb_ue_id":8,"mme_ue_id":107,"rnti":68,"cells":[{"cell_id":1}]}],"message_id":20,"time":9640.296}
      {"message":"erab_get","erab_list":[{"erab_id":5,"qci":9,"dl_total_bytes":0,"ul_total_bytes":0,"enb_ue_id":8}],"timestamp":9640831,"message_id":21,"time":9640.831}
      {"message":"ue_get","ue_list":[{"enb_ue_id":8,"mme_ue_id":107,"rnti":68,"cells":[{"cell_id":1}]}],"message_id":22,"time":9643.268}
      {"message":"erab_get","erab_list":[{"erab_id":5,"qci":9,"dl_total_bytes":0,"ul_total_bytes":0,"enb_ue_id":8}],"timestamp":9643555,"message_id":23,"time":9643.555}
      {"message":"ue_get","ue_list":[{"enb_ue_id":8,"mme_ue_id":107,"rnti":68,"cells":[{"cell_id":1}]}],"message_id":24,"time":9646.326}
      {"message":"erab_get","erab_list":[{"erab_id":5,"qci":9,"dl_total_bytes":0,"ul_total_bytes":0,"enb_ue_id":8}],"timestamp":9646850,"message_id":25,"time":9646.85}
      {"message":"ue_get","ue_list":[],"message_id":26,"time":9649.287}
      {"message":"erab_get","erab_list":[],"timestamp":9649646,"message_id":27,"time":9649.646}
      ^CTraceback (most recent call last):
        File "/home/kirr/src/wendelin/venv/xlte3.venv/bin/xamari", line 33, in <module>
          sys.exit(load_entry_point('xlte', 'console_scripts', 'xamari')())
        File "/home/kirr/src/wendelin/xlte/amari/xamari.py", line 130, in main
          return command_module.main(argv)
        File "/home/kirr/src/wendelin/xlte/amari/xlog.py", line 275, in main
          xlog(wsuri, logspecv)
        File "/home/kirr/src/wendelin/xlte/amari/xlog.py", line 108, in xlog
          xl.xlog1()
        File "/home/kirr/src/wendelin/venv/xlte3.venv/lib/python3.9/site-packages/decorator.py", line 232, in fun
          return caller(func, *(extras + args), **kw)
        File "/home/kirr/src/wendelin/venv/xlte3.venv/lib/python3.9/site-packages/golang/__init__.py", line 103, in _
          return f(*argv, **kw)
        File "/home/kirr/src/wendelin/xlte/amari/xlog.py", line 157, in xlog1
          xl._xlog1(conn)
        File "/home/kirr/src/wendelin/xlte/amari/xlog.py", line 208, in _xlog1
          time.sleep(δtsleep)
      KeyboardInterrupt
      e0cc8a38
    • Kirill Smelkov's avatar
      amari: Add functionality to interoperate with an Amarisoft LTE service via WebSocket. · 61ad9032
      Kirill Smelkov authored
      - amari.connect() connects to a service and returns Conn.
      - Conn can be used to issue requests, receive replies and (TODO) event
        notifications. Several requests could be issued simultaneously and
        handled in parallel.
      61ad9032
    • Kirill Smelkov's avatar
      xamari: New supplementary tool for managing Amarisoft LTE services. · 95cd8912
      Kirill Smelkov authored
      Only main driver framework here without actual subcommands.
      
      Based on similar main driver from Zodbtools:
      
      https://lab.nexedi.com/nexedi/zodbtools/blob/master/zodbtools/zodb.py
      95cd8912
    • Kirill Smelkov's avatar
      Top-level in-tree import redirector · 3c971d64
      Kirill Smelkov authored
      So that import xlte.abc resolves to xlte/abc.py
      
      Based on similar top-level import redirector from wendelin.core
      See nexedi/wendelin.core@e870781d for
      context and links in added code for further details.
      3c971d64
    • Kirill Smelkov's avatar
      Initial Administrivia · 3de908b0
      Kirill Smelkov authored
      Stub setup.py, README/CHANGELOG, .gitignore.
      3de908b0
    • Kirill Smelkov's avatar
      Licensing · ede5d48e
      Kirill Smelkov authored
      Follow standard GPL3 + wide exception for Free and Open-source software.
      See https://www.nexedi.com/licensing for details.
      ede5d48e
    • Kirill Smelkov's avatar
      Start of xlte.git · 12c09480
      Kirill Smelkov authored
      The project to implement assorted functionality related to LTE.
      
      In particular I intend for it to be the place where KPI-related
      functionality will live.
      12c09480