Commit 6a8f58c5 authored by Jérome Perrin's avatar Jérome Perrin

stack/erp5: remove httpd and use haproxy instead

Two main differences of haproxy are file format for certificates and logs.

HAProxy also uses certificates in PEM format, but it expect its own server
certificate and the key to be in the same file (although recent version seems
to accept separate files, we don't use this now) and the CRL and CA certificates
also all together in the same file.
We change to use the same file for certificate and key and for CA and CRL, in
the updater script we we build PEM files by containing all CA certificates and
all CRL together.
Also, since haproxy needs to be reloaded when certificate change, we run it in
master-worker mode, with a pid file so that we can signal it to reload.

For the logs, since haproxy does not log to file, we introduce a rsyslogd to
log to a file. The log format is same as with httpd, except that timing are not
in microseconds but in milliseconds - this did not seem to be configurable.
This is a problem for apachedex reports on log, for that we plan to use an
updated version of apachedex with support for `%{ms}T` for durations.

HAProxy is configured with same timeouts, except:
 - "connect" timeout has been increased a bit (from 5 to 10s), because the
   comment "The connection should be immediate on LAN" was no longer true, now
   that haproxy is accessed from frontend.
 - the server entries for testrunner are a very long timeout (8h) because some
   ERP5 functional tests exceeed the 305s timeout.

The SSL configuration is with current "modern" config from https://ssl-config.mozilla.org/

Tests have been modified a bit, because haproxy uses HTTP/2.0 and not 1.1
like httpd was doing several haproxy features (keep alive and gzip
compression) are only available when backend uses HTTP/1.1, so we adjusted
tests to use a 1.1 backend.

There was also differences with logs, because of the time being in milliseconds.

TestPublishedURLIsReachableMixin._checkERP5IsReachable was also updated, it
was working by chance because when accessed behind httpd->haproxy->zope, zope
was producing a redirect URL that was the URL of haproxy, which could be
resolved by chance. This test was updated to access zope with a path that
contains VirtualHostMonster magic, as the shared frontend ( with "zope" software
type) is supposed to set.

This should hopefuly solve the "502 Proxy Error" that we are observing with httpd.
parent c82ad114
......@@ -48,10 +48,6 @@ def setUpModule():
class ERP5InstanceTestCase(SlapOSInstanceTestCase):
"""ERP5 base test case
"""
# ERP5 instanciation needs to run several times before being ready, as
# the root instance request more instances.
instance_max_retry = 7 # XXX how many times ?
def getRootPartitionConnectionParameterDict(self):
"""Return the output paramters from the root partition"""
return json.loads(
......
......@@ -50,7 +50,31 @@ class EchoHTTPServer(ManagedHTTPServer):
self.end_headers()
self.wfile.write(response)
log_message = logging.getLogger(__name__ + '.HeaderEchoHandler').info
log_message = logging.getLogger(__name__ + '.EchoHTTPServer').info
class EchoHTTP11Server(ManagedHTTPServer):
"""An HTTP/1.1 Server responding with the request path and incoming headers,
encoded in json.
"""
class RequestHandler(BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.1'
def do_GET(self):
# type: () -> None
self.send_response(200)
self.send_header("Content-Type", "application/json")
response = json.dumps(
{
'Path': self.path,
'Incoming Headers': self.headers.dict
},
indent=2,
)
self.send_header("Content-Length", len(response))
self.end_headers()
self.wfile.write(response)
log_message = logging.getLogger(__name__ + '.EchoHTTP11Server').info
class CaucaseService(ManagedResource):
......@@ -105,6 +129,7 @@ class CaucaseService(ManagedResource):
shutil.rmtree(self.directory)
class BalancerTestCase(ERP5InstanceTestCase):
@classmethod
......@@ -120,7 +145,7 @@ class BalancerTestCase(ERP5InstanceTestCase):
# XXX what is this ? should probably not be needed here
'name': cls.__name__,
'monitor-passwd': 'secret',
'apachedex-configuration': '--erp5-base +erp5 .*/VirtualHostRoot/erp5(/|\\?|$) --base +other / --skip-user-agent Zabbix --error-detail --js-embed --quiet',
'apachedex-configuration': '--erp5-base +erp5 .*/VirtualHostRoot/erp5(/|\\?|$) --base +other / --skip-user-agent Zabbix --error-detail --js-embed --quiet --logformat=\\"%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i" %{ms}T\\"',
'apachedex-promise-threshold': 100,
'haproxy-server-check-path': '/',
'zope-family-dict': {
......@@ -184,7 +209,7 @@ class TestAccessLog(BalancerTestCase, CrontabMixin):
access_line = access_log_file.read().splitlines()[-1]
self.assertIn('/url_path', access_line)
# last \d is the request time in micro seconds, since this SlowHTTPServer
# last \d is the request time in milli seconds, since this SlowHTTPServer
# sleeps for 2 seconds, it should take between 2 and 3 seconds to process
# the request - but our test machines can be slow sometimes, so we tolerate
# it can take up to 20 seconds.
......@@ -195,8 +220,8 @@ class TestAccessLog(BalancerTestCase, CrontabMixin):
self.assertTrue(match)
assert match
request_time = int(match.groups()[-1])
self.assertGreater(request_time, 2 * 1000 * 1000)
self.assertLess(request_time, 20 * 1000 * 1000)
self.assertGreater(request_time, 2 * 1000)
self.assertLess(request_time, 20 * 1000)
def test_access_log_apachedex_report(self):
# type: () -> None
......@@ -415,15 +440,34 @@ class TestTestRunnerEntryPoints(BalancerTestCase):
class TestHTTP(BalancerTestCase):
"""Check HTTP protocol
"""Check HTTP protocol with a HTTP/1.1 backend
"""
@classmethod
def _getInstanceParameterDict(cls):
# type: () -> Dict
parameter_dict = super(TestHTTP, cls)._getInstanceParameterDict()
# use a HTTP/1.1 server instead
parameter_dict['dummy_http_server'] = [[cls.getManagedResource("HTTP/1.1 Server", EchoHTTP11Server).netloc, 1, False]]
return parameter_dict
__partition_reference__ = 'h'
def test_http_version(self):
# type: () -> None
# https://stackoverflow.com/questions/37012486/python-3-x-how-to-get-http-version-using-requests-library/37012810
self.assertEqual(
requests.get(self.default_balancer_url, verify=False).raw.version, 11)
subprocess.check_output([
'curl',
'--silent',
'--show-error',
'--output',
'/dev/null',
'--insecure',
'--write-out',
'%{http_version}',
self.default_balancer_url,
]),
'2',
)
def test_keep_alive(self):
# type: () -> None
......@@ -451,24 +495,27 @@ class TestHTTP(BalancerTestCase):
class ContentTypeHTTPServer(ManagedHTTPServer):
"""An HTTP Server which reply with content type from path.
"""An HTTP/1.1 Server which reply with content type from path.
For example when requested http://host/text/plain it will reply
with Content-Type: text/plain header.
The body is always "OK"
"""
class RequestHandler(BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.1'
def do_GET(self):
# type: () -> None
self.send_response(200)
if self.path == '/':
self.send_header("Content-Length", 0)
return self.end_headers()
content_type = self.path[1:]
body = "OK"
self.send_header("Content-Type", content_type)
self.send_header("Content-Length", len(body))
self.end_headers()
self.wfile.write("OK")
self.wfile.write(body)
log_message = logging.getLogger(__name__ + '.ContentTypeHTTPServer').info
......@@ -510,9 +557,9 @@ class TestContentEncoding(BalancerTestCase):
resp = requests.get(urlparse.urljoin(self.default_balancer_url, content_type), verify=False)
self.assertEqual(resp.headers['Content-Type'], content_type)
self.assertEqual(
resp.headers['Content-Encoding'],
resp.headers.get('Content-Encoding'),
'gzip',
'%s uses wrong encoding: %s' % (content_type, resp.headers['Content-Encoding']))
'%s uses wrong encoding: %s' % (content_type, resp.headers.get('Content-Encoding')))
self.assertEqual(resp.text, 'OK')
def test_no_gzip_encoding(self):
......
......@@ -43,23 +43,44 @@ setUpModule # pyflakes
class TestPublishedURLIsReachableMixin(object):
"""Mixin that checks that default page of ERP5 is reachable.
"""
def _checkERP5IsReachable(self, url):
def _checkERP5IsReachable(self, base_url, site_id, verify):
# We access ERP5 trough a "virtual host", which should make
# ERP5 produce URLs using https://virtual-host-name:1234/virtual_host_root
# as base.
virtual_host_url = urlparse.urljoin(
base_url,
'/VirtualHostBase/https/virtual-host-name:1234/{}/VirtualHostRoot/_vh_virtual_host_root/'
.format(site_id))
# What happens is that instanciation just create the services, but does not
# wait for ERP5 to be initialized. When this test run ERP5 instance is
# instanciated, but zope is still busy creating the site and haproxy replies
# with 503 Service Unavailable when zope is not started yet, with 404 when
# erp5 site is not created, with 500 when mysql is not yet reachable, so we
# retry in a loop until we get a succesful response.
for i in range(1, 60):
r = requests.get(url, verify=False) # XXX can we get CA from caucase already ?
if r.status_code != requests.codes.ok:
delay = i * 2
self.logger.warn("ERP5 was not available, sleeping for %ds and retrying", delay)
time.sleep(delay)
continue
r.raise_for_status()
break
# configure this requests session to retry.
# XXX we should probably add a promise instead
session = requests.Session()
session.mount(
base_url,
requests.adapters.HTTPAdapter(
max_retries=requests.packages.urllib3.util.retry.Retry(
total=60,
backoff_factor=.5,
status_forcelist=(404, 500, 503))))
r = session.get(virtual_host_url, verify=verify, allow_redirects=False)
self.assertEqual(r.status_code, requests.codes.found)
# access on / are redirected to login form, with virtual host preserved
self.assertEqual(r.headers.get('location'), 'https://virtual-host-name:1234/virtual_host_root/login_form')
# login page can be rendered and contain the text "ERP5"
r = session.get(
urlparse.urljoin(base_url, '{}/login_form'.format(site_id)),
verify=verify,
allow_redirects=False,
)
self.assertEqual(r.status_code, requests.codes.ok)
self.assertIn("ERP5", r.text)
def test_published_family_default_v6_is_reachable(self):
......@@ -67,14 +88,20 @@ class TestPublishedURLIsReachableMixin(object):
"""
param_dict = self.getRootPartitionConnectionParameterDict()
self._checkERP5IsReachable(
urlparse.urljoin(param_dict['family-default-v6'], param_dict['site-id']))
param_dict['family-default-v6'],
param_dict['site-id'],
verify=False,
)
def test_published_family_default_v4_is_reachable(self):
"""Tests the IPv4 URL published by the root partition is reachable.
"""
param_dict = self.getRootPartitionConnectionParameterDict()
self._checkERP5IsReachable(
urlparse.urljoin(param_dict['family-default'], param_dict['site-id']))
param_dict['family-default'],
param_dict['site-id'],
verify=False,
)
class TestDefaultParameters(ERP5InstanceTestCase, TestPublishedURLIsReachableMixin):
......@@ -93,7 +120,7 @@ class TestMedusa(ERP5InstanceTestCase, TestPublishedURLIsReachableMixin):
return {'_': json.dumps({'wsgi': False})}
class TestApacheBalancerPorts(ERP5InstanceTestCase):
class TestBalancerPorts(ERP5InstanceTestCase):
"""Instanciate with two zope families, this should create for each family:
- a balancer entry point with corresponding haproxy
- a balancer entry point for test runner
......@@ -151,33 +178,22 @@ class TestApacheBalancerPorts(ERP5InstanceTestCase):
3 + 5,
len([p for p in all_process_info if p['name'].startswith('zope-')]))
def test_apache_listen(self):
# We have 2 families, apache should listen to a total of 3 ports per family
def test_haproxy_listen(self):
# We have 2 families, haproxy should listen to a total of 3 ports per family
# normal access on ipv4 and ipv6 and test runner access on ipv4 only
with self.slap.instance_supervisor_rpc as supervisor:
all_process_info = supervisor.getAllProcessInfo()
process_info, = [p for p in all_process_info if p['name'] == 'apache']
apache_process = psutil.Process(process_info['pid'])
process_info, = [p for p in all_process_info if p['name'].startswith('haproxy-')]
haproxy_master_process = psutil.Process(process_info['pid'])
haproxy_worker_process, = haproxy_master_process.children()
self.assertEqual(
sorted([socket.AF_INET] * 4 + [socket.AF_INET6] * 2),
sorted([
c.family
for c in apache_process.connections()
for c in haproxy_worker_process.connections()
if c.status == 'LISTEN'
]))
def test_haproxy_listen(self):
# There is one haproxy per family
with self.slap.instance_supervisor_rpc as supervisor:
all_process_info = supervisor.getAllProcessInfo()
process_info, = [
p for p in all_process_info if p['name'].startswith('haproxy-')
]
haproxy_process = psutil.Process(process_info['pid'])
self.assertEqual([socket.AF_INET, socket.AF_INET], [
c.family for c in haproxy_process.connections() if c.status == 'LISTEN'
])
class TestDisableTestRunner(ERP5InstanceTestCase, TestPublishedURLIsReachableMixin):
"""Test ERP5 can be instanciated without test runner.
......@@ -199,20 +215,22 @@ class TestDisableTestRunner(ERP5InstanceTestCase, TestPublishedURLIsReachableMix
self.assertNotIn('runUnitTest', bin_programs)
self.assertNotIn('runTestSuite', bin_programs)
def test_no_apache_testrunner_port(self):
# Apache only listen on two ports, there is no apache ports allocated for test runner
def test_no_haproxy_testrunner_port(self):
# Haproxy only listen on two ports, there is no haproxy ports allocated for test runner
with self.slap.instance_supervisor_rpc as supervisor:
all_process_info = supervisor.getAllProcessInfo()
process_info, = [p for p in all_process_info if p['name'] == 'apache']
apache_process = psutil.Process(process_info['pid'])
process_info, = [p for p in all_process_info if p['name'].startswith('haproxy')]
haproxy_master_process = psutil.Process(process_info['pid'])
haproxy_worker_process, = haproxy_master_process.children()
self.assertEqual(
sorted([socket.AF_INET, socket.AF_INET6]),
sorted(
c.family
for c in apache_process.connections()
for c in haproxy_worker_process.connections()
if c.status == 'LISTEN'
))
class TestZopeNodeParameterOverride(ERP5InstanceTestCase, TestPublishedURLIsReachableMixin):
"""Test override zope node parameters
"""
......
......@@ -11,6 +11,7 @@ extends =
../../component/gzip/buildout.cfg
../../component/xz-utils/buildout.cfg
../../component/haproxy/buildout.cfg
../../component/rsyslogd/buildout.cfg
../../component/findutils/buildout.cfg
../../component/librsvg/buildout.cfg
../../component/imagemagick/buildout.cfg
......@@ -179,6 +180,7 @@ context =
key gzip_location gzip:location
key xz_utils_location xz-utils:location
key haproxy_location haproxy:location
key rsyslogd_location rsyslogd:location
key instance_common_cfg instance-common:rendered
key jsl_location jsl:location
key jupyter_enable_default erp5-defaults:jupyter-enable-default
......@@ -208,6 +210,7 @@ context =
key template_balancer template-balancer:target
key template_erp5 template-erp5:target
key template_haproxy_cfg template-haproxy-cfg:target
key template_rsyslogd_cfg template-rsyslogd-cfg:target
key template_jupyter_cfg instance-jupyter-notebook:rendered
key template_kumofs template-kumofs:target
key template_mariadb template-mariadb:target
......@@ -273,6 +276,9 @@ fontconfig-includes =
[template-haproxy-cfg]
<= download-base
[template-rsyslogd-cfg]
<= download-base
[erp5-bin]
<= erp5
repository = https://lab.nexedi.com/nexedi/erp5-bin.git
......
......@@ -70,7 +70,7 @@ md5sum = cc19560b9400cecbd23064d55c501eec
[template]
filename = instance.cfg.in
md5sum = 5c5250112b87a3937f939028f9594b85
md5sum = f16326000790ce18bb0f9d275cbdcd84
[monitor-template-dummy]
filename = dummy.cfg
......@@ -78,7 +78,7 @@ md5sum = 68b329da9893e34099c7d8ad5cb9c940
[template-erp5]
filename = instance-erp5.cfg.in
md5sum = 0920a53b10d3811a5f49930adffb62d8
md5sum = 6fdeb7f59d9f06b638cf7c81a4c38560
[template-zeo]
filename = instance-zeo.cfg.in
......@@ -90,8 +90,12 @@ md5sum = 2f3ddd328ac1c375e483ecb2ef5ffb57
[template-balancer]
filename = instance-balancer.cfg.in
md5sum = 4ba93d28d93bd066d5d19f4f74fc13d7
md5sum = 02cf39b7f0dd387f08fe73e1b6cbd011
[template-haproxy-cfg]
filename = haproxy.cfg.in
md5sum = fec6a312e4ef84b02837742992aaf495
md5sum = 8de18a61607bd66341a44b95640d293f
[template-rsyslogd-cfg]
filename = rsyslogd.cfg.in
md5sum = 7030e42b50e03f24e036b7785bd6159f
{# This file configures haproxy to redirect requests from ports to specific urls.
# It provides TLS support for server and optionnaly for client.
#
# All parameters are given through the `parameter_dict` variable, see the
# list entries :
#
# parameter_dict = {
# # Path of the PID file. HAProxy will write its own PID to this file
# # Sending USR2 signal to this pid will cause haproxy to reload
# # its configuration.
# "pidfile": "<file_path>",
#
# # AF_UNIX socket for logs. Syslog must be listening on this socket.
# "log-socket": "<file_path>",
#
# # AF_UNIX socket for statistics and control.
# # Haproxy will listen on this socket.
# "stats-socket": "<file_path>",
#
# # IPv4 to listen on
# # All backends from `backend-dict` will listen on this IP.
# "ipv4": "0.0.0.0",
#
# # IPv6 to listen on
# # All backends from `backend-dict` will listen on this IP.
# "ipv6": "::1",
#
# # Certificate and key in PEM format. All ports will serve TLS using
# # this certificate.
# "cert": "<file_path>",
#
# # CA to verify client certificates in PEM format.
# # If set, client certificates will be verified with these CAs.
# # If not set, client certificates are not verified.
# "ca-cert": "<file_path>",
#
# # An optional CRL in PEM format (the file can contain multiple CRL)
# # This is required if ca-cert is passed.
# "crl": "<file_path>",
#
# # Path to use for HTTP health check on backends from `backend-dict`.
# "server-check-path": "/",
#
# # The mapping of backends, keyed by family name
# "backend-dict": {
# "family-secure": {
# ( 8000, # port int
# 'https', # proto str
# True, # ssl_required bool
# [ # backends
# '10.0.0.10:8001', # netloc str
# 1, # max_connection_count int
# False, # is_web_dav bool
# ],
# ),
# },
# "family-default": {
# ( 8002, # port int
# 'https', # proto str
# False, # ssl_required bool
# [ # backends
# '10.0.0.10:8003', # netloc str
# 1, # max_connection_count int
# False, # is_web_dav bool
# ],
# ),
# },
#
# # The mapping of zope paths.
# # This is a Zope specific feature.
# # `enable_authentication` has same meaning as for `backend-list`.
# "zope-virtualhost-monster-backend-dict": {
# # {(ip, port): ( enable_authentication, {frontend_path: ( internal_url ) }, ) }
# ('[::1]', 8004): (
# True, {
# 'zope-1': 'http://10.0.0.10:8001',
# 'zope-2': 'http://10.0.0.10:8002',
# },
# ),
# },
# }
#
# This sample of `parameter_dict` will make haproxy listening to :
# From to `backend-list`:
# For "family-secure":
# - 0.0.0.0:8000 redirecting internaly to http://10.0.0.10:8001 and
# - [::1]:8000 redirecting internaly to http://10.0.0.10:8001
# only accepting requests from clients providing a verified TLS certificate
# emitted by a CA from `ca-cert` and not revoked in `crl`.
# For "family-default":
# - 0.0.0.0:8002 redirecting internaly to http://10.0.0.10:8003
# - [::1]:8002 redirecting internaly to http://10.0.0.10:8003
# accepting requests from any client.
#
# For both families, X-Forwarded-For header will be stripped unless
# client presents a certificate that can be verified with `ca-cert` and `crl`.
#
# From zope-virtualhost-monster-backend-dict`:
# - [::1]:8004 with some path based rewrite-rules redirecting to:
# * http://10.0.0.10/8001 when path matches /zope-1(.*)
# * http://10.0.0.10/8002 when path matches /zope-2(.*)
# with some VirtualHostMonster rewrite rules so zope writes URLs with
# [::1]:8004 as server name.
# For more details, refer to
# https://docs.zope.org/zope2/zope2book/VirtualHosting.html#using-virtualhostroot-and-virtualhostbase-together
-#}
{% set server_check_path = parameter_dict['server-check-path'] -%}
global
maxconn 4096
stats socket {{ parameter_dict['socket-path'] }} level admin
master-worker
pidfile {{ parameter_dict['pidfile'] }}
# SSL configuration was generated with mozilla SSL Configuration Generator
# generated 2020-10-28, Mozilla Guideline v5.6, HAProxy 2.1, OpenSSL 1.1.1g, modern configuration
# https://ssl-config.mozilla.org/#server=haproxy&version=2.1&config=modern&openssl=1.1.1g&guideline=5.6
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tlsv12 no-tls-tickets
ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tlsv12 no-tls-tickets
stats socket {{ parameter_dict['stats-socket'] }} level admin
defaults
mode http
retries 1
option redispatch
maxconn 2000
cookie SERVERID rewrite
balance roundrobin
stats uri /haproxy
stats realm Global\ statistics
# it is useless to have timeout much bigger than the one of apache.
# By default apache use 300s, so we set slightly more in order to
# make sure that apache will first stop the connection.
timeout server 305s
# Stop waiting in queue for a zope to become available.
# If no zope can be reached after one minute, consider the request will
# never succeed.
timeout connect 10s
timeout queue 60s
# The connection should be immediate on LAN,
# so we should not set more than 5 seconds, and it could be already too much
timeout connect 5s
# As requested in haproxy doc, make this "at least equal to timeout server".
timeout server 305s
timeout client 305s
# Use "option httpclose" to not preserve client & server persistent connections
# while handling every incoming request individually, dispatching them one after
# another to servers, in HTTP close mode. This is really needed when haproxy
# is configured with maxconn to 1, without this option browsers are unable
# to render a page
option httpclose
{% for name, (port, backend_list) in sorted(parameter_dict['backend-dict'].iteritems()) -%}
listen {{ name }}
bind {{ parameter_dict['ip'] }}:{{ port }}
option http-server-close
# compress some content types
compression algo gzip
compression type application/font-woff application/font-woff2 application/hal+json application/javascript application/json application/rss+xml application/wasm application/x-font-opentype application/x-font-ttf application/x-javascript application/xml image/svg+xml text/cache-manifest text/css text/html text/javascript text/plain text/xml
log {{ parameter_dict['log-socket'] }} local0 info
{% set bind_ssl_crt = 'ssl crt ' ~ parameter_dict['cert'] ~ ' alpn h2,http/1.1' %}
{% for name, (port, _, certificate_authentication, backend_list) in sorted(parameter_dict['backend-dict'].iteritems()) -%}
listen family_{{ name }}
{%- if parameter_dict.get('ca-cert') -%}
{%- set ssl_auth = ' ca-file ' ~ parameter_dict['ca-cert'] ~ ' verify' ~ ( ' required' if certificate_authentication else ' optional' ) ~ ' crl-file ' ~ parameter_dict['crl'] %}
{%- else %}
{%- set ssl_auth = '' %}
{%- endif %}
bind {{ parameter_dict['ipv4'] }}:{{ port }} {{ bind_ssl_crt }} {{ ssl_auth }}
bind {{ parameter_dict['ipv6'] }}:{{ port }} {{ bind_ssl_crt }} {{ ssl_auth }}
cookie SERVERID rewrite
http-request set-header X-Balancer-Current-Cookie SERVERID
# remove X-Forwarded-For unless client presented a verified certificate
acl client_cert_verified ssl_c_used ssl_c_verify 0
http-request del-header X-Forwarded-For unless client_cert_verified
# set Remote-User if client presented a verified certificate
http-request del-header Remote-User
http-request set-header Remote-User %{+Q}[ssl_c_s_dn(cn)] if client_cert_verified
# logs
capture request header Referer len 512
capture request header User-Agent len 512
log-format "%{+Q}o %{-Q}ci - - [%trg] %r %ST %B %{+Q}[capture.req.hdr(0)] %{+Q}[capture.req.hdr(1)] %Tt"
  • I've run an APacheDEX report on these logs, and %Tt is probably not what we want. From haproxy's doc:

    Timings events in HTTP mode:
    
                     first request               2nd request
          |<-------------------------------->|<-------------- ...
          t         tr                       t    tr ...
       ---|----|----|----|----|----|----|----|----|--
          : Th   Ti   TR   Tw   Tc   Tr   Td : Ti   ...
          :<---- Tq ---->:                   :
          :<-------------- Tt -------------->:
                    :<--------- Ta --------->:

    Th is connection acceptation, and Ti is time idle on the connection since any previous event. In the context of connection reuse in HTTP/1.1 (and certainly 2.0 too), the latter means that we are including the time since last response was sent.

    As a result, the logged time is not representative of corresponding request's duration, but includes some random amount of "dead air".

    I think either %Tr or %Ta would be closer to what we want: TR is from first to last request bytes, Tw is waiting for a server slot, Tr is a bit unclear as to when it starts (the graph above does not match the text explanation) but it ends at the last response header byte, and Td is even less clear in the context of connection reuse but it should correspond to the time needed for the client to acknowledge all data (and acknowledge connection closure if applicable). The difference between both being apparently whether we want to include client's acknowledgment of sent data in the measure and I believe there is no consensus on this topic.

  • /cc @luke as I know you worked/are working on apache-style logs for haproxy

  • Ah thanks, that's right %Tt is not what we want. Isn't %Ta the closest from "time that client had to wait" ? In my understanding this is what we are interested in measuring

  • I have %Ta as an experiment at the moment, and at least it seems to match very closely the reports I can generate on frontend logs. So it looks like a better choice, yes. In my undertsanding this it also what apache2 was measuring as well (but I still have unexpected differences, not sure if these are real time differences or an artefact of how the times are measured exactly).

    I am not yet excluding %Tr: I tink relying on it could produce cleaner reports, and all it removes should be the effect of bad bandwidth/high-latency in the connection between the users and the server... I'm not yet sure what I would prefer to measure exactly, as currently the max-times are just garbage (ex: always at 300s because of some outliers, probably lost connection during response, ...).

  • @vpelletier and I discussed this a bit, we made be8c2a39 to change ERP5 to use %Ta

  • I forgot to mention it in the commit message, but there's a promise checking apachedex score is above a threshold and I was seeing this promise fail most of the time:

    2021-11-05 17:28:39 slapos[12654] INFO Error with promises for the following partitions:
    2021-11-05 17:28:39 slapos[12654] INFO   slappart7[balancer]: b'Promise \'check-apachedex-result.py\' failed with output: ERROR \'"/srv/slapgrid/slappart3/srv/runner/software/cc0326f0dcb093f56c01291c300c8481/bin/check-apachedex-result" --apachedex_path "/srv/slapgrid/slappart3/srv/runner/instance/slappart7/srv/monitor/private/apachedex" --status_file /srv/slapgrid/slappart3/srv/runner/instance/slappart7/srv/monitor/private/apachedex.report.json --threshold "70"\' run with failure, output: \'Score too low: 0% - Threshold: 70.0%\\n\''

    this was also probably a consequence of not reporting the request times properly.

    Also, this is with slapos.core on python3 and having the promise error message as bytes seems like a bug to me, we could try to decode it, maybe with repr for errors (cc @xavier_thompson )

Please register or sign in to reply
{% set has_webdav = [] -%}
{% for address, connection_count, webdav in backend_list -%}
{% if webdav %}{% do has_webdav.append(None) %}{% endif -%}
{% set server_name = name ~ '-' ~ loop.index0 -%}
{% set server_name = name ~ '-' ~ loop.index0 %}
server {{ server_name }} {{ address }} cookie {{ server_name }} check inter 3s rise 1 fall 2 maxqueue 5 maxconn {{ connection_count }}
{% endfor -%}
{%- endfor -%}
{%- if not has_webdav and server_check_path %}
option httpchk GET {{ server_check_path }}
{% endif -%}
{%- endif %}
{% endfor %}
{% for (ip, port), (_, backend_dict) in sorted(parameter_dict['zope-virtualhost-monster-backend-dict'].iteritems()) -%}
{% set group_name = 'testrunner_' ~ loop.index0 %}
frontend frontend_{{ group_name }}
bind {{ ip }}:{{ port }} {{ bind_ssl_crt }}
timeout client 8h
# logs
capture request header Referer len 512
capture request header User-Agent len 512
log-format "%{+Q}o %{-Q}ci - - [%trg] %r %ST %B %{+Q}[capture.req.hdr(0)] %{+Q}[capture.req.hdr(1)] %Tt"
{% for name in sorted(backend_dict.keys()) %}
use_backend backend_{{ group_name }}_{{ name }} if { path -m beg /{{ name }} }
{%- endfor %}
{% for name, url in sorted(backend_dict.items()) %}
backend backend_{{ group_name }}_{{ name }}
http-request replace-path ^/{{ name }}(.*) /VirtualHostBase/https/{{ ip }}:{{ port }}/VirtualHostRoot/_vh_{{ name }}\1
timeout server 8h
server {{ name }} {{ urlparse.urlparse(url).netloc }}
{%- endfor %}
{% endfor %}
This diff is collapsed.
......@@ -338,7 +338,7 @@ config-backend-path-dict = {{ dumps(zope_backend_path_dict) }}
config-ssl-authentication-dict = {{ dumps(ssl_authentication_dict) }}
config-apachedex-promise-threshold = {{ dumps(monitor_dict.get('apachedex-promise-threshold', 70)) }}
config-apachedex-configuration = {{ dumps(monitor_dict.get('apachedex-configuration',
'--erp5-base +erp5 .*/VirtualHostRoot/erp5(/|\\?|$) --base +other / --skip-user-agent Zabbix --error-detail --js-embed --quiet')) }}
'--erp5-base +erp5 .*/VirtualHostRoot/erp5(/|\\?|$) --base +other / --skip-user-agent Zabbix --error-detail --js-embed --quiet --logformat=\'%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i" %{ms}T\'')) }}
[request-frontend-base]
{% if has_frontend -%}
......
......@@ -56,13 +56,16 @@ openssl-location = {{ openssl_location }}
[dynamic-template-balancer-parameters]
<= default-dynamic-template-parameters
apache = {{ apache_location }}
openssl = {{ openssl_location }}
haproxy = {{ haproxy_location }}
rsyslogd = {{ rsyslogd_location }}
apachedex-location = {{ bin_directory }}/apachedex
run-apachedex-location = {{ bin_directory }}/runApacheDex
promise-check-apachedex-result = {{ bin_directory }}/check-apachedex-result
template-haproxy-cfg = {{ template_haproxy_cfg }}
template-rsyslogd-cfg = {{ template_rsyslogd_cfg }}
# XXX: only used in software/slapos-master:
apache = {{ apache_location }}
template-apache-conf = {{ template_apache_conf }}
[dynamic-template-balancer]
......
module(
load="imuxsock"
SysSock.Name="{{ parameter_dict['log-socket'] }}")
# Just simply output the raw line without any additional information, as
# haproxy emits enough information by itself
# Also cut out first empty space in msg, which is related to rsyslogd
# internal and end up cutting on 8k, as it's default of $MaxMessageSize
template(name="rawoutput" type="string" string="%msg:2:8192%\n")
$ActionFileDefaultTemplate rawoutput
$FileCreateMode 0600
$DirCreateMode 0700
$Umask 0022
$WorkDirectory {{ parameter_dict['spool-directory'] }}
local0.=info {{ parameter_dict['access-log-file'] }}
local0.warning {{ parameter_dict['error-log-file'] }}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment