software/erp5: Expose mariadb max-connections.
Auto-compute a value suitable for the number of requested Zope processes and threads for default ERP5 needs.
-
Hello @vpelletier,
This looks like related to recent ERP5-MASTER generic test regression with error such as "OperationalError: (1040, 'Too many connections')" . Only one other commit was part of regression : 896cd97c . But this commit is probably not related.
I see here that it looks like you kept the default 1000, but since tests are failing, there must be an issue somewhere.
/cc @nexedi (to have information on the regression)
-
Owner
There's no such default value 1000 commit. I don't understand the purpose of
.get(..., 1000)
in:max-connection-count = {{ dumps(slapparameter_dict.get('max-connection-count', 1000)) }}
(rather than simply
[....]
, since the value is always there)- Do we want to support direct instanciation of mariadb software type ?
- Is it a problem if the instanciation of the mariadb partition fails temporarily because the value is not propagated yet from the root ?
-
Owner
I see here that it looks like you kept the default 1000, but since tests are failing, there must be an issue somewhere.
The default does not really matter in a "normal" instance tree, as the value is computed from requested zope partitions. How many zopes are requested in test setups ? With how many threads ?
A possible quick fix would be a
min(computed_value, 1000)
, but I would like to understand how our tests initiate more than 50 connections to mariadb (assuming 0 zopes are requested): even if we count 1 test thread, 1 zope thread (for UI tests), and all connections being used, we should not exceed 8 connections.- Do we want to support direct instanciation of mariadb software type ?
I would like to, yes. This is the intent behind splitting the input schemas: it allows declaring the software types in the base schema and link to their respective input and output schemas. I did not push further in this direction yet, so only the default software type is referenced.
-
Owner
A possible quick fix would be a
min(computed_value, 1000)
For reference, here are the fixes I thought of, by rough decreasing preference order:
- fix any easy connection leak in unittests
- provide a custom connection count parameter to test nodes if there is an identified but hard connection leak in unittests, so that we get time to eventually fix them
- increase default connection count offset if there are solid reasons to why we need so many connections, and hopefully the passing value is low enough (<=100)
- put back such non-linearity in the formula (
min(computed_value, 1000)
)
-
Owner
I could reproduce the issue with
testERP5Security
: this file has 15 test classes, each one gets its ownERP5Site
instance, each one with its own 4 SQL connectors. So as test progresses, more and more connections are established, and never deleted nor disconnected. In addition to these, for some reason each connection is established twice: once when creating the instance, and once more while running the test. So while running the 9th class, it exceeded 70 connections and test fails.Here are the class counts for the tests which regressed because of this change, which seems to confirm this is the issue (taken from latest result):
./product/ERP5/tests/testAccounting.py:11 ./product/ERP5Security/tests/testERP5Security.py:15 ./product/ERP5Form/tests/testFields.py:12 ./product/ERP5/tests/testTradeCondition.py:16
Some other tests have over 8 classes, so maybe there is something else allowing these to pass (no all classes get run in those tests ? or not in the same process ?).
I'm looking at how this can be fixed in unit test code, by closing connections in
tearDown
. -
mentioned in merge request erp5!911 (closed)
-
Owner
Issue should be fixed by latest push in ERP5 repository.
-
Owner
So the first "fix any easy connection leak in unittests" was done in erp5@5abb074d but we still observe sometimes some "Too many connections" problems on test nodes sometimes, often testBusinessTemplate is failing, but also sometimes some functional tests like this testPortalContributionsToolNewFile.
The configuration of
ERP5-MASTER
test suite is :{ "mariadb": { "relaxed-writes": true, "mariadb-relaxed-writes": true, "test-database-amount": 30 }, "cloudooo-url": "https://.....net" }
which gives use 70 max connections: 1 partition * 4 zserver_threads * 5 connections + 50 more.
We are running 3
runUnitTest
processes, some of them are using more connections (withgetExtraSqlConnectionStringList
), some or them are spawning other zope nodes (using--activity_node=
like conflict resolution test is doing). On the test instance, that zope is started, even though it is not really accessed except by haproxy and some monitoring promises, it probably open a few connections. Maybe also some mysql monitoring and other scripts are consuming connections.Even in the worst case, I'm not sure how this adds up to 70.
There's this
test-database-amount
parameter that we are using to allocate some extra databases ( forgetExtraSqlConnectionStringList
), we should probably add this in the computation of default value of max connections, because currently the connections used by tests have to fit in the extra 50 connections. We could addtest-database-amount
in connections in the computation of default value (probably multiplied by a factor of 5 for tests using--activity_node=
+ some safety), formula could become:X zopes partitions * Y zserver threads * 5 connections + test-database-amount * 5 connections + 50 extras
... or just use simple approach of
min(max-connection-count, 1000)
that was suggested earlier.As a temporary measure to verify this theory and so that we have less test failures, I reconfigured
ERP5-MASTER
test suite to setmax-connection-count
. -
Owner
We could add
test-database-amount
in connections in the computation of default valueI guess 30 can be decomposed as 3 * 10, so that one
runUnitTest
may use up to 10 databases. Is this correct ?In this case, it indeed makes perfect sense to involve this number in the maximum connection count, and would explain why I could not see this error when running this test on my own: I ran a single
runUnitTest
at a time.BTW, I'm surprised we would need 10 databases per test. I believe most tests use 1, hot-reindex test at least 2... But I doubt we use many more than that. But that's another topic...
-
Owner
It's a bit more complex. For reference, my understanding is that it works like this:
There are two level of parallelism in test nodes, first we have multiple test nodes and second is that each test node also runs more that one test in parallel (using "node-quantity" parameter), in practice our test nodes have 3.
ERP5 software release generates a
runTestSuite
wrapper where all connection strings are passed torunTestSuite --db_list
.Test nodes executes
runTestSuite
by passing it's configured "node-quantity".runTestSuite
is implemented here for ERP5. It will spawns 3 ("node-quantity")runUnitTest
processes, by distributing connection strings torunUnitTest
processes, each one receiving the number ofmysql_db_count
defined on the test suite class configured on Nexedi ERP5.For ERP5-MASTER suite, the test suite class is "ERP5", which is defined here, so we have 3 connection strings for each of of the 3 nodes (from "node-quantity").
When testnode executes the
runTestSuite
, it's like this:runTestSuite --db_list=cnx1,cnx2,cnx3,cnx4,..,cnx100 --node_quantity 3 --test_suite ERP5 ...
and
runTestSuite
will execute 3runUnitTest
processes: (BTW, thisrunUnitTest
is not the wrapper generated by ERP5 software release - therunUnitTest
wrapper is to run test "manually")runUnitTest --erp5_sql_connection_string=cnx1 --extra_sql_connection_string_list=cnx2,cnx3 .. testBusinessTemplate runUnitTest --erp5_sql_connection_string=cnx4 --extra_sql_connection_string_list=cnx5,cnx6 .. testERP5Core runUnitTest --erp5_sql_connection_string=cnx7 --extra_sql_connection_string_list=cnx8,cnx9 .. testSomethingElse
You are right that we don't need 10 database per tests and in the case of ERP5-MASTER tests, they will not be used.
One more thing is that requesting with
test-database-amount
like we do is deprecated ( since e4d1ea03 ... we can now update test suites configurations, at least for ERP5-MASTER ), ERP5 software release have "better" request parameters. There relevant part of instance schema is this:"test-runner": { "description": "Test runner parameters.", "properties": { "enabled": { "description": "Generate helper scripts to run test suite.", "default": true, "type": "boolean" }, "node-count": { "description": "Number of tests this instance can execute in parrallel. This must be at least equal to the number of nodes configured on testnode running the test", "default": 3, "type": "integer" }, "extra-database-count": { "description": "Number of extra databases this instance tests will need.", "default": 3, "type": "integer" } }, "type": "object" },
so in test suite definition we could set
test-runner.node-count
andtest-runner.extra-database-count
instead ofmysql.test-database-amount
. We could even not set it and just leave the default, as it provides more databases that what default ERP5 suite (because ERP5 tests uses 3 databases in total and default parameter is 3 extra database, so 4 in total).That's a long explanation, but in the end in erp5 instance we calculate
test_runner_total_database_count
and what I suggest adding it to the default calculation, maybe something like jerome/slapos@d12d1259 -
mentioned in merge request erp5!995 (merged)