Added a test for getCreationDate / getModificationDate
git-svn-id: https://svn.erp5.org/repos/public/erp5/trunk@4584 20353a03-c40f-0410-a6d1-a30d3c3de9de
-
Owner
@jerome, FYI this test fails sometimes on ERP5.UnitTest-Master -> erp5_core_test:testERP5Catalog, e.g.:
https://nexedijs.erp5.net/#/test_result_module/20210624-BB07CC5C/65
I've noticed this because I saw testERP5Catalog failure on ERP5.UnitTest-Master.ZODB4-wc2 and wondered whether it was due to my ZODB changes, or this core test was indeed flaky before.
/cc @romain
-
Developer
no, I checked maybe a dozen of the test results, this test just passed in those test results.
-
Owner
@romain, what could help here is if sorting by "Result" column in "Result history" query actually worked. Then it would be easy to check "whether there were failures or not" - by sorting on ("Result", "Launch date") pair. For now - because that sorting does not work in both nexedijs.erp5.net and nexedi.erp5.net - I have to go and manually linearly dig result history to find those failures.
If Boxiang is doing some QA work on the tests, I belive that would help him too.
-
Owner
Sadly, Test Result Line are not indexed in the catalog and so, the listbox query script does not directly search for this kind of document.
I don't see for now how to improve this without making the
Test Result Line
indexable. But as it was explicitely defined as not indexable, I bet there is a good reason, and working on this would probably take too much time. -
Owner
@romain, thanks for feedback.
@jerome, since it was you to disable indexing for
TestResultLine
in 02729fdc and 30066147, would you please share why it was done? Current status is that both old and new UIs allow to click on sort by "Result", but the sorting is silently not performed at all.Sorting by "Result" is needed to quickly find out test failures - please see f5ffa996 (comment 141467) for context.
-
Owner
I did not know this test was flaky. The problematic line is:
self.assertEqual(creation_date, result[0]['modification_date'].ISO())
we are comparing the creation date with the modification date. This is a bit questionable. If I remember correctly, the rationale is that when we don't modify a document, the creation date and modification date are supposed to be the same.
getCreationDate
uses the time from the first entry ofedit_workflow
(code)getModificationDate
use the most recent date for any entry in any workflow (code)I tried a bit to create organisations and compare creation date and modification date, I tried on a site with old workflow implementation and dates seemed to be same, and on another site with new workflow implementation ( from !1378 (merged) ) and the dates are slightly different. I could observe this:
(Pdb) pp [(wf_id, [entry['time'] for entry in wf_entries]) for wf_id, wf_entries in self.workflow_history.items()] [('validation_workflow', [DateTime('2021/09/22 00:54:27.249143 UTC')]), ('edit_workflow', [DateTime('2021/09/22 00:54:27.247178 UTC')])] (Pdb) self.getCreationDate() DateTime('2021/09/22 00:54:27.247178 UTC') (Pdb) self.getModificationDate() DateTime('2021/09/22 00:54:27.249143 UTC')
it's almost same, but this can make creation date and modification date at different second, so yes, that test is flaky.
I'm not really sure what we should do.
It may not be a side effect of the new workflow, but I suspect it is, because performance tests showed a small performance regression when we merged it.
For that test, maybe we don't need to check it's exactly same date, but use something like "assertAlmostEqual" ... Or maybe we should try first to improve workflow performance a bit. There are a few things in
_executeTransition
orupdateRoleMappingsFor
that maybe we could make faster. The workflow definition is made of several ZODB documents and the code loads several documents, the state definition to see what are the permissions, etc. The workflow definition never change, so maybe we can benefit from our dynamic class to dynamically generate methods that would be a kind of "compiled" version - ie. instead of loading the workflow, transitions, states etc objects each time, we can generate a method which does not need to load objects - a bit similar to what we did in 58d4ab8e (instead of having a method looking upgetTypeAcquireLocalRole
on the portal type definition, we generate class with a method with a simplereturn True
orreturn False
which does not need to load portal type). All this maybe not really clear, this is something I wanted to give a try, but I don't think I can find the time soon. Anyway this seems a big change.
About the non-indexable test result lines, I'm sorry about the lack of details in the commit messages. This was made as an optimisation. On Nexedi ERP5, we now have about 6 * 10⁵ Test Results, which accounts for 18% of the size of mariadb catalog. It depends on the tests, but on average I guess we have something like 300 test result lines per test, so if they are indexed, we would have 2 * 10⁸ test result lines, which would be something like 98% of catalog. We did not have any use case depending of the test result lines to be indexed (except the test result history, but it was possible to rewrite it in a way that it's not required), so that was rather easy to remove these documents from catalog. It's now long time ago and I don't remember how much this improved performance, but I think this was enough to make the data fit in RAM for mariadb.
Now if we want a report on flaky tests, another approach could also beto write a custom script using a catalog search to select the tests and using ZODB to iterate on test result lines (
searchAndActivate
API makes it easy to write in a way that computation is distributed on the cluster). That may be not too easy to distinguish real failures from flaky tests though. I feel the situation now is not so bad: if we look at 4 or 5 pages of test results and we don't see failures, it's already a sign that the test is not so flaky.That said, I'm not against the idea of making test result lines indexable again, but this will increase a lot the data size, so if we do this, we should probably check the impact on performance on a clone instance.
-
Owner
@jerome, thanks for feedback.
About dates/time - I suggest not to use
.ISO()
at all and just compareDateTime
objects. This should work just ok but it will compare datetime objects precisely instead of rounding them to seconds:In [1]: from time import time In [2]: from DateTime import DateTime In [4]: t1 = time(); t2 = time() In [5]: t1 Out[5]: 1632297639.166148 In [6]: t2 Out[6]: 1632297639.166266 In [7]: t2 - t1 Out[7]: 0.00011801719665527344 In [8]: T1 = DateTime(t1, 'UTC'); T2 = DateTime(t2, 'UTC') In [9]: T1 Out[9]: DateTime('2021/09/22 08:00:39.166148 UTC') In [10]: T2 Out[10]: DateTime('2021/09/22 08:00:39.166266 UTC') In [11]: T2 - T1 Out[11]: 1.3657407407407407e-09 In [12]: # ^^^ T2 - T1 is in days In [13]: T1 == T1 Out[13]: True In [14]: T2 == T1 Out[14]: False
This way both the test code will be more clear, and the test itself will be more robust in catching differences.
About TestResultLine search: I now see and agree it might be not so good idea to enable catalogization for them. Instead, as you say, if we can implement needed queries directly via ZODB without catalog - even if slow - that would be probably enough.
I feel the situation now is not so bad: if we look at 4 or 5 pages of test results and we don't see failures, it's already a sign that the test is not so flaky.
Unfortunately it was not the case for me. For example for
testCMFActivity
(see !1334 (comment 141436)) I had to dig through ~ 50 pages to find out whether there was any failure. With loading each next page taking ~ 30 seconds on my side... I had to do this because with ZODB4-wc2 a test failed recently and I had to figure out whether the failure is related to ZODB4-wc2 changes, or it is unrelated and was there before.P.S. having such "find failures" queries working, could allow us to automatically find out all flaky/failing tests instead of manually inspecting each of them.
-
Owner
I tried a bit to create organisations and compare creation date and modification date, I tried on a site with old workflow implementation and dates seemed to be same,
I investigated a bit more, because it was surprising that in my observations the dates were exactly the same on the "old site". I thought maybe this was evaluated only once or something similar, but this is because this old site was also using a custom validation workflow named something like "aaaa_validation_workflow", which sorts alphabetically before edit_workflow. In this observation, on the old site we have aaaa_validation_workflow then edit_workflow, so modification date is taken from edit_workflow and is same from creation date. On the new site we had edit_workflow then validation_workflow, then modification date is taken from validation workflow and is slightly different from creation date.
So I made the same experiment with another old site using normal workflows, edit workflow then validation workflow, and the times are also slightly different:
2021/09/24 09:37:2.731914 UTC getCreationDate 2021/09/24 09:37:2.733560 UTC getModificationDate
so it's not really something related with new workflow workflow, this test is flaky and this is not related to new workflows.
@kirr thanks for feedback about comparing dates directly. I think we're not just comparing dates because accessors such as
getCreationDate
andgetModificationDate
get the date from ZODB and the attributes on catalog brains.creation_date
and.modification_date
are different dates instances that are re-created from the data in mariadb and the date columns are less precise and we are loosing some precision here. Then I think.ISO()
was a way to loose precision until the previous second (which is a recipe for flaky test). I did not follow this, but since then mariadb and the zope connector to interface support DateTime with better precision, but we are not using it here, maybe because it would use too much disk space or some other compatibility with existing sites issues - or just because it's generally not a problem we are facing in production.Anyway, to fix the flakyness, probably we can compare DateTime instances, checking that difference is less than 1 seconds or something, but comparing
.ISO()
was a bad idea.BTW, about:
In [11]: T2 - T1 Out[11]: 1.3657407407407407e-09 In [12]: # ^^^ T2 - T1 is in days
one generally convenient thing is the
.asdatetime()
method, which returns a python standard datetime, so we can do:T2.asdatetime() - T1.asdatetime()
and we get a timedelta.
I agree that something to find failures could be generally useful. The "Test Result History" is slow, because it loads lots of documents from ZODB, maybe it also gets slower and slower when getting in the previous pages