Commit 924f20fa authored by Sebastien Robin's avatar Sebastien Robin

simulation: solve random solving issues in a cluster environment

Before, when solving a solver, parsing sub objects of solver_process was done directly synchronously
in solver_workflow. And we had the case where you have parallel transactions solving a solver,
each of theses transactions see remaining solver not in solved state, but once all transaction are
finished, all solver are solved. This could lead to the case where solver_process is never moved
to succeeded. Instead of using serialize (which may lead to conflicts), just use activities in queue
with a serialization tag.
parent 1b087745
solver_process = state_change['object'].getParentValue() solver_process = state_change['object'].getParentValue()
for solver in solver_process.objectValues( # Before parsing sub objects was done directly here. And we had the
portal_type=solver_process.getPortalObject().getPortalTargetSolverTypeList()): # case where you have parallel transactions solving a solver,
if solver.getValidationState() != 'solved': # each of theses transactions see remaining solver not in solved state,
return # but once all transaction are finished, all solver are solved. This
solver_process.succeed() # could lead to the case where solver_process is never moved to succeeded
# Instead of using serialize (which may lead to conflicts), just use
# activities in queue.
solver_process.activate(serialization_tag=solver_process.getRelativeUrl()
).SolverProcess_tryToSucceed()
solver_process = context
for solver in solver_process.objectValues(
portal_type=solver_process.getPortalObject().getPortalTargetSolverTypeList()):
if solver.getValidationState() != 'solved':
break
else:
portal = context.getPortalObject()
if portal.portal_workflow.isTransitionPossible(solver_process, 'succeed'):
solver_process.succeed()
<?xml version="1.0"?>
<ZopeData>
<record id="1" aka="AAAAAAAAAAE=">
<pickle>
<global name="PythonScript" module="Products.PythonScripts.PythonScript"/>
</pickle>
<pickle>
<dictionary>
<item>
<key> <string>Script_magic</string> </key>
<value> <int>3</int> </value>
</item>
<item>
<key> <string>_bind_names</string> </key>
<value>
<object>
<klass>
<global name="NameAssignments" module="Shared.DC.Scripts.Bindings"/>
</klass>
<tuple/>
<state>
<dictionary>
<item>
<key> <string>_asgns</string> </key>
<value>
<dictionary>
<item>
<key> <string>name_container</string> </key>
<value> <string>container</string> </value>
</item>
<item>
<key> <string>name_context</string> </key>
<value> <string>context</string> </value>
</item>
<item>
<key> <string>name_m_self</string> </key>
<value> <string>script</string> </value>
</item>
<item>
<key> <string>name_subpath</string> </key>
<value> <string>traverse_subpath</string> </value>
</item>
</dictionary>
</value>
</item>
</dictionary>
</state>
</object>
</value>
</item>
<item>
<key> <string>_params</string> </key>
<value> <string></string> </value>
</item>
<item>
<key> <string>_proxy_roles</string> </key>
<value>
<tuple>
<string>Manager</string>
</tuple>
</value>
</item>
<item>
<key> <string>id</string> </key>
<value> <string>SolverProcess_tryToSucceed</string> </value>
</item>
</dictionary>
</pickle>
</record>
</ZopeData>
  • serialization_tag looks useless, and even bad for performance.

    /cc @vpelletier

  • serialization_tag looks useless, and even bad for performance.

    serialization_tag costs time on validation node, yes. It even costs more than any other dependency when there is no grouping_method as will cost for each activity on each validation attempt for just producing a single executable activity. If you somehow get near or above a thousand of such activities pending validation, it is likely to slow down validation to more than one second per activity (as I saw recently on a customer project, with another activity method but same kinf of dependency pattern).

    It is a bad idea to use a bare path as serialization_tag, because indexation activities already use a bare path as serialisation_tag, so suddenly this code becomes dependent & depended on by indexation activities, which can cause unnecessary delay in indexation activities.

    I think the correct thing here is, in a single transaction:

    • set a tag on the indexation activity triggered for the solver when it gets to solved state - ex: solver_process_path + "_solve"
    • spawn a SolverProcess_tryToSucceed activity, with after_tag=solver_process_path + "_solve"

    This way, at least one such activity will be executed after all contained solvers are solved, without propagating the dependency to other activities nor risking avalance effect from serialization_tag.

    Side note:

    In general, I dislike isTransitionPossible. If an admin comes around and needs to undo user mistake by doing an unexpected state change, then tested transition may temporarily become possible from an unexpected state. If at that time such code runs and discovers the transition is possible, it will stab the admin in the back. I vastly prefer checking current state, even if it means somewhat duplicating workflow logic (but the code is anyway tightly intricated with workflow configuration, so I do not think this is a significant difference). Symetrically, it means that if the workflow is somehow damaged and the transition is not possible anymore, instead of getting failed activities signaling that something has gone wrong, suddently such activities will "succeed" without doing everything they are exepecting to do, which will also cause huge pain to recover from (much more than receiving some failure mails, fixing the issue and reinvoking activities).

    The only place where it would be legitimate would be type-agnostic code doing opportunistic workflow transitions, as it cannot possibly know which state(s) a transition is possible from, and it is better than doing a try..except which would risk intercepting and misinterpreting an exception raised from a deeper workflow change (ex: code starts object A, which triggers a validate on object B, but object B cannot be validated, code gets the same exception it would have had if it was object A that couldn't be started, and may take the wrong decision as a result).

    Also, here it would be better to first check whether the transition is expected to be possible, so that time is not spent loading and checking contained objects if it turns out nothing can be done anyway.

  • I think the correct thing here is, in a single transaction:

    • set a tag on the indexation activity triggered for the solver when it gets to solved state - ex: solver_process_path + "_solve"
    • spawn a SolverProcess_tryToSucceed activity, with after_tag=solver_process_path + "_solve"

    This way, at least one such activity will be executed after all contained solvers are solved, without propagating the dependency to other activities nor risking avalance effect from serialization_tag.

    What you suggest brings best performance, but it's also so complex that it's hard to get right and maintain. For me, it's not worth it. With a bare activate(), you just have a few more executions of SolverProcess_tryToSucceed while solvers are being solved. This looks negligible. And the last commit solving a solver spawns the SolverProcess_tryToSucceed activity that does the work (no reason this activity does nothing).

    About isTransitionPossible, I also thought about suggesting the simpler (and stricter):

    if solver_process.getValidationState() != 'solved':
      solver_process.succeed()
  • What you suggest brings best performance, but it's also so complex that it's hard to get right and maintain. For me, it's not worth it.

    From a performance point of view, it of course depends on how many solvers there are in a solver process - which I don't know.

    My concern was rather about any surrounding code which could be broken by finding the solver process in succeeded state in catalog while some of it children solvers are not in solved state. And this would itself probably not be enough, as what these solvers themselves do may have to be indexed too before whatever depends on the solver process becomming succeeded starts being executed. This, to me, seems like it could be more misleading for the code and make it harder to maintain, while I do not follow how the pattern I propose would be so complex (but I'm not familiar with surrounding code and execution conditions).

    About isTransitionPossible, I also thought about suggesting the simpler (and stricter):

    Yes, with of course the whole check-all-solver-children-states between those two lines.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment