1. 26 Dec, 2023 2 commits
    • Jérome Perrin's avatar
      core: fix error when downloading when mimetypes_registry only defines globs · 5ea313bb
      Jérome Perrin authored
      DownloadableMixin uses mimetypes_registry to guess an extension from
      the mimetype, but this was wrong when the entry in mimetype only defines
      globs, an error like this was raised:
      
            Module erp5.component.mixin.erp5_version.DownloadableMixin, line 143, in index_html
              output_format = mimetype_object.globs.strip('*.')
          AttributeError: 'list' object has no attribute 'strip'
      
      Also blindly apply the same fix in OOoTemplate, as it used the same
      problematic pattern.
      5ea313bb
    • Jérome Perrin's avatar
      core: drop text/comma-separated-values mime type · 7e35d0e7
      Jérome Perrin authored
      we use to have two entries for csv:
      
        - text/comma-separated-values
           - mimes: ["text/comma-separated-values"]
           - extensions: ["csv"]
           - globs: []
        - CSV document
           - mimes: ["text/csv", "text/x-comma-separated-values", "text/x-csv"]
           - extensions: []
           - globs: ["*.csv"]
      
      but text/comma-separated-values does not really exist, rfc4180 recommends
      text/csv.
      
      The problem with this configuration is that when ERP5 picks a mime type
      for csv extension, it uses text/comma-separated-values, as this one has
      extensions set.
      
      Change the configuration to delete "text/comma-separated-values" and keep
      everything in "CSV document":
      
        - CSV document
           - mimes: ["text/csv", "text/x-comma-separated-values", "text/x-csv", "text/comma-separated-values"]
           - extensions: ["csv"]
           - globs: ["*.csv"]
      7e35d0e7
  2. 20 Nov, 2023 1 commit
    • Jérome Perrin's avatar
      DMS: handle gracefully unauthorized ingestion scenarios · 5c601193
      Jérome Perrin authored
      Instead of using UnrestrictedMethod for the whole duration of
      discoverMetadata, revert to the original user permissions for the last
      part where we merge revision or change state, because we want this to
      fail when user is not allowed.
      
      This adjusts also the high level Base_contribute script to catch these
      errors and report them nicely to the user and while doing this some
      other problems were discovered and are also fixed.
      
      Some problems in ContributionTool.newContent were fixed, this was
      already trying to detect that a document is going to be merged and
      synchronously checked that this existing document can be replaced, to
      display an error to the user, but this part has two issues:
        - first it was using getMatchedFilenamePatternDict to find the
        document coordinates, but this method only supports
        preferred_document_filename_regular_expression capturing reference,
        but not the combination of node_reference and local_reference, so this
        was changed to use getPropertyDictFromFilename, which is what is
        actually used to compute the reference.
        - The second problem here was that this check was done with a
        standard restricted catalog search, but later mergeRevision uses an
        unrestricted search, so this was changed to use unrestricted catalog
        search to match mergeRevision
      
      The testing also revealed that the PDFDocument._setFile hack to clear
      the _content_information was only for _setFile, but not for _setData,
      so this was extended to _setData as well, so that a case where the PDF
      content is updated by setData also update content information.
      5c601193
  3. 16 Nov, 2023 1 commit
  4. 03 Mar, 2023 5 commits
  5. 19 Dec, 2022 1 commit
    • Jérome Perrin's avatar
      *: rewrite with lib2to3.fixes.fix_asserts and ad-hoc assertin · 2e366054
      Jérome Perrin authored
      The add-hoc assertin filter:
      
      --
      
      from typing import List
      
      import lib2to3
      
      from lib2to3.fixer_base import BaseFix
      from lib2to3.fixer_util import Comma, Name
      
      class FixAssertIn(BaseFix):
      
        PATTERN = """
            power< any+ trailer< '.' meth=("assertTrue" | "assertFalse")>
            trailer< '('
              comparison< (needle=any ( comp_op<'not' 'in'> | 'in' ) haystack=any) >
            ')' > >
        """
      
        def transform(self, node: lib2to3.pytree.Node,
                      results: List[lib2to3.pytree.Base]):
      
          needle = results['needle']
          haystack = results['haystack']
          meth = results["meth"][0]
      
          method_map = {True: 'assertIn', False: 'assertNotIn'}
          method_in = meth.value == 'assertTrue'
          if 'not' in str(needle.parent.children[1]):
            method_in = not method_in
          meth.replace(Name(method_map[method_in], prefix=meth.prefix))
      
          needle.parent.children = [needle, Comma(), haystack]
      2e366054
  6. 04 Nov, 2022 1 commit
  7. 05 Oct, 2022 1 commit
  8. 02 Jun, 2021 1 commit
    • Jérome Perrin's avatar
      dms: use ghostscript to convert PDF to text · f775724e
      Jérome Perrin authored
      For historical reasons, PDF to text involved conversion first of the PDF to
      png, then this png to tiff and the tiff was sent to tesseract. This works, but
      it consumes a lot of resources with large PDFs, especially because the
      intermediate png/tiff are created with a resolution of 300 DPI, which easily
      needs serveral Go of RAM and temporary disk space.
      This was obsorved with the PDF created by erp5_document_scanner, which are
      usually high quality (1 or 2Mo per page) and even a one page PDF sometimes
      took more than one minute to OCR.
      
      Since 9.53 ghostscript integrates tesseract engine directly, we don't need to
      prepare a tiff beforehand, we can directly send the PDF data to ghostscript.
      
      These change use ghostscript if available and otherwise fallback to the same
      pipeline as before. This will allow the transition until all ERP5 instances
      are running a recent enough SlapOS with ghostscript 9.54. Fortunately, before
      SlapOS include ghostscript 9.54, ERP5 software release did not have ghostscript
      in $PATH, so we don't have to check ghostscript version, we assume that if gs
      is in $PATH, it means we have a recent enough SlapOS.
      
      This new approach was less tolerant regarding broken/password-protected PDFs
      so we perform a new check that the PDF is valid and not encrypted before
      trying to use OCR.
      f775724e
  9. 27 May, 2021 2 commits
  10. 26 May, 2021 1 commit
    • Jérome Perrin's avatar
      ingestion: review publication_state argument · 015bc1c1
      Jérome Perrin authored
      Changing state directly in Base_contribute was only functional for the case
      where metadata was discovered asynchronously. In the case of synchronous
      discovery, the state was first changed state, and Document_convertToBaseFormatAndDiscoverMetadata
      was executed - but this this was causing Unauthorized like this:
      
            Module script, line 10, in Document_convertToBaseFormatAndDiscoverMetadata
            - <PythonScript at /erp5/Document_convertToBaseFormatAndDiscoverMetadata used for /erp5/document_module/163>
            - Line 10
              return context.discoverMetadata(filename=filename,
          Unauthorized: You are not allowed to access 'discoverMetadata' in this context
      
      because once we have already changed state, regular user no longer have
      permission to access discoverMetadata, because that method needs ModifyPortalContent
      permission.
      
      Instead, of handling publication_state only in Base_contribute, treat it
      like others user input parameter and change state during discovery.
      
      Tests were also re-organised to move Base_contribute related test in testIngestion
      and also to run Base_contribute tests as a non-manager user.
      015bc1c1
  11. 23 Apr, 2021 1 commit
    • Arnaud Fontaine's avatar
      ERP5Workflow: DC Workflows are now ERP5 objects (!1378). · df85ef46
      Arnaud Fontaine authored
      This also moves all Configurator Workflows in workflow_module to portal_workflow
      (workflow_module was an implementation of Workflows based on ERP5 objects and
      not using DCWorkflow code).
      
      * Workflows are now defined on on portal_workflow._chains_by_type anymore but,
        as everything else, on the Portal Type itself.
      * portal_workflow can contain and work at the same time with legacy and new
        Workflows (ERP5Type/patches/DCWorkflow.py monkey-patching DCWorkflow classes
        to provide the same API).
      * Existing Workflow Scripts should work as they are and the code can be updated
        later on to take advantage of the new API:
        + With legacy implementation Workflow {Scripts,Transitions,Worklists,States}
          were in a Folder ({scripts,transitions,worklists,states} attribute) but
          all of these are now in the Workflow itself and their IDs are prefixed
          (PropertySheet-style), for example `script_`. Legacy attributes are
          provided in new implementation to call the new API.
        + When calling a Workflow Script, `container` was bound to its parent, namely
          WF.scripts (Folder) and a Workflow Script could call another. Now `container`
          is bound to the WF itself and Workflow Scripts are in a Workflow directly.
          New implementation `scripts` attribute handle such use case.
        + Override portal_workflow.__getattr__ so that a Workflow Script can call
          another one without prefix.
      * Worklist are Predicate: Worklist filter objects based on given criterions and
        thus it makes more sense for a Worklist to be a Predicate (albeit a Predicate
        with only Identity Criterion and nothing else).
        + Criterion Properties:
          * state_variable.
          * local_roles (SECURITY_PARAMETER_ID).
          * Any Workflow Variables with for_catalog == 1.
      
      erp5_performance_test:testWorkflowPerformance were ran to compare DCWorkflow
      and ERP5Workflow implementations and it seems to be about 4% slower with the
      new implementation (legacy: 7.547, 7.593, 7.618, 7.59, 7.514 and new: 7.842,
      7.723, 7.902, 7.837, 7.875).
      
      Work done by Wenjie Zheng, Isabelle Vallet, Sebastien Robin and myself.
      df85ef46
  12. 15 Mar, 2021 1 commit
  13. 12 Mar, 2021 1 commit
    • Georgios Dagkakis's avatar
      erp5_core: Make two queries in Base_getRelatedDocumentList · aba2d822
      Georgios Dagkakis authored
      For two reasons:
      - In previous version, query would lose Embedded Documents, lacking a left join in follow_up_uid
      - In large instances the previous version was many times slower
      
      Also,
      erp5_dms: Add a test for Base_getRelatedDocumentList
      
      To check it works correctly with both related (follow_up)
      Documents and with sub-object Embedded Files
      aba2d822
  14. 01 Mar, 2021 1 commit
  15. 26 Feb, 2021 1 commit
    • Jérome Perrin's avatar
      base: don't acquire local role on File · 614ac5e4
      Jérome Perrin authored
      File is a top-level module document, like PDF, Text Document etc, so it
      should have its own security definition and should not acquire local roles.
      
      For cases where files are embedded as sub documents, we are using Embedded
      File, which acquire local roles.
      614ac5e4
  16. 22 Feb, 2021 1 commit
  17. 20 Jul, 2020 1 commit
  18. 05 Jun, 2020 1 commit
  19. 27 May, 2020 1 commit
  20. 21 Feb, 2020 1 commit
  21. 10 Feb, 2020 1 commit
  22. 15 Jan, 2020 1 commit
  23. 18 Nov, 2019 1 commit
  24. 05 Sep, 2019 1 commit
  25. 24 Jul, 2018 1 commit
  26. 07 Jul, 2017 1 commit
    • Julien Muchembled's avatar
      tests: clean up code related to configuration of cloudooo/memcached connectors · f84f4cdb
      Julien Muchembled authored
      - The conversion server is supposed to be configured in a system preference,
        so do this instead of using a normal preference.
      - _getConversionServerDict -> _getConversionServerUrl, to make clear that
        cloudooo is now configured by a url, instead of a host/port couple.
      - Refactoring: From the moment where setUpERP5Site() sets up things
        automatically, we don't the "same" duplicated code throughout many test to
        redo cloudooo configuration.
      - In the promise file, the volatile/persistent memcached url were swapped.
      f84f4cdb
  27. 08 Mar, 2017 1 commit
  28. 27 Jan, 2017 1 commit
  29. 23 Dec, 2016 3 commits
  30. 09 Dec, 2016 1 commit
  31. 23 Aug, 2016 1 commit
  32. 22 Aug, 2016 1 commit