Commits · c1933ad1022a7ec9199fc84fe4894aca31011668 · nexedi / erp5

26 Dec, 2023 2 commits

core: fix error when downloading when mimetypes_registry only defines globs · 5ea313bb

Jérome Perrin authored 1 year ago

DownloadableMixin uses mimetypes_registry to guess an extension from
the mimetype, but this was wrong when the entry in mimetype only defines
globs, an error like this was raised:

      Module erp5.component.mixin.erp5_version.DownloadableMixin, line 143, in index_html
        output_format = mimetype_object.globs.strip('*.')
    AttributeError: 'list' object has no attribute 'strip'

Also blindly apply the same fix in OOoTemplate, as it used the same
problematic pattern.

5ea313bb

core: drop text/comma-separated-values mime type · 7e35d0e7

Jérome Perrin authored 1 year ago

we use to have two entries for csv:

  - text/comma-separated-values
     - mimes: ["text/comma-separated-values"]
     - extensions: ["csv"]
     - globs: []
  - CSV document
     - mimes: ["text/csv", "text/x-comma-separated-values", "text/x-csv"]
     - extensions: []
     - globs: ["*.csv"]

but text/comma-separated-values does not really exist, rfc4180 recommends
text/csv.

The problem with this configuration is that when ERP5 picks a mime type
for csv extension, it uses text/comma-separated-values, as this one has
extensions set.

Change the configuration to delete "text/comma-separated-values" and keep
everything in "CSV document":

  - CSV document
     - mimes: ["text/csv", "text/x-comma-separated-values", "text/x-csv", "text/comma-separated-values"]
     - extensions: ["csv"]
     - globs: ["*.csv"]

7e35d0e7

20 Nov, 2023 1 commit

DMS: handle gracefully unauthorized ingestion scenarios · 5c601193

Jérome Perrin authored 1 year ago

Instead of using UnrestrictedMethod for the whole duration of
discoverMetadata, revert to the original user permissions for the last
part where we merge revision or change state, because we want this to
fail when user is not allowed.

This adjusts also the high level Base_contribute script to catch these
errors and report them nicely to the user and while doing this some
other problems were discovered and are also fixed.

Some problems in ContributionTool.newContent were fixed, this was
already trying to detect that a document is going to be merged and
synchronously checked that this existing document can be replaced, to
display an error to the user, but this part has two issues:
  - first it was using getMatchedFilenamePatternDict to find the
  document coordinates, but this method only supports
  preferred_document_filename_regular_expression capturing reference,
  but not the combination of node_reference and local_reference, so this
  was changed to use getPropertyDictFromFilename, which is what is
  actually used to compute the reference.
  - The second problem here was that this check was done with a
  standard restricted catalog search, but later mergeRevision uses an
  unrestricted search, so this was changed to use unrestricted catalog
  search to match mergeRevision

The testing also revealed that the PDFDocument._setFile hack to clear
the _content_information was only for _setFile, but not for _setData,
so this was extended to _setData as well, so that a case where the PDF
content is updated by setData also update content information.

5c601193

16 Nov, 2023 1 commit
- core: make sure base_data is bytes or Pdata · c633ecef
  Jérome Perrin authored 1 year ago
```
This was working fine (although not python3 ready), but was not tested
```
  c633ecef
03 Mar, 2023 5 commits

testDms: test checkConsistency and PropertyTypeValidity · 9f8a808f
Jérome Perrin authored 2 years ago
```
Checks that created document types match constraints
```
9f8a808f
File: treat data as bytes, not str · 86981105
Jérome Perrin authored 2 years ago

86981105

DownloadableMixin: encode content disposition as RFC 6266 · 85681613

Jérome Perrin authored 2 years ago

Previously, we were using an encoded string, which in practice worked,
but with python3, WSGI server will encode as latin1 (because WSGI is
latin1) and already with Zope4 on python2 we have issues with testing,
as functional testing fake WSGI server only accepts ascii headers [1]

1: https://github.com/zopefoundation/Zope/blob/cddecf7e/src/Testing/ZopeTestCase/functional.py#L125-L126

85681613

ERP5TypeTestCase: update .publish() patch with Zope4 version · 0f236fb0

Jérome Perrin authored 3 years ago

This changes slightly getOutput, `Status` is not printed as an header
(Status: 200 OK) but only in the status line (HTTP/1.1 200 OK)

0f236fb0

column name is case sensitive · 5742bf4b
Aurel authored 3 years ago

5742bf4b

19 Dec, 2022 1 commit

*: rewrite with lib2to3.fixes.fix_asserts and ad-hoc assertin · 2e366054

Jérome Perrin authored 2 years ago

The add-hoc assertin filter:

--

from typing import List

import lib2to3

from lib2to3.fixer_base import BaseFix
from lib2to3.fixer_util import Comma, Name

class FixAssertIn(BaseFix):

  PATTERN = """
      power< any+ trailer< '.' meth=("assertTrue" | "assertFalse")>
      trailer< '('
        comparison< (needle=any ( comp_op<'not' 'in'> | 'in' ) haystack=any) >
      ')' > >
  """

  def transform(self, node: lib2to3.pytree.Node,
                results: List[lib2to3.pytree.Base]):

    needle = results['needle']
    haystack = results['haystack']
    meth = results["meth"][0]

    method_map = {True: 'assertIn', False: 'assertNotIn'}
    method_in = meth.value == 'assertTrue'
    if 'not' in str(needle.parent.children[1]):
      method_in = not method_in
    meth.replace(Name(method_map[method_in], prefix=meth.prefix))

    needle.parent.children = [needle, Comma(), haystack]

2e366054

04 Nov, 2022 1 commit

Revert "Products.ERP5.ERP5Site: Install erp5_oauth2_{resource,authentication} by default." · 19327cc0

Vincent Pelletier authored 2 years ago

This reverts commit 035d099a.
Installing BTs which do not come from produt/ERP5/bootstrap breaks site
creation, except in unit tests.
This commit is very desirable, but not ready, so unfortunately I have
to revert it.

19327cc0

05 Oct, 2022 1 commit

Products.ERP5.ERP5Site: Install erp5_oauth2_{resource,authentication} by default. · 035d099a

Vincent Pelletier authored 3 years ago

So every new instance is able to use self-contained oauh2 authentication.
In turn, this triggers atomated migration of a few portal types, which
cause the coding style tests to fail. So commit these as well.

035d099a

02 Jun, 2021 1 commit

dms: use ghostscript to convert PDF to text · f775724e

Jérome Perrin authored 4 years ago

For historical reasons, PDF to text involved conversion first of the PDF to
png, then this png to tiff and the tiff was sent to tesseract. This works, but
it consumes a lot of resources with large PDFs, especially because the
intermediate png/tiff are created with a resolution of 300 DPI, which easily
needs serveral Go of RAM and temporary disk space.
This was obsorved with the PDF created by erp5_document_scanner, which are
usually high quality (1 or 2Mo per page) and even a one page PDF sometimes
took more than one minute to OCR.

Since 9.53 ghostscript integrates tesseract engine directly, we don't need to
prepare a tiff beforehand, we can directly send the PDF data to ghostscript.

These change use ghostscript if available and otherwise fallback to the same
pipeline as before. This will allow the transition until all ERP5 instances
are running a recent enough SlapOS with ghostscript 9.54. Fortunately, before
SlapOS include ghostscript 9.54, ERP5 software release did not have ghostscript
in $PATH, so we don't have to check ghostscript version, we assume that if gs
is in $PATH, it means we have a recent enough SlapOS.

This new approach was less tolerant regarding broken/password-protected PDFs
so we perform a new check that the PDF is valid and not encrypted before
trying to use OCR.

f775724e

27 May, 2021 2 commits
- dms: test more cases of converting PDFs to images · 9ac96204
  Jérome Perrin authored 4 years ago
  
  9ac96204
- dms: run testERP5Base.TestImage with DMS installed · 1d0aeccb
  Jérome Perrin authored 4 years ago
```
because DMS extends image portal types with interaction workflows etc,
it's better to also cover the case where DMS is installed.
```
  1d0aeccb
26 May, 2021 1 commit

ingestion: review publication_state argument · 015bc1c1

Jérome Perrin authored 4 years ago

Changing state directly in Base_contribute was only functional for the case
where metadata was discovered asynchronously. In the case of synchronous
discovery, the state was first changed state, and Document_convertToBaseFormatAndDiscoverMetadata
was executed - but this this was causing Unauthorized like this:

      Module script, line 10, in Document_convertToBaseFormatAndDiscoverMetadata
      - <PythonScript at /erp5/Document_convertToBaseFormatAndDiscoverMetadata used for /erp5/document_module/163>
      - Line 10
        return context.discoverMetadata(filename=filename,
    Unauthorized: You are not allowed to access 'discoverMetadata' in this context

because once we have already changed state, regular user no longer have
permission to access discoverMetadata, because that method needs ModifyPortalContent
permission.

Instead, of handling publication_state only in Base_contribute, treat it
like others user input parameter and change state during discovery.

Tests were also re-organised to move Base_contribute related test in testIngestion
and also to run Base_contribute tests as a non-manager user.

015bc1c1

23 Apr, 2021 1 commit

ERP5Workflow: DC Workflows are now ERP5 objects (!1378). · df85ef46

Arnaud Fontaine authored 4 years ago

This also moves all Configurator Workflows in workflow_module to portal_workflow
(workflow_module was an implementation of Workflows based on ERP5 objects and
not using DCWorkflow code).

* Workflows are now defined on on portal_workflow._chains_by_type anymore but,
  as everything else, on the Portal Type itself.
* portal_workflow can contain and work at the same time with legacy and new
  Workflows (ERP5Type/patches/DCWorkflow.py monkey-patching DCWorkflow classes
  to provide the same API).
* Existing Workflow Scripts should work as they are and the code can be updated
  later on to take advantage of the new API:
  + With legacy implementation Workflow {Scripts,Transitions,Worklists,States}
    were in a Folder ({scripts,transitions,worklists,states} attribute) but
    all of these are now in the Workflow itself and their IDs are prefixed
    (PropertySheet-style), for example `script_`. Legacy attributes are
    provided in new implementation to call the new API.
  + When calling a Workflow Script, `container` was bound to its parent, namely
    WF.scripts (Folder) and a Workflow Script could call another. Now `container`
    is bound to the WF itself and Workflow Scripts are in a Workflow directly.
    New implementation `scripts` attribute handle such use case.
  + Override portal_workflow.__getattr__ so that a Workflow Script can call
    another one without prefix.
* Worklist are Predicate: Worklist filter objects based on given criterions and
  thus it makes more sense for a Worklist to be a Predicate (albeit a Predicate
  with only Identity Criterion and nothing else).
  + Criterion Properties:
    * state_variable.
    * local_roles (SECURITY_PARAMETER_ID).
    * Any Workflow Variables with for_catalog == 1.

erp5_performance_test:testWorkflowPerformance were ran to compare DCWorkflow
and ERP5Workflow implementations and it seems to be about 4% slower with the
new implementation (legacy: 7.547, 7.593, 7.618, 7.59, 7.514 and new: 7.842,
7.723, 7.902, 7.837, 7.875).

Work done by Wenjie Zheng, Isabelle Vallet, Sebastien Robin and myself.

df85ef46

15 Mar, 2021 1 commit

fixup! erp5_core: Make two queries in Base_getRelatedDocumentList · 33117bad

Jérome Perrin authored 4 years ago

My suggestion of using sorted in
nexedi/erp5!1373 (comment 129103) was wrong,
ERP5 document do not sort in deterministic order. Use a key function to make
sure we get results in same order

33117bad

12 Mar, 2021 1 commit

erp5_core: Make two queries in Base_getRelatedDocumentList · aba2d822

Georgios Dagkakis authored 4 years ago

For two reasons:
- In previous version, query would lose Embedded Documents, lacking a left join in follow_up_uid
- In large instances the previous version was many times slower

Also,
erp5_dms: Add a test for Base_getRelatedDocumentList

To check it works correctly with both related (follow_up)
Documents and with sub-object Embedded Files

aba2d822

01 Mar, 2021 1 commit
- erp5_dms: new test for search document by size · a25aed72
  Roque authored 4 years ago
  
  a25aed72
26 Feb, 2021 1 commit

base: don't acquire local role on File · 614ac5e4

Jérome Perrin authored 4 years ago

File is a top-level module document, like PDF, Text Document etc, so it
should have its own security definition and should not acquire local roles.

For cases where files are embedded as sub documents, we are using Embedded
File, which acquire local roles.

614ac5e4

22 Feb, 2021 1 commit

ingestion: extend contribute dialog to support "Publication State" · 14097487

Jérome Perrin authored 4 years ago

Extend "Attach Document" to have the same feature as in "Scan Document"
so that user can directly choose the state when uploading a document.

This simplifies a bit document_scanner, which can reuse the code that is
now in Base_contribute

14097487

20 Jul, 2020 1 commit
- ZODB Components: Migrate Products.ERP5.Document.Document. · 7c72a354
  Arnaud Fontaine authored 4 years ago
  
  7c72a354
05 Jun, 2020 1 commit
- fixup! core: index component reference & text content in full text · 3700e8e1
  Jérome Perrin authored 4 years ago
```
adjust tests now that the test component itself is found by catalog.
```
  3700e8e1
27 May, 2020 1 commit
- ZODB Components: erp5_dms: Migrate ExternalSource and erp5_dms-related Unit Tests from filesystem. · 227402e3
  Arnaud Fontaine authored 5 years ago
  
  227402e3
21 Feb, 2020 1 commit
- Get rid of Products.CMFDefault.{File.File,Portal.CMFSite} · 12e4cc8b
  Bryton Lacquement authored 5 years ago
  
  12e4cc8b
10 Feb, 2020 1 commit
- ZODB Components: Migrate source files related to erp5_base. · 937500a1
  Arnaud Fontaine authored 5 years ago
  
  937500a1
15 Jan, 2020 1 commit
- ZODB Components: Preparation of erp5_base migration from FS: Fix pylint... · 26e3c68b
  Arnaud Fontaine authored 5 years ago
```
ZODB Components: Preparation of erp5_base migration from FS: Fix pylint no-name-in-module on newTempXXX (04b49859).
```
  26e3c68b
18 Nov, 2019 1 commit
- PortalTransforms: safe_html: Changes in b255c894 were not actually applied so... · 8515e0ac
  Arnaud Fontaine authored 5 years ago
```
PortalTransforms: safe_html: Changes in b255c894 were not actually applied so merge FS module and portal_transforms/safe_html.
```
  8515e0ac
05 Sep, 2019 1 commit

OCR: strip trailing spaces from outputs · bf9df74f

Julien Muchembled authored 5 years ago

With a minor update of ghostscript, we again had weird changes:

AssertionError: 'ERP5 is a free software.\n\x0c' != 'ERP5 is a free software.\n\n \n\x0c'

bf9df74f

24 Jul, 2018 1 commit
- testDms.py: Changes to fit the new Tesseract outputs · bfa62b51
  Valentin Benozillo authored 6 years ago
```
/reviewed-on nexedi/erp5!713
```
  bfa62b51
07 Jul, 2017 1 commit

tests: clean up code related to configuration of cloudooo/memcached connectors · f84f4cdb

Julien Muchembled authored 7 years ago

- The conversion server is supposed to be configured in a system preference,
  so do this instead of using a normal preference.
- _getConversionServerDict -> _getConversionServerUrl, to make clear that
  cloudooo is now configured by a url, instead of a host/port couple.
- Refactoring: From the moment where setUpERP5Site() sets up things
  automatically, we don't the "same" duplicated code throughout many test to
  redo cloudooo configuration.
- In the promise file, the volatile/persistent memcached url were swapped.

f84f4cdb

08 Mar, 2017 1 commit
- testDms: s/portal.getPortalObject()/portal/ · b8afaa14
  Vincent Pelletier authored 8 years ago
  
  b8afaa14
27 Jan, 2017 1 commit
- testDms: add test to check for visible text in presentation to image conversion · 8d82d382
  Tristan Cavelier authored 8 years ago
```
/reviewed-on nexedi/erp5!229
```
  8d82d382
23 Dec, 2016 3 commits
- Reduce reliance on Person.reference as a user id. · fe27a9c3
  Vincent Pelletier authored 8 years ago
```
To prepare for moving user id to a different property.
Mostly replacing getReference on Persons with Person_getUserId, and
catalog searches with PAS API when it is meant to search for a user and
not really a person by reference.
```
  fe27a9c3
- tests: Use loginByUserName. · eaa72c7c
  Vincent Pelletier authored 8 years ago
  
  eaa72c7c
- use getId() instead of __str__. · 0eed9896
  Kazuhiko Shiozaki authored 8 years ago
  
  0eed9896
09 Dec, 2016 1 commit
- Use PAS API: take 2. · 62d8d3ac
  Vincent Pelletier authored 8 years ago
  
  62d8d3ac
23 Aug, 2016 1 commit
- *: remove references to old style of configuring cloudooo · be9835f1
  Jérome Perrin authored 8 years ago
```
We now use URL only, old address + port way is obsolete.
```
  be9835f1
22 Aug, 2016 1 commit
- update tests and configurators to handle cloudooo url preferences · c175a161
  Tristan Cavelier authored 8 years ago
  
  c175a161