• francois's avatar
    erp5_receipt_recognition: Execute OCR on receipt image to find total value · 4ba30106
    francois authored
    This commit contain the business template that take a receipt image as a source,
    binarize then segmentize it, and apply OCR on it. It then extract the meaning
    with regular expressions. The image should already be loaded inside the
    image module before it can be read.
    
    The business template contain:
    	* The receipt recognition module
    	* An extension containing the code that binarize, crop and
    	  segmentize the image then analize it.
    	* A new type "Receipt" that contain a source image and the
    	  field that contain the "total" value
    	* A portal skin folder containing the extension externalMethods
    	  aswell as the conversion script that call the recognition and
    	  update the Receipt "total" field
    
    Improvements (not limited to this list):
    	- Easier loading of picture: directly from the receipt page.
    	- Easier loading of picture 2: from phone with OfficeJS
    	  (or any renderJS) application?
    	- Detect when images are sideway and rotate them straight
    	- Better "boxing" and segmentation: some lines are deleted from
    	  the original image during the segmentation when they are too
    	  close from other
    	- Modify the neural network (lstm) to increase weight of signs
    	  like $, euro, / and numbers
    	- Use of a faster/smaller neural network: Most of the time is
    	  lost with the loading of the neural network
    	- Caching the neural network: See previous statement.
    	- Extract currency, date and receipt emettor.
    	- Use a neural network for the meaning extraction?
    4ba30106
template_format_version 1 Byte