ocr ( string pathImage , string pathImageMagick , string _pathTesseract , string _debug ) : string
Read the text in an image by an OCR processus.
Returns null if there is an OCR error.
Note: Please note that if the text boxes are facing each other, the lines of the different texts may appear mixed. Partition the image into sub-blocks of text to avoid this.
Note 2: With artificial intelligence image processing functions, be sure to protect your functions from very large images as this processing would then take a long time. To do this use resizeImage.
Installation
To use this function, you need first install Tesseract (UB Mannheim Binaries) and ImageMagick.Example
text = ocr(path("desktop")+"invoice.jpg", path("program_files")+"ImageMagick-7.1.0-Q16-HDRI\\magick.exe")
Problems
If you parallel the function, beware of CPU overloads. If you run several instances of Tesseract in parallel, then Tesseract may in some cases mishandle multi-instances and do multiple alternations between processes, so that the whole thing takes much longer. To avoid this behavior, use getPerformanceStats to check if you are not over 90% CPU. This problem seems to be present on Linux and not on Windows.See also
partitionImagestraightenImage
pdfToImage
Parameters
pathImage
the path of the image to read
pathImageMagick
This argument is mandatory for Windows but not for Linux. You need to set the path to the magick.exe of ImageMagick. Ex: C:\Program Files\ImageMagick-7.1.0-Q16-HDRI\magick.exe.
_pathTesseract (optional)
The path to the Tesseract FOLDER. Ex: C:\Program Files\Tesseract-OCR.
_debug (optional)
Display some debug message.