Building Portable Tesseract OCR Libraries on Linux

Is there a way to build and use the Tesseract library and the corresponding Leptonica library (because Tesseract depends on Leptonica) like this can be done on Windows?

I have compiled these libraries according to their instructions, but it seems to libtesseract.so.3.0.2

contain a fixed path to the Leptonica shared library:

$ ldd libtesseract.so.3.0.2

linux-vdso.so.1 =>  (0x00007fffbc5ff000)
**liblept.so.4 => /usr/local/lib/liblept.so.4 (0x00007fa8400fd000)**
libpng12.so.0 => /usr/lib64/libpng12.so.0 (0x00007fa83fcae000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007fa83fa5e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa83f5e4000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fa83f2de000)
libm.so.6 => /lib64/libm.so.6 (0x00007fa83f059000)
libc.so.6 => /lib64/libc.so.6 (0x00007fa83ecc5000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa83eaaf000)
/lib64/ld-linux-x86-64.so.2 (0x0000003080200000)

      

This results in an OSError when running the application on a workstation where Leptonica is not installed:

OSError: liblept.so.4: cannot open shared object file: No such file or directory

      

A typical use case looks like this (tesseract and leptonica libraries are in the same folder):

import ctypes
import os
import sys

lang = 'eng'

os.putenv('TESSDATA_PREFIX', ".")
tessdata = os.environ.get('TESSDATA_PREFIX')
tess_libpath = "."
tess_libname = "libtesseract.so.3.0.2"
# tess_libname = "libtesseract302.dll" works in Windows, no need to add a leponica library file

os.environ["PATH"] += os.pathsep + tess_libpath
tesseract = None

try:
    tesseract = ctypes.cdll.LoadLibrary(os.path.join(tess_libpath, tess_libname))
except OSError, err:
    raise

class _TessBaseAPI(ctypes.Structure): pass
TessBaseAPI = ctypes.POINTER(_TessBaseAPI)
tesseract.TessBaseAPICreate.restype = TessBaseAPI
tesseract.TessBaseAPIDelete.restype = None
tesseract.TessBaseAPIDelete.argtypes = [TessBaseAPI]
tesseract.TessBaseAPIInit3.argtypes = [TessBaseAPI,
                                   ctypes.c_char_p,
                                   ctypes.c_char_p]
tesseract.TessBaseAPISetImage.restype = None
tesseract.TessBaseAPISetImage.argtypes = [TessBaseAPI,
                                      ctypes.c_void_p,
                                      ctypes.c_int,
                                      ctypes.c_int,
                                      ctypes.c_int,
                                      ctypes.c_int]
 tesseract.TessBaseAPIGetUTF8Text.restype = ctypes.c_char_p
 tesseract.TessBaseAPIGetUTF8Text.argtypes = [TessBaseAPI]

      

I tried to add parameters --disable-shared --enable-static

when setting up Tesseract but it didn't work.

In my case, the target OS is CentOS 6.5, but I would appreciate any general answer.

+3


source to share


1 answer


ldd

can't tell you if an absolute path exists in the library. Instead, it uses the standard shared library search path and prints out what it finds.

To check if the download will work from a different folder try this:



> mkdir tmp
> cd tmp
> cp /usr/local/lib/liblept.so.4
> LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH ldd libtesseract.so.3.0.2

      

Now it should show liblept.so.4

in the folder tmp

.

+1


source







All Articles