Tesseract: OCR Exception Exception

I am working on a Spring-MVC application in which I am using Tesseract for OCR. I am getting an Index out of bounds exception for the file I am transferring. Any ideas?

Error log:

et.sourceforge.tess4j.TesseractException: java.lang.IndexOutOfBoundsException
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.testOcr(GroupAttachmentsServiceImpl.java:839)
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.lambda$addAttachment$0(GroupAttachmentsServiceImpl.java:447)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException
    at javax.imageio.stream.FileCacheImageOutputStream.seek(FileCacheImageOutputStream.java:170)
    at net.sourceforge.tess4j.util.ImageIOHelper.getImageByteBuffer(ImageIOHelper.java:297)
    at net.sourceforge.tess4j.Tesseract.setImage(Tesseract.java:397)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
    ... 4 more

      

Code:

 private String testOcr(String fileLocation, int attachId) {
        try {
            File imageFile = new File(fileLocation);
            BufferedImage img = ImageIO.read(imageFile);
            BufferedImage blackNWhite = new BufferedImage(img.getWidth(), img.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
            Graphics2D graphics = blackNWhite.createGraphics();
            graphics.drawImage(img, 0, 0, null);
            String identifier = String.valueOf(new BigInteger(130, random).toString(32));
            String blackAndWhiteImage = previewPath + identifier + ".png";
            File outputfile = new File(blackAndWhiteImage);
            ImageIO.write(blackNWhite, "png", outputfile);

            ITesseract instance = new Tesseract();
            // Point to one folder above tessdata directory, must contain training data
            instance.setDatapath("/usr/share/tesseract-ocr/");
            // ISO 693-3 standard
            instance.setLanguage("deu");
            String result = instance.doOCR(outputfile);
            result = result.replaceAll("[^a-zA-Z0-9ΓΆΓ–Γ€Γ„ΓΌΓœΓŸ@\\s]", "");
            Files.delete(new File(blackAndWhiteImage).toPath());
            GroupAttachments groupAttachments = this.groupAttachmentsDAO.getAttachmenById(attachId);
            System.out.println("OCR Result is "+result);
            if (groupAttachments != null) {
                saveIndexes(result, groupAttachments.getFileName(), null, groupAttachments.getGroupId(), false, attachId);
            }
            return result;
        } catch (Exception e) {
            e.printStackTrace();

        }
        return null;
    }

      

Thank.

+3


source to share


2 answers


Due to a bug in Java Image IO (which was fixed with Java 9), the current version of Java Tesseract Wrapper (3.4.0 as this answer was written) does not work with <Java 9. To work with lower Java versions you can try the following fix for Tesseract ImageIOHelper class. Just make a copy of the class in your project and make the necessary changes and it will work with both files and BufferedImages smoothly.

Note. This version does not use the Tiff optimization used in the original class, it can be added if necessary for your project.



public static ByteBuffer getImageByteBuffer(RenderedImage image) throws IOException {
    //Set up the writeParam
    if (image instanceof BufferedImage) {
        return convertImageData((BufferedImage) image);
    }
    ColorModel cm = image.getColorModel();
    int width = image.getWidth();
    int height = image.getHeight();
    WritableRaster raster = cm
            .createCompatibleWritableRaster(width, height);
    boolean isAlphaPremultiplied = cm.isAlphaPremultiplied();
    Hashtable properties = new Hashtable();
    String[] keys = image.getPropertyNames();
    if (keys != null) {
        for (int i = 0; i < keys.length; i++) {
            properties.put(keys[i], image.getProperty(keys[i]));
        }
    }
    BufferedImage result = new BufferedImage(cm, raster,
            isAlphaPremultiplied, properties);
    image.copyData(raster);
    return convertImageData(result);
}

      

+2


source


Try updating your tess4j version 3.4.1. This solved the problem for me.



0


source







All Articles