PDF convert to black and white PNG

I am trying to compress PDFs using iTextSharp. There are many pages with color images stored in JPEG (DCTDECODE) format ... so I convert them to black and white PNG and replace them in the document (PNG is much smaller than JPG for black and white format)

I have the following methods:

    private static bool TryCompressPdfImages(PdfReader reader)
    {
        try
        {
            int n = reader.XrefSize;
            for (int i = 0; i < n; i++)
            {
                PdfObject obj = reader.GetPdfObject(i);
                if (obj == null || !obj.IsStream())
                {
                    continue;
                }

                var dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
                var subType = (PdfName)PdfReader.GetPdfObject(dict.Get(PdfName.SUBTYPE));
                if (!PdfName.IMAGE.Equals(subType))
                {
                    continue;
                }

                var stream = (PRStream)obj;
                try
                {
                    var image = new PdfImageObject(stream);

                    Image img = image.GetDrawingImage();
                    if (img == null) continue;

                    using (img)
                    {
                        int width = img.Width;
                        int height = img.Height;

                        using (var msImg = new MemoryStream())
                        using (var bw = img.ToBlackAndWhite())
                        {
                            bw.Save(msImg, ImageFormat.Png);
                            msImg.Position = 0;
                            stream.SetData(msImg.ToArray(), false, PdfStream.NO_COMPRESSION);
                            stream.Put(PdfName.TYPE, PdfName.XOBJECT);
                            stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
                            stream.Put(PdfName.FILTER, PdfName.FLATEDECODE);
                            stream.Put(PdfName.WIDTH, new PdfNumber(width));
                            stream.Put(PdfName.HEIGHT, new PdfNumber(height));
                            stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
                            stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
                            stream.Put(PdfName.LENGTH, new PdfNumber(msImg.Length));
                        }
                    }
                }
                catch (Exception ex)
                {
                    Trace.TraceError(ex.ToString());
                }
                finally
                {
                    // may or may not help      
                    reader.RemoveUnusedObjects();
                }
            }
            return true;
        }
        catch (Exception ex)
        {
            Trace.TraceError(ex.ToString());
            return false;
        }
    }

    public static Image ToBlackAndWhite(this Image image)
    {
        image = new Bitmap(image);
        using (Graphics gr = Graphics.FromImage(image))
        {
            var grayMatrix = new[]
            {
                new[] {0.299f, 0.299f, 0.299f, 0, 0},
                new[] {0.587f, 0.587f, 0.587f, 0, 0},
                new[] {0.114f, 0.114f, 0.114f, 0, 0},
                new [] {0f, 0, 0, 1, 0},
                new [] {0f, 0, 0, 0, 1}
            };

            var ia = new ImageAttributes();
            ia.SetColorMatrix(new ColorMatrix(grayMatrix));
            ia.SetThreshold((float)0.8); // Change this threshold as needed
            var rc = new Rectangle(0, 0, image.Width, image.Height);
            gr.DrawImage(image, rc, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, ia);
        }
        return image;
    }

      

I've tried the COLORSPACE and BITSPERCOMPONENT options, but I always get "Not enough data for the image", "Not enough memory" or "An error exists on this page" when trying to open the resulting PDF ... so I must be doing it wrong. I'm sure FLATEDECODE is the right thing to do.

Any help would be much appreciated.

+2


source to share


1 answer


Question:

You have PDF with color JPG. For example: image.pdf

If you take a look at this PDF, you will see that the filter is an image stream /DCTDecode

, but a color space /DeviceRGB

.

Now you want to replace the image in PDF so that the result looks like this: image_replaced.pdf

In this PDF file, the filter is /FlateDecode

changed and the color space is changed to /DeviceGray

.

During the conversion process, you want the user to get the PNG format.

Example:

I made you an example that does this conversion: ReplaceImage

I will explain this example step by step:

Step 1: find the image

In my example, I know there is only one image, so I extract quickly PRStream

with the image dictionary and image bytes.

PdfReader reader = new PdfReader(src);
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgRef = xobjects.getKeys().iterator().next();
PRStream stream = (PRStream) xobjects.getAsStream(imgRef);

      

I go into the dictionary /XObject

from the /Resources

page dictionary on page 1. I take the first XObject I come across, assuming it is an image, and I get that image as an object PRStream

.

Your code is better than mine, but this piece of code is irrelevant to your question and works in the context of my example, so let's ignore the fact that this won't work for other PDFs. What you really need are steps 2 and 3.

Step 2: convert color JPG to black and white PNG

Let's write a method that takes PdfImageObject

and converts it to an object Image

that is changed to gray and saved as PNG:

public static Image makeBlackAndWhitePng(PdfImageObject image) throws IOException, DocumentException {
    BufferedImage bi = image.getBufferedImage();
    BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
    newBi.getGraphics().drawImage(bi, 0, 0, null);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ImageIO.write(newBi, "png", baos);
    return Image.getInstance(baos.toByteArray());
}

      

We convert the original image to black and white using standard BufferedImage

manipulations: we draw the original image bi

onto a new image of newBi

type TYPE_USHORT_GRAY

.



Once this is done, you need the bytes of the PNG image. This is also done using the standard ImageIO

functiontiy: we just write BufferedImage

into a byte array saying ImageIO

what we want "png"

.

We can use the received bytes to create an object Image

.

Image img = makeBlackAndWhitePng(new PdfImageObject(stream));

      

We now have an iText object Image

, but note that the image bytes stored in this object Image

are no longer in PNG format. As mentioned in the comments, PNG is not supported in PDF. iText will change the image bytes in the format that is supported in PDF (see section 4.2.6.2 ABC PDF for details ).

Step 3: replacing the original image stream with a new image stream

We now have an object Image

, but we really need to replace the original image stream with a new one, and we also need to adapt the image dictionary as it /DCTDecode

will change to /FlateDecode

, /DeviceRGB

change to /DeviceGray

, and the value /Length

will also be different.

You create an image stream and its dictionary by hand. It's bold. I leave this assignment for the iText object PdfImage

:

PdfImage image = new PdfImage(makeBlackAndWhitePng(new PdfImageObject(stream)), "", null);

      

PdfImage

extends PdfStream

and now I can replace the original stream with this new stream:

public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
    orig.clear();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    stream.writeContent(baos);
    orig.setData(baos.toByteArray(), false);
    for (PdfName name : stream.getKeys()) {
        orig.put(name, stream.get(name));
    }
}

      

The order in which you do this is important. You don't want the method to setData()

interfere with the length and filter.

Step 4: save the document after replacing the stream

I guess it's not hard to understand this part:

replaceStream(stream, image);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();

      

Problem:

I am not a C # developer. I know PDF from the inside out and I know Java.

  • If your problem is caused in step 2, you will have to post another question on how to convert a color JPEG image to a black and white PNG image.
  • If your problem is caused in step 3 (for example, because you are using /DeviceRGB

    instead /DeviceGray

    ), then this answer will solve your problem.
+5


source







All Articles