PDF convert to black and white PNG
I am trying to compress PDFs using iTextSharp. There are many pages with color images stored in JPEG (DCTDECODE) format ... so I convert them to black and white PNG and replace them in the document (PNG is much smaller than JPG for black and white format)
I have the following methods:
private static bool TryCompressPdfImages(PdfReader reader)
{
try
{
int n = reader.XrefSize;
for (int i = 0; i < n; i++)
{
PdfObject obj = reader.GetPdfObject(i);
if (obj == null || !obj.IsStream())
{
continue;
}
var dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
var subType = (PdfName)PdfReader.GetPdfObject(dict.Get(PdfName.SUBTYPE));
if (!PdfName.IMAGE.Equals(subType))
{
continue;
}
var stream = (PRStream)obj;
try
{
var image = new PdfImageObject(stream);
Image img = image.GetDrawingImage();
if (img == null) continue;
using (img)
{
int width = img.Width;
int height = img.Height;
using (var msImg = new MemoryStream())
using (var bw = img.ToBlackAndWhite())
{
bw.Save(msImg, ImageFormat.Png);
msImg.Position = 0;
stream.SetData(msImg.ToArray(), false, PdfStream.NO_COMPRESSION);
stream.Put(PdfName.TYPE, PdfName.XOBJECT);
stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.Put(PdfName.FILTER, PdfName.FLATEDECODE);
stream.Put(PdfName.WIDTH, new PdfNumber(width));
stream.Put(PdfName.HEIGHT, new PdfNumber(height));
stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
stream.Put(PdfName.LENGTH, new PdfNumber(msImg.Length));
}
}
}
catch (Exception ex)
{
Trace.TraceError(ex.ToString());
}
finally
{
// may or may not help
reader.RemoveUnusedObjects();
}
}
return true;
}
catch (Exception ex)
{
Trace.TraceError(ex.ToString());
return false;
}
}
public static Image ToBlackAndWhite(this Image image)
{
image = new Bitmap(image);
using (Graphics gr = Graphics.FromImage(image))
{
var grayMatrix = new[]
{
new[] {0.299f, 0.299f, 0.299f, 0, 0},
new[] {0.587f, 0.587f, 0.587f, 0, 0},
new[] {0.114f, 0.114f, 0.114f, 0, 0},
new [] {0f, 0, 0, 1, 0},
new [] {0f, 0, 0, 0, 1}
};
var ia = new ImageAttributes();
ia.SetColorMatrix(new ColorMatrix(grayMatrix));
ia.SetThreshold((float)0.8); // Change this threshold as needed
var rc = new Rectangle(0, 0, image.Width, image.Height);
gr.DrawImage(image, rc, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, ia);
}
return image;
}
I've tried the COLORSPACE and BITSPERCOMPONENT options, but I always get "Not enough data for the image", "Not enough memory" or "An error exists on this page" when trying to open the resulting PDF ... so I must be doing it wrong. I'm sure FLATEDECODE is the right thing to do.
Any help would be much appreciated.
source to share
Question:
You have PDF with color JPG. For example: image.pdf
If you take a look at this PDF, you will see that the filter is an image stream /DCTDecode
, but a color space /DeviceRGB
.
Now you want to replace the image in PDF so that the result looks like this: image_replaced.pdf
In this PDF file, the filter is /FlateDecode
changed and the color space is changed to /DeviceGray
.
During the conversion process, you want the user to get the PNG format.
Example:
I made you an example that does this conversion: ReplaceImage
I will explain this example step by step:
Step 1: find the image
In my example, I know there is only one image, so I extract quickly PRStream
with the image dictionary and image bytes.
PdfReader reader = new PdfReader(src);
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgRef = xobjects.getKeys().iterator().next();
PRStream stream = (PRStream) xobjects.getAsStream(imgRef);
I go into the dictionary /XObject
from the /Resources
page dictionary on page 1. I take the first XObject I come across, assuming it is an image, and I get that image as an object PRStream
.
Your code is better than mine, but this piece of code is irrelevant to your question and works in the context of my example, so let's ignore the fact that this won't work for other PDFs. What you really need are steps 2 and 3.
Step 2: convert color JPG to black and white PNG
Let's write a method that takes PdfImageObject
and converts it to an object Image
that is changed to gray and saved as PNG:
public static Image makeBlackAndWhitePng(PdfImageObject image) throws IOException, DocumentException {
BufferedImage bi = image.getBufferedImage();
BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
newBi.getGraphics().drawImage(bi, 0, 0, null);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(newBi, "png", baos);
return Image.getInstance(baos.toByteArray());
}
We convert the original image to black and white using standard BufferedImage
manipulations: we draw the original image bi
onto a new image of newBi
type TYPE_USHORT_GRAY
.
Once this is done, you need the bytes of the PNG image. This is also done using the standard ImageIO
functiontiy: we just write BufferedImage
into a byte array saying ImageIO
what we want "png"
.
We can use the received bytes to create an object Image
.
Image img = makeBlackAndWhitePng(new PdfImageObject(stream));
We now have an iText object Image
, but note that the image bytes stored in this object Image
are no longer in PNG format. As mentioned in the comments, PNG is not supported in PDF. iText will change the image bytes in the format that is supported in PDF (see section 4.2.6.2 ABC PDF for details ).
Step 3: replacing the original image stream with a new image stream
We now have an object Image
, but we really need to replace the original image stream with a new one, and we also need to adapt the image dictionary as it /DCTDecode
will change to /FlateDecode
, /DeviceRGB
change to /DeviceGray
, and the value /Length
will also be different.
You create an image stream and its dictionary by hand. It's bold. I leave this assignment for the iText object PdfImage
:
PdfImage image = new PdfImage(makeBlackAndWhitePng(new PdfImageObject(stream)), "", null);
PdfImage
extends PdfStream
and now I can replace the original stream with this new stream:
public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
orig.clear();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
stream.writeContent(baos);
orig.setData(baos.toByteArray(), false);
for (PdfName name : stream.getKeys()) {
orig.put(name, stream.get(name));
}
}
The order in which you do this is important. You don't want the method to setData()
interfere with the length and filter.
Step 4: save the document after replacing the stream
I guess it's not hard to understand this part:
replaceStream(stream, image);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
Problem:
I am not a C # developer. I know PDF from the inside out and I know Java.
- If your problem is caused in step 2, you will have to post another question on how to convert a color JPEG image to a black and white PNG image.
- If your problem is caused in step 3 (for example, because you are using
/DeviceRGB
instead/DeviceGray
), then this answer will solve your problem.
source to share