Combine many PDF files into one PDF file in java web application
I have a lot of PDFs and need to combine all pdfs into one big pdf and render it in the browser. I am using itext. Using this, I can merge the PDFs into one file per disk, but I cannot merge in the browser and there is only the latest PDF in the browser. Below is my code .. please help me on this.
Thanks in advance.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
ServletOutputStream servletOutPutStream = response.getOutputStream();;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();;
InputStream is=null;
List<InputStream> inputPdfList = new ArrayList<InputStream>();
System.err.println(imageMap.size());
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
response.setContentType("application/pdf");
response.setContentLength(byteArrayOutputStream.size());
System.out.println(inputPdfList.size()+""+inputPdfList.toString());
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
System.out.println("Pdf files merged successfully.");
There are many errors in your code:
Just write to the response output stream what you want to return to the browser
Your code writes a wild set of data to the response output stream:
ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]
servletOutPutStream.flush();
document.close();
servletOutPutStream.close();
The result is many copies of the items imageMap
that need to be written there, and the merged file will only be added after that.
What do you expect the browser to ignore all leading copies of PDFs until finally the merged PDF appears?
Thus, please only write the merged PDF to the response output stream.
Don't write the wrong content length
It's a good idea to write the length of the content for the response ... but only if you are using the correct value!
In your code, you write down the length of the content:
response.setContentLength(byteArrayOutputStream.size());
but byteArrayOutputStream
at this time only contains a wild mix of copies of the original PDFs and not yet the final merged PDF. Thus, it will only confuse the browser even more.
Thus, do not add false headers to the response.
Don't interfere with the input
In a loop
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
you take arrays byte
that I believe contain one original PDF, pollute the response stream with them (as mentioned earlier) and create a collection of input streams where the first contains the first original PDF, the second contains the concatenation of the first two original PDFs, the third - combining the first three original PDF files, etc.
Since you never reset or re-instantiate it byteArrayOutputStream
, it gets bigger and bigger.
So please start or end loops like this reset byteArrayOutputStream
.
(Actually you don't need this loop at all, PdfReader
has a constructor to take right away byte[]
, no need to wrap it in a stream of bytes.)
Don't combine PDFs with simple PdfWriter
, usePdfCopy
You are merging PDFs using the PdfWriter
/ getImportedPage
/ approach addTemplate
. There are tons of Stack Overflow questions and answers (many are answered by the iText developers) explaining that this is usually a bad idea and what you should be using PdfCopy
.
Thus, please use many good answers that already exist in this thread and use PdfCopy
to merge.
Don't clear or close streams just because you can
You terminate the output of the response by closing multiple threads:
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
I have not seen the line where you declared or set this variable outputStream
, but even if it contains the response output stream, there is no need to close it because you are already closing it in a variable servletOutPutStream
.
Thus, remove unnecessary calls like this.