Node.js How to use url as pdf path to work with pdf2json
I am using node.js and pdf2json parser to parse a PDF file. It is currently working with a local pdf file. But I am trying to get a pdf file via a node.js url / HTTP module and I want to open that file to parse it.
Is it possible to analyze / work with online pdf?
let query = url.parse(req.url, true).query;
let pdfLink = query.pdf;
...
pdfParser.loadPDF(pdfLink + "");
So the url must be specified via the url like: https: // localhost: 8080 /? Pdf = http: //whale-cms.de/pdf.pdf
Is there a way to parse it in an online pdf / link?
Thanks in advance.
+3
source to share
1 answer
I just ran into the same problem and found a solution:
var request = require('request');
var PDFParser = require("pdf2json");
var pdfUrl = "http://localhost:3000/cdn/storage/PDFFiles/sk87bAfiXxPre428b/original/sk87bAfiXxPre428b"
var pdfParser = new PDFParser();
var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);
pdfPipe.on("pdfParser_dataError", err => console.error(err) );
pdfPipe.on("pdfParser_dataReady", pdf => {
let usedFieldsInTheDocument = pdfParser.getAllFieldsTypes();
console.log(usedFieldsInTheDocument)
});
Source:
https://github.com/modesty/pdf2json/issues/65
Greetings
+1
source to share