Node.js How to use url as pdf path to work with pdf2json

I am using node.js and pdf2json parser to parse a PDF file. It is currently working with a local pdf file. But I am trying to get a pdf file via a node.js url / HTTP module and I want to open that file to parse it.

Is it possible to analyze / work with online pdf?

let query   = url.parse(req.url, true).query;
let pdfLink = query.pdf;
...
pdfParser.loadPDF(pdfLink + "");

      

So the url must be specified via the url like: https: // localhost: 8080 /? Pdf = http: //whale-cms.de/pdf.pdf

Is there a way to parse it in an online pdf / link?

Thanks in advance.

+3


source to share


1 answer


I just ran into the same problem and found a solution:

        var request = require('request');
        var PDFParser = require("pdf2json");
        var pdfUrl = "http://localhost:3000/cdn/storage/PDFFiles/sk87bAfiXxPre428b/original/sk87bAfiXxPre428b"
        var pdfParser = new PDFParser();

        var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);

        pdfPipe.on("pdfParser_dataError", err => console.error(err) );
        pdfPipe.on("pdfParser_dataReady", pdf => {
          let usedFieldsInTheDocument = pdfParser.getAllFieldsTypes();
            console.log(usedFieldsInTheDocument)
        });

      



Source: https://github.com/modesty/pdf2json/issues/65
Greetings

+1


source







All Articles