How to extract data from PDF in Python using PDFrw

I'm trying to use PDFrw to get data from a specific PDF (say the one in the top right corner of the page HERE). I use PDFrw for this. I went through the documentation they provide (I couldn't find much) and looked at the example code they posted on git, but I can't get enough information to do what I would like to do, How can I make a simple program to log into PDF using PDFrw (Or another one if there is a better one) and extract a specific piece of text. I was thinking about converting it to html ... Would it be easier? Take a look at the PDF I have given above, as an example, I would like to get (say) the voltage that is 600W in PDF ... How would I do this in the simplest way? I couldn't find any other stack overflow issues about this, so hopefully someone can help who has used it before!

Thank!

+3


source to share


1 answer


I am the author of pdfrw and it is not intended for this. You should probably take a look at pdfminer.



+9


source







All Articles