How to download pdf file in ruby without .pdf in link

Question

How to download pdf file in ruby without .pdf in link

I need to download a pdf from a site that does not provide a link ending in (.pdf) using ruby. Manually, when I click on the link to download the PDF, it takes me to a new page and after a while a dialog box will open to save / open the file.

Please help me download the file.

Link

0

ruby pdf download

Sushil Jul 24 13 at 19:03

source to share

2 answers

Will you do it

require 'open-uri'
File.open('my_file_name.pdf', "wb") do |file|
  file.write open('http://someurl.com/2013-1-2/somefile/download').read
end

I do this for my projects and it works.

+2

roxxypoxxy 07 Sep At 4:45 am

source to share

Peter Klipfel · Accepted Answer · 2013-07-25T00:14:31+0000

If you just want a simple ruby script, I just run wget

. Like thisexec 'wget "http://path.to.the.file/and/some/params"'

At the same time, you can run wget.

Another way is to just run get on the page you know is in

source = Net::HTTP.get("http://the.website.com", "/and/some/params")

There are several other http clients you could use, but as long as you are making a request to the get

endpoint where the pdf file resides, it should provide you with the raw data. Then you can just rename the file and you will have a pdf

In your case, I ran the following commands to get the pdf

wget http://www.lawcommission.gov.np/en/documents/prevailing-laws/constitution/func-download/129/chk,d8c4644b0f086a04d8d363cb86fb1647/no_html,1/
mv index.html thefile.pdf

Then open the pdf file. Note that these are linux commands. If you want to get a file with a ruby script, you can use something like the one I mentioned earlier.

Update:

There is an additional complication that was not originally outlined, namely that the PDF url changes every time there is an update to the pdf. To make this work, you probably want to do something that involves web scrubbing. I suggest nokogiri . This way, you can look at the page where the download is located and then execute a request to get the url you want. Also, the server hosting the pdf file is misconfigured and breaks chrome within seconds of opening the page.

How to solve this problem: I went to the site and updated it. Then broke the connection to the server (press the X where the refresh button would otherwise be). Then right-click next to the download link and select inspect element

. Then scan the dom to find what ultimately identifies (like id). Luckily I found something <strong id="telecharger"> Download</strong>

. This means you can use something like page.css('strong#telecharger')[0].parent['href']

This should give you the url. Then you can complete the pull request as described above. I don't have time to make the script for you (too much work), but it should be enough to solve the problem.

How to download pdf file in ruby ​​without .pdf in link

More articles:

How to download pdf file in ruby without .pdf in link