How to download pdf file in ruby without .pdf in link
If you just want a simple ruby script, I just run wget
. Like thisexec 'wget "http://path.to.the.file/and/some/params"'
At the same time, you can run wget.
Another way is to just run get on the page you know is in
source = Net::HTTP.get("http://the.website.com", "/and/some/params")
There are several other http clients you could use, but as long as you are making a request to the get
endpoint where the pdf file resides, it should provide you with the raw data. Then you can just rename the file and you will have a pdf
In your case, I ran the following commands to get the pdf
wget http://www.lawcommission.gov.np/en/documents/prevailing-laws/constitution/func-download/129/chk,d8c4644b0f086a04d8d363cb86fb1647/no_html,1/
mv index.html thefile.pdf
Then open the pdf file. Note that these are linux commands. If you want to get a file with a ruby script, you can use something like the one I mentioned earlier.
Update:
There is an additional complication that was not originally outlined, namely that the PDF url changes every time there is an update to the pdf. To make this work, you probably want to do something that involves web scrubbing. I suggest nokogiri . This way, you can look at the page where the download is located and then execute a request to get the url you want. Also, the server hosting the pdf file is misconfigured and breaks chrome within seconds of opening the page.
How to solve this problem: I went to the site and updated it. Then broke the connection to the server (press the X where the refresh button would otherwise be). Then right-click next to the download link and select inspect element
. Then scan the dom to find what ultimately identifies (like id). Luckily I found something <strong id="telecharger"> Download</strong>
. This means you can use something like page.css('strong#telecharger')[0].parent['href']
This should give you the url. Then you can complete the pull request as described above. I don't have time to make the script for you (too much work), but it should be enough to solve the problem.
source to share