How can I get the generated date of a file online (from Python)?
I have a python application that relies on a file that is downloaded by the client from a website.
The website is not under my control and does not have an API to check the "latest version" of the file.
Is there an easy way to access a file (in python) via a url and check its date (or size) without having to download it to the client machine every time?
update: Thanks to those who mentioned the last modified date. This is the correct setting to view.
I don't think I formulated the question well enough. How can I do this using a python script? I want the application to check the file and then download it if (last modified date and current file date).
source to share
Check the Last-Modified header .
EDIT: try urllib2 .
EDIT 2: This short tutorial should give you a pretty good idea of how to achieve your goal.
source to share
Note that "last-modified" may be missing:
>>> from urllib import urlopen >>> f = urlopen ('http://google.com/') >>> i = f.info () >>> i.keys () ['set-cookie', 'expires', 'server', 'connection', 'cache-control', 'date', 'content-type'] >>> i.getdate ('date') (2009, 1, 10, 16, 17, 8, 0, 1, 0) >>> i.getheader ('date') 'Sat, 10 Jan 2009 16:17:08 GMT' >>> i.getdate ('last-modified') >>>
Now you can compare:
if (i.getdate ('last-modified') or i.getheader ('date'))> current_file_date: open ('file', 'w'). write (f.read ())
source to share
In HTTP 1.1, the Content-Disposition section is intended to store this kind of information in a parameter creation-date
(see RFC 2183 ).
source to share
I have built a tool that does this based on etags. Sounds very similar to what you describe:
pfetch is a twisted tool that does this on a schedule and can work with many, many URLs and trigger events on change (after download). It's pretty simple, but it can still be more difficult than you want.
This code is exactly what you are asking for.
So, take your pick. :)
source to share