How can I get the generated date of a file online (from Python)?

I have a python application that relies on a file that is downloaded by the client from a website.

The website is not under my control and does not have an API to check the "latest version" of the file.

Is there an easy way to access a file (in python) via a url and check its date (or size) without having to download it to the client machine every time?

update: Thanks to those who mentioned the last modified date. This is the correct setting to view.

I don't think I formulated the question well enough. How can I do this using a python script? I want the application to check the file and then download it if (last modified date and current file date).

+1


source to share


5 answers


Check the Last-Modified header .

EDIT: try urllib2 .



EDIT 2: This short tutorial should give you a pretty good idea of ​​how to achieve your goal.

+4


source


There is no reliable way to do this. As far as you know, the file can be created "on the fly" by the web server, and the question "how old is this file" does not make sense. The web server can choose to provide the Last-Modified header, but it can tell you what it wants.



+5


source


Note that "last-modified" may be missing:

>>> from urllib import urlopen
>>> f = urlopen ('http://google.com/')
>>> i = f.info ()
>>> i.keys ()
['set-cookie', 'expires', 'server', 'connection', 'cache-control', 'date', 'content-type']
>>> i.getdate ('date')
(2009, 1, 10, 16, 17, 8, 0, 1, 0)
>>> i.getheader ('date')
'Sat, 10 Jan 2009 16:17:08 GMT'
>>> i.getdate ('last-modified')
>>>

Now you can compare:

if (i.getdate ('last-modified') or i.getheader ('date'))> current_file_date:
  open ('file', 'w'). write (f.read ())
+3


source


In HTTP 1.1, the Content-Disposition section is intended to store this kind of information in a parameter creation-date

(see RFC 2183 ).

+2


source


I have built a tool that does this based on etags. Sounds very similar to what you describe:

pfetch is a twisted tool that does this on a schedule and can work with many, many URLs and trigger events on change (after download). It's pretty simple, but it can still be more difficult than you want.

This code is exactly what you are asking for.

So, take your pick. :)

0


source







All Articles