How do I change the file extension?
I am trying to clear a .xlsx file from the Tax Foundation website . Unfortunately, I keep getting an error message that reads: Excel cannot open the file '2017-FF-For-Website-7-10-2017.xlsx because the file format or file extension is not valid. verify that the file has not been corrupted and that the file extension matches the format of the file
. I did some research and he says the fix is ββto change the file extension to ".xls" instead of ".xlsx". Can anyone help?
from bs4 import BeautifulSoup
import urllib.request
import os
url = urllib.request.urlopen("https://taxfoundation.org/facts-figures-2017/")
soup = BeautifulSoup(url, from_encoding=url.info().get_param('charset'))
FHFA = os.chdir('C:/US_Census/Directory')
seen = set()
for link in soup.find_all('a', href=True):
href = link.get('href')
if not any(href.endswith(x) for x in ['.xlsx']):
continue
file = href.split('/')[-1]
filename = file.rsplit('.', 1)[0]
if filename not in seen: # only retrieve file if it has not been seen before
seen.add(filename) # add the file to the set
url = urllib.request.urlretrieve('https://taxfoundation.org/' + href, file)
print(filename)
print(' ')
print("All files successfully downloaded.")
PS I know you can upload a file, but I am using it to automate a certain process.
source to share
Your problem was your line url = urllib.request.urlretrieve('https://taxfoundation.org/' + href, file)
. If you go to the website and hover over the Excel download button, you will see that there is a much longer link, https://files.taxfoundation.org/20170710170238/2017-FF-For-Website-7-10-2017.xlsx
(notice the 2017....238
?). This way you have never downloaded an Excel file. Here is the correct line for this:
url = urllib.request.urlretrieve(href, file)
Everything else worked correctly.
source to share