Simple web crawler I need to remove duplicate url that is present in an array

Question

Simple web crawler I need to remove duplicate url that is present in an array

I am using an array to store the url and I need to eliminate the url that is present more than once in the array because I no longer need to crawl the same url:

self.level = []  # array where the URL are present 
for link in self.soup.find_all('a'):
    self.level.append(link.get('href'))
    print(self.level)

I need to eliminate the duplicate url before crawling that url.

+3

python web-crawler web-scraping

mans Dec 31. 15 at 5:51 am

source to share

1 answer

alecxe · Answer 1 · 2014-12-31T05:52:25+0000

Support set

URLs:

self.level = set()
for link in self.soup.find_all('a'):
    self.level.add(link.get('href'))

Simple web crawler I need to remove duplicate url that is present in an array

More articles: