Simple web crawler I need to remove duplicate url that is present in an array

I am using an array to store the url and I need to eliminate the url that is present more than once in the array because I no longer need to crawl the same url:

self.level = []  # array where the URL are present 
for link in self.soup.find_all('a'):
    self.level.append(link.get('href'))
    print(self.level)

      

I need to eliminate the duplicate url before crawling that url.

+3


source to share


1 answer


Support set

URLs:



self.level = set()
for link in self.soup.find_all('a'):
    self.level.add(link.get('href'))

      

+7


source







All Articles