How to filter articles by category when using newspaper?

I am trying to use a newspaper library for a simple news scraper. http://newspaper.readthedocs.org/

After getting the list of articles as follows:

cnn_paper = newspaper.build('http://cnn.com')

      

I would like to receive only articles from a specific category. And while I see the categories available, I cannot find a way to filter the articles I have by the categories they were downloaded from.

How to do it?

+3


source to share


1 answer


If I understand correctly, you want to get articles for a given category, then I think it should be something like this (sorry if I was wrong):



import newspaper

cnn_paper = newspaper.build('http://cnn.com')

for category in cnn_paper.category_urls():
    cat_paper = newspaper.build(category)
    print cat_paper.articles #Gives all articles of category
    for article in cat_paper.articles:
        print article.url #prints URL for all articles in given category

      

+2


source







All Articles