How to filter articles by category when using newspaper?
I am trying to use a newspaper library for a simple news scraper. http://newspaper.readthedocs.org/
After getting the list of articles as follows:
cnn_paper = newspaper.build('http://cnn.com')
I would like to receive only articles from a specific category. And while I see the categories available, I cannot find a way to filter the articles I have by the categories they were downloaded from.
How to do it?
+3
source to share
1 answer
If I understand correctly, you want to get articles for a given category, then I think it should be something like this (sorry if I was wrong):
import newspaper
cnn_paper = newspaper.build('http://cnn.com')
for category in cnn_paper.category_urls():
cat_paper = newspaper.build(category)
print cat_paper.articles #Gives all articles of category
for article in cat_paper.articles:
print article.url #prints URL for all articles in given category
+2
source to share