Go to the main article list using the Wikipedia API

I have a list of articles and I want to find the main category of each article.

Wikipedia lists the main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories .

I can find the subcategories of each article using:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml

I can also check if a subcategory is within a category:

http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml

This will tell me if "pets" are a subcategory of "Dog", but that's not exactly what I want. I want to check which is the main category "pets". Is this possible with the API?

+3


source to share


1 answer


First, there is no such thing as a "Wikipedia API". There is the MediaWiki API (web). Knowing this will help you find information about existing tools. https://www.mediawiki.org/wiki/API:Main_Page

Which tells you that there is no API that will do all the category recursion for you. What for? Because 1) it is extremely inefficient, 2) the recursion can go anywhere or never end.



However, there is now Magnus Manske: https://tools.wmflabs.org/catscan2/reverse_tree.php?doit=1&language=en&project=wikipedia&title=Dog&namespace=0 "Maximum depth: 61 levels Total categories in path: 7988" Using this definition, the "root" category for [[Dog]], ie The most distant father category belongs to the "Industry by country" category. Probably not what you expected! However, from the point of view of the English Wikipedia, the root category for any article is always the same, [[Category: Content]].

0


source







All Articles