How do I get from the Mediawiki API all images in a category that are not in another?
I'm completely new to the API, so sorry if the question is stupid.
I would like to get all the images in a category in Commons let say X, but exclude those that are also in the other (Y). I don't understand if I can actually do this.
will get all of them, how to exclude some of them?
Also, I would like to be able to get a description of the images as a result, not just the filename?
source to share
AFAIK, there is no way to get this directly using the API. But assuming both categories are small enough, you can get all the images from both of them and then compute the padding in your code.
To get a description, you can use prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription
.
In the context of your example request, it would look like this:
source to share
MediaWiki has - by default - no built-in support for category building and intersection queries. To accomplish this task, extensions or external tools or multiple API requests and processing of the results are not required.
CirrusSearch API
On Wikimedia Commons, as well as the Wikimedia Wiki farm in general, CirrusSearch provides filtering searches, including category intersection searches, and also accessible via the API ( action=query&list=search&srsearch=incategory:A+-incategory:B
that's a Category:A
minus Category:B
).
FastCCI
One tool I can recommend (because it is a dedicated high performance solution and actually works) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, a database already exists and a webservice is running, but it can be configured for any wiki, as long as the toolbox has a host to run and access the database.
Query
Consider the following request url:
https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js
- https://fastcci.wmflabs.org/ - Wikimedia Commons host fastcci runs on
- c1 - category 1 identifier
- c2 - category 2 identifier
- d1 - depth of category 1 to search (fastcci considers subcategories by default)
- d2 - depth of category 2 to search (fastcci considers subcategories by default)
- s - The number or results to return
- o - Offset
- a - union
- t - connection type (
t=js
for JSONP response, otherwise it is assumed to be used as websocket)
Response
fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );
RESULT
followed by a |
highlighted list of 50 whole shape triplets pageId,depth,tag
. Each triplet means one image or category
Resources
- Example client side implementation - to see it in action, just visit any category and next to the button
Good pictures
on any category page .- Example
FilesOf('Category:Saaleck')
-FilesOf('Category:Rapeseed fields in Saxony-Anhalt')
- Example
- Server application
- YouTube presentation
- Slides
Note on pageIDs
- page identifiers -> page names:
GET
/w/api.php?action=query&pageids=page_IDs_separated_by_pipe
- page names -> page identifiers:
GET
/w/api.php?action=query&titles=Titles_separated_by_pipe
source to share