How do I get from the Mediawiki API all images in a category that are not in another?

I'm completely new to the API, so sorry if the question is stupid.

I would like to get all the images in a category in Commons let say X, but exclude those that are also in the other (Y). I don't understand if I can actually do this.

https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:X

will get all of them, how to exclude some of them?

Also, I would like to be able to get a description of the images as a result, not just the filename?

+3


source to share


2 answers


AFAIK, there is no way to get this directly using the API. But assuming both categories are small enough, you can get all the images from both of them and then compute the padding in your code.

To get a description, you can use prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription

.



In the context of your example request, it would look like this:

https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtype=file&gcmtitle=Category:X&prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription

+1


source


MediaWiki has - by default - no built-in support for category building and intersection queries. To accomplish this task, extensions or external tools or multiple API requests and processing of the results are not required.

CirrusSearch API

On Wikimedia Commons, as well as the Wikimedia Wiki farm in general, CirrusSearch provides filtering searches, including category intersection searches, and also accessible via the API ( action=query&list=search&srsearch=incategory:A+-incategory:B

that's a Category:A

minus Category:B

).

FastCCI

One tool I can recommend (because it is a dedicated high performance solution and actually works) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, a database already exists and a webservice is running, but it can be configured for any wiki, as long as the toolbox has a host to run and access the database.

FastCCI in action

Query

Consider the following request url:

https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js



  • https://fastcci.wmflabs.org/ - Wikimedia Commons host fastcci runs on
  • c1 - category 1 identifier
  • c2 - category 2 identifier
  • d1 - depth of category 1 to search (fastcci considers subcategories by default)
  • d2 - depth of category 2 to search (fastcci considers subcategories by default)
  • s - The number or results to return
  • o - Offset
  • a - union
  • t - connection type ( t=js

    for JSONP response, otherwise it is assumed to be used as websocket)

Response

fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );

      

RESULT

followed by a |

highlighted list of 50 whole shape triplets pageId,depth,tag

. Each triplet means one image or category

Resources

Note on pageIDs

0


source







All Articles