How can I archive Wikidata to create a Siri-like service?

I would like to discuss the first part of this Siri-like service.

Ideally, I would like to be able to query things like:

  • "social network"
  • "Beethoven"
  • "bad blood spill quickly"

And get the following results:

{type:"film"}

{type:"composer"}

{type:"song"}

      

I don't care about anything else, I find descriptions, images, and general information completely useless outside of Wikipedia. I see Wikidata as a metadata service that can provide me with the semantics of the text I'm looking for.

Do all data structures have "types" or some property that is related to its value? Is there a list of all types? Is there a suggestion function for objects that have double meanings, like "apple"? Finally, how can I send a text request and read the "type" of the response data structure?

I know I am not providing any code, but I really cannot wrap the Wikidata API around. I've searched everywhere and all I can't seem to find are some garbled sampling examples and messed up Objective-C HTML parsers. I can't even get the "sample request" page to work due to some error that I don't understand.

Really newbie not friendly and full of heavy terminology.

+3


source to share


1 answer


The problem with the Wikidata API is that it doesn't have a query interface. All it does is return information for a specific item, if you already know the ID. We just couldn't create a query interface that is powerful enough and scalable. There is an early beta of the SPARQL endpoint: https://tools.wmflabs.org/ppp-sparql/ .

Once that happens, we hope to provide easier-to-use services on top of this, such as Magnus WDQ http://magnusmanske.de/wordpress/?p=72 .


(Edit to answer specific questions about the API :)

I have searched everywhere and all I cannot find are some examples of crippled examples

The documentation might be nicer, but https://www.wikidata.org/wiki/Wikidata:Data_access is a good start. Also note that https://www.wikidata.org/w/api.php is self-documenting. Specifically take a look at https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities and https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities

Do all data structures have "types" or some kind of property that is associated with its value?

All statements about a data item relate to its meaning. Many have a statement about an "instance" (P31) or "subclass" (P279) property that I think is pretty close to what you want.

Is there a list of all types?

Not. Wikidata does not use a closed, predefined ontology to describe the world. It is a platform for describing the world in aggregate, in a machine-readable manner; from this emerges a fluid ontology that is never complete or consistent.

Any item of data can serve as a class or suprt-class of another item. An element can be an instance or a subclass of several classes. The relationship is pretty complicated.



Is there a suggestion function for objects that have a double meaning, like "apple"?

There is a search interface that can display all relevant data items for a given term. It's called wbsearchentities

for example https://www.wikidata.org/w/api.php?action=wbsearchentities&search=apple&language=en (add format=json

for machine readable JSON).

However, the ranking is very naive as a result. And without the semantic context of the original sentence, there is no way to find the meaning of a word. This is an interesting area of ​​research called β€œsemantic meaning of the meaning of a word”.

Finally, how can I send a text request and read the "type" of the response data structure?

At the moment, you will need to make two API calls, one wbsearchentities

to get the ID of the object you are interested in, and one wbgetentities

to get the instruction instance for that object. It would be nice to combine this into one call; a ticket is open for this: https://phabricator.wikimedia.org/T90693


As for Siri-like services, an early prototype called "wiri" by Magnus Mansk has been around for a long time. It uses very simple templates: https://tools.wmflabs.org/magnus-toolserver/thetalkpage/

Bene * is working on a more advanced approach to natural language answering, see Platypus demographics: https://projetpp.github.io/demo.html

Just yesterday, he presented a new prototype that he was developing with Tpt that generates SPARQL queries from natural language input: https://tools.wmflabs.org/ppp-sparql/

All of these projects are open source and created by enthusiastic volunteers. Take a look at the code and talk to them. :)

+6


source







All Articles