Using freebase vectors with gensim

I am trying to use nested freebase word as released by google but I am having a hard time getting words from freebase name.

model = gensim.models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin',binary=True)
model.vocab.keys()[:10]

Out[22]:
[u'/m/026tg5z',
 u'/m/018jz8',
 u'/m/04klsk',
 u'/m/08gd39',
 u'/m/0kt94',
 u'/m/05mtf0t',
 u'/m/05tjjb',
 u'/m/01m3vn',
 u'/m/0h7p35',
 u'/m/03ggvg3']

      

Does anyone know if there is any table for mapping freebase views in the words they represent?

Hello,

Hedi

+3


source to share


2 answers


Someone really did a nice thing for all of us and mapped IDs to names in a pretrained model. You can download this model here .

from gensim.models import Word2Vec
model = Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000-en.bin.gz',
                                       binary=True)

      



Note the additional -en

up .bin

. Then some sample word:

>>> list(model.vocab.keys())[:10] 
['/en/the_final_country', '/en/independent_curators_international', 
'/en/coney_reyes', '/en/scalr', '/en/everyman_palace_theatre', 
'/m/0g55w3s', '/en/waltershausen', '/en/river_frome_stroud', 
'/en/grzegorz_turnau']

      

+2


source


These strings are Freebase identifiers, in particular MIDs, not names. You can search for their names using FreeBase MQLRead or Search API and they are included in Freebase data dumps as well.

The first identifier in your example represents British filmmaker Jack Gold. https://www.freebase.com/m/026tg5z



This API call returns JSON with its name:

https://www.googleapis.com/freebase/v1/mqlread?query=[{"id":"/m/026tg5z","name":null}] 

      

+1


source







All Articles