Using freebase vectors with gensim

Question

Using freebase vectors with gensim

I am trying to use nested freebase word as released by google but I am having a hard time getting words from freebase name.

model = gensim.models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin',binary=True)
model.vocab.keys()[:10]

Out[22]:
[u'/m/026tg5z',
 u'/m/018jz8',
 u'/m/04klsk',
 u'/m/08gd39',
 u'/m/0kt94',
 u'/m/05mtf0t',
 u'/m/05tjjb',
 u'/m/01m3vn',
 u'/m/0h7p35',
 u'/m/03ggvg3']

Does anyone know if there is any table for mapping freebase views in the words they represent?

Hello,

Hedi

+3

python gensim word2vec freebase

HediBY May 27 '15 at 10:36

source to share

2 answers

Jason · Answer 1 · 2016-08-10T21:27:07+0000

Someone really did a nice thing for all of us and mapped IDs to names in a pretrained model. You can download this model here .

from gensim.models import Word2Vec
model = Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000-en.bin.gz',
                                       binary=True)

Note the additional -en

up .bin

. Then some sample word:

>>> list(model.vocab.keys())[:10] 
['/en/the_final_country', '/en/independent_curators_international', 
'/en/coney_reyes', '/en/scalr', '/en/everyman_palace_theatre', 
'/m/0g55w3s', '/en/waltershausen', '/en/river_frome_stroud', 
'/en/grzegorz_turnau']

Tom morris · Answer 2 · 2015-06-10T16:00:58+0000

These strings are Freebase identifiers, in particular MIDs, not names. You can search for their names using FreeBase MQLRead or Search API and they are included in Freebase data dumps as well.

The first identifier in your example represents British filmmaker Jack Gold. https://www.freebase.com/m/026tg5z

This API call returns JSON with its name:

https://www.googleapis.com/freebase/v1/mqlread?query=[{"id":"/m/026tg5z","name":null}]

Using freebase vectors with gensim

More articles: