Flux tensor sensor designation

Can someone explain to me what the Tensorflow BoW Encoder does / returns? I would expect to get a vector of word counts per document (e.g. in sklearn), however it seems to do something a little more quirky.

In this example:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py

features = encoders.bow_encoder(
  features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE)

      

'Embed_dim' is passed in and I also don't understand what this does in the context of BoW encoding. The documentation is unfortunately not very helpful. I could try to work through the Tensorflow code, however, I would appreciate a high level explanation.

+3


source to share


1 answer


In the classical BOW model, each word is represented by an identifier (sparse vectors). Bow_encoder maps these sparse vectors to another layer at the size specified by "embed_dim". bow_encoder is used to learn dense vector representation for a word or text (for example, in the word2vec model).

From tensorflow documentation about bow_encoder: Msgstr "Maps a sequence of characters to a vector for example by averaging the attachments."



Thus: If the input to bow_encoder is a single word, it is simply mapped to the inline layer. Although the sentence (or text) is displayed word by word, the final embedded vector is averaged.

0


source







All Articles