Teeno graph calculator is slower than numpy
I am learning to use anano. I want to populate a term-document (matrix sparse matrix) matrix by calculating a binary TF-IDF for each element inside it:
import theano
import theano.tensor as T
import numpy as np
from time import perf_counter
def tfidf_gpu(appearance_in_documents,num_documents,document_words):
start = perf_counter()
APP = T.scalar('APP',dtype='int32')
N = T.scalar('N',dtype='int32')
SF = T.scalar('S',dtype='int32')
F = (T.log(N)-T.log(APP)) / SF
TFIDF = theano.function([N,APP,SF],F)
ret = TFIDF(num_documents,appearance_in_documents,document_words)
end = perf_counter()
print("\nTFIDF_GPU ",end-start," secs.")
return ret
def tfidf_cpu(appearance_in_documents,num_documents,document_words):
start = perf_counter()
tfidf = (np.log(num_documents)-np.log(appearance_in_documents))/document_words
end = perf_counter()
print("TFIDF_CPU ",end-start," secs.\n")
return tfidf
But the numpy version is much faster than theano's implementation:
Progress 1/43
TFIDF_GPU 0.05702276699594222 secs.
TFIDF_CPU 1.454801531508565e-05 secs.
Progress 2/43
TFIDF_GPU 0.023830442980397493 secs.
TFIDF_CPU 1.1073017958551645e-05 secs.
Progress 3/43
TFIDF_GPU 0.021920352999586612 secs.
TFIDF_CPU 1.0738993296399713e-05 secs.
Progress 4/43
TFIDF_GPU 0.02303648801171221 secs.
TFIDF_CPU 1.1675001587718725e-05 secs.
Progress 5/43
TFIDF_GPU 0.02359767400776036 secs.
TFIDF_CPU 1.4385004760697484e-05 secs.
....
I read that it could be due to overhead, which for small operations can lead to poor performance.
Is my code bad or should I avoid using the GPU due to overhead?
source to share
The point is that you collect your Theano function every time. Compilation takes time. Try to pass the compiled function like this:
def tfidf_gpu(appearance_in_documents,num_documents,document_words,TFIDF):
start = perf_counter()
ret = TFIDF(num_documents,appearance_in_documents,document_words)
end = perf_counter()
print("\nTFIDF_GPU ",end-start," secs.")
return ret
APP = T.scalar('APP',dtype='int32')
N = T.scalar('N',dtype='int32')
SF = T.scalar('S',dtype='int32')
F = (T.log(N)-T.log(APP)) / SF
TFIDF = theano.function([N,APP,SF],F)
tfidf_gpu(appearance_in_documents,num_documents,document_words,TFIDF)
Also your TFIDF task is a bandwidth intensive task. Theano and GPUs in general are best suited for intensive computing.
The current task will be a significant overhead, taking data to and from the GPU, because in the end you will need to read each element O (1) times. But if you want to do more computation, it makes sense to use a GPU.
source to share