What is the reverence between lda [doc_bow] and lda.inference (corpus)?
I ran some tests where my ldamodel has 8 themes and here are my results: 2 docs for predicting a theme:
list_unseenTw=[['hope', 'miley', 'blow', 'peopl', 'mind', 'tonight', 'gain', 'million', 'fan'],['@mileycyrustour', "we'r", 'think', "it'", 'pretti', 'cool', 'miley', 'saturday', 'night', 'live', 'tonight', '#prettycool']]
-
Forecast with lda [doc_bow] (it already gives a percentage of the corresponding topic)
doc_bow = [dictionary.doc2bow (text) for text in list_unseenTw] projections = ldamodel [doc_bow]
predictions [0]: [(0, 0.02509002728802024), (1, 0.0250114373070437), (2, 0.025040162139306051), (3, 0.82462688228515812), (4, 0.025150924341817767), (5, 0.025000027675139792), (6, 0.025000024127660267), (7, 0.025080514835853926)]
predictions [1]: [(0, 0.031250011319462589), (1, 0.031250013721820222), (2, 0.031250019639505598), (3, 0.031250015093378707), (4, 0.031250019670816337), (5, 0.0312500248607396 , 0.78124988084026048), (7, 0.031250014854016454)]
-
Prediction with ldamodel.inference (results are weight, not percent)
pred = ldamodel.inference (doc_bow)
print (prev)
(array ([[0.12545023, 0.1250572, 0.12520085, 4.12309694, 0.12579184, 0.12500014, 0.12500012, 0.12540268], [0.12500005, 0.12500005, 0.12500008, 0.12500006, 0.12500008, 0.1250001, 3.12499952, 0.12500006)]])
As you can see, the result for the first prediction (doc1) is the same (topic 3) as you:
total=0
for i in pred[0][0]:
total+=i
4.12309694/total = 0.82462%
source to share