Sentiment Analysis Using OpenNLP

I am using Apache OpenNLP to analyze sentiment on Yammer chains. The idea here is to classify every conversation into positive, negative, or neutral feelings. The conversation can be one sentence or a group of sentences.

I have 2 models - short sentence classification model and long sentence classification model. The short sentence classification model is trained on shorter sentences (less than 10 words) with a cutoff of 2, and the long sentence classification model is trained on longer sentences with a cutoff of 5.

Here is my approach

  • Read every conversation.
  • Clean it up to remove HTTP URLs, special characters, add a space after a period, etc.
  • Use SentenceDetector to split the conversation into sentences.
  • For each classification of sentences offers. If the sentence is short, then the short sentence classification model is invoked or the long sentence classification model is invoked. The result of the classification of sentences is positive, negative or neutral
  • Summarize the classification of sentences. those. if more positive sentences are found, classify the conversation as positive, negative, or neutral, respectively.

I have a couple of questions related to this approach

  • Do I need two short sentence model and long sentence model. The reason I decided to do this is because the truncation for shorter sentences and longer sentences is different.
  • Is it good to follow a sentence-based classification model and then summarize the results of each sentence to get the conversation result.
  • Is there a standard / better approach to this problem.
+3


source to share


1 answer


I think your approach is really ... trying to create sentiment models on large chunks of text is problematic, so a suggestion based approach seems like a good idea to me.

For long and short sentence models, this seems like a good idea, assuming there is a fairly large difference between the content in the short sentence rather than the long sentence (β€œusually”). You might also consider another generator function for a longer sentence model ... sometimes ngrams (word bigrams) work well to help contextualize content a little more than a normal batch of words would fit.



In terms of output, collapsing the sum might be too tricky to normalize due to the unknown number of sentences in each thread (maybe ...), so I would consider generating basic statistics (min, max, sum, avg, stdev, most) for each class of each model so that you can ask better quality questions about the results (for example, you could write the results into a fuzzy detection index that would allow for multiple use cases)

NTN

0


source







All Articles