Text analysis. Unable to write Python program output to csv or xls file.
Hi I am trying to do sentiment analysis using Naive Bayes classifier in python 2.x. It reads the mood using a txt file and then gives the result as positive or negative based on the settings of the sample txt file. I want the output to be the same form as input, for example. I have a text file to allow 1000 raw moods to sit and I want the result to show positive or negative values ββfor each mood. Please help. Below is the code I am using
import math
import string
def Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string):
y_values = [0,1]
prob_values = [None, None]
for y_value in y_values:
posterior_prob = 1.0
for word in test_string.split():
word = word.lower().translate(None,string.punctuation).strip()
if y_value == 0:
if word not in negative:
posterior_prob *= 0.0
else:
posterior_prob *= negative[word]
else:
if word not in positive:
posterior_prob *= 0.0
else:
posterior_prob *= positive[word]
if y_value == 0:
prob_values[y_value] = posterior_prob * float(total_negative) / (total_negative + total_positive)
else:
prob_values[y_value] = posterior_prob * float(total_positive) / (total_negative + total_positive)
total_prob_values = 0
for i in prob_values:
total_prob_values += i
for i in range(0,len(prob_values)):
prob_values[i] = float(prob_values[i]) / total_prob_values
print prob_values
if prob_values[0] > prob_values[1]:
return 0
else:
return 1
if __name__ == '__main__':
sentiment = open(r'C:/Users/documents/sample.txt')
#Preprocessing of training set
vocabulary = {}
positive = {}
negative = {}
training_set = []
TOTAL_WORDS = 0
total_negative = 0
total_positive = 0
for line in sentiment:
words = line.split()
y = words[-1].strip()
y = int(y)
if y == 0:
total_negative += 1
else:
total_positive += 1
for word in words:
word = word.lower().translate(None,string.punctuation).strip()
if word not in vocabulary and word.isdigit() is False:
vocabulary[word] = 1
TOTAL_WORDS += 1
elif word in vocabulary:
vocabulary[word] += 1
TOTAL_WORDS += 1
#Training
if y == 0:
if word not in negative:
negative[word] = 1
else:
negative[word] += 1
else:
if word not in positive:
positive[word] = 1
else:
positive[word] += 1
for word in vocabulary.keys():
vocabulary[word] = float(vocabulary[word])/TOTAL_WORDS
for word in positive.keys():
positive[word] = float(positive[word])/total_positive
for word in negative.keys():
negative[word] = float(negative[word])/total_negative
test_string = raw_input("Enter the review: \n")
classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string)
if classifier == 0:
print "Negative review"
else:
print "Positive review"
source to share
I checked out the github repository you posted in the comments. I tried to run the project, but I have some errors.
Anyway, I checked the project structure and the file used to train the algorithm for naive bikes and I think the following piece of code can be used to write your result data to an Excel file (i.e. xls)
with open("test11.txt") as f:
for line in f:
classifier = naive_bayes_classifier(positive, negative, total_negative, total_positive, line)
result = 'Positive' if classifier == 0 else 'Negative'
data_to_be_written += ([line, result],)
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('test.xls')
worksheet = workbook.add_worksheet()
# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for item, cost in data_to_be_written:
worksheet.write(row, col, item)
worksheet.write(row, col + 1, cost)
row += 1
workbook.close()
It is fair that for each line of the file with the sentences being tested, I call the classifier and create a structure that will be written in the csv file.
Then let's loop the structure and write the xls file.
For this, I used a python site package named xlsxwriter.
As I said, I have some problems to run the project, so this code is also untested. It should be good, be anyway, if you are in trouble let me know.
Hello
source to share
> with open("test11.txt") as f:
> for line in f:
> classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, line) if classifier == 0:
> f.write(line + 'Negative') else:
> f.write(line + 'Positive')
>
> # result = 'Positive' if classifier == 0 else 'Negative'
> # data_to_be_written += ([line, result],)
>
> # Create a workbook and add a worksheet. workbook = xlsxwriter.Workbook('test.xls') worksheet = workbook.add_worksheet()
>
> # Start from the first cell. Rows and columns are zero indexed. row = 0 col = 0
>
> # Iterate over the data and write it out row by row. for item, cost in f: worksheet.write(row, col, item) worksheet.write(row, col +
> 1, cost) row += 1
>
> workbook.close()
source to share