Counting the most used words in a table, dropping stop words
I created a spreadsheet filled with answers that people stated as the first thing that came to their mind when they viewed the photo. I have ~ 1400 entries. Now I want to see what is the most common description.
CREATE TABLE descript (
wordID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
wordText TEXT(50)
)
ENGINE=MyISAM;
INSERT INTO descript VALUES(0,"Big");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"fast");
INSERT INTO descript VALUES(0,"impressive");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"red");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"His");
INSERT INTO descript VALUES(0,"Black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"a black");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"pleasant");
INSERT INTO descript VALUES(0,"the fast");
INSERT INTO descript VALUES(0,"blue");
and on and on ...
I need to make it a line example that is done with this:
select LOWER(wordText) descript;
How can I tell the most common answer and display it? I have some stop words (which I don't want to include in the number, like "a" or "the". How can I not count them?
+3
source to share
3 answers
Basic query:
SELECT lower(wordText) as word, count(*)
FROM descript
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;
You can remove stop words with not in
if you want to include them in your query:
SELECT lower(wordText) as word, count(*)
FROM descript
WHERE lower(wordText) not in ('a', 'the', . . . )
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;
Alternatively, if they are in the table:
SELECT lower(sw.wordText) as word, count(*)
FROM descript d left join
stopwords sw
on d.wordText = sw.word
WHERE sw.word is not null
GROUP BY lower(sw.wordText)
ORDER BY count(*) DESC
LIMIT 1;
You can read about stop words included in MySQL here .
+1
source to share