Counting the most used words in a table, dropping stop words

I created a spreadsheet filled with answers that people stated as the first thing that came to their mind when they viewed the photo. I have ~ 1400 entries. Now I want to see what is the most common description.

CREATE TABLE descript (
wordID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
wordText TEXT(50)
)
ENGINE=MyISAM;

INSERT INTO descript VALUES(0,"Big");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"fast");
INSERT INTO descript VALUES(0,"impressive");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"red");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"His");
INSERT INTO descript VALUES(0,"Black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"a black");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"pleasant");
INSERT INTO descript VALUES(0,"the fast");
INSERT INTO descript VALUES(0,"blue");

      

and on and on ...

I need to make it a line example that is done with this:

select LOWER(wordText) descript;

      

How can I tell the most common answer and display it? I have some stop words (which I don't want to include in the number, like "a" or "the". How can I not count them?

+3


source to share


3 answers


Basic query:

SELECT lower(wordText) as word, count(*)
FROM descript
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;

      

You can remove stop words with not in

if you want to include them in your query:

SELECT lower(wordText) as word, count(*)
FROM descript
WHERE lower(wordText) not in ('a', 'the', . . . )
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;

      



Alternatively, if they are in the table:

SELECT lower(sw.wordText) as word, count(*)
FROM descript d left join
     stopwords sw
     on d.wordText = sw.word
WHERE sw.word is not null
GROUP BY lower(sw.wordText)
ORDER BY count(*) DESC
LIMIT 1;

      

You can read about stop words included in MySQL here .

+1


source


IF you do

SELECT COUNT(LOWER(wordText)) FROM descript GROUP BY LOWER(wordText);

      

you should be able to see how much of each word is there.



You can add

ORDER BY

      

to order them based on the count of each result

0


source


According to getting the most frequent value, you can use this query.

   SELECT wordText, count(*) FROM descript GROUP BY wordText  ORDER BY count(*) DESC LIMIT 1;

      

0


source







All Articles