I need to count the number of different lines where a word appears
I still have
SELECT
word, count(*)
FROM
(SELECT
regexp_split_to_table(ColDescription, '\s') as word
FROM tblCollection
) a
GROUP BY word
ORDER BY count(*) desc
Which makes a good list of all words and how many times they appear in the entire description column.
I need a way to show how many times a word is in a string at least once.
For example, if my data was:
hello hello test
hello test test test
test hi
he would show
word count # of rows it appears in
hello 3 2
test 5 3
hi 1 1
I am a very beginner with databases, any help is appreciated!
Example table:
CREATE TABLE tblCollection ( ColDescription varchar(500) NOT NULL PRIMARY KEY);
Sample data:
"hello hello test"
"hello test test test"
"test hi"
Each line is its own line.
+3
source to share
1 answer
The main obstacle is that your subquery does not store any information about where it found each instance of the word. This is easily fixed:
SELECT
regexp_split_to_table(ColDescription, '\s') as word,
ColDescription
FROM tblCollection
You now have a source field listed along with each word, and it's just a matter of counting them:
SELECT
word, count(*), count(distinct ColDescription)
FROM
...
+1
source to share