I need to count the number of different lines where a word appears

I still have

SELECT
    word, count(*)
FROM
    (SELECT
            regexp_split_to_table(ColDescription, '\s') as word
    FROM tblCollection
    ) a
GROUP BY word
ORDER BY count(*) desc

      

Which makes a good list of all words and how many times they appear in the entire description column.

I need a way to show how many times a word is in a string at least once.

For example, if my data was:

hello hello test 
hello test test test
test hi

      

he would show

word    count   # of rows it appears in
hello     3        2
test      5        3
hi        1        1

      

I am a very beginner with databases, any help is appreciated!

Example table:

CREATE TABLE tblCollection ( ColDescription varchar(500) NOT NULL PRIMARY KEY);

      

Sample data:

"hello hello test"
"hello test test test"
"test hi"

      

Each line is its own line.

+3


source to share


1 answer


The main obstacle is that your subquery does not store any information about where it found each instance of the word. This is easily fixed:

SELECT
  regexp_split_to_table(ColDescription, '\s') as word,
  ColDescription
FROM tblCollection

      



You now have a source field listed along with each word, and it's just a matter of counting them:

SELECT
  word, count(*), count(distinct ColDescription)
FROM
...

      

+1


source







All Articles