How to find a string within a character in R
I know this is a very naive question, but I've tried a lot but haven't found a way to count the number of occurrences of a specified substring within a character string in R.
For example:
str <- "Hello this is devavrata! here, say again hello"
Now I want to find the number of occurrences hello
, ignoring case. In this example, the answer should be 2.
EDIT: I'm wondering that when I find ello th
it str_count
will give an entry 1
, but I want the exact word surrounded by the appearance of spaces to give in this case zero
.
For example, if I want to search very good
on a specific line, eg: -
It is very good to speak like thevery good
And the meeting shouldn't be 1
here 2
. I hope you understand.
source to share
Perhaps the simplest and easiest way would be to use str_count
fromstringr
str <- "Hello this is devavrata! here, say again hello"
library(stringr)
str_count(str, ignore.case("hello"))
# [1] 2
Two basic R methods:
length(grep("hello", strsplit(str, " ")[[1]], ignore.case = TRUE))
# [1] 2
and
sum(gregexpr("hello", str, ignore.case = TRUE)[[1]] > 0)
# [1] 2
source to share
I'm late to the party, but I think the function termco
from the package qdap
does exactly what you want. You use leading and / or trailing spaces to control word boundaries, as shown in the example below:
x <- c("Hello this is devavrata! here, say again hello",
"It is very good to speak like thevery good")
library(qdap)
(out <- termco(x, id(x), list("hello", "very good", " very good ")))
## x word.count hello very good very good
## 1 1 8 2(25.00%) 0 0
## 2 2 9 0 2(22.22%) 1(11.11%)
## To get a data frame of pure counts:
out %>% counts()
## x word.count hello very good very good
## 1 1 8 2 0 0
## 2 2 9 0 2 1
source to share