Cut a line with a complete word closest to a specific character number

I am trying to split a vector of strings into two parts (I only want to keep the first bit) based on the following criteria:

  • it must split after the full word (i.e. where the whitespace occurs)
  • it should cut in the space closest to the 12th character

Example:

textvec <- c("this is an example", "I hope someone can help me", "Thank you in advance")

      

The expected result is a vector like this:

"this is an" , "I hope someone", "Thank you in"

      

What I've tried so far: I can get full words that occur before or on the 12th character like this:

t13 <- substr(textvec , 1, 13) #gives me first 13 characters of each string
lastspace <- lapply(gregexpr(" ", t13), FUN=function(x) x[length(x)]) #gives me last space before/at 13th character
result <- substr(t13, start=1, stop=lastspace)

      

But what I want is to include the word closest to the 12th character (eg "someone" in the example above), not necessarily before or on the 12th character. In case there is a connection, I would like to include the word after the 12th character. I hope I can explain clearly :)

+3


source to share


2 answers


Using cumsum

,



sapply(strsplit(textvec, ' '), function(i) paste(i[cumsum(nchar(i)) <= 12], collapse = ' '))

#[1] "this is an"     "I hope someone" "Thank you in"

      

+3


source


We can use gregexpr

to find the nearest space at 12 and then with substr

cut the string



substr(textvec, 1, sapply(gregexpr("\\s+", textvec), 
            function(x) x[which.min(abs(12 - x))])-1)
#[1] "this is an"     "I hope someone" "Thank you in"  

      

+2


source







All Articles