Extract character before first point in string

Question

Extract character before first point in string

I would like to extract the character preceding the first point in a column of strings. I can do it with the code below. Although, the code seems too complex, and I had to resort to for-loop

. Is there an easier way? I'm especially interested in the solution regex

.

Note that finding the last number in each line will not work with my real data, although this approach will work with this example.

Thanks for any advice.

my.data <- read.table(text = '
     my.string  state
     .........    A
     1........    B
     112......    C
     11111....    D
     1111113..    E
     111111111    F
     111111111    G
', header = TRUE, stringsAsFactors = FALSE)

desired.result <- c(NA,1,2,1,3,NA,NA)

Determine the position of the first point:

my.data$first.dot <- apply(my.data, 1, function(x) {     
                                as.numeric(gregexpr("\\.", x['my.string'])[[1]])[1]
                          })

Separated lines:

split.strings <- t(apply(my.data, 1, function(x) { (strsplit(x['my.string'], '')[[1]]) } ))

my.data$revised.first.dot <- ifelse(my.data$first.dot < 2, NA, my.data$first.dot-1)

Extract the character preceding the first dot:

for(i in 1:nrow(my.data)) {
     my.data$character.before.dot[i] <- split.strings[i,my.data$revised.first.dot[i]]
}

my.data

#   my.string state first.dot revised.first.dot character.before.dot
# 1 .........     A         1                NA                 <NA>
# 2 1........     B         2                 1                    1
# 3 112......     C         4                 3                    2
# 4 11111....     D         6                 5                    1
# 5 1111113..     E         8                 7                    3
# 6 111111111     F        -1                NA                 <NA>
# 7 111111111     G        -1                NA                 <NA>

Here is a related post:

find the location of a character in a string

+3

string regex r

Mark miller Dec 11. 14 at 11:59

source to share

6 answers

Use the regex below and don't forget to include the parameter perl=TRUE

.

^[^.]*?\K[^.](?=\.)

In R, the regex would look like

^[^.]*?\\K[^.](?=\\.)

DEMO

> library(stringr)
> as.numeric(str_extract(my.data$my.string, perl("^[^.]*?\\K[^.](?=\\.)")))
[1] NA  1  2  1  3 NA NA

Sample Explanation:

^

It is stated that we are at the beginning.
[^.]*?

An unwanted match of any character with the first dot.
\K

Discards previously matched characters.
[^.]

The symbol we are going to match does not have to be a dot.
(?=\.)

And this character must be followed by a period. Thus, it matches the character that exists immediately before the first point.

+4

Avinash Raj Dec 11. 14 at 12:07

source to share

The simplest regular expression would be ^([^.])+(?=\.)

:

^      # Start of string
(      # Start of group 1
 [^.]  # Match any character except .
)+     # Repeat as many times as needed, overwriting the previous match
(?=\.) # Assert the next character is a .

Test it live at regex101.com .

The content of group 1 will be your desired symbol. I don't really like the guy, but according to RegexBuddy the following should work:

matches <- regexpr("^([^.])+(?=\\.)", my.data, perl=TRUE);
result <- attr(matches, "capture.start")[,1]
attr(result, "match.length") <- attr(matches, "capture.length")[,1]
regmatches(my.data, result)

+3

Tim Pietzcker Dec 11. 14 at 12:02

source to share

In this example, everything digits

and.

library(stringr)
as.numeric(str_extract(my.data$my.string, perl('\\d(?=\\.)')))
#[1] NA  1  2  1  3 NA NA

Or using stringi

library(stringi)
as.numeric(stri_extract(my.data$my.string, regex='\\d(?=\\.)'))
#[1] NA  1  2  1  3 NA NA

If this is for the case general

:

as.numeric(str_extract(my.data$my.string, perl('[^.](?=\\.)')))

+2

akrun Dec 11. 14 at 12:02

source to share

[^.](?=\\.)

You can just do this. See demo.

https://regex101.com/r/qB0jV1/26

+2

vks Dec 11. 14 at 12:17

source to share

Using rex can make this type of task a little easier.

my.data <- read.table(text = '
     my.string  state
     .........    A
     1........    B
     112......    C
     11111....    D
     1111113..    E
     111111111    F
     111111111    G
', header = TRUE, stringsAsFactors = FALSE)

library(rex)

re_matches(my.data$my.string,
  rex(capture(except(".")), "."))$'1'

#> [1] NA  "1" "2" "1" "3" NA  NA

+1

Jim Dec 12. 14 at 21:08

source to share

Sven Hohenstein · Accepted Answer · 2014-12-11T12:34:34+0000

Here's a basic R solution with ifelse

:

res <- regexpr("[^.](?=\\.)", my.data$my.string, perl = TRUE)
ifelse(res < 1, NA, as.integer(regmatches(my.data$my.string, res)))
# [1] NA  2  1  3  1 NA NA

Extract character before first point in string

More articles: