Extracting n decimal from a string

I've been going through half of stackoverflow looking for this, but nothing seems to be perfect, sorry if not.

I have a string with the format:

fname <-'FS1_SCN0.83_axg3.csv'

I would like to extract the second number, which is decimal, but could also be an integer and get 0.83 in the result (or 3 if an integer). The closest I realized was:

gsub("[^0-9.]","\\2",fname)

      

which produces all numbers and decimal tokens in fname (10.833.), but as a whole string.

Thanks in advance, p.

+3


source to share


5 answers


To get the second number,

regmatches(x, regexpr("^\\D*\\d+\\D*\\K\\d+(?:\\.\\d+)?", x, perl=TRUE))

      

Demo

or

sub("^\\D*\\d+\\D*(\\d+(?:\\.\\d+)?).*", "\\1", x, perl=TRUE)

      

Example:



> x <-'FS1_SCN0.83_axg3.csv'
> regmatches(x, regexpr("^\\D*\\d+\\D*\\K\\d+(?:\\.\\d+)?", x, perl=TRUE))
[1] "0.83"
> sub("^\\D*\\d+\\D*(\\d+(?:\\.\\d+)?).*", "\\1", x, perl=TRUE)
[1] "0.83"

      

For a more general case

regmatches(x, regexpr("^\\D*\\d+(?:\\.\\d+)?\\D*\\K\\d+(?:\\.\\d+)?", x, perl=TRUE))
sub("^\\D*\\d+(?:\\.\\d+)?\\D*(\\d+(?:\\.\\d+)?).*", "\\1", x, perl=TRUE)

      

OR

Just provide the postcode number to get the number you want.

> regmatches(fname, gregexpr("\\d+(?:\\.\\d+)?", fname))[[1]][2]
[1] "0.83"

      

+3


source


Regex

.+_SCN(\d+(?:\.\d+)?)_.+\.csv

      

Description

Regular expression visualization



Demo

Sample code

sub(".+_SCN(\\d+(?:\\.\\d+)?)_.+\\.csv", "\\1", fname)

      

+3


source


^.*?(?:\\d+(?:\\.\\d+)?).*?\\K\\d+(?:\\.\\d+)?

      

You can use this parameter perl=True

and get a match. See demo.

https://www.regex101.com/r/fJ6cR4/8

or

gsub("^.*?(?:\\d+(?:\\.\\d+)?).*?(\\d+(?:\\.\\d+)?).*$","\\1",fname,perl=TRUE)

      

+2


source


You can use str_extract_all()

from a batch stringr

to match all numbers in a given input, and then grab the captured group # 2 from the resulting array:

library(stringr)

str_extract_all(fname, "([0-9]+(?:\\.[0-9]+)?)")

      

+2


source


As per your comment, you can use this: _[A-Z]+(\d+(\.\d+)?)

as shown here . As a minor note, this suggested answer does nothing that hasn't been posted. I just think it's a bit readable and easier to follow.

If you know the exact characters, it might make sense to replace the section with the [A-Z]

specified characters. This would make the expression even more intuitive.

+1


source







All Articles