Enter part of the string value into a number and concatenate
Below are some details below. As you can see, the $ MarketCap column contains both string and numeric components. I want to replace all B with 10 ^ 9 and M with 10 ^ 6, concatenated of course with existing numeric values.
I tried: dframe$MarketCap <- replace(dframe$MarketCap, "B", 10^6)
but I get an error
Error in $ <-. data.frame (tmp, "MarketCap", value = c ("$ 9B", "$ 987.15M": replacement has 908 rows, data has 907
Symbol Name LastSale MarketCap IPOyear Sector
904 DLR Digital Realty Trust, Inc. 66.3 $9B 2004 Consumer Services
2745 SWAY Starwood Waypoint Residential Trust 25.86 $987.15M 2014 Consumer Services
3140 WNC Wabash National Corporation 14.45 $981.39M 1991 Capital Goods
2102 NOA North American Energy Partners, Inc. 2.89 $98.24M 2006 Energy
3115 VG Vonage Holdings Corp. 4.57 $976.09M 2006 Public Utilities
273 ATTO Atento S.A. 13.21 $972.51M 2014 Public Utilities
2541 RMP Rice Midstream Partners LP 16.79 $965.55M 2014 Public Utilities
str for the data block before the error message
data.frame': 907 obs. of 9 variables:
$ Symbol : Factor w/ 3285 levels "A","AA","AA^B",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
$ Name : Factor w/ 2657 levels "3D Systems Corporation",..: 735 2214 2478 1689 2602 205 2048 635 1650 2055 ...
$ LastSale : Factor w/ 2572 levels "0.02","0.22",..: 2192 1153 412 758 1664 316 560 877 1872 1049 ...
$ MarketCap : chr "$9B" "$987.15M" "$981.39M" "$98.24M" ...
$ IPOyear : Factor w/ 33 levels "1984","1985",..: 21 31 8 23 23 31 31 31 30 27 ...
$ Sector : Factor w/ 13 levels "Basic Industries",..: 5 5 2 6 11 11 11 2 6 13 ...
$ industry : Factor w/ 130 levels "Accident &Health Insurance",..: 109 109 32 89 123 123 83 19 86 87 ...
$ Summary.Quote: Factor w/ 3285 levels "http://www.nasdaq.com/symbol/a",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
$ X : logi NA NA NA NA NA NA ...
+3
source to share
1 answer
You may try
library(gsubfn)
dframe$MarketCap <- as.numeric(gsubfn('[BMK$]', list(K='e3', M='e6',
B='e9', "$"=''), dframe$MarketCap))
Or using base R
v1 <- sub('[$0-9.]+', '', dframe$MarketCap)
v2 <- c(K='e3', M='e6', B='e9')
dframe$MarketCap <- as.numeric(paste0(gsub('\\$|[A-Z]+', '',
dframe$MarketCap), v2[v1]))
dframe$MarketCap
#[1] 9000000000 987150000 981390000 98240000 976090000 972510000 965550000
+3
source to share