Enter part of the string value into a number and concatenate

Below are some details below. As you can see, the $ MarketCap column contains both string and numeric components. I want to replace all B with 10 ^ 9 and M with 10 ^ 6, concatenated of course with existing numeric values.

I tried: dframe$MarketCap <- replace(dframe$MarketCap, "B", 10^6)

but I get an error

Error in $ <-. data.frame (tmp, "MarketCap", value = c ("$ 9B", "$ 987.15M": replacement has 908 rows, data has 907

              Symbol                                   Name  LastSale MarketCap IPOyear                Sector
904             DLR              Digital Realty Trust, Inc.     66.3       $9B    2004     Consumer Services
2745           SWAY     Starwood Waypoint Residential Trust    25.86  $987.15M    2014     Consumer Services
3140            WNC             Wabash National Corporation    14.45  $981.39M    1991         Capital Goods
2102            NOA    North American Energy Partners, Inc.     2.89   $98.24M    2006                Energy
3115             VG                   Vonage Holdings Corp.     4.57  $976.09M    2006      Public Utilities
273            ATTO                             Atento S.A.    13.21  $972.51M    2014      Public Utilities
2541            RMP              Rice Midstream Partners LP    16.79  $965.55M    2014      Public Utilities

      

str for the data block before the error message

data.frame':    907 obs. of  9 variables:
 $ Symbol       : Factor w/ 3285 levels "A","AA","AA^B",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
 $ Name         : Factor w/ 2657 levels "3D Systems Corporation",..: 735 2214 2478 1689 2602 205 2048 635 1650 2055 ...
 $ LastSale     : Factor w/ 2572 levels "0.02","0.22",..: 2192 1153 412 758 1664 316 560 877 1872 1049 ...
 $ MarketCap    : chr  "$9B" "$987.15M" "$981.39M" "$98.24M" ...
 $ IPOyear      : Factor w/ 33 levels "1984","1985",..: 21 31 8 23 23 31 31 31 30 27 ...
 $ Sector       : Factor w/ 13 levels "Basic Industries",..: 5 5 2 6 11 11 11 2 6 13 ...
 $ industry     : Factor w/ 130 levels "Accident &Health Insurance",..: 109 109 32 89 123 123 83 19 86 87 ...
 $ Summary.Quote: Factor w/ 3285 levels "http://www.nasdaq.com/symbol/a",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
 $ X            : logi  NA NA NA NA NA NA ...

      

+3


source to share


1 answer


You may try

library(gsubfn)
dframe$MarketCap <- as.numeric(gsubfn('[BMK$]', list(K='e3', M='e6', 
                            B='e9', "$"=''), dframe$MarketCap))

      



Or using base R

v1 <- sub('[$0-9.]+', '', dframe$MarketCap)
v2 <- c(K='e3', M='e6', B='e9')
dframe$MarketCap <- as.numeric(paste0(gsub('\\$|[A-Z]+', '', 
                     dframe$MarketCap), v2[v1]))
dframe$MarketCap
#[1] 9000000000  987150000  981390000   98240000  976090000  972510000  965550000

      

+3


source







All Articles