Formatting "Kilo", "Mega", "Gig" data in numpy record array
I am trying to build something that is in this csv: timestamp, value format. But the values are not real numbers, but rather abbreviations of large values (k = 1000, M = 1,000,000, etc.).
2012-02-24 09:07:01, 8.1M
2012-02-24 09:07:02, 64.8M
2012-02-24 09:07:03, 84.8M
2012-02-24 09:07:04, 84.8M
2012-02-24 09:07:05, 84.8M
2012-02-24 09:07:07, 84.8M
2012-02-24 09:07:08, 84.8M
2012-02-24 09:07:09, 84.8M
2012-02-24 09:07:10, 84.8M
I usually use an array of numpy records to store the csv with matplotlib.mlab.csv2rec(infile)
. But it only works if the values are not specified in abbreviated form. Is there an easy way to do this without my program reading each value looking for "M" to convert 84.8M to 84800000?
+3
source to share
2 answers
Another possibility is the following conversion function:
conv = dict(zip('kMGT', (3, 6, 9, 12)))
def parse_number(value):
if value[-1] in conv:
value = '{}e{}'.format(value[:-1], conv[value[-1]])
return float(value)
Example:
>>> parse_number('1337')
1337.0
>>> parse_number('8.1k')
8100.0
>>> parse_number('8.1M')
8100000.0
>>> parse_number('64.367G')
64367000000.0
+5
source to share
You can use Niklas B function in convertd csv2rec argument :
>>> data = mlab.csv2rec(infile, names=['datetime', 'values'],
... convertd={'values': parse_number})
>>> data
rec.array([(datetime.datetime(2012, 2, 24, 9, 7, 1), 8100000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 2), 64800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 3), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 4), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 5), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 7), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 8), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 9), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 10), 84800000.0)],
dtype=[('datetime', '|O8'), ('values', '<f8')])
+2
source to share