Sorting numbers using SI scale factors
I have a CSV file that contains a column whose values ββare specified with SI factors. I need to do a numeric sort on this column. Specifically, the CSV file contains a list of known astronomical objects (Messier objects) and I need to sort by distance. The kicker is that the distance is given by numbers using SI unix prefixes, so a simple view won't work. Is there an easy way to do this?
Here's a very abbreviated version of the file:
"Messier Number","Distance"
"M1","6.5 kly"
"M2","33 kly"
"M7","980 ly"
"M16","7 kly"
"M19","29 kly"
"M31","2.5 Mly"
"M49","56 Mly"
Here's what I have so far:
from csv import DictReader
with open('m.csv') as f:
messier = sorted(DictReader(f), key=lambda e: e['Distance'])
for entry in messier:
print('{Messier Number:>5s} {Distance}'.format(**entry))
But this makes it more literal than numeric:
M31 2.5 Mly
M19 29 kly
M2 33 kly
M49 56 Mly
M1 6.5 kly
M16 7 kly
M7 980 ly
I could try to split the distance and interpret k and M myself, but that seems like the wrong approach. After all, the use of metric prefixes is very common. There should be some support for this already. Any pointers would be much appreciated.
source to share
The easiest way to do this is to use QuantiPhy . This is a good package that reads and writes numbers with SI coefficients and units. QuantiPhy privides The amount that subclasses float. It will convert your string to something that behaves like a float, allowing you to do numeric sorts. the string can include scale factors and units. The correct scale factor is interpreted. In this case, the units are unnecessary and this would be effectively ignored.
Modifying the code to the following should work.
from csv import DictReader
from quantiphy import Quantity
with open('m.csv') as f:
messier = sorted(DictReader(f), key=lambda e: Quantity(e['Distance']))
for entry in messier:
print('{Messier Number:>5s} {Distance}'.format(**entry))
With this code, the sort comes out correctly:
M7 980 ly
M1 6.5 kly
M16 7 kly
M19 29 kly
M2 33 kly
M31 2.5 Mly
M49 56 Mly
source to share
Probably overkill, compared to the other answer in this case, but here's the code using units
library
from units import unit, predefined, scaled_unit
from csv import DictReader
predefined.define_units()
# units defines `ly` as multiple of metres, but doesn't define
# any of the SI prefixes for these, so we do these units by hand
scaled_unit('kly', 'ly', 1000)
scaled_unit('Mly', 'ly', 1000 * 1000)
scaled_unit('Gly', 'ly', 1000 * 1000 * 1000)
def make_distance(d):
number, un = d.split()
return unit(un)(float(number))
with open('m.csv') as f:
messier = sorted(DictReader(f), key=lambda e: make_distance(e['Distance']))
for entry in messier:
print('{Messier Number:>5s} {Distance}'.format(**entry))
output
M7 980 ly
M1 6.5 kly
M16 7 kly
M19 29 kly
M2 33 kly
M31 2.5 Mly
M49 56 Mly
source to share