Sorting numbers using SI scale factors

I have a CSV file that contains a column whose values ​​are specified with SI factors. I need to do a numeric sort on this column. Specifically, the CSV file contains a list of known astronomical objects (Messier objects) and I need to sort by distance. The kicker is that the distance is given by numbers using SI unix prefixes, so a simple view won't work. Is there an easy way to do this?

Here's a very abbreviated version of the file:

"Messier Number","Distance"
"M1","6.5 kly"
"M2","33 kly"
"M7","980 ly"
"M16","7 kly"
"M19","29 kly"
"M31","2.5 Mly"
"M49","56 Mly"

      

Here's what I have so far:

from csv import DictReader

with open('m.csv') as f:
    messier = sorted(DictReader(f), key=lambda e: e['Distance'])

for entry in messier:
    print('{Messier Number:>5s} {Distance}'.format(**entry))

      

But this makes it more literal than numeric:

 M31 2.5 Mly
 M19 29 kly
  M2 33 kly
 M49 56 Mly
  M1 6.5 kly
 M16 7 kly
  M7 980 ly

      

I could try to split the distance and interpret k and M myself, but that seems like the wrong approach. After all, the use of metric prefixes is very common. There should be some support for this already. Any pointers would be much appreciated.

+3


source to share


2 answers


The easiest way to do this is to use QuantiPhy . This is a good package that reads and writes numbers with SI coefficients and units. QuantiPhy privides The amount that subclasses float. It will convert your string to something that behaves like a float, allowing you to do numeric sorts. the string can include scale factors and units. The correct scale factor is interpreted. In this case, the units are unnecessary and this would be effectively ignored.

Modifying the code to the following should work.

from csv import DictReader
from quantiphy import Quantity

with open('m.csv') as f:
    messier = sorted(DictReader(f), key=lambda e: Quantity(e['Distance']))

for entry in messier:
    print('{Messier Number:>5s} {Distance}'.format(**entry))

      



With this code, the sort comes out correctly:

  M7 980 ly
  M1 6.5 kly
 M16 7 kly
 M19 29 kly
  M2 33 kly
 M31 2.5 Mly
 M49 56 Mly

      

+2


source


Probably overkill, compared to the other answer in this case, but here's the code using units

library

from units import unit, predefined, scaled_unit
from csv import DictReader

predefined.define_units()

# units defines `ly` as multiple of metres, but doesn't define
# any of the SI prefixes for these, so we do these units by hand

scaled_unit('kly', 'ly', 1000)
scaled_unit('Mly', 'ly', 1000 * 1000)
scaled_unit('Gly', 'ly', 1000 * 1000 * 1000)


def make_distance(d):
    number, un = d.split()
    return unit(un)(float(number))


with open('m.csv') as f:
    messier = sorted(DictReader(f), key=lambda e: make_distance(e['Distance']))


for entry in messier:
    print('{Messier Number:>5s} {Distance}'.format(**entry))

      



output

   M7 980 ly
   M1 6.5 kly
  M16 7 kly
  M19 29 kly
   M2 33 kly
  M31 2.5 Mly
  M49 56 Mly

      

0


source







All Articles