Matching float in two lists
I have two CSV files.
The first, when viewed as a list, looks like this:
('Rubus idaeus', '10.0', '56.0')
('Neckera crispa', '9.8785', '56.803')
('Dicranum polysetum', '9.1919', '56.0456')
('Sphagnum subnitens', '9.1826', '56.6367')
('Taxus baccata', '9.61778', '55.68833')
('Sphagnum papillosum', '9.1879', '56.0442')
The columns contain Views, Longitude, and Latitude. These are observations made in the field.
The other file is also a CSV file. A test similar to the real thing. It looks like this:
{'y': '58.1', 'x': '22.1', 'temp': '14'}
{'y': '58.2', 'x': '22.2', 'temp': '10'}
{'y': '58.3', 'x': '22.3', 'temp': '1'}
{'y': '58.4', 'x': '22.4', 'temp': '12'}
{'y': '58.5', 'x': '22.5', 'temp': '1'}
{'y': '58.6', 'x': '22.6', 'temp': '6'}
{'y': '58.7', 'x': '22.7', 'temp': '0'}
{'y': '58.8', 'x': '22.8', 'temp': '13'}
{'y': '58.9', 'x': '22.9', 'temp': '7'}
Both files are very long.
I have an observation and now I want to find the closest bottom number in the file that contains the climate data and then concatenate that line with another, so the output looks like this:
('Dicranum polysetum', '9.1919', '56.0456', 'y': '9.1', 'x': '56.0', 'temp': '7')
I tried to create nested loops, iterate through CSV files with DictReader
, but it is very heavily nested. And it will take a huge number of cycles to get through it all.
Does anyone know a method?
The code I have at the moment is low, but I tried to loop in a couple of ways and I expect there is something fundamentally wrong with my general approach.
import csv
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')
for row in fil:
print 'x=',row['x']
for line in occu:
print round(float(line['decimalLongitude']),1)
if round(float(line['decimalLongitude']),1) == row['x']:
print 'You did it, found one dam match'
Here are the links for my two files, so you don't need to compile any data if you know something that might push me forward.
https://www.dropbox.com/s/lmstnkq8jl71vcc/nyDK_OVER_50M.csv?dl=0 https://www.dropbox.com/s/v22j61vi9b43j78/TestData.csv?dl=0
Regards, Mathias
source to share
Since you say there are no temperature points missing, then it is much easier to solve the problem:
import csv
# temperatures
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
# species
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')
d = {}
for row in fil:
x = '{:.1f}'.format(float(row['x']))
y = '{:.1f}'.format(float(row['y']))
try:
d[x][y] = row['temp']
except KeyError:
d[x] = {y:row['temp']}
for line in occu:
x = '{:.1f}'.format(round(float(line['decimalLongitude']),1))
y = '{:.1f}'.format(round(float(line['decimalLatitude']),1))
temp = d[x][y]
line['temp'] = temp
line['x'] = x
line['y'] = y
print(line)
source to share
This is a solution that uses the numpy
Euclidean distance of each data item to compute the points x,y
and joins the item with the data from the data tuple x,y
with the smallest distance to it.
import numpy
import operator
# read the data into numpy arrays
testdata = numpy.genfromtxt('TestData.csv', delimiter=';', names=True)
nyDK = numpy.genfromtxt('nyDK_OVER_50M.csv', names=True, delimiter='\t',\
dtype=[('species','|S64'),\
('decimalLongitude','float32'),\
('decimalLatitude','float32')])
# extract the x,y tuples into a numpy array or [(lat,lon), ...]
xy = numpy.array(map(operator.itemgetter('x', 'y'), testdata))
# this is a function which returns a function which computes the distance
# from an arbitrary point to an origin
distance = lambda origin: lambda point: numpy.linalg.norm(point-origin)
# methods to extract the (lat, lon) from a nyDK entry
latlon = operator.itemgetter('decimalLatitude', 'decimalLongitude')
getlatlon = lambda item: numpy.array(latlon(item))
# this will transfrom a single element of the nyDK array into
# a union of it with its closest climate data
def transform(item):
# compute distance from each x,y point to this item location
# and find the position of the minimum
idx = numpy.argmin( map(distance(getlatlon(item)), xy) )
# return the union of the item and the closest climate data
return tuple(list(item)+list(testdata[idx]))
# transform all the entries in the input data set
result = map(transform, nyDK)
print result[0:3]
Outputs:
[('Rubus idaeus', 10.0, 56.0, 15.0, 51.0, 14.0),
('Neckera crispa', 9.8785, 56.803001, 15.300000000000001, 51.299999999999997, 2.0),
('Dicranum polysetum', 9.1919003, 56.045601, 14.6, 50.600000000000001, 10.0)]
Note: not very close distances, but this is probably because the file .csv
does not have a full mesh of x,y
points.
source to share