Loop through array to find euclidean distance in python

This is what I have so far:

Stats2003 = np.loadtxt('/DataFiles/2003.txt') 
Stats2004 = np.loadtxt('/DataFiles/2004.txt') 
Stats2005 = np.loadtxt('/DataFiles/2005.txt') 
Stats2006 = np.loadtxt('/DataFiles/2006.txt')
Stats2007 = np.loadtxt('/DataFiles/2007.txt') 
Stats2008 = np.loadtxt('/DataFiles/2008.txt')
Stats2009 = np.loadtxt('/DataFiles/2009.txt') 
Stats2010 = np.loadtxt('/DataFiles/2010.txt') 
Stats2011 = np.loadtxt('/DataFiles/2011.txt') 
Stats2012 = np.loadtxt('/DataFiles/2012.txt') 

Stats = Stats2003, Stats2004, Stats2004, Stats2005, Stats2006, Stats2007, Stats2008, Stats2009, Stats2010, Stats2011, Stats2012

      

I am trying to calculate the euclidean distance between each of these arrays with every other array, but I am having a hard time doing it.

I have an output that I would like by calculating a distance like this:

dist1 = np.linalg.norm(Stats2003-Stats2004)
dist2 = np.linalg.norm(Stats2003-Stats2005)
dist11 = np.linalg.norm(Stats2004-Stats2005)

      

and so on, but I would like to do these calculations with a loop.

I am showing calculations to a table using Prettytable.

Can anyone point me in the right direction? I haven't found any previous solutions that worked.

+3


source to share


2 answers


Take a look scipy.spatial.distance.cdist

.

From the documentation:

Calculates the distance between each pair of two sets of inputs.



So, you can do something like the following:

import numpy as np
from scipy.spatial.distance import cdist
# start year to stop year
years = range(2003,2013)
# this will yield an n_years X n_features array
features = np.array([np.loadtxt('/Datafiles/%s.txt' % year) for year in years])
# compute the euclidean distance from each year to every other year
distance_matrix = cdist(features,features,metric = 'euclidean')

      

If you know the starting year and you are not missing data for any years, then it is easy to determine which two years are compared on a coordinate (m,n)

in the distance matrix.

+2


source


To make a loop, you need to store data from your variable names . A simple solution would be to use dictionaries instead. Loops are implicit in the understanding of a dict:

import itertools as it

years = range(2003, 2013)
stats = {y: np.loadtxt('/DataFiles/{}.txt'.format(y) for y in years}
dists = {(y1,y2): np.linalg.norm(stats[y1] - stats[y2]) for (y1, y2) in it.combinations(years, 2)}

      



now get access to statistics for a specific year, for example. 2007, at stats[2007]

and distances with tuples for example. dists[(2007, 20011)]

...

+2


source







All Articles