Min-Max DataPoint Normilization

I have a DataPoint list like

List<DataPoint> newpoints=new List<DataPoint>(); 


where DataPoint is a class, consists of nine double objects from A to z and

newpoints.count=100000 double points (i.e each point consists of nine double features from A to I)


I need to apply normalization to new points of a list using the Min-Max and scale_range normalization method between 0 and 1.

I followed the steps below

  • each DataPoints function is assigned to one dimensional array. for example the code for function A

    for (int i = 0; i < newpoints.Count; i++)
        {  array_A[i] = newpoints[i].A;} and so on for all nine double features

  • I applied the max-min normalization method. for example the code for function A:

    normilized_featureA= (((array_A[i] - array_A.Min()) * (1 - 0)) / 
                      (array_A.Max() - array_A.Min()))+0;

the method succeeds, but it takes more time (i.e. 3 minutes and 45 seconds).

how can I apply Max_min normalization using LINQ code in C # to cut the time down to a few seconds? I found this question on Stackoverflow How to normalize a list of int values, but my problem is

double valueMax = list.Max(); // I need Max point for feature A  for all 100000
double valueMin = list.Min(); //I need Min point for feature A  for all 100000


etc. for all other nine functions, your help would be much appreciated.


source to share

3 answers

As an alternative to modeling your 9 functions as double properties in the "DataPoint" class, you can also model the 9 doubles datapoint as an array, whereby you can do all 9 calculations in one pass, again using LINQ:

var newpoints = new List<double[]>
    new []{1.23, 2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12},
    new []{2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23},
    new []{3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34},
    new []{4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34, 15.32}

var featureStats = newpoints
// We make the assumption that all 9 data points are present on each row.
// 2 Anon Projections - first to determine min / max as a function of column
.Select((np, idx) => new
   Idx = idx,
   Max = newpoints.Max(x => x[idx]),
   Min = newpoints.Min(x => x[idx])
// Second to add in the dynamic Range
.Select(x => new {
  Range = x.Max - x.Min
// Back to array for O(1) lookups.

// Do the normalizaton for the columns, for each row.
var normalizedFeatures = newpoints
   .Select(np => np.Select(
      (i, idx) => (i - featureStats[idx].Min) / featureStats[idx].Range));

foreach(var datapoint in normalizedFeatures)
  Console.WriteLine(string.Join(",", datapoint.Select(x => x.ToString("0.00"))));







Stop recalculating high / low over and over again, it doesn't change.

double maxInFeatureA = array_A.Max();
double minInFeatureA = array_A.Min();

// somewher in the loop:
normilized_featureA= (((array_A[i] - minInFeatureA ) * (1 - 0)) / 
                  (maxInFeatureA  - minInFeatureA ))+0;


Max / Min is very expensive for an array when used foreach/for

with many elements.

I suggest you take this code: Normalizing Array Data

and use it like

var normalizedPoints = newPoints.Select(x => x.A)
            .NormalizeData(1, 1)




double min = newpoints.Min(p => p.A);
double max = newpoints.Max(p => p.A);
double readonly normalizer = 1 / (max - min);

var normalizedFeatureA = newpoints.Select(p => (p.A - min) * normalizer);




All Articles