Performing searches on sparse Javascript arrays

I have an interesting Javascript task (executed in Node.js, FWIW): I need to take the "weighted median" of a dataset for which I have values โ€‹โ€‹(income in this case) and a weight for each one. For example:

income  #people
0   5
16000   3
20000   8
32000   4
40000   3
41000   1
50000   2
90000   1

      

In other words, 8 people are $ 20 thousand, 2 are $ 50 thousand, etc. I want a "weighted median" - the median of all 27 people.

A naive way to do this is to make an array and seed it with each value, like this:

var incomes = [0, 0, 0, 0, 0, 16000, 16000, 16000, 20000, 20000, 20000, 20000, 20000, 20000, 20000, 20000, 32000, 32000, 32000, 32000, 40000, 40000, 40000, 41000, 50000, 50000, 90000];

      

Then you can easily take the median of this array (which is $ 20,000). In fact, I have data from 7000 to 14000 people per sample. While I'm pretty sure Node can handle an array this large, it feels incredibly sloppy.

My current solution is to compute the index of the median value in a hypothetical verbose matrix - 13, in this case - and loop through the array of incomes and weights, adding up the cumulative weight until it reaches or exceeds halfway. Here's a simplified example. (Obviously medians require slightly different rules for even numbered lists. It's just a POC.)

var halfway = 13,
    progress = 0;

var vals = [[0,5], [16000,3], [20000,8], [32000,4], [40000,3], [41000,1], [50000,2], [90000,1]];

for (var v = 0; v < vals.length; v += 1) {
    progress += vals[v][1];

    if (progress >= halfway) {
        var median = vals[v][0];
        break;
    }
}

      

This works fine, but it gets messy when you want to start calculating quartiles and so on. It would be easier for me to create a sparse array of values โ€‹โ€‹at the appropriate location in the verbose array without populating all the intermediate values, and then search that array for any index up to the maximum. But I need an efficient mechanism to find the previous known index in the sparse array, if (as far as possible) the index I'm looking for in the spare array is not populating.

This seems to be a fairly common problem.

+3


source to share


1 answer


In terms of computational efficiency, I think you have as good as you do, but I'm not sure what kind of difficulties your quartics are facing (sorry too low to ask for clarification on this).

Let's start by examining the effectiveness of what you have. You have an array of length n, and you jump to it, adding a counter and breaking halfway through (I guess half the information was provided, again too sorry to ask). So it was a good idea to look at simple O (n).



Now, what you are suggesting is some form of search that, given the index, knows the nearest populated index, O (1). Ok, that would be better, so let's see what we need for this. Well, we would need to move the data into some new data structure by going through it ..... Oh, that means they are back in O (n).

The moral of the story is that you have good, good work.

+1


source







All Articles