Split the list of objects into two lists according to the ratio by size and division of objects into sections

I have a list of objects, call it dataset

and I want to split into two new lists, call them trainSet

and testSet

when reading the input line dataset

by line, where the list object inherits from the node

read stream object (in this particular case).

We then define a bifurcated ratio split_ratio=0.75

to divide dataset

by a trainSet

at least 75% of dataset

and 25% of the length of the list testSet

(for example):

var dataSize = dataset.length,
    lim_75 = 0,
    lim_25 = 0;

lim_75 = Math.floor(dataSize * split_ratio); // e.g.: .75
if (lim_75 < 1) lim_75 = 1;
lim_25 = Math.floor(dataSize * (1 - split_ratio)); // eg: 1-.75=.25
if (lim_25 < 1) lim_25 = 1;

      

When reading line by line, I will do

while (row = dataset.next()) {
    var r = Math.random(); // split randomness seed
    if (r < split_ratio && lim_75_count < lim_75 || r >= split_ratio && lim_25_count > lim_25) {
        lim_75_count += 1;
        trainSet.write(row);
    } else {
        lim_25_count += 1;
        testSet.write(row);
    }
}

      

This works, I get at the end two lists trainSet

of 0.75 size and testSet

from 0.25 of the list length dataset

.

Now, suppose my objects in a list dataset

have a structure like

{
 objectId: 12345,
 objectClass: 'CLASS_A`
 objectValue: 'The quick brown fox jumps over the lazy dog'
}

      

where objectClass

belongs to an enumeration of values ​​such as CLASS_A

, CLASS_B

etc.

I want to keep the above split_ratio

for the ratio of length trainSet

and testSet

, but adding a new condition that would allow me to divide dataset

by a new ratio to split the list into the object key objectClass

, according to this partition map:

[
{
 objectClass: 'TYPE_A',
 ratio: 0.30
},
{
 objectClass: 'TYPE_B',
 ratio: 0.20
},
{
 objectClass: 'TYPE_C',
 ratio: 0.50
}
]

      

What's the correct splitting condition that will work AND

with the first based on size?

Example

Real-world examples to clarify: Suppose that split_ratio=.70

you have to have trainSet.length=70

, testSet.length=30

if `dataset.length = 100.

Now given two types objectClass

with ratio

values

CLASS_A=0.20

and CLASS_B=0.80

, we must have 0.2*70/100

elements CLASS_A

at trainSet

and 0.80*30/100

at testSet

, while we will have 0.80*70/100

elements CLASS_B

at trainSet

and 0.20*30/100

elements CLASS_B

at testSet

.

In this case, it is assumed that the partition map contains only types CLASS_A

and CLASS_B

, such as:

[
{
     objectClass: 'TYPE_A',
     ratio: 0.20
    },
    {
     objectClass: 'TYPE_B',
     ratio: 0.80
    }
]

      

+3
javascript node.js dataset partitioning


source to share


No one has answered this question yet

Check out similar questions:

2685
Checking if a key exists in a JavaScript object?
2284
How can I combine properties of two JavaScript objects dynamically?
1435
How to efficiently count the number of keys / properties of an object in JavaScript?
1068
How can I remove a key from a JavaScript object?
1010
How do I add a key / value pair to a JavaScript object?
725
How do I list properties of a JavaScript object?
639
How to check if an object has a key in JavaScript?
596
Checking for nested JavaScript object object
517
JavaScript object key specified by variable
286
Getting a key list of JavaScript objects



All Articles
Loading...
X
Show
Funny
Dev
Pics