Removing strings at distinct indices from a numpy array
In my dataset I have close to 200 rows, but for minimal work, for example, let's say the following array:
arr = np.array([[1,2,3,4], [5,6,7,8],
[9,10,11,12], [13,14,15,16],
[17,18,19,20], [21,22,23,24]])
I can take a random sample of 3 rows like this:
indexes = np.random.choice(np.arange(arr.shape[0]), int(arr.shape[0]/2), replace=False)
Using these indices, I can select my test cases like this:
testing = arr[indexes]
I want to delete rows in these indices and I can use the leftover items for my workout set.
From the post here , it seems like I training = np.delete(arr, indexes)
should. But I end up with a 1d array instead.
I also tried the suggestion here with help training = arr[indexes.astype(np.bool)]
, but it didn't give a clean separation. I am getting item [5,6,7,8] in train and test kits.
training = arr[indexes.astype(np.bool)]
testing
Out[101]:
array([[13, 14, 15, 16],
[ 5, 6, 7, 8],
[17, 18, 19, 20]])
training
Out[102]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Any idea what I am doing wrong? Thank.
source to share
One approach is to get the remaining row indices with np.setdiff1d
and then use those row indices to get the desired result -
out = arr[np.setdiff1d(np.arange(arr.shape[0]), indexes)]
Or use np.in1d
to use boolean indexing
-
out = arr[~np.in1d(np.arange(arr.shape[0]), indexes)]
source to share