Creating one hot vector from indices, specified as tensor
I have a size tensor 4 x 6
where 4 is batch size and 6 is sequence length. Each element of the vectors of the sequence is some index (from 0 to n). I want to create a tensor 4 x 6 x n
where vectors in 3rd dimension will be one hot coding of the index, which means I want to put 1 at the specified index and the rest of the values will be zero.
For example, I have the following tensor:
[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]
Here all values are between (0 to n) where n = 15. So I want to convert tensor to 4 X 6 X 16
tensor where the third dimension will represent one hot encoding vector.
How can I do this using PyTorch functions? Right now, I am doing this with a loop, but I want to avoid the loop!
source to share
NEW ANSWER As of PyTorch 1.1, the one_hot
function torch.nn.functional
. Given any index tensor indices
and maximum index n
, you can create a one_hot version like this:
n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)
Very old answer
At this point in my experience slicing and indexing can be a little painful in PyTorch. I assume you don't want to convert your tensors to arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert them to dense tensors. It will work like this:
from torch.sparse import FloatTensor as STensor
batch_size = 4
seq_length = 6
feat_dim = 16
batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)
my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length),
torch.Size([batch_size, seq_length, feat_dim])).to_dense()
print(my_final_array)
Note: PyTorch is currently doing some work that will add numpy-style broadcasting and other features over the next two or three weeks, as well as other features. So maybe better solutions will appear in the near future.
Hope this helps you a little.
source to share
This can be done PyTorch
using an scatter_
in-place method for any object Tensor
.
labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)
For num_classes=3
(indices must be different from [0,3)
), this will give you
(0 ,.,.) =
0 0 1
0 1 0
1 0 0
(1 ,.,.) =
1 0 0
0 1 0
1 0 0
[torch.FloatTensor of size 2x3x3]
Please note what labels
should be torch.LongTensor
. torch.LongTensor
...
PyTorch Docs Reference: torch.Tensor.scatter_
source to share