Creating one hot vector from indices, specified as tensor

I have a size tensor 4 x 6

where 4 is batch size and 6 is sequence length. Each element of the vectors of the sequence is some index (from 0 to n). I want to create a tensor 4 x 6 x n

where vectors in 3rd dimension will be one hot coding of the index, which means I want to put 1 at the specified index and the rest of the values ​​will be zero.

For example, I have the following tensor:

[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]

      

Here all values ​​are between (0 to n) where n = 15. So I want to convert tensor to 4 X 6 X 16

tensor where the third dimension will represent one hot encoding vector.

How can I do this using PyTorch functions? Right now, I am doing this with a loop, but I want to avoid the loop!

+5


source to share


2 answers


NEW ANSWER As of PyTorch 1.1, the one_hot

function torch.nn.functional

. Given any index tensor indices

and maximum index n

, you can create a one_hot version like this:

n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)

      

Very old answer

At this point in my experience slicing and indexing can be a little painful in PyTorch. I assume you don't want to convert your tensors to arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert them to dense tensors. It will work like this:



from torch.sparse import FloatTensor as STensor

batch_size = 4
seq_length = 6
feat_dim = 16

batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],                            
                             [2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)

my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length), 
                         torch.Size([batch_size, seq_length, feat_dim])).to_dense()    

print(my_final_array)

      

Note: PyTorch is currently doing some work that will add numpy-style broadcasting and other features over the next two or three weeks, as well as other features. So maybe better solutions will appear in the near future.

Hope this helps you a little.

+2


source


This can be done PyTorch

using an scatter_

in-place method for any object Tensor

.

labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)

      

For num_classes=3

(indices must be different from [0,3)

), this will give you



(0 ,.,.) = 
  0  0  1
  0  1  0
  1  0  0
(1 ,.,.) = 
  1  0  0
  0  1  0
  1  0  0
[torch.FloatTensor of size 2x3x3]

      

Please note what labels

should be torch.LongTensor

. torch.LongTensor

...

PyTorch Docs Reference: torch.Tensor.scatter_

+3


source







All Articles