Seemingly conflicting results from topk / sort and pick

Question

Seemingly conflicting results from topk / sort and pick

I am predicting about one in 100K possible outputs with the MXNet model using a fairly standard softmax output. I want to compare the probability assigned to the true label and the top predictions within the model. To get the former, I use the select operator; later I tried the cheap version (topk operator) and the expensive version (sort / argsort + slice).

In both cases, I am getting conflicting results. In particular, there are many cases where the probability of a true label (obtained with pick) is significantly higher than the highest probability output (obtained with topk / sort). I think this means I am doing something wrong, but I don't understand what. This does not happen for all forecasts, but for a significant proportion.

Can someone give me a hint as to what is going on?

The code follows:

for batch in data_iter:
    model.forward(batch, is_train=False)
    predictions = model.get_outputs()[0]
    labels = batch.label[0].as_in_context(predictions.context)

    # scores = mx.nd.topk(predictions, axis=1, k=6, ret_typ='value')
    scores = mx.nd.sort(predictions, axis=1, is_ascend=0)
    scores = mx.nd.slice_axis(scores, axis=1, begin=0, end=6)

    label_score = mx.nd.pick(predictions, labels, axis=1)
    equal = label_score.asnumpy() <= scores.asnumpy()[:, 0]

    if not np.all(equal):
        #I think this should never happen but it does frequently

+3

python mxnet

Ben allison 04 jul. 17 at 16:38

source to share

1 answer

Sina Afrooze · Answer 1 · 2018-03-01T03:38:09+0000

Testing with MXNet 1.1.0, the following code shows the problem does not occur:

for _ in range(10):
    predictions = nd.random.uniform(shape=(100, 100000))
    labels = nd.array(np.random.randint(0, 99999, size=(100, 1)))

    scores = mx.nd.sort(predictions, axis=1, is_ascend=0)
    scores = mx.nd.slice_axis(scores, axis=1, begin=0, end=6)

    label_score = mx.nd.pick(predictions, labels, axis=1)
    equal = label_score.asnumpy() <= scores.asnumpy()[:, 0]

    if not np.all(equal):
        print("ERROR")

Seemingly conflicting results from topk / sort and pick

More articles: