How to improve the forecast of rare events in the keras entity recognition problem?

I am working with a dataset of ~ 90,000 tokens (not huge) and in this dataset only ~ 600 are labeled with something other than the standard (not interesting) label (So there are over 89,000 "O" tokens in the IOB format, the rest is marked with one from five different categories, I'd like to know).

I use the bi-lstm network, which works well on a sampled dataset where "interesting" categories are relatively frequent (about 5000 out of 12000 tokens are not "O").

But when we train the same network according to my data, the result is that there are practically no non-O 'tokens - it seems that rare non-O'Events are simply not frequent enough for NN to give them a chance.

In this situation, I was thinking of a custom loss function that penalizes the absence of a non-O token rather than the other way around (falsely predicting "O" as not-O).

Are there any other guidelines that are recommended? (600 out of 90,000 doesn't seem too rare, but still, I'm not getting any non-"predictions" at this time.)

The model I'm using is a nice implementation by Adam Atkinson ( https://github.com/aatkinson/deep-named-entity-recognition ) of the keras model which, as I said, works fine on a comparable dataset with less sparse objects.

+3


source to share





All Articles