How do I create a dataset in the same format as an FSNS dataset?
I'm working on this project on the basis of TensorFlow.
I just want to train an Attention_Handling OCR model based on my own datasets, but I don't know how to store my images and basic truth in the same format as FSNS datasets.
Does anyone also work on this project or know how to fix this problem?
source to share
The data format for storing the training / test is defined in the FSN paper https://arxiv.org/pdf/1702.03970.pdf (Table 4).
To store tfrecord files using tf.Example protos, you can use tf.python_io.TFRecordWriter . There is a good tutorial existing fooobar.com/questions/171476 / ... and a short text .
Suppose you have a numpy ndarray img
that has num_of_views
images stored side by side (see figure 3 in the doc):
and the corresponding text in a variable text
. You will need to define some function to convert the unicode string to a list of fixed-length padded and unadapted character identifiers. For example:
char_ids_padded, char_ids_unpadded = encode_utf8_string(
text='abc',
charset={'a':0, 'b':1, 'c':2},
length=5,
null_char_id=3)
the result should be:
char_ids_padded = [0,1,2,3,3]
char_ids_unpadded = [0,1,2]
If you are using the functions _int64_feature
and _bytes_feature
defined in the gist , you can create a tf.Example protocol using FSNS using the following snippet:
char_ids_padded, char_ids_unpadded = encode_utf8_string(
text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
feature={
'image/format': _bytes_feature("PNG"),
'image/encoded': _bytes_feature(img.tostring()),
'image/class': _int64_feature(char_ids_padded),
'image/unpadded_class': _int64_feature(char_ids_unpadded),
'height': _int64_feature(img.shape[0]),
'width': _int64_feature(img.shape[1]),
'orig_width': _int64_feature(img.shape[1]/num_of_views),
'image/text': _bytes_feature(text)
}
))
source to share