How do I create a dataset in the same format as an FSNS dataset?

I'm working on this project on the basis of TensorFlow.

I just want to train an Attention_Handling OCR model based on my own datasets, but I don't know how to store my images and basic truth in the same format as FSNS datasets.

Does anyone also work on this project or know how to fix this problem?


source to share

3 answers

The data format for storing the training / test is defined in the FSN paper (Table 4).

To store tfrecord files using tf.Example protos, you can use tf.python_io.TFRecordWriter . There is a good tutorial existing / ... and a short text .

Suppose you have a numpy ndarray img

that has num_of_views

images stored side by side (see figure 3 in the doc): enter image description here and the corresponding text in a variable text

. You will need to define some function to convert the unicode string to a list of fixed-length padded and unadapted character identifiers. For example:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   charset={'a':0, 'b':1, 'c':2},


the result should be:

char_ids_padded = [0,1,2,3,3]
char_ids_unpadded = [0,1,2]


If you are using the functions _int64_feature

and _bytes_feature

defined in the gist , you can create a tf.Example protocol using FSNS using the following snippet:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
    'image/format': _bytes_feature("PNG"),
    'image/encoded': _bytes_feature(img.tostring()),
    'image/class': _int64_feature(char_ids_padded),
    'image/unpadded_class': _int64_feature(char_ids_unpadded),
    'height': _int64_feature(img.shape[0]),
    'width': _int64_feature(img.shape[1]),
    'orig_width': _int64_feature(img.shape[1]/num_of_views),
    'image/text': _bytes_feature(text)




You shouldn't use the below code directly:

"'image/encoded': _bytes_feature(img.tostring()),"


In my code, I wrote this:

_,jpegVector = cv2.imencode('.jpeg',img)
imgStr = jpegVector.tostring()
'image/encoded': _bytes_feature(imgStr)




when i read tfrecords the values ​​'image / class': _int64_feature (char_ids_padded) are empty, has anyone come across this issue?



All Articles