How do I create a dataset in the same format as an FSNS dataset?

I'm working on this project on the basis of TensorFlow.

I just want to train an Attention_Handling OCR model based on my own datasets, but I don't know how to store my images and basic truth in the same format as FSNS datasets.

Does anyone also work on this project or know how to fix this problem?

+3


source to share


3 answers


The data format for storing the training / test is defined in the FSN paper https://arxiv.org/pdf/1702.03970.pdf (Table 4).

To store tfrecord files using tf.Example protos, you can use tf.python_io.TFRecordWriter . There is a good tutorial existing fooobar.com/questions/171476 / ... and a short text .

Suppose you have a numpy ndarray img

that has num_of_views

images stored side by side (see figure 3 in the doc): enter image description here and the corresponding text in a variable text

. You will need to define some function to convert the unicode string to a list of fixed-length padded and unadapted character identifiers. For example:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text='abc', 
   charset={'a':0, 'b':1, 'c':2},
   length=5,
   null_char_id=3)

      



the result should be:

char_ids_padded = [0,1,2,3,3]
char_ids_unpadded = [0,1,2]

      

If you are using the functions _int64_feature

and _bytes_feature

defined in the gist , you can create a tf.Example protocol using FSNS using the following snippet:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
  feature={
    'image/format': _bytes_feature("PNG"),
    'image/encoded': _bytes_feature(img.tostring()),
    'image/class': _int64_feature(char_ids_padded),
    'image/unpadded_class': _int64_feature(char_ids_unpadded),
    'height': _int64_feature(img.shape[0]),
    'width': _int64_feature(img.shape[1]),
    'orig_width': _int64_feature(img.shape[1]/num_of_views),
    'image/text': _bytes_feature(text)
  }
))

      

+11


source


You shouldn't use the below code directly:

"'image/encoded': _bytes_feature(img.tostring()),"

      



In my code, I wrote this:

_,jpegVector = cv2.imencode('.jpeg',img)
imgStr = jpegVector.tostring()
'image/encoded': _bytes_feature(imgStr)

      

0


source


when i read tfrecords the values ​​'image / class': _int64_feature (char_ids_padded) are empty, has anyone come across this issue?

0


source







All Articles