LMDB files and how they are used for the caffe deep learning network

I am completely new to deep learning and am having some problems using the caffe deep learning network. Basically, I have not found any documentation explaining how I can resolve the range of questions and problems that I am dealing with right now.

Please let me explain my situation first.

I have thousands of images and I have to do a series of preprocessing operations on them. For each preprocessing operation, I have to save these preprocessing images as 4D matrices and also save the vector with image labels. I will store this information as LMDB files to be used as input to deep learning caffe googletet.

I tried to save my images as .HD5 files, but the final file size is 80GB, which cannot be processed with the memory I have.

So the other option is using LMDB files, right? I am new to this format and I appreciate your help in understanding how to create them in Matlab. Basically, my newbie questions:

1- These LMDB files have the extension .MDB, right? is this extension the same as Microsoft Access? or the correct format is -.lmdb and they are different?

2- I find this solution for generating .mdb files ( https://github.com/kyamagu/matlab-leveldb ), does it create the file format the caffe needs?

3- For caffe, should I create one .mdb file for labels and other images, or can both be fields of the same .mdb file?

4- When I create the .mdb file I have to tag the database fields. Is it possible to mark one field as an image and another as a label? does any field understand any field?

5- what does the function (at https://github.com/kyamagu/matlab-leveldb ) database.put ('key1', 'value1') and database.put ('key2', 'value2') do? Should I store my 4D matrices in one field and the label vector in another?

+3


source to share


2 answers


No link between LMDB files and MS Access files.

As I see it, you have two options:

  • Use the "convert_imageset" tool - it is located in the tools folder in the tools folder to convert the image file list and label to lmdb.
  • Instead of the "data layer", use the "image data layer" as the entrance to the network. This type of layer accepts a file with a list of image filenames and shortcuts as a source, so you don't need to create a database (another benefit for training - you can use the shuffle option and get slightly better training results)


To use an image data layer, simply change the layer type from Data to ImageData. A source file is a file path containing on each line the path to the image file and a space-delimited label. For example:

/path/to/filnename.png 23

      

If you want to do some preprocessing on the data without saving the preprocessed file to disk, you can use the transformations available by caffe (mirror and trim) (see here for info http://caffe.berkeleyvision.org/tutorial/data.html ) or implement your own DataTransformer

.

+7


source


Caffe does not use LevelDB, but uses LMDB 'Lightning' db from Symas

You can try using this Matlab LMDB wrapper I personally have no experience using LMDB with Matlab, but there is a good library for that: py-lmdb

LMDB database is a key / value db (similar to HashMap in Java or dict in Python). To store 4D matrices, you need to understand what Caffe uses to store images in the LMDB format.



This means that the best way to convert images to LMDB for Caffe is with Caffe.

Caffe has examples on how to convert images to LMDB - I would try to repeat them and then modify the scripts to use your images.

+1


source







All Articles