How do I write / read binaries representing objects?

I am new to Java programming and I ran into this problem:

I am creating a program that reads a CSV file, converts its strings to objects, and then manipulates those objects. More specifically, the application reads each row by giving it an index, and it also reads specific values ​​from those rows and stores them in TRIE trees. The application can then read indices from the values ​​stored in the trees and then retrieve the complete information about the corresponding row.

My problem is that although I've been researching for the past couple of days, I don't know how to write these structures in binaries or how to read them. I want to write lines (with their indices) in a binary indexed file and only read the exact index I got from TRIE.

For writing a tree, I was looking for something like this (in C)

fwrite(tree, sizeof(struct TrieTree), 1, file)

      

For a "binary indexed file" I was thinking about writing objects like TRIE and maybe reading each object until I read enough to reach the corresponding index, but this is probably not very efficient.

Recapitulation, I need help writing and reading objects in binaries and solutions on how to create an indexed file.

+3


source to share


2 answers


I think you (for starters) are best at trying to do this with serialization.

Here's just one example from stackoverflow: What is object serialization?



(I think copy and paste the code doesn't make sense, please follow the link to read)

True, this has not yet solved the problem of creating your index.

+2


source


Here is an alternative to the native Java serialization, Google Protocol Buffers.

I'm going to write direct quotes from the documentation mostly in this answer, so be sure to follow the link at the end of the answer if you're interested in more details.

What it is:

Protocol Buffers is a Google-neutral, platform-neutral, extensible mechanism for serializing structured data - I think XML, but smaller, faster, and simpler.

In other words, you can serialize your structs in Java and deserialize to .net, pyhton, etc. This does not exist in Java Serialization.

Performance:

This can vary depending on the use case, but in principle GPB should be faster as it is built with performance and interchangeability in mind. Here is a link discussing Java native vs GPB:

High performance serialization: Java vs Google Protocol Buffers vs ...?



How it works:

You specify how you want to structure the information you order by defining the protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information containing a series of name-value pairs. Here is a very simple example of a .proto file that defines a message containing information about a person:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

      

After you have defined your messages, you run the protocol buffer compiler for your application language in your .proto file to create the data access classes. They provide simple accessors for each field (like name () and set_name ()) as well as methods for serializing / parsing the entire structure to / from raw bytes.

You can then use this class in your application to fill, serialize, and receive messages from the Person protocol buffer. Then you can write code like this:

Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .build();
output = new FileOutputStream(args[0]);
john.writeTo(output);

      

Read all about it here: https://developers.google.com/protocol-buffers/

You can look at GPB as an alternative XSD format describing XML structures, just more compact and with faster serialization.

+2


source







All Articles