Better data structure / db / binary format for storing hierarchical items

Question

Better data structure / db / binary format for storing hierarchical items

I have the following hierarchical objects:

Country, City, Street.
Every country has cities, every city has streets.

I wrote pseudocode for what I want to run:

handle_country_before(country)
for city in country:
    handle_city_before(city)
    for street in city:
        handle_street_before(street)
        handle_street_after(street)
    handle_city_after(city)
handle_country_after(country)

I've tried the following approaches:

Approach document

(noSql):

I have saved all my data tightly:

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

Using this method, I had to use the following pseudocode:

    last_country = 0
    last_city = 0
    last_street = 0
    for element in elements:
        if element.country.id != last_country_id:
            if (0 != last_country) :
                handle_country_after(last_country)
            handle_country_before(element.country)
        if element.city.id != city:
            if (0 != last_city) :
                handle_country_after(last_city)
            handle_country_before(element.city)     
        if element.street.id != street:
            if (0 != last_street) :
                handle_country_after(last_street)
            handle_country_before(element.street)

Disadvantage: I feel like this approach is a little overwhelmed, and using a flat structure is not suitable for my case, and it was also very slow and inefficient.

SQL approach:

I saved each entity in a table: Country, City, Street and repeated it with the following code:

    country_cursor = query('select * from countries')
    for country in country_cursor:
        handle_country_before(country)
        city_cursor = query('select * from cities where parent_country_ref=%s' % (country.id))
        for city in city_cursor:
            street_cursor = query('select * from streets where parent_city_ref=%s' % (city.id))
            ...
            ...
        ...
        handle_country_after(country)

in the beginning, it looked like the best approach. But since I added more metadata tables and had to use JOIN statements, it got slower and I tried to use materialized view to speed things up a bit, but got the same result as when using docs.

Custom format method:

I tried to store the information in my own binary serialization format:

<number of countries>[1st-country-data]<number of citieis>[1nd-city-data]<number of streets>[1st-street-data][2nd-street-data][3rd-street...]...

Disadvantage: it couldn't scale, I couldn't update information, I couldn't get a specific city / street, each search was O (n).

I'm looking for a serialization / DB format that would look like this:

Ability to add / update fields for existing elements
Efficiency in speed, space and memory
C compatible (no CPP)

+3

c sql database design-patterns data-structures

Aviv May 22 '17 at 16:34

source to share

1 answer

Ilya Bursov · Answer 1 · 2017-05-22T16:44:46+0000

The fastest approach in your case is to use unnormalized data, create a flat file / table with the following information:

country    city    street    current mayor    additional information

of course the data must be sorted, of course you won't be using heavy parsing formats like json / xml, only plain text

then you will be able to use a single loop to iterate over that array

to speed up the iteration a bit, you can try to split one file into multiple fixed-width lines.

Better data structure / db / binary format for storing hierarchical items

I've tried the following approaches:

More articles: