Better data structure / db / binary format for storing hierarchical items

I have the following hierarchical objects:

Country, City, Street.
Every country has cities, every city has streets.

      

I wrote pseudocode for what I want to run:

handle_country_before(country)
for city in country:
    handle_city_before(city)
    for street in city:
        handle_street_before(street)
        handle_street_after(street)
    handle_city_after(city)
handle_country_after(country)

      

I've tried the following approaches:

Approach document

(noSql):

I have saved all my data tightly:

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

      

Using this method, I had to use the following pseudocode:

    last_country = 0
    last_city = 0
    last_street = 0
    for element in elements:
        if element.country.id != last_country_id:
            if (0 != last_country) :
                handle_country_after(last_country)
            handle_country_before(element.country)
        if element.city.id != city:
            if (0 != last_city) :
                handle_country_after(last_city)
            handle_country_before(element.city)     
        if element.street.id != street:
            if (0 != last_street) :
                handle_country_after(last_street)
            handle_country_before(element.street)           

      

Disadvantage: I feel like this approach is a little overwhelmed, and using a flat structure is not suitable for my case, and it was also very slow and inefficient.

SQL approach:

I saved each entity in a table: Country, City, Street and repeated it with the following code:

    country_cursor = query('select * from countries')
    for country in country_cursor:
        handle_country_before(country)
        city_cursor = query('select * from cities where parent_country_ref=%s' % (country.id))
        for city in city_cursor:
            street_cursor = query('select * from streets where parent_city_ref=%s' % (city.id))
            ...
            ...
        ...
        handle_country_after(country)

      

in the beginning, it looked like the best approach. But since I added more metadata tables and had to use JOIN statements, it got slower and I tried to use materialized view to speed things up a bit, but got the same result as when using docs.

Custom format method:

I tried to store the information in my own binary serialization format:

<number of countries>[1st-country-data]<number of citieis>[1nd-city-data]<number of streets>[1st-street-data][2nd-street-data][3rd-street...]...

Disadvantage: it couldn't scale, I couldn't update information, I couldn't get a specific city / street, each search was O (n).

I'm looking for a serialization / DB format that would look like this:

  • Ability to add / update fields for existing elements

  • Efficiency in speed, space and memory

  • C compatible (no CPP)

+3


source to share


1 answer


The fastest approach in your case is to use unnormalized data, create a flat file / table with the following information:

country    city    street    current mayor    additional information

      

of course the data must be sorted, of course you won't be using heavy parsing formats like json / xml, only plain text



then you will be able to use a single loop to iterate over that array

to speed up the iteration a bit, you can try to split one file into multiple fixed-width lines.

0


source







All Articles