Osmosis - removing business from OSM data for geocoding use

I am trying to set up a Nominatim database to geocode addresses. The database will be used by komoot Photon, but I think it is not that important.

The problem is that OSM XML / PBF files I have to contain not only addresses, but a whole bunch of other things, such as bars, various offices and so on, which I am trying to delete.

The idea is to go with something like this until I get the desired set of results:

osmosis  --read-xml us-northeast-latest.osm.bz2 \
    --tf reject-nodes landuse=* \
    --tf reject-nodes amenity=* \
    --tf reject-nodes office=*  \
    --tf reject-nodes shop=* \
    --tf reject-nodes place=house  \
    --write-xml output.osm

      

However, after importing the resulting file, I still get these nodes (which should have been excluded) in the search results:

{
    properties: {
        osm_key: "office",
        osm_value: "ngo",
        extent: [
            -73.9494926,
            40.6998938,
            -73.9482012,
            40.6994192
        ],
        street: "Flushing Avenue",
        name: "Public Lab NYC",
        state: "New York",
        osm_id: 250328718,
        osm_type: "W",
        housenumber: "630",
        postcode: "11206",
        city: "New York City",
        country: "United States of America"
    },
    type: "Feature",
    geometry: {
        type: "Point",
        coordinates: [
            -73.9490215989286,
            40.699639649999995
        ]
    }
}

      

Pay attention to osm_key and value.

I'm not sure what I am doing wrong here. Any help would be appreciated.

+3


source to share


1 answer


I don't think you are familiar enough with OSM elements and tags .

Removing nodes (or paths or relationships) that contain certain tags is definitely not what you want. Instead, you want to either discard certain tags, or keep only certain tags and leave the rest - instead of discarding complete objects.

To understand the difference between the two, you must know that addresses in OSM are modeled in two different ways. Either they are modeled on a separate node address, or they are tied to an already existing function like a building, store, restaurant, etc. The second way is the important part here when your approach will lower all those addresses.



Therefore, you want to keep items even if they are "just" in a store or restaurant, because they can still contain an address. But you can drop all non-address tags from those elements, and remove all items that don't contain any address tags at all. This should be possible with osmosis, but I am not familiar with osmosis well enough to provide you with the parameters you need.

But I'm not sure if this is really a good idea, because more than one object can have the same name. Imagine a river, a mountain peak, a small village, and a large village all share the same name. If you decide to drop all the extra tags that are necessary to distinguish a river from a peak and a small village from a large one, then you will have problems when trying to decide which name to choose from the list of search results.

+3


source







All Articles