Search in Kassandra

I want to use Cassandra to query search alongside search. based on my lon / lat coordinates, I want to get the closest points. I don't need 100% precision, so I'm more comfortable using a bounding box instead of a circle (better performance), but I can't find specific instructions (hopefully an example) how to implement a bounding box.

+3


source to share


2 answers


In my experience, there is no easy way to get a general search for geospatial indexes on top of Cassandra. I believe you only have two options:



  • Geohash , split your dataset into square / rectangular elements: for example use whole lat / lon parts as indices in the grid. After doing the search, you can load all the elements in the closing grid element and do a full scan of the neighbors inside your application.

    • works well if you have an evenly distributed dataset like grid points in NWP that I had.
    • performs very poorly on datasets such as "restaurants in the US" where most points are grazed around major cities. You will have an unbalanced high load on various mesh elements, for example, in the New York area, and you will have completely empty index buckets located somewhere in the Atlantic Ocean.
  • External indexes like ElasticSearch / Solr / Sphinx / etc.

    • They all have geospatial indexing support out of the box, no need to develop your own in your application layer.
    • You need to set up a separate indexing service and sync your cassandra / index. There's some cassandra / search integrations out there like DSE (commercial), stargate-core (I've never heard of anyone using this in production), or you can roll your own, but it all takes time and effort.
+4


source


This issue was raised at the Euro-Kassandra summit in 2014.

RedHat: Scalable Geospatial Indexing with Cassandra



The presenter explains how he created a spatial index using User Defined Types, which is very suitable for querying geospatial data using region or frame searches.

The general idea is to split your data into regions that are defined by the bounding boxes. Each region then represents a row, which you can use to access any data associated with that region. If you have interest, you are requesting a key space in the regions that fall into this area.

+1


source







All Articles