SolrCloud Indexing / Query without Smart-Client

I'm having trouble understanding how indexing and querying would work if I don't have a smart client available. I am using SolrNet with C # which does not currently integrate with ZooKeeper.

As a basic example, let's say I have one collection split into two shards, replicated to two separate nodes / servers, and I have a standard HTTP load balancer in front of the servers (the scenario mentioned here ). If I'm using a standard composite router, I believe the indexing will work seamlessly and replicate to both ZooKeeper nodes behind the scenes. I don't need to worry about node getting the "update" command - ZooKeeper handled the routing and document replication automatically.

However, in this same scenario, will ZooKeeper properly handle routing requests behind the scenes? Given that I am using inline shard and not custom shard, will the load balancer request request be redirected to the correct shard, or will I have to include all known shards in the shards parameter (see here ) to make sure I'm not doing anything missed? Obviously, this would be cumbersome to maintain as the number of shards grows.

It seems that custom shredding will provide the most efficiency for indexing and querying, although then you risk unequal shard sizes. Any thoughts on these matters would be appreciated.

+3


source to share


1 answer


Let's take an example of a collection with two shards, with each shard on a separate node / server.

10.xx100: 8983 / solr / -> shard 1 / node 1

10.xx101: 8983 / solr / -> shard 2 / node 2

Using standard routing, you indexed 100 documents, which split across these two servers and now have 50 documents.

If you query any of the two servers for documents, solr will search by default by default. You don't need to specify anything in the shards parameter.



So

10.x.x.100:8983/solr/collection/select?q=solr rocks

will run the same query on 10.x.x.101:8983/solr/

, and the results returned will be a combination of results from both shards, sorted and ranked by score.

The parameter &shards

enters the image when you know which "group" of data is in the shard. For example, using the example above, you have custom routing enabled and you are using the city field to route documents. As an example, suppose there are only two values ​​for the city field. Your documents will be redirected to one of the shards based on this field.

On the side of your application, if you want to specifically query for documents belonging to a city, you can specify a parameter &shard

and all query results will be fetched from that shard only.

+3


source







All Articles