How Erlang accesses huge generic data structures like BTree in CouchDB

CouchDB has a huge BTree data structure and multiple processes (one for each request).

Erlang processes cannot share state - so it seems like there should be a dedicated process responsible for accessing BTree and communicating with other processes via messages. But that would be inefficient - because there is only one process that can access the data.

So how is such cases handled in Erland and how is it handled in this particular case with CouchDB?

+3


source to share


1 answer


Good question. If you want an authoritative answer, the best place to ask a question about couchdb internals is the couchdb mailing list , they are very fast and one of the main developers can probably give you a better answer.I will try to answer this as best I can as I can just remember that I could be wrong :)

The first hint is provided by the couchdb config file. Starting couchdb in shell mode

couchdb -i

point the browser

http://localhost:5984/_utils/config.html

You will see that daemon

there are a couple of key values in the section

index_server {couch_index_server, start_link, []}

Oh! therefore the index is maintained by the server. Which server? We will need to dive into the code : -

This is gen_server. All operations on the couchdb view are handled by this gen_server. Gen_server is the standard implementation of the erlang client server model. It matches by default. So your observation is correct. All view requests are different processes controlled by gen_server.



index_server defines three tables. You can check this by typing

ets:i()

in the erlang shell we started earlier and you should see: -

 couchdb_indexes_by_db couchdb_indexes_by_db bag     1      320      couch_index_server
 couchdb_indexes_by_pid couchdb_indexes_by_pid set   1      316      couch_index_server
 couchdb_indexes_by_sig couchdb_indexes_by_sig set   1      316      couch_index_server

      

When index_server gets from call

to get_index

, it adds the list Waiters

to ets couchdb_indexes_by_sig. Or, if a process asks for it, it just posts reply

with the location of the index.

When index_server gets from call

to async_open

, it just iterates over the list Waiters

and sends them reply

with the location of the index

Likewise, there are calls reset_indexes

and other ops for indexes which again send a response indicating the location of the index.

When the index is created for the first call to couchdb async_open

to serve the index to all pending processes. Subsequently, each process is granted access to the index.

It is important to note that the index server does nothing special except that the index is available to other processes (for example, couch_mr_view_util.erl). In this respect, it acts as a gateway. The index write operations are handled by couch_index.erl, couch-index_updater.erl, and couch_index_compactor.erl, which are (unsurprisingly) all gen_servers!

When a view is first created, only one process can access it. The query_server process (couchjs by default). After the view has been built, it can be read and updated at the same time. The actual view request is handled by couch_mr_view, which opens up to us as an http-api.

+2


source







All Articles