How Erlang accesses huge generic data structures like BTree in CouchDB
CouchDB has a huge BTree data structure and multiple processes (one for each request).
Erlang processes cannot share state - so it seems like there should be a dedicated process responsible for accessing BTree and communicating with other processes via messages. But that would be inefficient - because there is only one process that can access the data.
So how is such cases handled in Erland and how is it handled in this particular case with CouchDB?
source to share
Good question. If you want an authoritative answer, the best place to ask a question about couchdb internals is the couchdb mailing list , they are very fast and one of the main developers can probably give you a better answer.I will try to answer this as best I can as I can just remember that I could be wrong :)
The first hint is provided by the couchdb config file. Starting couchdb in shell mode
couchdb -i
point the browser
http://localhost:5984/_utils/config.html
You will see that daemon
there are a couple of key values in the section
index_server {couch_index_server, start_link, []}
Oh! therefore the index is maintained by the server. Which server? We will need to dive into the code : -
This is gen_server. All operations on the couchdb view are handled by this gen_server. Gen_server is the standard implementation of the erlang client server model. It matches by default. So your observation is correct. All view requests are different processes controlled by gen_server.
index_server defines three tables. You can check this by typing
ets:i()
in the erlang shell we started earlier and you should see: -
couchdb_indexes_by_db couchdb_indexes_by_db bag 1 320 couch_index_server
couchdb_indexes_by_pid couchdb_indexes_by_pid set 1 316 couch_index_server
couchdb_indexes_by_sig couchdb_indexes_by_sig set 1 316 couch_index_server
When index_server gets from call
to get_index
, it adds the list Waiters
to ets couchdb_indexes_by_sig. Or, if a process asks for it, it just posts reply
with the location of the index.
When index_server gets from call
to async_open
, it just iterates over the list Waiters
and sends them reply
with the location of the index
Likewise, there are calls reset_indexes
and other ops for indexes which again send a response indicating the location of the index.
When the index is created for the first call to couchdb async_open
to serve the index to all pending processes. Subsequently, each process is granted access to the index.
It is important to note that the index server does nothing special except that the index is available to other processes (for example, couch_mr_view_util.erl). In this respect, it acts as a gateway. The index write operations are handled by couch_index.erl, couch-index_updater.erl, and couch_index_compactor.erl, which are (unsurprisingly) all gen_servers!
When a view is first created, only one process can access it. The query_server process (couchjs by default). After the view has been built, it can be read and updated at the same time. The actual view request is handled by couch_mr_view, which opens up to us as an http-api.
source to share