Choosing a NoSQL database to store events in an application developed by CQRS
I'm looking for a good, up-to-date and explanatory "help solution" on how to choose a NoSQL database engine to store all events in a CQRS developed application.
I am currently new to everything about NoSQL (but learning): be clear and feel free to explain your point in a (almost too) precise manner. This post might deserve other newbies like me.
This database will:
-
Be able to insert from 2 to 10 rows on updates given by front view (in my case, updates happen frequently). Think thousands of updates per minute, how would it scale?
-
It is critical to be consistent and resilient as events are the source of the application's truth
-
No communication between objects needed (like RDBMS) except maybe user id / GUID (I don't know if this was critical or needed yet)
-
Receive events containing from 3 to 10 "columns" (sequence id, event name, date-time, JSON / binary encoded parameter packet, some context data ..). Without targeting your viewpoint in a column-oriented database, it can be document-oriented as long as it meets all the other requirements.
-
Used as a queue or sent / read from an external AMQP system such as RabbitMQ or ZeroMQ (this part hasn't worked yet if you could reason / explain as well ..) as forecast projections will be generated on events
-
Need some sort of filtering with a sequence id such as
SELECT * FROM events WHERE sequence_id > last_sequence_id
for subscribers (or queuing systems) to be able to sync from a given point
I've heard about storing HBase events for CQRS, but maybe MongoDB can fit? Or even Elasticsearch (wouldn't argue with this ...)? I am also open to RDBMS for consistency and accessibility. But what about the section tolerance part ??
Indeed, I am lost, I need arguments to make the right choice.
source to share
https://geteventstore.com/ is a database designed specifically for event streams.
They take the consistency and validity of the source of truth (your events) very seriously, and I myself use it to read / write thousands of events per second.
source to share
I have one working in a production implementation MongoDB
like Event store
. It is used by web app CRM
CQRS
+ Event sourcing
.
To provide a 100% transactional but transactional guarantee of storing multiple events in one go (all events or none of them), I use MongoDB document
like events commit
, and events like nested documents
, as you know, MongoDB
have document-level locking .
For concurrency, I use optimistic locking using a property version
for each Aggregate steam
. Aggregate stream
identified with dublet ( Aggregate class
x Aggregate ID
).
The event store also keeps commits in relative order, using sequence
on each commit
, incrementing on each commit protected with optimistic locking.
Each commit
contains the following:
- aggregateId: string, perhaps
GUID
, - aggregateClass: string,
- version: integer, incremented for each aggregateId x aggregateClass,
- sequence, integer, increment for each commit,
- createdAt: UTCDateTime,
- authenticatedUserId: string or null,
- events: list
EventWithMetadata
,
Each also EventWithMetadata
contains the event class/type
payload as a string (a serialized version of the actual event).
The collection MongoDB
has the following indexes:
-
aggregateId
,aggregateClass
,version
Howunique
-
events.eventClass
,sequence
-
sequence
- other indexes for query optimization
These indexes are used to enforce general rules for storing events (events are not stored for the same version Aggregate
) and to optimize queries (the client can only select certain events - by type - from all streams).
You can use sharding on aggregateId
for scaling if you strip the global event ordering (property sequence
) and move that responsibility to event publisher
, but that complicates things as event publisher
it needs to be synchronized (even in case of failure!) With Event store
. I only recommend doing this if you need it.
Tests for this implementation ( Intel I7
with 8GB
of RAM
):
- total accumulated recording time was: 7.99, rate: 12516 events recorded per second
- total aggregate read time: 1.43, rate: 35036 events read per second
- total read-read time: 3.26, rate: 30679 events read per second
I noticed I MongoDB
was slow on the counting
number of events in the event store. I don't know why, but I don't care as I don't need this feature.
I recommend using MongoDB
both Event store
.
source to share