Choosing a NoSQL database to store events in an application developed by CQRS

I'm looking for a good, up-to-date and explanatory "help solution" on how to choose a NoSQL database engine to store all events in a CQRS developed application.

I am currently new to everything about NoSQL (but learning): be clear and feel free to explain your point in a (almost too) precise manner. This post might deserve other newbies like me.

This database will:

  • Be able to insert from 2 to 10 rows on updates given by front view (in my case, updates happen frequently). Think thousands of updates per minute, how would it scale?

  • It is critical to be consistent and resilient as events are the source of the application's truth

  • No communication between objects needed (like RDBMS) except maybe user id / GUID (I don't know if this was critical or needed yet)

  • Receive events containing from 3 to 10 "columns" (sequence id, event name, date-time, JSON / binary encoded parameter packet, some context data ..). Without targeting your viewpoint in a column-oriented database, it can be document-oriented as long as it meets all the other requirements.

  • Used as a queue or sent / read from an external AMQP system such as RabbitMQ or ZeroMQ (this part hasn't worked yet if you could reason / explain as well ..) as forecast projections will be generated on events

  • Need some sort of filtering with a sequence id such as SELECT * FROM events WHERE sequence_id > last_sequence_id

    for subscribers (or queuing systems) to be able to sync from a given point

I've heard about storing HBase events for CQRS, but maybe MongoDB can fit? Or even Elasticsearch (wouldn't argue with this ...)? I am also open to RDBMS for consistency and accessibility. But what about the section tolerance part ??

Indeed, I am lost, I need arguments to make the right choice.

+1


source to share


2 answers


https://geteventstore.com/ is a database designed specifically for event streams.



They take the consistency and validity of the source of truth (your events) very seriously, and I myself use it to read / write thousands of events per second.

+2


source


I have one working in a production implementation MongoDB

like Event store

. It is used by web app CRM

CQRS

+ Event sourcing

.

To provide a 100% transactional but transactional guarantee of storing multiple events in one go (all events or none of them), I use MongoDB document

like events commit

, and events like nested documents

, as you know, MongoDB

have document-level locking .

For concurrency, I use optimistic locking using a property version

for each Aggregate steam

. Aggregate stream

identified with dublet ( Aggregate class

x Aggregate ID

).

The event store also keeps commits in relative order, using sequence

on each commit

, incrementing on each commit protected with optimistic locking.

Each commit

contains the following:

  • aggregateId: string, perhaps GUID

    ,
  • aggregateClass: string,
  • version: integer, incremented for each aggregateId x aggregateClass,
  • sequence, integer, increment for each commit,
  • createdAt: UTCDateTime,
  • authenticatedUserId: string or null,
  • events: list EventWithMetadata

    ,

Each also EventWithMetadata

contains the event class/type

payload as a string (a serialized version of the actual event).

The collection MongoDB

has the following indexes:



  • aggregateId

    , aggregateClass

    , version

    Howunique

  • events.eventClass

    , sequence

  • sequence

  • other indexes for query optimization

These indexes are used to enforce general rules for storing events (events are not stored for the same version Aggregate

) and to optimize queries (the client can only select certain events - by type - from all streams).

You can use sharding on aggregateId

for scaling if you strip the global event ordering (property sequence

) and move that responsibility to event publisher

, but that complicates things as event publisher

it needs to be synchronized (even in case of failure!) With Event store

. I only recommend doing this if you need it.

Tests for this implementation ( Intel I7

with 8GB

of RAM

):

  • total accumulated recording time was: 7.99, rate: 12516 events recorded per second
  • total aggregate read time: 1.43, rate: 35036 events read per second
  • total read-read time: 3.26, rate: 30679 events read per second

I noticed I MongoDB

was slow on the counting

number of events in the event store. I don't know why, but I don't care as I don't need this feature.

I recommend using MongoDB

both Event store

.

+3


source







All Articles