How to store and compress data for real-time logging?

When developing software that records input signals (numbers) in real time, how is this data best stored and compressed? Could the SQL engine be good for this, allowing fast data collection in the future, or are there other data formats that would be suitable or compressed for up to 1000 data samples per second?

I don't mind building in VC ++, but ideas applicable to C # would be ideal.

+2


source to share


2 answers


It's hard to tell without additional information, like what is the source, you will need to query the stored data, and so on.

But for 1000 samples per second, you should reliably look at keeping a few seconds of data in memory and then writing it in bulk to persistent storage on another thread. (A multiprocessor machine is recommended).

If you choose to do this in a managed language, keep the same data structure for storing samples - so the GC doesn't need to collect memory too often. You can get the minimum best performance with pointers and the unsafe keyword (provides direct access to the memory structure and eliminates bounds checking code for arrays).



I don't know how much CPU time you need to collect each sample; and how critical is the time to read each sample at a given time (will they be buffered in the device you are reading?). If sampling is time critical, you have 1 ms per sample; and then you probably cannot afford the risk of being garbage collected as it will block your thread for a while. In this case, I would go for an unmanaged approach.

SQL Server can easily store your data, or you can write it to a file. It mainly depends on what you need to do with the data later. I don't know how much data each sample is, but let's say it's 8 bytes. Then you have 8000 bytes per second to write the raw data - you might have some overhead, so it might be 10kB / s. Most storage engines I can think of will be able to write data at this rate. Just make sure you are writing on a different thread than the one that is sampling.

+2


source


You might want to look at time series databases rather than relational ones. They will be optimized for the data handling and use you are considering.



Kx is a popular choice and Fame .

+2


source







All Articles