NiFi how to store stream data in memory or disks
Can someone explain in detail how NiFi processors like GetFile or QueryDatabaseTable store rows when the next processor is not available to receive or process any data? Will the data be loaded into memory and then replaced with disks if the size exceeds a certain threshold? This could potentially run out of memory or data loss?
source to share
I would recommend reading the Apache NiFi documentation, in particular the Apache NiFi in Depth document, to understand how data is stored and transmitted over NiFi:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
The short answer is that data is always written to disk in the internal NiFi storage. The stream file has attributes that are stored in the stream file repository and content that is stored in the content repository. The content is not stored in memory, unless the processor wants to read all of the content in memory to do some processing.
When streaming files are queued, no content is stored in memory, only stream objects that know where the content lives on disk. When the queue reaches a certain size, these stream file objects will be swapped out to disk, allowing you to queue with millions of stream files without actually having a million stream file objects.
There is also the concept of back pressure to control the maximum queue size based on the number of files in a stream or the size of all files in a stream in the queue.
source to share