Choosing a long term storage / analysis system?

A quick overview of the project I am working on:

I was hired as a web designer for a small company (part of a large corporation) near the public college I am attending. Over the past couple of months, I and two other trainees have worked both in the outside and in the background. The company develops prototypes by adding sensors to its products (oil and gas industry); we were tasked with creating a portal that customers could log into to see data from their computers, even if they are not near them.

Basically, we collect sensor data (~ ten sensors / machine) and we send it back to us. Where we are stuck is determining the best way to store and analyze long-term data. We have Redis Cache configured for quick access by an interface that only stores the most recent dataset for each machine. But for historical data, I (and my colleagues) have difficulty choosing the best route. Our entire project is based on VS (C # / Razor) with Azure integration (which is amazing by the way), so I would like to keep long term storage there. As far as I can tell, HDinsight + data to BLOB seems to be the best option, but I'm pretty green when it comes to backend solutions. I just would like to get some information from some older developers who may have more experience in this area,since we are the only developers here besides a couple of older members who are more involved in the engineering side of things or development.

So, professionals, what would be your recommendation for long-term data retention and analytics?

PS: I'm sorry if I have HDinsight running. As I understand it, it displays data in BLOB storage in HBase for easier analytics? Hadoop / HBase confuses me.

+3


source to share


1 answer


My first recommendation is Azure Table storage. It provides a scalable and low-cost data archiving solution. If you've styled correctly, you can also get very decent query performance. See the Azure Wood Table Design Guide for details .

My second option would be Azure DocumentDB , which is a NoSQL document database. It costs a little more, but data queries are much more flexible.



You should only go with HDInsight when you have a specific need, as it is a resource intensive and expensive service. Once you have identified a specific requirement for big data analytics, when you import your data and process it with HDInsight.

+1


source







All Articles