Use Hadoop as MySQL storage engine?

also, using Hive, is it a good idea to run ad hoc query on large scale log data for HDFS for SQL programmers?

Is there a similar open source implementation?


source to share

2 answers

Technically, it shouldn't be that hard to implement. Some conceptual issue I see with is showing that the behavioral behavior of NoSQL engines is fundamentally different from what MySQL expects from storage systems. In particular, they have good random access and are not effective in full or full scans. The question is whether it will be possible to transfer all these costs to the optimizer. This something applies to any RDBMS engine. In fact, many of them have the concept of storage plugins and have varying levels of flexibility / documentation.
I believe that for effective integration we need to be able to outperform predicates to NoSQL engines for full scan / range. I'm not 100% sure that MySQL supports it at the storage engine interface level.
Another major problem I see with this approach is the fact that MySQL does not have a parallel query and cannot be too good at handling big data.



I searched for a question in 2014 and found Infinidb and a blog about it. It combines hadoop and mysql. It provides its own mysql protocol for storing data stored in hadoop.

I don't read much about it, although it is doubtful for me in compatibility (with an existing application for mysql) and performance (compare to a well-tuned index and data partitioning).

But it might be the simplest HA solution with a really large dataset that can't fit across multiple drives. (using built-in HDFS replication, no SAN or RAID required)

BTW, the Infinidb site is currently affected by the Heartbleed bug. I wonder if their product is still protected by a patch that has been heard for over 5 months.



All Articles