F # GPU programming vs KDB for data crunching, which is fastest?

Hi, I would like to ask someone to know what is the most economical and efficient way of crunching huge amounts of data with F # GPUs (for example using a Nivida GPI api type C program) while programming against KDB to crunch data.

I know both approaches are completely different, but they just want advice from people who have worked before investing in one or both technologies.

For the GPU side, I am planning to work with hanging relational DB or NoSQL DB like mongodb using separate tables and simple joins from 2-3 other tables.

Does anyone know of any metrics or comparisons (speed mostly) between both approaches?

+3


source to share


2 answers


As others have said, too much depends on your use case which is faster. I previously helped build a 15 query test framework and some algorithmic strategies against several different stock databases:

  • PostGreSQL
  • mysql - in memory version
  • mongodb - for supported queries
  • KDB
  • plus a few other new nosql and column oriented databases.


Database

kdb was significantly faster than the ones mentioned above in most requests. One database was close in performance, but it was significantly more difficult to get it to do the calculations I wanted.

Not. I cannot give hard numbers because it is against the terms of some database vendors. But I would like to emphasize that if you are going to build a system, the skills of your team should influence the choice. Plus, you can quickly change the system and its programming.

+3


source


In my honest opinion, it is much easier to create complex queries in KDB (and understand them later) than "something like MongoDB".

I'm a F # fan too.

Now, either F # or KDB + can help you think in a GPU-compatible way (array, whole task, less linear, parallelism). Whichever choice you choose, think about the process that will get you there, and whether you are locked in one particular worldview or not.

When it comes to modeling, context is pretty important. It really depends on what models you want to run and how the bandwidth factors are.



KDB + agility, speed and speed are amazing. Likewise, F # is great for type safety and for materials-based research like life sciences.

Nothing prevents you from using both together. Oh, and the 32-bit version of KDB + can now be used in both commercial and non-commercial ways.

Like John, I have also tried many variations from BerkeleyDB and above. In particular, the non-KDB + columnar options lacked multiple ways (not just performance). I looked at it from a kernel perspective and even spoke to some of the engineers who were working on those kernels when the sales teams gave up. There are underlying reasons why KDB +, beyond benchmarks, is a smart way to move forward.

Speed ​​is a factor that weighs more or less depending on the application. Other factors and how they relate to product cards are probably universal.

+1


source







All Articles