Processing CSV files from the Internet using the built-in Java database
Short version: if I don't want to store data for long, how can I create a database programmatically in HSQLDB and load some CSV data into it? My schema will match the files exactly, and the files have matching column names.
This is a raw process.
More details:
I need to apply some simple SQL techniques to three CSV files being downloaded over the internet and then create some DTOs that I can then use with some existing code to process them some more and save them via REST. I don't really want to mess with the databases, but the CSV files are associated with foreign keys, so I thought about using an embedded in-memory database to do the job and then threw away all of it.
I was referring to a command line application that works like this:
- Create a new database in HSQLDB.
- Run three HTTP GETs on three threads using Apache HttpClient.
- Import CSV into three HSQLDB MEMORY.
- Run some SQL.
- Parse the results into my existing DTOS.
- Etc ...
I could use code pointers and utilities useful for items 1 and 3. Also is there an alternative to HSQLDB that I should consider?
source to share
The command line application you are using is the SqlTool utility provided with HSQLDB. Your procedure can be completed as follows:
- Create a new in-memory HSQLDB database (just connect to in-memory database).
- Run three HTTP GETs using Apache HttpClient to get CSV files.
- Create three HSQLDB TEXT tables and set the SOURCE of these tables to CSV
- Run some SQL. Parse the results into your existing DTOs.
Creating TEXT tables in pure in-memory tables was not possible when the question was asked. It is now fully supported in HSQLDB 2.x versions.
source to share