How to store data with tree structure in Julia
I want to keep high frequency financial data in memory while I work with it in Julia.
My data is in lots of Float64 arrays. Each array stores high frequency data for one day, for some security, in some market. For example, for the date 2010-01-04 for IBM listed on the NYSE (New York Stock Exchange), there is one array Float64.
As said, I have many of these arrays spanning multiple dates, markets and securities. I want to store them all in one object so that it is easy to get any given array (perhaps using a metadata tree structure).
In Matlab, I used this in a framework where the first level is the market, the next level is security, the next level is the date, and then at the end of the tree is the corresponding array. At each level, I also kept a list of fields at that level.
Julia doesn't really have an equivalent to Matlab structures , so what's the best way to do this in Julia?
Currently, the best I can find is a sequence of nested composite types, each with two fields. For example:
type HighFrequencyData
dateList::Array{Date, 1}
dataArray::Array{Any, 1}
end
where dateList
is stored a list of dates corresponding to the sequence of Float64 arrays stored in dataArray
(i.e. dateList
and dataArray
will have the same length). Then:
type securitiesData
securityList::Array{String, 1}
highFrequencyArray::Array{Any, 1}
end
where securityList
is stored a list of securities matching the type sequence HighFrequencyData
stored in highFrequencyArray
. Then:
type marketsData
marketList::Array{String, 1}
securitiesArray::Array{Any, 1}
end
where marketList
is stored a list of markets that match the type sequence securitiesData
stored in securitiesArray
.
With this in mind, now all data can be stored in a type variable marketsData
and will be searched using marketList
, securityList
and dateList
at each nesting level.
But that seems a little cumbersome ...
source to share
Your type hierarchy looks ok, but maybe dictionaries are all you need?
all_data = ["Market1" => {
["Sec1" => {[20140827, 20140825], [1.05, 10.6]}],
["Sec2" => {[20140827, 20140825], [1.05, 10.6]}]},
"Market2" => {
["Sec1" => {[20140827, 20140825], [1.05, 10.6]}],
["Sec2" => {[20140827, 20140825], [1.05, 10.6]}]},
...]
println(all_data["Market1"]["Sec1"] ./ all_data["Market2"]["Sec1"])
If you could post what the MATLAB code looks like, that might be helpful too.
I would reformulate your types a bit, maybe something simpler like
type TimeSeries
dates::Vector{Date}
data::Vector{Any}
end
typealias Security (String,TimeSeries)
typealias Market Vector{Security}
markets = Market[]
push!(markets, [("Sec1",TimeSeries(...)), ("Sec2",TimeSeries(...)])
Also, don't forget to check out https://github.com/JuliaStats/TimeSeries.jl
source to share