Duplicate Key in Deedle Series Index

I have a list of events that are happening on the system. My goal is to take a list of events and create a series sliding window to define the event events. Events are loaded into the event list from the application outside of this problem area.

Since the system can receive events from multiple sources at the same time, some of the event timestamps (the value I use as the key for the series) are the same. What's the correct way to achieve this?

This is the error I am getting:

An unhandled exception of type 'System.ArgumentException' occurred in Deedle.dll

Additional information: Duplicate key '6/12/2015 3:14:43 AM'. Duplicate keys are not allowed in the index.

      

My code:

let mutable events = new ResizeArray<StreamEvent>()
let getSeries =
    let eventsKvp = events |>  Seq.map(fun(event) -> new KeyValuePair<DateTime,StreamEvent>(event.OccuredAt,event))
        let series = Series(eventsKvp)
    series |> Series.windowDist (TimeSpan(0, 0, 0,30))

      

Update # 1

What is not shown here is some C # code that starts some F # Stream objects and adds events through the Stream.ProcessEvent method. This code is irrelevant for the problem I'm having here.

I no longer get duplicate key issue, but I get the error Additional information: Floating window aggregation and chunking is only supported on ordered indices.

.

Update # 2 I needed to use sortByKey instead of sort.

Here is my F # code:

namespace Storck.Data
open System
open System.Collections.Generic
open Deedle

type EventType =
    | ClientConnected
    | ClientDisconnect

type Edge(id:string,streamId:string) = 
    member this.Id = id
    member this.StreamId = streamId
    member this.Edges =  new ResizeArray<Edge>() 

type StreamEvent(id:string,originStreamId:string,eventType:EventType,ocurredAt:DateTime) = 
    member this.Id = id
    member this.Origin = originStreamId
    member this.EventType = eventType
    member this.OccuredAt = ocurredAt
    override this.Equals(o) =
        match o with
        | :? StreamEvent as sc -> this.Id = sc.Id
        | _ -> false
    override this.GetHashCode() =
        id.GetHashCode()
    interface System.IComparable with
        member this.CompareTo(o) =
            match o with
            | :? StreamEvent as sc -> compare this.Id sc.Id
            | _ -> -1

type Client(id:string) = 
    member this.Id=id
type Key = 
  | Key of DateTime * string
  static member (-) (Key(a, _), Key(b, _)) = a - b
  override x.ToString() = let (Key(d, s)) = x in d.ToString() + ", " + s

  type Stream(id:string, origin:string) = 
    let mutable clients = new   ResizeArray<Client>()
    let mutable events = new ResizeArray<StreamEvent>()

    member this.Events =  clients.AsReadOnly()
    member this.Clients = clients.AsReadOnly()
    member this.Id = id
    member this.Origin = origin
    member this.Edges =  new ResizeArray<Edge>() 
    member this.ProcessEvent(client:Client,event:StreamEvent)  =  
        match event.EventType with
            |EventType.ClientConnected -> 
                events.Add(event)
                clients.Add(client)
                true
            |EventType.ClientDisconnect -> 
                events.Add(event)
                let clientToRemove = clients |> Seq.find(fun(f)-> f.Id = client.Id)
                clients.Remove(clientToRemove)
    member this.GetSeries() =       
        let ts = series [ for e in events -> Key(e.OccuredAt, e.Id) => e ]
        ts |> Series.sortByKey |> Series.windowDist (TimeSpan(0, 0, 0,30))

      

+3


source to share


1 answer


One of the design decisions we made in Deedle is that a series can be viewed as a continuous series (not a sequence of events) and therefore Deedle does not allow duplicate keys (which makes sense for events, but not for time series).

I wish there was good support for things like your script - which is what we're thinking about the next version, but I'm not sure how best to do that.

As Fedor suggests in the comments, you can use a unique index that consists of the date along with something (either a source or just an ordinal index).

If you define an operator -

on your key, then you can even use a function windowDist

:

type StreamEvent = { OccuredAt : DateTime; Source : string; Value : int }

/// A key combines date with the source and defines the 
/// (-) operator which subtracts the dates returning TimeSpan
type Key = 
  | Key of DateTime * string
  static member (-) (Key(a, _), Key(b, _)) = a - b
  override x.ToString() = let (Key(d, s)) = x in d.ToString() + ", " + s

      

We can now create a bunch of event examples:



let events = 
  [ { OccuredAt = DateTime(2015,1,1,12,0,0); Source = "one"; Value = 1 }
    { OccuredAt = DateTime(2015,1,1,12,0,0); Source = "two"; Value = 2 }
    { OccuredAt = DateTime(2015,1,1,13,0,0); Source = "one"; Value = 3 } ]

      

Here, I'll use a built-in function series

with the Deedle operator =>

to create a series that maps keys to values:

let ts = series [ for e in events -> Key(e.OccuredAt, e.Source) => e ]

      

And we can use the function windowDist

because the key type supports -

!

ts |> Series.windowDist (TimeSpan(0, 0, 0,30))

      

+4


source







All Articles