KDB: select rows matching two updated columns

Consider financial quote data where the bid and query are not always updated at the same time. I would like to select only the rows in which both the rate and the request are reflected at new market levels. In the table below, this would correspond to the selection of rows t1, t5, t7, t9. Any elegant way to do this? (Alternatively, I would like to exclude lines t2, t3, t4, which correspond to times when only one of the bid / ask was updated).

time bid ask
t1 12 13
t2 12 14
t3 12 14
t4 12 14
t5 13 14
t6 13 14
t7 14 15
t8 14 15
t9 13 14

      

+3


source to share


2 answers


This should do the trick (haven't experienced its huge amount)

tab:([] time:`t1`t2`t3`t4`t5`t6`t7`t8`t9;bid:12 12 12 12 13 13 14 14 13;ask:13 14 14 14 14 14 15 15 14)

q)select from tab where differ {$[all x<y;y;x]}\[flip sums each differ each (bid;ask)]
time bid ask
------------
t1   12  13 
t5   13  14 
t7   14  15 
t9   13  14 

Another example which contains more edge cases:

tab:([] time:`g`b`b`b`b`g`b`b`g`b`g`g`b`g;bid:12 12 12 12 12 13 13 14 13 13 14 13 14 14;ask:13 13 14 15 14 14 14 14 15 16 16 15 15 16)

q)select from tab where differ {$[all x<y;y;x]}\[flip sums each differ each (bid;ask)]
time bid ask
------------
g    12  13 
g    13  14 
g    13  15 
g    14  16 
g    13  15 
g    14  16 

      

There might be a slightly cleaner way to do this, but I would test this method for now.



EDIT: It's more efficient to flip after sums - changed above.

The previous approach I was looking at only used boolean values ​​from different ones in each column. This method will work (and may be more intuitive), but is less efficient in both time and memory, so I'll stick with what's close to the first approach above.

scanner:{if[all x;x:not x];$[(y&z)|(x[0]&z)|x[1]&y;11b;x|(y;z)]}

q)select from tab where all each scanner\[00b;differ bid;differ ask]
time bid ask
------------
t1   12  13
t5   13  14
t7   14  15
t9   13  14

      

+1


source


I've tried a different approach that takes less time but more memory. It looks like this:

Step 1: compute a table with rows where "ask" changes. Then remove the rows from this staging table where rate = previous rate.

Step 2: calculate the table with rows where the rate changes. Then remove the rows from that staging table where ask = previous query. {(Select the tab where the request is distinguished, the rate <> prev)}

Step 3: Join Table 1 and 2



I used the table from @terrylench's example. I used actual time values ​​because I need this column for sorting purposes.

     q)tab:([] time:.z.T+til 14;bid:12 12 12 12 12 13 13 14 13 13 14 13 14 14;ask:13 13 14 15 14 14 14 14 15 16 16 15 15 16)
     q)`time xasc distinct (select from tab where differ ask,bid<>prev bid) upsert (select from tab where differ bid,ask<>prev ask)

      


time         bid ask
--------------------
10:45:02.530 12  13
10:45:02.535 13  14
10:45:02.538 13  15
10:45:02.540 14  16
10:45:02.541 13  15
10:45:02.543 14  16

      

0


source







All Articles