Efficient Datomic Query for Filtering on Partitioned Sets

Given that Datomic does not support pagination I am wondering how to efficiently support a query, e.g . :

Take the first 30 objects on :history/body

, find the objects whose :history/body

matches some regexp.

This is how I would make a regex:

{:find [?e]
 :where [[?e :history/body ?body]
         [(re-find #"foo.*bar$" ?body)]]}

      

remarks:

  • Then I could (take ...)

    , but this is not the same as matching with the first 30 objects.
  • I could get all the entities, take 30

    and then manually filter with re-find

    , but if I have 30M entities, getting all of them just before take 30

    seems wildly inefficient. Also: what if I wanted to take 20M from my 30M objects and filter them through re-find

    ?

The Datomic docs talks about how queries are done locally, but I tried doing in-memory conversions on a set of 52913 objects (provided, they are completely touch

ed) and it takes ~ 5 seconds. Imagine how bad they will be in the millions or 10 million million.

+3


source to share


1 answer


(Just brainstorming, here)

First of all, if you ever use regexp, you might want to consider the full text index: history / body so you can:

[(fulltext $ :history/body "foo*bar") [[?e]]]

      

(Note: you cannot change :db/fulltext true/false

to an existing entity schema)

Sorting is what you need to do outside of the query. But depending on your data, you might want to restrict your query to one "page" and then apply your predicate to only those objects.



For example, if we were only paging :history

entities using auto-incrementing :history/id

, then we know in advance that "Page 3" is :history/id

between 61 and 90.

[:find ?e
 :in $ ?min-id ?max-id
 :where
 [?e :history/id ?id]
 (<= ?min-id ?id ?max-id)
 (fulltext $ :history/body "foo*bar") [[?e]]]

      

Perhaps something like this:

(defn get-filtered-history-page [page-n match]
  (let [per-page 30
        min-id (inc (* (dec page-n) per-page))
        max-id (+ min-id per-page)]
    (d/q '[:find ?e
           :in $ ?min-id ?max-id ?match
           :where
           [?e :history/id ?id]
           [(<= ?min-id ?id ?max-id)]
           [(fulltext $ :history/body ?match) [[?e]]]]
      (get-db) min-id max-id match)))

      

But of course the problem is that containment of a grouped set is usually based on an order you don't know about in advance, so it's not very useful.

+1


source







All Articles