Find objects, ref-to-many attribute contains all input elements

Question

Find objects, ref-to-many attribute contains all input elements

Suppose I have an object entry

with ref-to-many attribute :entry/groups

. How do I create a query to find objects whose attribute :entry/groups

contains all of my input external IDs?

The following pseudocode illustrates my question better:

[2 3] ; having this as input foreign ids

;; and having these entry entities in db
[{:entry/id "A" :entry/groups  [2 3 4]}  
 {:entry/id "B" :entry/groups  [2]}     
 {:entry/id "C" :entry/groups  [2 3]}  
 {:entry/id "D" :entry/groups  [1 2 3]}
 {:entry/id "E" :entry/groups  [2 4]}] 

;; only A, C, D should be pulled

Being new to Datomic / Datalog I've exhausted all possibilities, so any help is appreciated. Thank!

+3

clojure datomic datalog

Twice_Twice May 04 '17 at 13:28

source to share

2 answers

You can see an example of this in the James Bond example from the Tupelo-Datomic library. You just specify 2 sentences, one for each desired value in the set:

; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set   (td/find :let    [$ (live-db)]
                           :find   [?name]
                           :where  {:person/name ?name :weapon/type :weapon/guile }
                                   {:person/name ?name :weapon/type :weapon/gun } ) ]
  (is (= #{["Dr No"] ["M"]} tuple-set )))

In pure Datomic it will look similar but using something like Entity ID:

[?eid :entry/groups 2]
[?eid :entry/groups 3]

while Datomic will perform an implicit operation AND

(i.e. both clauses must match, any redundant entries are ignored). This is logically the "union" of the operation, even if it is the same object that is being requested for both values. You can find more information in the Datomic docs .

+2

Alan thompson 04 May '17 at 16:08

source to share

Valentin Waeselynck · Accepted Answer · 2017-05-05T15:04:54+0000

TL; DR

You are solving the general "dynamic join" problem in the Datomic Datalog.

3 strategies:

Write a dynamic Datalog query that uses 2 negatives and 1 disjunction or recursive rule (see below).
Create request code (equivalent to Alan Thompson's answer): flaws are the usual flaws of generating Datalog clauses dynamically, i.e. you are not using query caching .
Use indexes directly (EAVT or AVET).

Dynamic query request

Datalog has no direct way to express dynamic conjunction (boolean AND / 'for all ...' / set intersection). However, you can achieve this in a pure Datalog by combining one disjunction (logical OR / 'exists ...' / set union) and two negatives, i.e.(For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))

In your case, this can be expressed as:

[:find [?entry ...] :in $ ?groups :where
  ;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic Datalog, which will refuse to scan the whole db)
  ;; NOTE: this imposes ?groups cannot be empty!
  [(first ?groups) ?group0]
  [?entry :entry/groups ?group0]
  ;; here comes the double negation
  (not-join [?entry ?groups]
    [(identity ?groups) [?group ...]]
    (not-join [?entry ?group]
      [?entry :entry/groups ?group]))]

Good news: This can be expressed as a very general Datalog rule (which I can add to Datofu ):

[(matches-all ?e ?a ?vs)
 [(first ?vs) ?v0]
 [?e ?a ?v0]
 (not-join [?e ?a ?vs]
   [(seq ?vs) [?v ...]]
   (not-join [?e ?a ?v]
     [?e ?a ?v]))]

... which means your request can now be expressed as:

[:find [?entry ...] :in % $ ?groups :where
 (matches-all ?entry :entry/groups ?groups)]

NOTE: there is an alternative implementation using the recursive rule :

[[(matches-all ?e ?a ?vs)
  [(seq ?vs)]
  [(first ?vs) ?v]
  [?e ?a ?v]
  [(rest ?vs) ?vs2]
  (matches-all ?e ?a ?vs2)]
 [(matches-all ?e ?a ?vs)
  [(empty? ?vs)]]]

This has the advantage of accepting an empty collection ?vs

(so far ?e

and ?a

have been linked in some other way in the request).

Generating the request code

The advantage of generating the query code is that it is relatively simple in this case and can possibly make the query execution more efficient than the more dynamic alternative. The disadvantage of generating Datalog queries in Datomic is that you might lose the benefits of query plan caching; so even if you are going to generate queries you still want to make them as general as possible (depending on the number of values v

)

(defn q-find-having-all-vs 
  [n-vs]
  (let [v-syms (for [i (range n-vs)]
                 (symbol (str "?v" i)))]
    {:find '[[?e ...]]
     :in (into '[$ ?a] v-syms)
     :where 
     (for [?v v-syms]
       ['?e '?a ?v])}))

;; examples    
(q-find-having-all-vs 1)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0],
    :where 
    ([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
    :in [$ ?a ?v0 ?v1], 
    :where
    ([?e ?a ?v0] 
     [?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0 ?v1 ?v2], 
    :where 
    ([?e ?a ?v0] 
     [?e ?a ?v1]
     [?e ?a ?v2])}


;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
  db :entry/group groups)

Use indexes directly

I'm not sure how efficient the above approaches are in the current Datomic Datalog implementation. If your benchmarking shows it to be slow, you can always go back to direct access to the index.

Here's an example in Clojure using the AVET index:

(defn find-having-all-vs
  "Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
  returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
  [db a vs]
  ;; DISCLAIMER: a LOT can be done to improve the efficiency of this code! 
  (apply clojure.set/intersection 
    (for [v vs]
      (into #{} 
        (map :e)
        (d/datoms db :avet a v)))))

Find objects, ref-to-many attribute contains all input elements

TL; DR

Dynamic query request

Generating the request code

Use indexes directly

More articles: