Moving through all the nodes and comparing each to each other

I am working on a small project and I have a dataset about 60k long and a 500k relationship between those nodes. Nodes are of two types. The first type is recipes and the second type is ingredients. The recipes are made up of ingredients such as:

    (ingredient)-[:IS_PART_OF]->(recipe)

      

My goal is to find how many common ingredients two recipes share. I was able to get this information with the following query, which compares one recipe to all the others (the first one to all the others):

   MATCH (recipe:RECIPE{ ID: 1000000 }),(other)
   WHERE (other.ID >= 1000001 AND other.ID <= 1057690)
   OPTIONAL MATCH (recipe:RECIPE)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-                 [:IS_PART_OF]->(other)
   WITH ingredient, other
   RETURN other.ID, count(distinct ingredient.name)
   ORDER BY other.ID DESC

      

My first question: how can I get the amount of all ingredients of two recipes so that reciprocal is counted only once (combining R1 and R2 -> R1 U R2)

My second question is, is it possible to write a loop that will iterate over all the recipes and check the common ingredients? The goal is to compare each recipe with all others . I think this should return strings (n-1) * (n / 2).

I tried this and the problem persists. Even with LIMIT

and SKIP

I can't run the code across the entire set. I modified my query so that it can split my set appropriately:

MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >=   1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient.name) AS MutualIngredients, recipe2.ID
ORDER BY recipe1.ID

      

Until I get the best car, that will be enough.

I still haven't solved my first question: how can I get the amount of all ingredients of two recipes so that reciprocal is counted only once (combining R1 and R2 -> R1 U R2)

+3


source to share


2 answers


You will need to play with this, but it will be something similar to this:

MATCH (recipe1:RECIPE)<-[:IS_PART_OF]-(ingred:INGREDIENT)-[:IS_PART_OF]->(recipe2:RECIPE)
WHERE ID(recipe1) < ID(recipe2)
RETURN recipe1, collect(ingred.name), recipe2
ORDER BY recipe1.ID

      

The matching template gives you all the common ingredients between the two recipes. The proposal WHERE

ensures that you don't compare the recipe to yourself (because it will share all the ingredients with itself). The return clause just gives you the two recipes you are comparing and what they have in common.



It will be O (n ^ 2) but very slow .

UPDATE took Nicole's suggestion, which is good. This is to ensure that each pair is only considered once.

+2


source


SOLVED: Just to share it in case anyone needs it:



    MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
    MATCH (recipe1)<-[:IS_PART_OF]-(ingredient1:INGREDIENT)
    MATCH (recipe2)<-[:IS_PART_OF]-(ingredient2:INGREDIENT)
    WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >=   1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
    RETURN recipe1.ID, count(distinct ingredient1.name) + count(distinct ingredient2.name) - count(distinct ingredient.name)  AS RecipesUnion, recipe2.ID
    ORDER BY recipe1.ID

      

0


source







All Articles