Cypher Batch update forever

I was doing a POC on a publicly available Twitter dataset for our project. I was able to create a Neo4j database for it using the Michael Hunger Batch Inserter utility and it was relatively fast (it took only 2 hours and 53 minutes to complete). In total there were 15,203,731 Nodes with 2 properties (name, address) 256,147,121 Relationships, with 1 property

I have now created a Cypher query to update the Twitter database. I added a new property (age) to Node and a new relationship property (FollowedSince) to CSV. Now things are starting to look bad. The relationship update request (see below) is executed forever.

USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM {csvfile} AS row FIELDTERMINATOR '\t'
MATCH (u1:USER {name:row.`name:string:user`}), (u2:USER {name:row.`name:string:user2`})
MERGE (u1)-[r:Follows]->(u2)
ON CREATE SET r.Property=row.Property, r.FollowedSince=row.FollowedSince
ON MATCH SET r.Property=row.Property, r.FollowedSince=row.FollowedSince;

      

I already pre-created the index by running

CREATE INDEX ON :USER(name); 

      

My neo4j property:

allow_store_upgrade=true
dump_configuration=false
cache_type=none
use_memory_mapped_buffers=true
neostore.propertystore.db.index.keys.mapped_memory=260M
neostore.propertystore.db.index.mapped_memory=260M
neostore.nodestore.db.mapped_memory=768M
neostore.relationshipstore.db.mapped_memory=12G
neostore.propertystore.db.mapped_memory=2048M
neostore.propertystore.db.strings.mapped_memory=2048M
neostore.propertystore.db.arrays.mapped_memory=260M

node_auto_indexing=true

      

I would like to know what should I do to speed up the Cypher query? At the time of this writing, more than an hour and a half have passed, and my relationship (10,000 747) has not yet ended. Node (15,203,731), which finished earlier, got 34 minutes, which I think is too long. (The Batch Inserter processed the entire Node in just 5 minutes!)

I tested my queries on a small dataset to try first before tackling a larger dataset and it actually worked.

My Neo4j lives on a server machine, so the hardware isn't an issue here.

Any advice please? Thank.

+3


source to share





All Articles