Parsing a large NTriples File Python

I am trying to parse a fairly large NTriples file using code from Parsing Large RDF in Python

I have installed raptor and redland-bindings for python.

import RDF
parser=RDF.Parser(name="ntriples") #as name for parser you can use ntriples, turtle, rdfxml, ...
model=RDF.Model()
stream=parser.parse_into_model(model,"file:./mybigfile.nt")
for triple in model:
    print triple.subject, triple.predicate, triple.object

      

However, the program freezes and I suspect that it is trying to load the entire file into memory or something because it does not start right away.

Does anyone know how to solve this?

+3


source to share


1 answer


This is slow because you are reading into in-memory storage (RDF.Model () by default) which is not indexed. Thus, it becomes slower and slower. Parsing the N-Triples really flows out of the file, it never sucks it all into memory.

See the Redland memory modules documentation for an overview of storage models. This is where you probably want to store memory type

"hashes" and hash-type

.



s = RDF.HashStorage("abc", options="hash-type='memory'")
model = RDF.Model(s)

      

(not verified)

+2


source







All Articles