Parsing a large NTriples File Python

Question

Parsing a large NTriples File Python

I am trying to parse a fairly large NTriples file using code from Parsing Large RDF in Python

I have installed raptor and redland-bindings for python.

import RDF
parser=RDF.Parser(name="ntriples") #as name for parser you can use ntriples, turtle, rdfxml, ...
model=RDF.Model()
stream=parser.parse_into_model(model,"file:./mybigfile.nt")
for triple in model:
    print triple.subject, triple.predicate, triple.object

However, the program freezes and I suspect that it is trying to load the entire file into memory or something because it does not start right away.

Does anyone know how to solve this?

+3

python rdf n-triples redland

ejang Jan 30 13 at 5:10 am

source to share

1 answer

dajobe · Accepted Answer · 2013-01-30T22:02:17+0000

This is slow because you are reading into in-memory storage (RDF.Model () by default) which is not indexed. Thus, it becomes slower and slower. Parsing the N-Triples really flows out of the file, it never sucks it all into memory.

See the Redland memory modules documentation for an overview of storage models. This is where you probably want to store memory type

"hashes" and hash-type

.

s = RDF.HashStorage("abc", options="hash-type='memory'")
model = RDF.Model(s)

(not verified)

Parsing a large NTriples File Python

More articles: