This is done unsubtlely at the moment and there is plenty of room for
optimisation of assertion location to prevent repeated reasserting as is
done now.
I intend to make this configurable, but there is tight connection
between db.Load and db.Open that is getting in the way of that.
Testing on data set 30kmoviedata.cq.gz created by doing:
zcat 30kmoviedata.nq.gz | sed 's/[<>]//g' | gzip -c > 30kmoviedata.cq.gz
The following query is successful:
[{
"type": "/film/film",
"name": null,
"/film/film/directed_by": {
"name": "David Fincher"
},
"/film/film/starring": [{
"/film/performance/actor": {
"name": null
}
}]
}]
TODO: fix up naming for quads and make strict parsing an option.
G2 code generation used after benchmarking.
style benchmark old ns/op new ns/op delta
T0 BenchmarkParser 672 5631 +737.95%
T1 BenchmarkParser 672 5579 +730.21%
G0 BenchmarkParser 672 4049 +502.53%
G1 BenchmarkParser 672 3868 +475.60%
G2 BenchmarkParser 672 3543 +427.23%
F0 and F1 create massive Go source (6.0M) and so were not tested.
Invalid tests removed, additional tests for invalid input to be added
later.
This is not only the right thing to do, as per the documentation of the
latest release (yesterday) but it should now be backed by git and not
bzr, which is a big plus and won't break our build so much.
Fixes issue #72
This change simplifies interactions with parsing N-Quads and makes
reading datasets more robust. Changes made while here also improve
performance:
benchmark old ns/op new ns/op delta
BenchmarkParser 1058 667 -36.96%
We still use string concatenation which I'm not wildly happy about, but
I think this can be left for a later change.
Initial changes towards idiomatic error handling have been made. More
significant changes are needed, but these have subtle design implication
and need to be thought about more.
30kmoviesdata.nt.gz has been altered to properly escape double quotes.
This was done mechanically and with manual curation to pick up
straglers.