Commit graph

13 commits

Author SHA1 Message Date
kortschak
6acfdcc5d6 Use concrete value for quad.Quad
Comparison of -short benchmarks in cayley.

$ benchcmp pointer.bench concrete.bench
benchmark                                   old ns/op     new ns/op	delta
BenchmarkNamePredicate                      1673276       1655093	-1.09%
BenchmarkLargeSetsNoIntersection            318985907     261499984	-18.02%
BenchmarkNetAndSpeed                        104403743     41516981	-60.23%
BenchmarkKeanuAndNet                        17309258      16857513	-2.61%
BenchmarkKeanuAndSpeed                      20159161      19282833	-4.35%

Comparison of pathological cases are not so happy.

benchmark                                   old ns/op       new ns/op		delta
BenchmarkVeryLargeSetsSmallIntersection     55269775527     246084606672	+345.24%
BenchmarkHelplessContainsChecker            23436501319     24308906949		+3.72%

Profiling the worst case:

Pointer:
Total: 6121 samples
    1973  32.2%  32.2%     1973  32.2% runtime.findfunc
     773  12.6%  44.9%      773  12.6% readvarint
     510   8.3%  53.2%      511   8.3% step
     409   6.7%  59.9%      410   6.7% runtime.gentraceback
     390   6.4%  66.2%      391   6.4% pcvalue
     215   3.5%  69.8%      215   3.5% runtime.funcdata
     181   3.0%  72.7%      181   3.0% checkframecopy
     118   1.9%  74.6%      119   1.9% runtime.funcspdelta
      96   1.6%  76.2%       96   1.6% runtime.topofstack
      76   1.2%  77.5%       76   1.2% scanblock

Concrete:
Total: 25027 samples
    9437  37.7%  37.7%     9437  37.7% runtime.findfunc
    3853  15.4%  53.1%     3853  15.4% readvarint
    2366   9.5%  62.6%     2366   9.5% step
    2186   8.7%  71.3%     2186   8.7% runtime.gentraceback
    1816   7.3%  78.5%     1816   7.3% pcvalue
    1016   4.1%  82.6%     1016   4.1% runtime.funcdata
     859   3.4%  86.0%      859   3.4% checkframecopy
     506   2.0%  88.1%      506   2.0% runtime.funcspdelta
     410   1.6%  89.7%      410   1.6% runtime.topofstack
     303   1.2%  90.9%      303   1.2% runtime.newstack
2014-08-05 23:25:02 +09:30
kortschak
41f6d3fd84 Temporarily use cquads only
I intend to make this configurable, but there is tight connection
between db.Load and db.Open that is getting in the way of that.

Testing on data set 30kmoviedata.cq.gz created by doing:

zcat 30kmoviedata.nq.gz | sed 's/[<>]//g' | gzip -c > 30kmoviedata.cq.gz

The following query is successful:

[{
  "type": "/film/film",
  "name": null,
  "/film/film/directed_by": {
    "name": "David Fincher"
  },
  "/film/film/starring": [{
    "/film/performance/actor": {
      "name": null
    }
  }]
}]

TODO: fix up naming for quads and make strict parsing an option.
2014-07-28 21:56:32 +09:30
kortschak
401c58426f Create quads hierarchy
* Move nquads into quad.
* Create cquads simplified parser in quad.
* Move Triple (renamed Quad) to quad.

Also made sure mongo actually implements BulkLoader.
2014-07-28 21:36:22 +09:30
kortschak
e6ed23ef7c Merge branch 'master' into parse
Conflicts:
	db/load.go
2014-07-23 06:39:56 +09:30
kortschak
0e0e382d2b Use error returns and interface type for parsing
Fixes issue #72

This change simplifies interactions with parsing N-Quads and makes
reading datasets more robust. Changes made while here also improve
performance:

benchmark           old ns/op     new ns/op     delta
BenchmarkParser     1058          667           -36.96%

We still use string concatenation which I'm not wildly happy about, but
I think this can be left for a later change.

Initial changes towards idiomatic error handling have been made. More
significant changes are needed, but these have subtle design implication
and need to be thought about more.

30kmoviesdata.nt.gz has been altered to properly escape double quotes.
This was done mechanically and with manual curation to pick up
straglers.
2014-07-22 20:34:37 +09:30
kortschak
9bf09a5db5 Add transparent input decompression
This supports gzip and bzip2 by magic number determination.

Trailing whitespace differences in documentation due to opinionated
editor.
2014-07-19 12:49:55 +09:30
Jeremy Jay
f9c60a5f30 update names per discussion at google/cayley#38 2014-07-18 11:17:57 -04:00
Jeremy Jay
d808d9347c move to registry interface for backends 2014-07-16 16:49:55 -04:00
Alex Peters
4d34ea50cc Fix typos and minor cleanup 2014-07-10 13:19:30 +02:00
kortschak
40f3363cde Destutter graph/... 2014-06-28 13:29:16 +09:30
kortschak
388618bfa7 Destutter cayley/config 2014-06-28 12:42:15 +09:30
kortschak
c4a19a4e35 Simplify names in cmd source 2014-06-28 12:38:51 +09:30
kortschak
639559544d Reorganise to make cmd code more prominant 2014-06-28 02:14:09 +09:30