Improve findDocuments performance #2935

Closed
opened 2016-06-28 15:36:03 +00:00 by wjt · 4 comments

In #2890 I added support for querying by entity, but it turns out this performs really badly when there are many document↔entity relationships. So I've made a stab at fixing this (basically convincing PostgreSQL not to construct the full document, documentproperties, entity join before filtering), plus making position queries a bit faster. Both patches are sort of proof-of-concept-y; particularly the position one is a bit half-hearted.

In #2890 I added support for querying by entity, but it turns out this performs really badly when there are many document↔entity relationships. So I've made a stab at fixing this (basically convincing PostgreSQL not to construct the full document, documentproperties, entity join before filtering), plus making position queries a bit faster. Both patches are sort of proof-of-concept-y; particularly the position one is a bit half-hearted.
wjt added the
backend
label 2016-06-28 15:36:03 +00:00
wjt added this to the 14.04 milestone 2016-06-28 15:36:03 +00:00
j was assigned by wjt 2016-06-28 15:36:03 +00:00
wjt added the
normal
enhancement
labels 2016-06-28 15:36:03 +00:00
Author

(https://gitlab.com/wjt/pandora.git) branch 2935-findDocuments-performance

There's a typo fix I had lying around in my working copy, too.

(https://gitlab.com/wjt/pandora.git) branch 2935-findDocuments-performance There's a typo fix I had lying around in my working copy, too.
Will Thompson commented 2016-06-30 13:44:58 +00:00
Owner

In 8d25e3b/pandora:

#!CommitTicketReference repository="pandora" revision="8d25e3be78e751f65375ae6cbe1a8001b83f3f32"
findDocuments: improve entity query performance

When I implemented this in 9a4c24c, there were not many rows in
entity_documentproperties in the database here. Now that there are,
computing the document_document -> entity_documentproperties ->
entity_entity join and then filtering is really, really slow. Postgres
seems to materialize the whole join and then scan it.

If we get a set of matching document IDs for the entity query in a
subquery, and then just filter with IN on that, things are much faster:
scan entity_entity; in a nested loop, get the document_ids via
entity_documentproperties; hash this set; and then scan
document_document.

Searching for a single character, this brings the query from ~1.1s to
~400ms. Searching for a full word, ~800ms to 120ms

This condition is getting really ugly -- I am sorry!

References #2935
In [8d25e3b/pandora](https://code.0x2620.org/0x2620/pandora/commit/8d25e3be78e751f65375ae6cbe1a8001b83f3f32): ``` #!CommitTicketReference repository="pandora" revision="8d25e3be78e751f65375ae6cbe1a8001b83f3f32" findDocuments: improve entity query performance When I implemented this in 9a4c24c, there were not many rows in entity_documentproperties in the database here. Now that there are, computing the document_document -> entity_documentproperties -> entity_entity join and then filtering is really, really slow. Postgres seems to materialize the whole join and then scan it. If we get a set of matching document IDs for the entity query in a subquery, and then just filter with IN on that, things are much faster: scan entity_entity; in a nested loop, get the document_ids via entity_documentproperties; hash this set; and then scan document_document. Searching for a single character, this brings the query from ~1.1s to ~400ms. Searching for a full word, ~800ms to 120ms This condition is getting really ugly -- I am sorry! References #2935 ```
Will Thompson commented 2016-06-30 13:45:00 +00:00
Owner

In 09ebbc9/pandora:

#!CommitTicketReference repository="pandora" revision="09ebbc9cc6ccf03d3b24c842042ac575749b3abc"
findDocuments: improve performance of positions queries

References #2935
In [09ebbc9/pandora](https://code.0x2620.org/0x2620/pandora/commit/09ebbc9cc6ccf03d3b24c842042ac575749b3abc): ``` #!CommitTicketReference repository="pandora" revision="09ebbc9cc6ccf03d3b24c842042ac575749b3abc" findDocuments: improve performance of positions queries References #2935 ```
Author

I guess since these patches are merged we can close this ☺

I guess since these patches are merged we can close this ☺
wjt added the
fixed
label 2016-07-01 17:02:47 +00:00
wjt closed this issue 2016-07-01 17:02:47 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: 0x2620/pandora#2935
No description provided.