Commit graph

2768 commits

Author SHA1 Message Date
8d25e3be78
findDocuments: improve entity query performance
When I implemented this in 9a4c24c, there were not many rows in
entity_documentproperties in the database here. Now that there are,
computing the document_document -> entity_documentproperties ->
entity_entity join and then filtering is really, really slow. Postgres
seems to materialize the whole join and then scan it.

If we get a set of matching document IDs for the entity query in a
subquery, and then just filter with IN on that, things are much faster:
scan entity_entity; in a nested loop, get the document_ids via
entity_documentproperties; hash this set; and then scan
document_document.

Searching for a single character, this brings the query from ~1.1s to
~400ms. Searching for a full word, ~800ms to 120ms

This condition is getting really ugly -- I am sorry!

References #2935
2016-06-28 16:33:01 +01:00
j
5aeffcfb6a check first audio track 2016-06-27 16:51:18 +02:00
j
adfcc1cb27 never set display aspect ratio to 0:0 2016-06-27 16:08:30 +02:00
j
8ac78f3bd6 remove unused force flag from make_poster, update_timeline 2016-06-26 23:24:11 +02:00
j
0f9e80e1e6 avoid saving item twice 2016-06-26 23:22:27 +02:00
j
de9b062d63 make sure existing index is using gin 2016-06-26 16:55:58 +02:00
j
ab0dfddf31 set SECURE_PROXY_SSL_HEADER by default 2016-06-26 15:34:19 +02:00
j
0d89ad640b ignore some broken audio codecs 2016-06-26 15:33:52 +02:00
j
92f642cbac pcm sound can have no codec 2016-06-26 14:41:58 +02:00
j
2cec1b9ad5 s/import Image/from PIL import Image/g 2016-06-25 20:39:29 +02:00
j
4785f314cb Add VP9/Opus support, use VP8 by default
- support vp9 and opus
- switch to 2 pass encoding
- use ffmpeg -movflags +faststart instead of qtfaststart
2016-06-23 17:36:41 +02:00
j
aaacc48259 only save if update_external fails 2016-06-20 18:28:05 +02:00
j
d83647c4a5 don't hide oxtimelines errors 2016-06-20 18:27:31 +02:00
j
6dcbcdd19c dont update timeline in update_selected, remove unused async get_item case 2016-06-16 14:48:54 +02:00
j
0486d62ec9 use absolute path 2016-06-16 14:48:09 +02:00
j
f25218466b formating 2016-06-16 14:48:01 +02:00
j
70f34bfde9 typo 2016-06-15 19:13:00 +02:00
j
e3c5ab18c7 only update itemsort if name is changed 2016-06-15 18:31:40 +02:00
j
22f83288c5 avoid looking up item twice 2016-06-15 18:29:09 +02:00
j
7c53dca65b less async item creation 2016-06-15 18:12:59 +02:00
j
b2a9a5f711 space 2016-06-15 17:56:31 +02:00
j
3c1f4a8c95 dont call module 2016-06-15 17:55:57 +02:00
j
b010aca0a9 s/taskId/id/ 2016-06-15 15:45:51 +02:00
j
a0fc6ffadc typo 2016-06-15 14:55:45 +02:00
j
f4cbe6a114 return empty sequences if no data timeline exists 2016-06-15 14:48:02 +02:00
j
af0e0cffe8 person can be removed again, let async itemsort fail without exception 2016-06-15 14:34:46 +02:00
j
fd9d3bdabf flake8 + map->[] 2016-06-15 14:34:46 +02:00
j
05c4cfcbc8 add space and other flake8 cleanups 2016-05-28 11:30:43 +02:00
j
5e149a5cb8 add space and other flake8 cleanups 2016-05-28 11:26:46 +02:00
j
225259e521 add space and other flake8 cleanups 2016-05-28 11:18:51 +02:00
j
f21e8413fb use get_random_string 2016-05-28 11:18:51 +02:00
j
7fdaf6d1ce include Access-Control-Allow-Origin in 404 not found response 2016-05-27 11:51:47 +02:00
05e6118a88
findAnnotations: include duration alongside result count
fixes #2921
2016-05-05 15:54:25 +01:00
j
41cc8e3573 expose encoding status via api 2016-05-05 10:49:34 +02:00
j
be163826ef Merge remote-tracking branch 'wjt/fix-migrations' 2016-05-05 10:48:24 +02:00
39b9b48be2
archive: fix migrations for upload_to function renamings
9c75526 renamed these functions. The function doesn't affect the DB
schema so it should be safe to just modify the migraiton.
2016-05-04 17:01:44 +01:00
e29ea230fb
Add migration for Document.documentproperties ref
This should have been included with a8dcbbb, which changed the
related_name to access DocumentProperties from Document. (There's no
actual change to the database.)
2016-05-04 16:55:11 +01:00
j
0f28a2b7d5 fix queue status 2016-04-30 14:15:13 +02:00
j
9c7552699f fix upload_to callbacks 2016-04-29 13:46:55 +02:00
2812834ce3
findAnnotations: don't lowercase ids (fixes #2916)
Without this fix, a condition like:

     {key: 'id', operator: '==', value: 'A/B'}

gets mapped to:

     public_id__exact=('A/B'.lower())

which is wrong.

I introduced this bug in b3df5b8. I didn't catch it because I was
mostly interested in the 'layer' key -- but layer names are
conventionally lowercase anyway so lowercasing them had no effect.
2016-04-29 11:03:45 +01:00
aa40a40595
Annotation.json: only include entity id & name
Fetching documents for each entity in turn is expensive. (I have tried
using ArrayAgg to fetch them in the same query as the Entity — no
improvement. It's possible that being able to join to entity_entity,
and then use ArrayAgg, would be better.)

Even once you've fetched them all, if the same entity appears many
times in an item, then get(..., keys=['layers']) duplicates the whole
JSON for the entity many times: expensive to serialize, expensive to
send over the wire.

Pandora's own web interface only depends on the 'id' key of 'entity' in
each annotation, and refetches the rest of the entity to show the pop-up
dialog when you press E. So by just not bothering to fetch and send any
other keys, get(..., keys=['layers']) on an item with many entity
annotations is substantially faster.

(I experimented with splitting the full entities off to one side, so,
you'd have:

    {
        "layers": {
            somelayer: [...,
              {..., "entity": {"id": ABC}},
            ], ...
        },
        "entities": {
            ABC: {...},
            ...
        }
    }

This is quicker than the status quo, but obviously not as fast as not
fetching & sending the rest at all!)
2016-04-28 14:15:23 +01:00
aa0fbc9d4a
Entity.json: get document ids from join table
This is a bit quicker because it's just a lookup in a single table, not
a join.
2016-04-28 14:15:12 +01:00
400b6650a2
Annotation.json: document empty-subtitle special case 2016-04-19 13:52:52 +01:00
af0d87b569
Annotation.json: reduce repeated layer lookups
It's actually quite costly to look up keys in CONFIG, particularly
inside a loop: this trims ~5% off get(keys=['layers']) for
annotation-heavy items.
2016-04-19 13:52:47 +01:00
3f5be0bd27
findClips: look up entity names (fixes #2804) 2016-04-19 12:28:58 +01:00
d0129a4416
findClips: avoid O(n²) lookup of clip from annotation
This doesn't make much difference for small ranges, of course.
2016-04-19 11:25:12 +01:00
ba00bcbf7b
findClips: select_related('item') / ('item__sort')
Clip.public_id uses self.item.public_id.

Clip.json() uses self.item.sort, so we should select_related on that
rather than the clip's own sort field. (They are identical objects. Is
Clip.sort ever used directly?)

With this change, findClips() issues one query to fetch clips plus one
query per flavour of annotation; before, it issued two extra queries per
clip.
2016-04-19 11:25:06 +01:00
6dbb7f921a
findClips: only scan layers once 2016-04-19 11:14:25 +01:00
b3df5b8d56 findAnnotations: match some fields case-sensitively
Requiring layer to have the right case is consistent with
addAnnotation(), and means the _layer[_like] index can be used. In my
testing, if itemsQuery specifies a single item, then postgres doesn't
bother with the layer index anyway; but if not, it makes a pretty big
(~3×) difference.

Matching public_id and item__public_id case-sensitively also seems
reasonable (it's consistent with get() and getAnnotation()).

(Is lower() redundant for the case-insensitive comparisons? ie. is
UPPER(x.lower()) == UPPER(x)? I'm not sure, it's cheap, let's leave it.)
2016-04-05 12:19:32 +01:00
8d1b4de337 findAnnotations(): make 'findvalue' the default key
Annotations have no 'name' field, so

     findAnnotations({query: {conditions: [{value: 'foo'}]}})

would previously raise an exception.
2016-04-05 12:19:31 +01:00