Commit graph

6476 commits

Author SHA1 Message Date
aa40a40595
Annotation.json: only include entity id & name
Fetching documents for each entity in turn is expensive. (I have tried
using ArrayAgg to fetch them in the same query as the Entity — no
improvement. It's possible that being able to join to entity_entity,
and then use ArrayAgg, would be better.)

Even once you've fetched them all, if the same entity appears many
times in an item, then get(..., keys=['layers']) duplicates the whole
JSON for the entity many times: expensive to serialize, expensive to
send over the wire.

Pandora's own web interface only depends on the 'id' key of 'entity' in
each annotation, and refetches the rest of the entity to show the pop-up
dialog when you press E. So by just not bothering to fetch and send any
other keys, get(..., keys=['layers']) on an item with many entity
annotations is substantially faster.

(I experimented with splitting the full entities off to one side, so,
you'd have:

    {
        "layers": {
            somelayer: [...,
              {..., "entity": {"id": ABC}},
            ], ...
        },
        "entities": {
            ABC: {...},
            ...
        }
    }

This is quicker than the status quo, but obviously not as fast as not
fetching & sending the rest at all!)
2016-04-28 14:15:23 +01:00
aa0fbc9d4a
Entity.json: get document ids from join table
This is a bit quicker because it's just a lookup in a single table, not
a join.
2016-04-28 14:15:12 +01:00
j
c149c5c42e use xenial repository if installing on 16.04 2016-04-28 13:27:46 +02:00
400b6650a2
Annotation.json: document empty-subtitle special case 2016-04-19 13:52:52 +01:00
af0d87b569
Annotation.json: reduce repeated layer lookups
It's actually quite costly to look up keys in CONFIG, particularly
inside a loop: this trims ~5% off get(keys=['layers']) for
annotation-heavy items.
2016-04-19 13:52:47 +01:00
3f5be0bd27
findClips: look up entity names (fixes #2804) 2016-04-19 12:28:58 +01:00
d0129a4416
findClips: avoid O(n²) lookup of clip from annotation
This doesn't make much difference for small ranges, of course.
2016-04-19 11:25:12 +01:00
ba00bcbf7b
findClips: select_related('item') / ('item__sort')
Clip.public_id uses self.item.public_id.

Clip.json() uses self.item.sort, so we should select_related on that
rather than the clip's own sort field. (They are identical objects. Is
Clip.sort ever used directly?)

With this change, findClips() issues one query to fetch clips plus one
query per flavour of annotation; before, it issued two extra queries per
clip.
2016-04-19 11:25:06 +01:00
6dbb7f921a
findClips: only scan layers once 2016-04-19 11:14:25 +01:00
j
27830d7c58 use markdown in readme 2016-04-15 14:21:24 +02:00
b3df5b8d56 findAnnotations: match some fields case-sensitively
Requiring layer to have the right case is consistent with
addAnnotation(), and means the _layer[_like] index can be used. In my
testing, if itemsQuery specifies a single item, then postgres doesn't
bother with the layer index anyway; but if not, it makes a pretty big
(~3×) difference.

Matching public_id and item__public_id case-sensitively also seems
reasonable (it's consistent with get() and getAnnotation()).

(Is lower() redundant for the case-insensitive comparisons? ie. is
UPPER(x.lower()) == UPPER(x)? I'm not sure, it's cheap, let's leave it.)
2016-04-05 12:19:32 +01:00
8d1b4de337 findAnnotations(): make 'findvalue' the default key
Annotations have no 'name' field, so

     findAnnotations({query: {conditions: [{value: 'foo'}]}})

would previously raise an exception.
2016-04-05 12:19:31 +01:00
284caf03c3 get_by_key: short-circuit
This is about 30% faster, presumably because it avoids allocation and/or
closing over variables is slow(?). It's not hugely significant (I
misread a line_profile report) but why not.
2016-04-05 12:19:31 +01:00
j
7ac68697d4 update pdf.js 2016-04-04 15:50:07 +02:00
j
e1967e96bc fix pdf zoom 2016-04-04 15:50:07 +02:00
j
652df88342 return 404 2016-04-04 15:50:07 +02:00
j
1bff4aa0e9 avoid storing invalid poster frames, only show videos with video 2016-04-01 16:40:20 +02:00
j
b8beb51480 fix multipart audio only timelines 2016-03-31 14:54:38 +02:00
j
30ce422452 disable apt translations 2016-03-26 22:56:09 +01:00
j
94b940436f fix timelines for items with many parts
- use durations from streams not from timelines
 - don't accumulate timeline drift
2016-03-19 18:58:48 +01:00
j
f0b8b2b81e check that range is [int, int] 2016-03-17 16:06:08 +01:00
j
e536dcb3b0 <= 2016-03-17 10:47:08 +01:00
j
7761cf9ec2 update celery package and promt to install new init files for workers 2016-03-17 10:38:15 +01:00
7554b0c105 init: restart celery workers on 'reload' (fixes #2904)
Sending HUP to the parent of a family of celery workers causes the
parent to re-exec itself, spawning a new set of child workers without
terminating the old ones.

So instead we send TERM to the parent on 'reload', which cleans up the
children, and rely on systemd/upstart to respawn the whole family.
2016-03-17 10:32:58 +01:00
j
e16310062b fix vm build on Ubuntu 16.04 2016-03-15 23:04:53 +01:00
eeaeda3970 Support WebVTT subtitle export 2016-03-11 14:16:23 +01:00
j
36463a8120 fix typo in README 2016-03-11 10:33:48 +01:00
j
697e501a4f only update item timeline once all parts are done 2016-03-11 10:33:48 +01:00
j
f6cebcaec9 fix user/group api 2016-03-08 20:14:05 +05:30
j
3a56a8138d only install systemd if /bin/systemctl is available 2016-03-08 19:51:19 +05:30
j
bff4a9553d needs daemon-reload after replacing systemd service file 2016-03-08 13:20:33 +05:30
j
4fd865efeb fix pid 2016-03-08 13:18:35 +05:30
j
29204b6fb5 move gunicorn configuration from init script to config file 2016-03-07 14:25:24 +05:30
j
7ec1e9f6da update django version 2016-03-06 21:59:49 +05:30
j
9d0b50bced build vm with vmdebootstrap 2016-03-05 18:36:39 +05:30
j
4f28c2c548 fix annotation import, values are decoded in d1.9 2016-03-05 15:36:47 +05:30
a8dcbbbe89 Include DocumentProperties.data in Document.json() 2016-03-05 15:07:47 +05:30
a55cbcfb9f DocumentProperties: add data field 2016-03-05 15:07:47 +05:30
j
42ac4a88b8 Only show Find: Entity if config defines entites
Followup to 9a4c24
2016-03-05 14:49:51 +05:30
0c98cd080e Entity.alternativeNames: default to () not [] (fixes #2896)
Otherwise this:

    self.name_find = '||' + '||'.join((self.name,) + self.alternativeNames) + '||'
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

fails because () + [] is an error. I guess this must have been
introduced by the DictField/TupleField rewrite.

Without this fix, it is impossible to create a new entity.

Basically the same logic is used for Event and Place too so I've made
the same change to those, and, in passing, fix another copy of the bug
fixed for Entity.name_find in fe7f961.
2016-03-04 17:11:36 +00:00
9a4c24cdb4 Support searching documents by entities 2016-03-04 12:41:41 +00:00
738a9282b4 Document: fix negating id queries 2016-03-04 12:41:41 +00:00
8c23bdff6d Implement DocumentProperties.__unicode__ 2016-03-04 12:41:41 +00:00
j
4613005b83 use geoip2 api to fix ipv6 lookups 2016-03-04 12:50:44 +05:30
340277db1a Raise Error.stackTraceLimit, if it exists (fixes #2894) 2016-03-03 18:15:37 +05:30
c6f9f87c8e Fix autocompleteSort with multiple keys (fixes #2893)
QuerySet.order_by() takes each key as a separate argument, not as a
single comma-separated string.
2016-03-03 18:15:37 +05:30
2a07e2a1ab Remove redundant overrides of Model.delete
Both of these models have pre_delete handlers which do the same things,
so I think these are unnecessary.
2016-03-03 18:10:29 +05:30
d69a8efd97 Don't save other file-owning models on delete, either 2016-03-03 18:10:29 +05:30
6e0049a20c Don't save Document in pre_delete handler (fixes #2889)
FileField.delete() will, by default, save() the model instance it is
attached to. This is pointless if we're in the process of deleting the
Document -- and since Document.save() calls Document.update_matches(),
this scans all annotations every time a document is deleted.
2016-03-03 18:10:29 +05:30
7d99950942 Only setInterval once to animate the loading icon (fixes #2888)
(On Chrome, at least,) window.onload() is called once by hand, and once
by the browser. This ends up calling setInterval() twice. When
stopAnimation() is called later, only the second interval is cleared; so
the first one keeps firing forever. Mostly harmless but unnecessary.

Only the first hunk of this patch is really needed, but making
startAnimation() / stopAnimation() idempotent can't hurt.
2016-03-03 18:08:46 +05:30