Fetching documents for each entity in turn is expensive. (I have tried
using ArrayAgg to fetch them in the same query as the Entity — no
improvement. It's possible that being able to join to entity_entity,
and then use ArrayAgg, would be better.)
Even once you've fetched them all, if the same entity appears many
times in an item, then get(..., keys=['layers']) duplicates the whole
JSON for the entity many times: expensive to serialize, expensive to
send over the wire.
Pandora's own web interface only depends on the 'id' key of 'entity' in
each annotation, and refetches the rest of the entity to show the pop-up
dialog when you press E. So by just not bothering to fetch and send any
other keys, get(..., keys=['layers']) on an item with many entity
annotations is substantially faster.
(I experimented with splitting the full entities off to one side, so,
you'd have:
{
"layers": {
somelayer: [...,
{..., "entity": {"id": ABC}},
], ...
},
"entities": {
ABC: {...},
...
}
}
This is quicker than the status quo, but obviously not as fast as not
fetching & sending the rest at all!)
Clip.public_id uses self.item.public_id.
Clip.json() uses self.item.sort, so we should select_related on that
rather than the clip's own sort field. (They are identical objects. Is
Clip.sort ever used directly?)
With this change, findClips() issues one query to fetch clips plus one
query per flavour of annotation; before, it issued two extra queries per
clip.
Requiring layer to have the right case is consistent with
addAnnotation(), and means the _layer[_like] index can be used. In my
testing, if itemsQuery specifies a single item, then postgres doesn't
bother with the layer index anyway; but if not, it makes a pretty big
(~3×) difference.
Matching public_id and item__public_id case-sensitively also seems
reasonable (it's consistent with get() and getAnnotation()).
(Is lower() redundant for the case-insensitive comparisons? ie. is
UPPER(x.lower()) == UPPER(x)? I'm not sure, it's cheap, let's leave it.)
This is about 30% faster, presumably because it avoids allocation and/or
closing over variables is slow(?). It's not hugely significant (I
misread a line_profile report) but why not.
Sending HUP to the parent of a family of celery workers causes the
parent to re-exec itself, spawning a new set of child workers without
terminating the old ones.
So instead we send TERM to the parent on 'reload', which cleans up the
children, and rely on systemd/upstart to respawn the whole family.
Otherwise this:
self.name_find = '||' + '||'.join((self.name,) + self.alternativeNames) + '||'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fails because () + [] is an error. I guess this must have been
introduced by the DictField/TupleField rewrite.
Without this fix, it is impossible to create a new entity.
Basically the same logic is used for Event and Place too so I've made
the same change to those, and, in passing, fix another copy of the bug
fixed for Entity.name_find in fe7f961.
FileField.delete() will, by default, save() the model instance it is
attached to. This is pointless if we're in the process of deleting the
Document -- and since Document.save() calls Document.update_matches(),
this scans all annotations every time a document is deleted.