Don't leak celery workers on reload #2904

Closed
opened 2016-03-16 16:34:03 +00:00 by wjt · 2 comments

As discussed on IRC, sending HUP to the main PID of each celery worker group causes it to restart itself and its children, without cleaning up its old children.

Here's a patch with the simplest possible fix: send it TERM. This way it cleans up its children and terminates; and then the init system respawns it.

The approach upstream takes is to use celery multi: https://github.com/celery/celery/blob/3.1/extra/systemd/celery.service

But I couldn't make it work! celery multi start would claim to have spawned its workers, but they terminated immediately. Running the command shown by celery multi show -- which looked right! -- worked fine. stracing celery multi start and its children showed one of the children calling exit(1) but I didn't have the patience to work out why.

As discussed on IRC, sending `HUP` to the main PID of each celery worker group causes it to restart itself and its children, without cleaning up its old children. Here's a patch with the simplest possible fix: send it `TERM`. This way it cleans up its children and terminates; and then the init system respawns it. The approach upstream takes is to use `celery multi`: <https://github.com/celery/celery/blob/3.1/extra/systemd/celery.service> But I couldn't make it work! `celery multi start` would claim to have spawned its workers, but they terminated immediately. Running the command shown by `celery multi show` -- which looked right! -- worked fine. stracing `celery multi start` and its children showed one of the children calling `exit(1)` but I didn't have the patience to work out why.
0x2620 added the
general
label 2016-03-16 16:34:03 +00:00
0x2620 added this to the 14.04 milestone 2016-03-16 16:34:03 +00:00
0x2620 self-assigned this 2016-03-16 16:34:03 +00:00
0x2620 added the
normal
defect
labels 2016-03-16 16:34:03 +00:00
Author

Attachment 0001-init-restart-celery-workers-on-reload.patch (2394 bytes) added

**Attachment** 0001-init-restart-celery-workers-on-reload.patch (2394 bytes) added
Owner

In 7554b0c/pandora:

#!CommitTicketReference repository="pandora" revision="7554b0c1058da6236386ff0809818dea3326e3fd"
init: restart celery workers on 'reload' (fixes #2904)

Sending HUP to the parent of a family of celery workers causes the
parent to re-exec itself, spawning a new set of child workers without
terminating the old ones.

So instead we send TERM to the parent on 'reload', which cleans up the
children, and rely on systemd/upstart to respawn the whole family.
In [7554b0c/pandora](https://code.0x2620.org/0x2620/pandora/commit/7554b0c1058da6236386ff0809818dea3326e3fd): ``` #!CommitTicketReference repository="pandora" revision="7554b0c1058da6236386ff0809818dea3326e3fd" init: restart celery workers on 'reload' (fixes #2904) Sending HUP to the parent of a family of celery workers causes the parent to re-exec itself, spawning a new set of child workers without terminating the old ones. So instead we send TERM to the parent on 'reload', which cleans up the children, and rely on systemd/upstart to respawn the whole family. ```
0x2620 added the
fixed
label 2016-03-17 09:38:21 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: 0x2620/pandora#2904
No description provided.