[Supervisor-users] memmon not restarting oversize process?
Chris McDonough
chrism at plope.com
Sat Oct 3 13:38:58 EDT 2009
Hi Chris,
For the following supervisor config (memmon limit set low intentionally):
[inet_http_server]
port=127.0.0.1:9036
[supervisord]
logfile=%(here)s/../log/supervisord.log
logfile_maxbytes=50MB
logfile_backups=10
loglevel=info
pidfile=%(here)s/../var/supervisord.pid
nodaemon=false
[rpcinterface:supervisor]
supervisor.rpcinterface_factory=supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=http://127.0.0.1:9036
[program:osi]
command = %(here)s/../bin/paster serve %(here)s/../etc/karl.ini
redirect_stderr = true
[program:zeo]
command = %(here)s/../bin/runzeo -C %(here)s/zeo.conf
redirect_stderr = true
[eventlistener:memmon]
command=%(here)s/../bin/memmon -a 20MB
events=TICK_60
When I run it under "supervisord -n -edebug" (using supervisor 3.0a7) on MacOS
10.6, I see this (see lines that being "restarting"):
[chrism at snowpro bfgenv]$ bin/supervisord -n -edebug
2009-10-03 13:34:41,101 INFO Increased RLIMIT_NOFILE limit to 1024
2009-10-03 13:34:41,221 INFO RPC interface 'supervisor' initialized
2009-10-03 13:34:41,221 WARN cElementTree not installed, using slower XML
parser for XML-RPC
2009-10-03 13:34:41,221 CRIT Server 'inet_http_server' running without any HTTP
authentication checking
2009-10-03 13:34:41,222 INFO supervisord started with pid 49246
2009-10-03 13:34:42,226 INFO spawned: 'memmon' with pid 49248
2009-10-03 13:34:42,231 INFO spawned: 'osi' with pid 49249
2009-10-03 13:34:42,239 INFO spawned: 'zeo' with pid 49250
2009-10-03 13:34:43,241 INFO success: memmon entered RUNNING state, process has
stayed up for > than 1 seconds (startsecs)
2009-10-03 13:34:43,241 INFO success: osi entered RUNNING state, process has
stayed up for > than 1 seconds (startsecs)
2009-10-03 13:34:43,250 INFO success: zeo entered RUNNING state, process has
stayed up for > than 1 seconds (startsecs)
2009-10-03 13:34:43,370 DEBG 'memmon' stdout output:
READY
2009-10-03 13:34:43,371 DEBG memmon: ACKNOWLEDGED -> READY
2009-10-03 13:34:43,705 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:43 INFO ZEO.runzeo (49250) created PID file './var/ZEO.pid'
2009-10-03 13:34:43,706 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:43 INFO ZEO.runzeo (49250) opening storage '1' using BlobStorage
2009-10-03 13:34:43,749 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:43 INFO ZODB.blob (49250) Blob directory `./var/blobs` has
layout marker set. Selected `bushy` layout.
2009-10-03 13:34:43,793 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:43 INFO ZEO.StorageServer (49250) StorageServer created RW
with storages: 1:RW:./var/osi.db
2009-10-03 13:34:44,156 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:44 INFO ZEO.zrpc (49250) listening on ('', 8884)
2009-10-03 13:34:45,279 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:45 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1',
59853): <ManagedServerConnection ('127.0.0.1', 59853)>
2009-10-03 13:34:45,280 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:45 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59853) received
handshake 'Z308'
2009-10-03 13:34:45,414 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:45 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1',
59854): <ManagedServerConnection ('127.0.0.1', 59854)>
2009-10-03 13:34:45,415 DEBG 'zeo' stdout output:
------
2009-10-03T13:34:45 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59854) received
handshake 'Z308'
2009-10-03 13:35:01,422 DEBG event 0 sent to listener memmon
2009-10-03 13:35:01,423 DEBG 'memmon' stderr output:
Checking any=20971520
2009-10-03 13:35:01,441 DEBG 'memmon' stderr output:
RSS of memmon:memmon is 5775360
2009-10-03 13:35:01,447 DEBG 'memmon' stderr output:
RSS of osi:osi is 28737536
Restarting osi:osi
2009-10-03 13:35:01,448 DEBG fd 10 closed, stopped monitoring
<POutputDispatcher at 7294408 for <Subprocess at 7054920 with name osi in state
RUNNING> (stdout)>
2009-10-03 13:35:01,449 DEBG killing osi (pid 49249) with signal SIGTERM
2009-10-03 13:35:01,456 INFO stopped: osi (terminated by SIGTERM)
2009-10-03 13:35:01,457 DEBG received SIGCHLD indicating a child quit
2009-10-03 13:35:01,457 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:01 INFO ZEO.StorageServer (49250/127.0.0.1:59854) disconnected
2009-10-03 13:35:01,457 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:01 INFO ZEO.StorageServer (49250/127.0.0.1:59853) disconnected
2009-10-03 13:35:02,465 INFO spawned: 'osi' with pid 49256
2009-10-03 13:35:03,468 INFO success: osi entered RUNNING state, process has
stayed up for > than 1 seconds (startsecs)
2009-10-03 13:35:03,477 DEBG 'memmon' stdout output:
RESULT 2
OKREADY
2009-10-03 13:35:03,477 DEBG memmon: BUSY -> ACKNOWLEDGED (processed)
2009-10-03 13:35:03,477 DEBG memmon: ACKNOWLEDGED -> READY
2009-10-03 13:35:03,478 DEBG 'memmon' stderr output:
RSS of zeo:zeo is 10936320
2009-10-03 13:35:05,143 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:05 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1',
59856): <ManagedServerConnection ('127.0.0.1', 59856)>
2009-10-03 13:35:05,144 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:05 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59856) received
handshake 'Z308'
2009-10-03 13:35:05,276 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:05 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1',
59857): <ManagedServerConnection ('127.0.0.1', 59857)>
2009-10-03 13:35:05,277 DEBG 'zeo' stdout output:
------
2009-10-03T13:35:05 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59857) received
handshake 'Z308'
Chris Shenton wrote:
> I've got supervisor with a memmon eventlistener from a buildout:
>
>> [supervisor]
>> recipe = collective.recipe.supervisor
>> user=nfw
>> password=nfw
>> eggs = supervisor
>> programs =
>> 10 zeo ${zeo:location}/bin/runzeo ${zeo:location}
>> 20 instance ${instance:location}/bin/runzope ${instance:location}
>> true
>> plugins =
>> superlance
>> eventlisteners =
>> MemmonAll TICK_60 ${buildout:bin-directory}/memmon [-a 100MB]
>
> It generates a plausible parts/supervisor/supervisord.conf:
>
>> ...
>> [program:instance]
>> command = /usr/local/cshenton/Projects/cjp/svn-buildout/parts/
>> instance/bin/runzope
>> process_name = instance
>> directory = /usr/local/cshenton/Projects/cjp/svn-buildout/parts/
>> instance
>> priority = 20
>> redirect_stderr = true
>>
>> [eventlistener:MemmonAll]
>> command = /usr/local/cshenton/Projects/cjp/svn-buildout/bin/memmon -
>> a 100MB
>> events = TICK_60
>> process_name=MemmonAll
>> environment
>> =
>> SUPERVISOR_USERNAME
>> =nfw,SUPERVISOR_PASSWORD=nfw,SUPERVISOR_SERVER_URL=http://localhost:9001
>
> I start up supervisor, and the plone instance stabilizes at 97MB,
> then after connecting to plone with a browser, jumps to 113MB:
>
>> 2009-10-03 10:30:34,798 INFO spawned: 'MemmonAll' with pid 11608
>> 2009-10-03 10:30:34,816 INFO spawned: 'zeo' with pid 11609
>> 2009-10-03 10:30:34,826 INFO spawned: 'instance' with pid 11610
>> chris at Mackeral:~$ ps -orss 11610
>> RSS
>> 97444
>> chris at Mackeral:~$ ps -orss 11610
>> RSS
>> 113512
>
> But supervisor never restarts the instance. I don't see any info in
> the logs that its checking all processes or anything else.
>
> I'm not sure how to debug it with pdb since supervisor spawns memmon.
> Sticking print and logging statements into superlance/memmon.py
> indicates to me that it never comes back from this wait call:
>
>> def runforever(self, test=False):
>> while 1:
>> # we explicitly use self.stdin, self.stdout, and
>> self.stderr
>> # instead of sys.* so we can unit test this code
>> headers, payload = childutils.listener.wait(self.stdin,
>> self.stdout)
>
> Any suggestions for debugging this? Thanks.
>
>
>
> _______________________________________________
> Supervisor-users mailing list
> Supervisor-users at lists.supervisord.org
> http://lists.supervisord.org/mailman/listinfo/supervisor-users
>
More information about the Supervisor-users
mailing list