Repository: walt-python-packages; trackexec tooling.¶
This section explains how to enable execution tracking of
walt-server-daemon
sub-processes, how to replay or analyse such a
tracked execution, and how to improve trackexec timestamping.
Purpose¶
Trackexec allows to track the execution of walt-server-daemon
sub-processes, for post-mortem analysis, understanding an unexpected
behavior, or analysing code performance.
It provides a tool to replay the execution trace, with an interface similar to a debugger, post mortem or while the process is still running. It also provides a tool to analyse the trace and find “hot points”, i.e. instructions taking a long time to execute and possibly blocking the process more than expected.
Enabling trackexec¶
To enable trackexec, just create the directory
/var/log/walt/trackexec
and restart walt-server-daemon
(or on a
production server, call systemctl restart walt-server
). Each
sub-process will then create a new subdirectory at
/var/log/walt/trackexec/<YYMMDD-HHMMSS>/<sub-process-name>/
, and
start populating it with execution traces.
For convenience, a symlink /var/log/walt/trackexec/latest
pointing
to the last <YYMMDD-HHMMSS>
sub-directory is also updated.
Enabling trackexec should not slow down WALT server code too much, so do not hesitate to activate it, even on a production system.
Note: execution traces are not human-readable, use
walt-server-trackexec-replay
.
Replaying execution traces¶
To replay, use for instance:
$ walt-server-trackexec-replay /var/log/walt/trackexec/latest/server-main
This will use the last execution of server-main
subprocess. The fact
it may still be running is not a problem, the tool will just allow to
browse past execution traces, up to the point the tool was called.
One can obviously replay an older execution, and possibly the execution of another sub-process, for instance:
$ walt-server-trackexec-replay /var/log/walt/trackexec/240416-162041/server-db
The command opens a command line interface similar to a debugger. You
can press <h>
for help. Press <h>
again to leave the help
screen.
Similarly to a debugger, you can press <n>
, <s>
, <c>
for
next
, step
, continue
. You can also insert a breakpoint
using b <line>
, and let the tool stop at this place by pressing
<c>
(i.e., continue
). The tool also provides ways to jump in
time, in a relative or absolute manner. See help screen for more info.
This is very handy for debugging around a specific timestamp seen in
walt or systemd logs.
Keep in mind that we are just browsing execution traces, so even if the process is still running, stopping on a breakpoint for instance has no impact on it.
Analysing execution traces¶
To analyse, use for instance:
$ walt-server-trackexec-analyse /var/log/walt/trackexec/latest/server-main
The tool will display a table indicating which python instructions consumed the most execution time.
Trackexec in the source code, details about timestamps¶
Trackexec is activated in the startup code of walt-server-daemon
subprocesses, in file server/walt/server/process.py
. For reducing
the disk usage and preserving performance, trackexec is configured to
monitor only the code of the walt
package: other libraries are not
monitored. Moreover, timestamps are recorded sparsely.
This initialization code also installs a hook on the event loop for recording a precise timestamp before and after the event loop is idle.
For efficient indexing, the traces are recorded as blocks. When a block is started, a timestamp is recorded. The estimated timestamp of each instruction is computed linearly based on the two closest timestamps recorded.
For more precise timestamping on a given section of the source code, the developer may use the following construct:
from walt.server.trackexec import precise_timestamping
[...]
def <function>(args):
[...]
with precise_timestamping():
<do-something>
<do-something-else>
[...]
In this case, a timestamp is recorded before and after each instruction of the section (including instructions of involved function calls if any). This can be handy for inspecting a slow function in more details. Without this, the delay caused by a slow function is spread linearly between the two closest timestamps recorded.
The hook installed on the event loop actually uses this function
precise_timestamping
.
The code of trackexec itself is at server/walt/server/trackexec/
.