[hatari-devel] Profiling Hatari code with Valgrind (was: Major changes in DSP code)

Eero Tamminen oak at helsinkinet.fi
Thu Dec 30 00:27:26 CET 2010


Hi,

On keskiviikko 29 joulukuu 2010, Laurent Sallafranque wrote:
> Thanks Eero to remind me valgrind, I forgot it.
> Can you also remind me the commands to use to analyse % of time each
> instruction takes ?

First make sure you've configured CMake to produce a Hatari binary that is
compiled so that it's both optimized and contains debug symbols (use
RELWITHDEBINFO build type and change it also to use -O3 as RELEASE
build type).

Then just prefix the hatari command with "valgrind --tool=callgrind".


If you're interested just about the DSP side, it's better if you minimize
the effect of display updates and sound & mic threading with options like:
	-z 1 --frameskips 8 --sound off --mic off

And if you don't want the profile to contain TOS bootup & starting the DSP
program, use a Hatari memory snapshot for profiling.


After you've profiled long enough, use:
	callgrind_control --dump

And close Hatari (closing Hatari seems to do something funny in SDL which
make callgrind not to work properly at its exit so you need to use
"callgrind_control --dump" instead).

Then give the last/largest callgrind.out.* file to kcachgrind.


In the Kcachegrind configuration dialog "Annotations" tab you can add
Hatari source directories so that Kcachegrind can correctly show the sources
annotated with profiling results.  This works only if you have debug symbols
in the Hatari binary.

From the Kcachegrind output, besides callgraphs etc, it makes sense to check
/ sort by how much functions are:
- called
- use CPU themselves
- use CPU cumulatively (with functions called from them)

I.e. whether one could look into optimizing number of calls, the function
itself, or what other functions it calls.


Attached is a screenshot of Kcachegrind output on 30l_coke.prg.
With some other program the hotspots could be elsewhere, it's always
good idea to profile several programs.


	- Eero

PS. Would the attached patch to dsp.c make sense?  The asm code GCC
generates for it according to "objdump -d" changes & and is couple of
lines shorter (doesn't use a register for "i" value), but I have no idea
whether it's any faster in practice.  Anyway it would be too small to
be visible e.g. in "top".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 30l_coke.png
Type: image/png
Size: 80795 bytes
Desc: not available
URL: <https://lists.berlios.de/pipermail/hatari-devel/attachments/20101230/3f19830b/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsp.diff
Type: text/x-patch
Size: 967 bytes
Desc: not available
URL: <https://lists.berlios.de/pipermail/hatari-devel/attachments/20101230/3f19830b/attachment.bin>


More information about the hatari-devel mailing list