[hatari-devel] Profiling Hatari code with Valgrind

Eero Tamminen oak at helsinkinet.fi
Fri Jan 7 22:28:34 CET 2011


Hi,

On perjantai 07 tammikuu 2011, Laurent Sallafranque wrote:
> I don't agree with you here.
> 
> Update_e_u_n_z is called for nearly every other instruction in the DSP
> (each instructions that are decoded in the else of the main DSP
> instruction decoder).
> 
> This mean mac, mpy, add, sub, test, cmp, ...
> Nearly all DSP programs use these instructions a lot (and use them
> millions of time).
> 
> I've done a quick Vallgrind tonight to see the difference before this
> update and after it.
> The difference is not negligeable (compared to the png you sent last
> time).
> 
> I'm running a vallgrind of hatari without this optimization. I'll send
> you the 2 pngs tonight.

In general, while profiler output can indicate performance changes, it's
better to have something that actually measures it[1], like I suggested in
previous mail (memory snapshot, --run-vbls & --frame-skips etc).
Especially if the change isn't localized within single function, but it
affects also how functions are called.

[1] Performance measuring and profiling/analysing are two different things
(I've done that kind of stuff at work for years), you typically need to use
different means for each.


Looking at the diff for your changes, they seem to be localized though, so
in this case profiler (especially one like Valgrind) could be reliable. 

Note though that while the boxed view gives nice overall picture,
if you want to know total individual percentage of given function, look
into callgraph or inclusive % column in the table at left.


> I'm sure I can't see the difference on my fast computer (I never go up
> to 92 % cpu), but for a slower machine, these small optimizations can do
> the difference (that's my point of view).

I was comparing numbers that "top" was giving for me before and after
the optimizations.  In both cases they were fluctuating quite a bit, but
the numbers I was seeing (highest, lowest, average) were very close
(within 0-2%) to each other at same places in the demos.

It might be possible that my i3 CPU scales its speed according to
the load[2] though and that's why I don't see the differences.  I hadn't
thought of that earlier. If it does scaling, then "top" isn't valid way
to measure anything.


	- Eero

[2] My earlier 7 years old computer with AMD Athlon XP CPU didn't do
    stuff like that.



More information about the hatari-devel mailing list