[hatari-devel] Hatari frozen after it's been paused for few hours

Eero Tamminen eerot at users.berlios.de
Wed Jul 13 23:51:07 CEST 2011


Hi,

On keskiviikko 13 heinäkuu 2011, Nicolas Pomarède wrote:
> What happens if you let Hatari in STE mode idle under GEM without
> starting any program ? Does it freeze too?

Yes.  I also didn't notice it earlier, but when Hatari freezes,
it starts taking all CPU.


> Does it freeze in STF mode ?

Yes.  It's enough just to do:
	hatari -d stmp3/

And wait 10-60 minutes.  Maybe one needs to be in desktop workspace so
that Hatari doesn't even accidentally receive any input events?

Running two Hatari instances at the same time would seem to trigger
the issue more easily.


> Is the freeze a consequence of playing STE DMA sound ?

Apparently not, but normal sound could, as:

> > After an hour, I have these warnings on terminal and Hatari is
> > frozen:
> > Your system is too slow, some sound samples were not correctly emulated
...
> This message will happen if the audio buffer is not updated fast enough,
> so it's consistent with the fact that hatari is frozen.

If I use "--sound off" option, I get those messages constantly.

And with sound disabled, I couldn't trigger the freeze.


This is where frozen Hatari spends its time according to Oprofile:
-----
# opcontrol --init
# opcontrol --no-vmlinux --separate=kernel
# opcontrol --start
<wait couple of minutes>
# opcontrol --stop
$ opreport -l src/hatari
...
samples  %        image name               symbol name
1475718  79.6711  no-vmlinux               /no-vmlinux
192327   10.3834  [vdso]                   [vdso]
118832    6.4155  [vdso]                   [vdso]
51229     2.7658  hatari                   Main_WaitOnVbl
11215     0.6055  libasound.so.2.0.0       /usr/lib/libasound.so.2.0.0
2763      0.1492  libc-2.11.2.so           gettimeofday
37        0.0020  libSDL-1.2.so.0.11.3     /usr/lib/libSDL-1.2.so.0.11.3
35        0.0019  libc-2.11.2.so           memcpy
26        0.0014  libc-2.11.2.so           memset
13       7.0e-04  libc-2.11.2.so           poll
11       5.9e-04  libc-2.11.2.so           ioctl
10       5.4e-04  libc-2.11.2.so           __i686.get_pc_thunk.bx
9        4.9e-04  libpthread-2.11.2.so     pthread_mutex_lock
7        3.8e-04  hatari                   Audio_CallBack
-----
I.e. almost all of the time goes to doing syscalls, and some audio stuff.
I think the [vdso] stuff is the kernel provided use-space part of syscall
calling.

This means that Hatari isn't spending any time in emulating e.g.
the m68k CPU where normally most CPU is spent.  Even the memcpy & memset
is also audio stuff, not e.g. SDL screen blitting (I verified that
separately).


Strace tells this about the syscalls:
-----
strace -c -f -p $(pidof hatari|cut -d' ' -f1)
Process 3395 attached with 2 threads - interrupt to quit
^C
Process 3395 detached
Process 3396 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.89    6.264392        4381      1430           poll
  0.11    0.007004           0   1753130           gettimeofday
  0.00    0.000142           0     19317           ioctl
  0.00    0.000005           0      1431           read
  0.00    0.000000           0         1           restart_syscall
  0.00    0.000000           0      2862           semop
------ ----------- ----------- --------- --------- ----------------
100.00    6.271543               1778171           total
-----
I.e. Hatari does a huge amount of gettimeofday() calls, but polling
may actually be taking most of the time spent for syscalls on kernel
side (strace isn't very accurate tool for measuring that).


If one ignores gettimeofday() calls which are done from the
Main_WaitOnVbl(), Hatari is repeating this:
-----
[pid  3396] poll([{fd=5, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 
([{fd=5, revents=POLLIN}])
[pid  3396] ioctl(6, 0x4122, 0xb6a54f08) = 0
[pid  3396] ioctl(6, 0x4122, 0)         = 0
[pid  3396] ioctl(6, 0x4122, 0)         = 0
[pid  3396] ioctl(6, 0x4122, 0xb6a54fa8) = 0
[pid  3396] ioctl(6, 0x4122, 0xb6a54f38) = 0
[pid  3396] ioctl(6, 0x4122, 0xb6a54ee8) = 0
[pid  3396] ioctl(6, 0x4122, 0xb7441164) = 0
[pid  3396] ioctl(6, 0x4122, 0x40)      = 0
[pid  3396] ioctl(7, USBDEVFS_IOCTL or USBDEVFS_IOCTL32, 0x9fd94d0) = 0
[pid  3396] ioctl(6, 0x4122, 0x10)      = 0
[pid  3396] semop(458752, {{0, 0, 0}, {0, 1, SEM_UNDO}}, 2) = 0
[pid  3396] semop(458752, {{0, -1, SEM_UNDO|IPC_NOWAIT}}, 1) = 0
[pid  3396] read(5, "\1\0\0\0\2346\0\0\24\1\312\33\1\0\0\0", 64) = 16
[pid  3396] ioctl(6, 0x4122, 0xb6a54ff8) = 0
[pid  3396] ioctl(6, 0x4122, 0xb6a54f2c) = 0
[pid  3396] ioctl(6, 0x4122, 0x400)     = 0
[pid  3396] poll([{fd=5, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 
([{fd=5, revents=POLLIN}])
-----


As to where all of these functions are called from:
functracer -s -a poll,gettimeofday,semop,read,ioctl -o . -p $(pidof hatari)
sp-rtrace-postproc -c < 3395-0.rtrace.txt |sp-rtrace-resolve > 
hatari.rtrace.txt
rtrace-calltree --type=count --limit=1 hatari.rtrace.txt > hatari.dot
dot -Tsvg hatari.dot > hatari.svg

-> As can be seen from the attached callgraph, most gettimeofday() system
   calls are done from the Main_WaitOnVbl() function, rest of the syscalls,
   including poll()s, are all done from the audio thread.


When looking at the htop output (which shows threads separately),
audio thread seems to be taking more CPU than normally.  Normally
(when no sound is playing), audio thread takes ~10% of the CPU that
Hatari takes, but after Hatari freezes, audio thread takes nearly
half of Hatari's 100% CPU usage.

When comparing Hatari instances emulating STF & STE, the STF one seems
to spend more time in sound thread for some reason.


But note that behavior of both sound thread & Hatari main loop is
strange.  When I step through the code in Main_WaitOnVbl() with Gdb,
the code runs within this loop:
        /* Now busy-wait for the right tick: */
        while (nDelay > 0)
        {
                CurrentTicks = Time_GetTicks();
                nDelay = DestTicks - CurrentTicks;
        }

But Gdb will always tell nDelay to have the same value:
----
(gdb) info locals
CurrentTicks = <value optimized out>
DestTicks = 4294978206
FrameDuration_micro = 19979
nDelay = 18403
----
Despite the Time_GetTicks() calls which Gdb stepping shows just as:
        gettimeofday ( &now , NULL );


If somebody knows x86 assembly, I believe this to be the above loop
in Hatari binary (output of "objdump -d"):
-----------
 80b51cb:       e8 f0 f3 ff ff          call   80b45c0 <Time_Delay>
Main_WaitOnVbl+0x190:
 80b51d0:       8d 5d e0                lea    -0x20(%ebp),%ebx
 80b51d3:       89 1c 24                mov    %ebx,(%esp)
 80b51d6:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)
 80b51dd:       00 
 80b51de:       e8 2d ee fc ff          call   8084010 <gettimeofday at plt>
 80b51e3:       8b 35 c0 3b 2a 08       mov    0x82a3bc0,%esi
 80b51e9:       31 d2                   xor    %edx,%edx
 80b51eb:       69 45 e0 40 42 0f 00    imul   $0xf4240,-0x20(%ebp),%eax
 80b51f2:       8b 3d c4 3b 2a 08       mov    0x82a3bc4,%edi
 80b51f8:       03 45 e4                add    -0x1c(%ebp),%eax
 80b51fb:       89 f1                   mov    %esi,%ecx
 80b51fd:       29 c1                   sub    %eax,%ecx
 80b51ff:       89 fb                   mov    %edi,%ebx
 80b5201:       19 d3                   sbb    %edx,%ebx
 80b5203:       83 fb 00                cmp    $0x0,%ebx
 80b5206:       7f c8                   jg     80b51d0 
<Main_WaitOnVbl+0x190>
 80b5208:       0f 8c eb fe ff ff       jl     80b50f9 <Main_WaitOnVbl+0xb9>
 80b520e:       83 f9 00                cmp    $0x0,%ecx
 80b5211:       77 bd                   ja     80b51d0 
<Main_WaitOnVbl+0x190>
-----------

How the value Hatari calculates from gettimeofday() results
can always be same?

The only reasonable error that gettimeofday() manual page reports
is if the args for it are at invalid address, but at least in
Hatari source code "now" is in stack:
        Sint64          ticks_micro;
        struct timeval  now;
        gettimeofday ( &now , NULL );
        ticks_micro = now.tv_sec * 1000000UL + now.tv_usec;
        return ticks_micro;

I don't know enough x86 asm to see from above disassembly whether
it's in stack still in the optimized code?


	- Eero


> Nicolas
> 
> >> Backtrace was following:
> >> -------
> >> #0  0xb77f1424 in __kernel_vsyscall ()
> >> #1  0xb74690e6 in gettimeofday () at
> >> ../sysdeps/unix/syscall-template.S:82 #2  0x080b52a3 in Time_GetTicks
> >> () at /home/eero/work/hatari/src/main.c:122 #3  Main_WaitOnVbl () at
> >> /home/eero/work/hatari/src/main.c:342
> >> #4  0x0810dc21 in m68k_run_1 (may_quit=1) at
> >> /home/eero/work/hatari/src/uae- cpu/newcpu.c:1776
> >> #5  m68k_go (may_quit=1) at
> >> /home/eero/work/hatari/src/uae-cpu/newcpu.c:1872 #6  0x080b40d4 in
> >> M68000_Start () at /home/eero/work/hatari/src/m68000.c:241 #7
> >> 0x080b4ece in main (argc=6, argv=0xbf804994) at
> >> /home/eero/work/hatari/src/main.c:776
> >> ...
> >> #3  Main_WaitOnVbl () at /home/eero/work/hatari/src/main.c:342
> >> 342                     CurrentTicks = Time_GetTicks();
> >> (gdb) info locals
> >> CurrentTicks =<value optimized out>
> >> DestTicks = 4294977719
> >> FrameDuration_micro = 19979
> >> nDelay = 18449
> >> -------
> >> 
> >> I.e. Main loop seems to be still running, but keys aren't being
> >> processed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hatari.svg
Type: image/svg+xml
Size: 43935 bytes
Desc: not available
URL: <https://lists.berlios.de/pipermail/hatari-devel/attachments/20110714/f9828600/attachment.svg>


More information about the hatari-devel mailing list