[hatari-devel] Long mail : new cpu cores + 2 questions.

Nicolas Pomarède npomarede at corp.free.fr
Mon Oct 31 17:16:58 CET 2011


Le 31/10/2011 16:44, laurent.sallafranque at free.fr a écrit :
> Hi Mikro,
>
>
>>> But in the motorola 68030 doc, I can read :
>
>>> Bcc (Taken) 6 0 6(0/0/0) 8(0/2/0)
>>> Bcc.B (Not Taken) 4 0 4(0/0/0) 4(0/1/0)
>>> Bcc.W (Not Taken) 6 0 6(0/0/0) 6(0/1/0)
>>> Bcc.L (Not Taken) 6 0 6(0/0/0) 8(0/2/0)
>
>>> It's in the file I've sent to Mikro, maybe he could confirm this.
>
>> Honestly, I've never measured branch cycles (who cares, they are only on the end of the loop :)) but you see how it's possible -- head is 6 cycles, so in case you find a nice instruction+addressing mode with such overlap, you can have really zero cycles instruction.
>
> I agree, but there's not any head/tail mechanism for now in the new winuae core.
> I can add it (I'd like to do it).
> But again, it'll derive from winuae code.
> Maybe it's time to derivate anyway.
> But you seem to tell that the most important is the falcon bus, not the processor instructions cycles for now (in a second time for better accuracy ?)
>
>>> I think the 68020 and 68030 CPU's in WinUae are not complete nor accurate.
> There's quite good, but not perfect.
>
>> Actually, it's all you need. We're in a little bit delicate situation because we've got the DSP and no sync in some loops but in most cases, you need only the bus cycles to get right.
>
> Ok, but as the 68030 is the main clock of the emulator, it needs to be accurate (sound, DSP, video, ... are based on the CPU cycles).
>
> Let's say I write a program that plays DMA sound.
> I do a DIV loop forever in the main code (DMA plays the music).
> Hatari will render it badly, as the div is given as 46, 70 or 90 cycles, but this is a maximum (it's frequently much less).
> DMA won't play the correct number of samples, Video won't compute the VBL correctly.
> (I agree videl is still not here).
>
> The same is all instructions are approximatives. And, as you say, the same if bus cycles are not right.
>

I agree that using the cpu as the main clock is an approximation, but I 
think it should be tested and would give correct result in >95% of the 
cases.

Of course, a div would mess with the timings, but practically, are there 
many demos/programs that use the DSP with div/mul while communicating 
with the DSP ? I really doubt it, because even on a real falcon, the div 
would not take the same amount of cycles, so it would be very hard to 
have a cpu<->dsp link with no sync with such instructions.

Let's be pragmatic, I think there're a lot of program that can be fixed 
without needing absolute cycles count (on Falcon, cycles vary with cache 
too, and I don't think 100% of demos take the cache into account, yet, 
they still work).

>
>
> Nicolas, if you have any idea on how to take into account the falcon bus nicely, I'd be interrested to uderstand all of this better (it's hard for me to understand this for now).

This would add some complexity in the code, but moreover, it would slow 
down everything in huge amount.

In order to simulate proper bus cycle, you would need to split every 
680x0 instruction in "micro code" taking 2 or 4 cycles each, and each 4 
cycles for example, call "videl_update_bus" and "dsp_update_bus".

This means that you "refresh" DSP/videl/dma/floppy several time per 
68000 instruction to check if each component needs to update its state 
(depending on how many cycles elapsed).

WinUAE already has such a mode in 68000, where DMA cycles are shared 
exactly during each instruction.

But even then, this would not solve the "div" problem : as long as you 
don't know the exact cycles for a div, your code can eventually unsync 
with the dsp, no matter how the bus is handled.

The problem is that div is one of the most complex arithmetic operation 
at the cpu level ; it took a lot of time to Ijor to reverse the 68000's 
algo, and I guess the 68030 can be as complex (and I don't think we will 
find some doc on this, it's certainly covered by some patents and not 
publicly available)

I'm really convinced that very few demos/programs need a "sub 
instruction" bus behaviour ; a correct cycle count can be needed in some 
case, but updating each component every 2 or 4 cycles should not be 
necessary.

I would vote against forking winuae's code too much, unless any 
modification comes with a very simple test program that can be compiled 
and run on a real Falcon and Hatari and give reproductible results.

I really think this is the key to proper Falcon emulation : some very 
simple test programs must be written to analyse very precise point, 
unless this is done, I really think emulation will not evolve correctly.

This is not a critic to the work you made, I know it's complex matter, 
but it's just I don't see any other solution.
Additionaly, I don't have any falcon experience (and not a lot of spare 
time at the moment), so I'm afraid I can't help with this point.

Combining dsp + videl + cpu is too hard to analyse properly as a whole.

What would be great would be some very small program (such as the first 
simple program you usually write when you learn asm or a new computer) 
and check if those programs run the same under Hatari and a Falcon.


Nicolas



More information about the hatari-devel mailing list