[hatari-devel] Hatari patches for bitplane conversion

Kåre Andersen kareandersen at gmail.com
Tue Jul 21 16:22:27 CEST 2009


2009/7/20 Thomas Huth <huth at users.berlios.de>:
> On Mon, 20 Jul 2009 22:25:42 +0300
> Eero Tamminen <oak at helsinkinet.fi> wrote:
>
>> CC'ing hatari-devel as this is of generic interest and discussed
>> earlier on the list.
>>
>> On Monday 20 July 2009, Kåre Andersen wrote:
>> > On Sun, Jul 19, 2009 at 4:57 PM, Eero Tamminen<oak at helsinkinet.fi>
>> > wrote:
>> > > On Saturday 18 July 2009, Thomas Huth wrote:
>> > >> since you've asked for them in another mail, here are the
>> > >> convertion function from Kaare.
>> > >> For me, they are slower than the old functions, so I didn't
>> > >> include them yet (and I am still waiting for some feedback from
>> > >> Kaare, so I also don't include them as an alternative yet).
>> >
>> > Just remember, they are not hand optimized - if you want speed out
>> > of them you will want to fine tune GCC optimization flags. There is
>> > no reason they should be slower in the actual conversion, because
>> > they fit CPU caches a lot better than the old code...
>
> Sorry, but if you're talking about recent CPUs with 2 MB cache or even
> more, this is not as valid as it might be for older CPUs (where the
> old code seems to be faster anyway): The screen buffers which are used
> by the old code take about 800 kB and should fit into a 2 MB cache as
> well!

Ok, I am no expert on Intel architecture, but I can tell you this
much: 2MB cache does not mean 2MB that you can freely fill up with
whatever code and data you wish. The level one caches are usually
64Kb, of which half and half are data and instruction. That means, we
have 32Kb to fill with lookup tables before a miss occurs. Also,
please remember we are running under pre-emptive multitasking
operating systems, which means a context switch can happen at _any_
time, discarding our cached data.

> I also don't think that this problem is related to CPU caches - since
> it does not occur on Linux or Windows with modern CPUs! The problem
> must have something to do with Mac OS X exclusively.

That is not what I have been saying at all. What I am saying is, this
new code fits better - that is all.

>> I haven't found these extra GCC options to help much, at least if you
>> apply them to whole Hatari (for example although --funroll-loops
>> helps sometimes, generally it just slows down things as it makes code
>> larger so that it doesn't fit as well to cache).
>
> That's also too system specific. We shouldn't include such specifc
> optimization flags into the default Makefile.

Indeed. Which brings up another point: Why are the converters
#include'd into one source file rather than being compiled as separate
units as would make so much more sense? It really messes up the whole
structure of the code, as they are neither inline functions nor simple
headers...

>> > (Point being,
>> > most demos and games, like i have already stressed, most of the time
>> > don't leave the frame buffer alone from frame to frame).
>>
>> Note that many games and demos don't write to the whole (overscan)
>> screen. Or they can update screen only every other frame.  Game menus
>> can also be static or only partly animated.
>>
>> So the comparison might still be faster with many/most demos & games.
>> And it preloads the line contents to CPU cache for conversion. :)
>
> Right. And I still can't think of a really good reason why code should
> suddenly execute slower on newer CPUs. I _really_ don't think this
> problem has something to do with CPU caches. It must be a problem with
> SDL on Mac OS X or something similar.

Again, this is the case, like I have tried to explain before: SDL on
OS X is crap due to compositing. But helping it suck a bit less when
it does not hurt the other platforms cant be a bad thing, can it? Of
course for this statement to be true we need to do the memcmp() thing
so as not to write unnecessary amounts of data.

>> > > * On platforms where SDL updates don't incur Vsync, the screen
>> > > updates could be done so that screen-blitting is skipped for
>> > > lines that haven't changed.
>
> How do you want to detect that SDL feature? Hard-coding it with #ifdefs
> is a bad idea, IMHO.

I guess there are several ways to do this, including the check for
hardware surfaces. The safest way should nevertheless be to do a bit
of profiling on buffer creation (that is, program start _and_ screen
mode changes). You can do a wait for vsync, and see how much time will
pass in between. If its shorter than a given threshold, say 50Hz, then
you dont have any vsync... A similar test is already done to get fine
granularity cycle timing (and the comments about OS X in that part of
the code are wrong mind you - we have HPET just as much as linux do).

>
> Another question, Kåre, did you already had some spare time to do the
> tests that I asked you to do? (Testing Spec512 screens etc.)

Nope, sorry, not been in OS X much lately tbh - been using real
Ataris, going to Atari/Demoscene parties and been busy fathering my 4
months old son :)

But the holidays are coming to and end, so here I am again :)

-Kåre



More information about the hatari-devel mailing list