[hatari-devel] Hatari patches for bitplane conversion

Thomas Huth huth at users.berlios.de
Mon Jul 20 22:58:32 CEST 2009


On Mon, 20 Jul 2009 22:25:42 +0300
Eero Tamminen <oak at helsinkinet.fi> wrote:

> CC'ing hatari-devel as this is of generic interest and discussed
> earlier on the list.
> 
> On Monday 20 July 2009, Kåre Andersen wrote:
> > On Sun, Jul 19, 2009 at 4:57 PM, Eero Tamminen<oak at helsinkinet.fi>
> > wrote:
> > > On Saturday 18 July 2009, Thomas Huth wrote:
> > >> since you've asked for them in another mail, here are the
> > >> convertion function from Kaare.
> > >> For me, they are slower than the old functions, so I didn't
> > >> include them yet (and I am still waiting for some feedback from
> > >> Kaare, so I also don't include them as an alternative yet).
> >
> > Just remember, they are not hand optimized - if you want speed out
> > of them you will want to fine tune GCC optimization flags. There is
> > no reason they should be slower in the actual conversion, because
> > they fit CPU caches a lot better than the old code...

Sorry, but if you're talking about recent CPUs with 2 MB cache or even
more, this is not as valid as it might be for older CPUs (where the
old code seems to be faster anyway): The screen buffers which are used
by the old code take about 800 kB and should fit into a 2 MB cache as
well!

I also don't think that this problem is related to CPU caches - since
it does not occur on Linux or Windows with modern CPUs! The problem
must have something to do with Mac OS X exclusively.

> I haven't found these extra GCC options to help much, at least if you
> apply them to whole Hatari (for example although --funroll-loops
> helps sometimes, generally it just slows down things as it makes code
> larger so that it doesn't fit as well to cache).

That's also too system specific. We shouldn't include such specifc
optimization flags into the default Makefile.

> > (Point being, 
> > most demos and games, like i have already stressed, most of the time
> > don't leave the frame buffer alone from frame to frame).
> 
> Note that many games and demos don't write to the whole (overscan)
> screen. Or they can update screen only every other frame.  Game menus
> can also be static or only partly animated.
> 
> So the comparison might still be faster with many/most demos & games.
> And it preloads the line contents to CPU cache for conversion. :)

Right. And I still can't think of a really good reason why code should
suddenly execute slower on newer CPUs. I _really_ don't think this
problem has something to do with CPU caches. It must be a problem with
SDL on Mac OS X or something similar.

> > > * On platforms where SDL updates don't incur Vsync, the screen
> > > updates could be done so that screen-blitting is skipped for
> > > lines that haven't changed.

How do you want to detect that SDL feature? Hard-coding it with #ifdefs
is a bad idea, IMHO.


Another question, Kåre, did you already had some spare time to do the
tests that I asked you to do? (Testing Spec512 screens etc.)

 Regards,
  Thomas



More information about the hatari-devel mailing list