[hatari-devel] Hatari patches for bitplane conversion
Eero Tamminen
oak at helsinkinet.fi
Mon Jul 20 21:25:42 CEST 2009
Hi,
CC'ing hatari-devel as this is of generic interest and discussed
earlier on the list.
On Monday 20 July 2009, Kåre Andersen wrote:
> On Sun, Jul 19, 2009 at 4:57 PM, Eero Tamminen<oak at helsinkinet.fi> wrote:
> > On Saturday 18 July 2009, Thomas Huth wrote:
> >> since you've asked for them in another mail, here are the convertion
> >> function from Kaare.
> >> For me, they are slower than the old functions, so I didn't include
> >> them yet (and I am still waiting for some feedback from Kaare, so I
> >> also don't include them as an alternative yet).
>
> Just remember, they are not hand optimized - if you want speed out of
> them you will want to fine tune GCC optimization flags. There is no
> reason they should be slower in the actual conversion, because they
> fit CPU caches a lot better than the old code... With a bit of
> --funroll-loops and such, they should be killer... 8)
I haven't found these extra GCC options to help much, at least if you apply
them to whole Hatari (for example although --funroll-loops helps sometimes,
generally it just slows down things as it makes code larger so that it
doesn't fit as well to cache).
Changing Hatari build to apply such flags only to specific source files
would need to be done in a way that allows specifying the flags separately
in top Makefile.cnf (so that other compilers can override them).
> > Some comments on the code design.
> >
> > Nowadays more and more of the computers will be battery powered so
> > reducing the *average* CPU consumption is IMHO more important on the
> > long run than the occasional peak loads.
>
> But when the peak loads are not occasional, but constant, this makes
> very little sense. Are you doing Hatari for the sake of enabling
> people to run GEM applications, or to run games and demos?
Both. There are people using Hatari to run GEM programs. I do it
occasionally too.
(Other use-case for lower average CPU usage is that --fast-forward will
then forward things faster. I do a lot of testing for Hatari and being able
to pass Atari boot & starting things from GEM faster is nice.)
> Because i think Aranym has some head start on the former... ;)
...But point taken. :-)
> (Point being,
> most demos and games, like i have already stressed, most of the time
> don't leave the frame buffer alone from frame to frame).
Note that many games and demos don't write to the whole (overscan) screen.
Or they can update screen only every other frame. Game menus can also
be static or only partly animated.
So the comparison might still be faster with many/most demos & games.
And it preloads the line contents to CPU cache for conversion. :)
> One thought
> that springs to mind, is to detect if we are in GEM or not, say by
> comparing screen address changes (GEM wont do that, most other stuff
> will), and switch comparison modes by that way - if the user enables
> it as an option naturally.
If going this route, I think it would be simplest just to use the comparison
always for the monochrome and VDI resolutions.
> > The new code will always do conversion whereas current Hatari code
> > converts only the changed parts of the screen. I.e. it optimizes for
> > the worst case instead of the general case.
> >
> > I think there could be a better compromise between these two
> > extremes...
>
> Sure there can - it just needs to be coded :)
Basically you would just need to add in the beginning of the loop:
-------
+ for (y = 0; y < no_lines; y++)
+ {
-------
something like this:
-------
if (memcmp(stplanes, stplanes_copy, no_words) == 0)
{
stplanes_copy += no_words;
stplanes += no_words;
continue;
}
-------
For easier measuring of the effect of this, it could be a macro that's
easy to disable:
#define CONTINUE_IF_NO_SCREEN_CHANGES(stplanes,stplanes_copy,no_words) \
...
> > Instead of Hatari checking word-by-word whether the areas have been
> > changed, it could be done line-by-line. This allows several
> > significant optimizations:
> > * Checking gets out of the conversion inner loop.
> > * Checking can be done with the standard C-library memcmp() which
> > hopefully has been hand/ASM-optimized for most platforms.
> > * On platforms where SDL updates don't incur Vsync, the screen updates
> > could be done so that screen-blitting is skipped for lines that
> > haven't changed.
>
> Yeap, this is a good idea. As already suggested by me.... ;)
>
> > The palette stuff is also line based, so I think this would fit into
> > the conversion routines quite nicely.
>
> This is what I am doing for OpenGL rendering - which I will get back
> to hacking at any day now... Having scaling for free is super
> enjoyable - and as a bonus, the SDL mouse grab problems just go
> away... :)
How it helps with mouse grabs?
> > Other comments...
...
> Sure, but converting once and then scaling should be a lot more
> efficient than converting several times for the same line. And for
> OpenGL, this will just go away. Oh, and memcpy() wont do for
> horizontal scaling anyway. What would work, would be a scaler in a
> separate routine from the converters, so as to avoid even more
> code duplication and allow more scaling than just double ST low...
> Hint: I use 1920x1200 on my main screen, and this is not uncommon any
> more... High res (HD) monitors are really cheap now.
For best effect one should run things fullscreen. Then the scaling would be
done by your LCD monitor i.e. already HW-accelerated. :-)
If one uses non-fullscreen, then something like what you propose would be
nice. Some devices have really high DPI values (e.g. my Nokia N810 has DPI
of 226 but doesn't provide other X server resolutions or display HW-scaling
for SDL. Which is pretty annoying as N810 is a bit underpowered for
running Hatari even with heaviest Hatari options turned off).
> > * Why this change:
> > ----------
> > /* Bits per pixel */
> > - if (STRes == ST_HIGH_RES || bUseVDIRes)
> > + if (bUseVDIRes)
> > ----------
>
> Because I have implemented my conversion routines for high res as
> well, so there is no reason for high res screen conversion to try and
> force an 8bpp frame buffer any more - this is generally a very bad
> idea anyway, since people don't run indexed modes on X (and definitely
> not GDI/Cocoa) anymore. I hope...
SDL converts it automatically from 8bpp to whatever is needed, but direct
route is sure nicer.
> I will write some test programs to run from the AUTO folder right now.
> Sitting in linux atm,
Linux is fine. I think most people on Hatari mailing lists have Linux
machines (I don't have others, even the N810 runs Linux), so I think having
the routines faster there is more important than having them faster on OSX
(or Windows).
> so I wont be testing current Hg on OS X until
> tomorrow or so. Has it settled down again now? (freeze bugs that is).
The easily triggerable asserts are fixed. One rare one is still loose
as we don't know yet why/how HBL gets wrong in that case.
> Anyway, if the tests turn out ok, i will ship them tonight. If not,
> expect them soon :) Got a small demo (noise intro really) off my chest
> this weekend, so Hatari is getting more of my attention for a while
> now :)
Great, and thanks for the patches!
I'll try to test your stuff tomorrow.
- Eero
More information about the hatari-devel
mailing list