Some notes about 386video....

IT'S LIMITED
The current 386video module does not exploit the full power
of XVD drivers.
I coded 386video with a generic interface (the interface won't change
in the future releases), but with the underlying code focused for
320x200 256 colors screen modes only.
If you want to exploit the full power of XVD drivers you'll have to
enhance 386video yourself (sorry i'm too much under pressure to do it now).

RAM BUFFERING and the VRAM BOTTLENECK

I use RAM BUFFERING to render each frame, first i compose the next graphic
image into SYSTEM RAM and then blit it to DISPLAY RAM.
The main reason to use RAM BUFFERING  is that display ram is usually SLOWER
than system ram, and usually display ram has lower i/o badwidth available
to the processor.
What's more, it is faster to cache system ram than display ram
(again the i/o bottleneck).
So if you have to access multiple times the display frame you are composing...
... it is better to render it on faster system ram and then copy it once
to vram.
Another reason to use ram buffering is that if you have only one visible
display page you can't use the double display page trick, and when you
update the display you have to be the fastest you can be.
If you have only to copy the buffered image you are sure to use the fastest
update method.

On some systems, a big cache and a good bus interface makes vram
look as fast as system ram, but your program has to run even on
"weak" systems with vram bandwidth bottlenecks.

DELTA BLITTING:

Plain ram buffering works well if system ram is a lot faster than vram
AND you don't have bus bandwidth bottlenecks.
Plain ISA bus has a 16bit width (8bit for some cards)
and a standard 8Mhz clock, this translates to a 1..2 Mbyte/sec bandwidth
when copying from memory to memory, while a plain 386/25 has at least a
4..8 Mbyte/sec available bandwidth when accessing system ram.
Some systems support some "speeded up ISA" bus (mine can run ISA at
12Mhz, others support internal buffering and "fast cycling") but 
even if you run on a 120 MIPS Pentium, with an ISA (8bit or 16bit) card
you can't go far.

The answer to these bottlenecks is DELTA BLITTING, instead of
blitting all the display page, blit only the differences between
the previous display frame and the next.

Usually there are strong correlations between the image already displayed
and the next still into system ram, so it is possible to boost
animation speed a lot.

The speed of my 'test program'  was 23..24 Frames Per Second (FPS)
with "simple ram buffering" while it skyrocketed to 56..60 FPS
when turning on delta blitting.

Of course your mileage may vary, it depends on the programmer to 
set up the appropriate "delta clipping" methods depending on what kind
of animation you perform.

Maybe you are thinking "HA! Now there are VL-BUS and PCI
i don't need to program for fucking old ISA ...".
Well, given the current trend, the VL-BUS and PCI buses you think are fast now
are gonna be a bottleneck to a 300Mhz SSPHARK
(SuperScalar Processor from Hell with Advanced Risky Killpower ;) )
driving a 4096x4096 24bit color mode on a 100 inches display.

HE TOUCHMAP

The bitmap you render on system ram contains the image you want on screen,
if you want to "blit only the differences" you have to store
some information to "remember" from a frame to another what's changed.
I call this structure a TOUCHMAP, every time you modify (touch) the bitmap
store some info on the touchmap, so when you will have to delta-blit
you will use the touchmap to see where are the altered pixels to blit .

If you want speed, the "touchmap composition" has to be an algorithm
of O(n) computational complexity  (linear) and the overhead has to be
the least possible.

I evaluated various touchmapping methods, here comes the one i choosed:

        A bit-equivalent mask of the display bitmap where
        ONE BIT in the TOUCHMAP
        "marks" FOUR PIXELS (A DWORD) on the BITMAP.

The bitmap/touchmap size ratio is around 32/1 (quite good)
and the touchbits are packed into DWORDS  (so, when you "touch", you use
the massive speed of 32bit transferts, instead of slow bit-by-bit things).
Using loop unrolling you can pump data to the video card at full speed.

Nota bene:

The touchmap is an ARRAY OF DWORDS, each bit into a dword is a flag
for four consecutive pixels.
The touchmap has as many rows as the logical display screen height in pixels
and as many BITS as logical_display_screen_width/4 in pixels
rounded up to a 32 multiple (so the lenght of a touchmap row can be
expressed in dwords).
When you manipulate the touchmap ALWAYS USE DWORD ACCESS, this is
an absolute need to minimize "touchmap updating" overhead.
I've tested various methods, the "dword sized" touchmap is faster
than anything else on a 386 class processor animating lots of independent
objects.
This is due to the 32 to 1 ratio between actual pixel data and touchmap data
and to the "always aligned dword" access you can use with this method.
To further reduce memory usage i use a self-compiling "loop unroller"
this way, instead of checking each bit i check a byte and call
the appropriate "unrolled loop" for it.
With this method i perform only one compare and call
instead of eight compare and branch (this keeps my 386 happy
because the less the jumps the more the pipeline stays filled and running)

WHY 256 COLORS ONLY

The 8bit/pixel  modes are the less processor intensive you can find
(this means lots of speed), 256 colors are good enough for most games.
You can blit 4 dots in a single memory access, mask quickly
and implement fast compression/decompression methods if needed.
If you think 16 or 24 bit/pixel modes are nicer to look at, you are right
but the most common display cards use dynamic ram and this means the
higher the video refresh bandwidth and the lower the cpu/blitter bandwidth.
What's more, i want things capable to run in 4Mbyte, having bitmaps
with two to four times the size of plain 256 color ones is no good.




For further info and explanations look into 
386video.asm, 386video.inc, driver.txt, xvd.txt, makefile
and the XVD driver sources (for example chips450.asm)

Ciao,
   Lorenzo Micheletto   knight@maya.dei.unipd.it