Safbench v2.13
                        Compiled 6 July 1997
         32bit protected mode computer, main and video memory
          speed test program for 80386+ CPUs and 387+ FPUs

		 (C) Copyright 1997 by Sami Farin
			All Rights Reserved
		      mailto:sfarin@ratol.fi
		   http://www.ratol.fi/~sfarin/




    Index
      1.   What's this?
      2.   Requirements
      3.   Usage
      4.   About Safbench
      5.   Command line arguments
      6.   Technical (?) stuff
      7.   Tables
      8.   VESA info abbrevs
      9.   Errorlevels
      10.  Final words





[1.] - What's this?

      Safbench v2.13 is a computer speed test program. It runs in 32bit
   protected mode (using DOS32 v3.3 by Adam Seychell). Safbench is
   programmed completely in assembly, because FOR EXAMPLE mixing integers
   with FPU code like it's done in Safbench isn't possible in other languages
   (Simple Arithmetic FPU and Mixed FPU&CPU contain the same FPU instructions,
   but Mixed one has some integer between FPU operations).
   CPU, FPU, main memory and video memory speeds can be tested.
   Safbench hooks the timer interrupt to achieve stabile results with
   every BIOS and (D)OS configuration, TSRs don't get called in Safbench.
   Timing accuracy is about one microsecond (assuming you don't have a buggy
   or shitty 8253/8254 timer chip).



[2.] - Requirements

      Safbench needs 80386 or a better CPU with 387 or a better FPU.
   287 doesn't have the needed instructions needed for some tests.
   About 4MB of free mem is needed, if you use option 'M' to test
   memory speed. If you use option 'V' or 'F', Safbench needs as much memory
   as the largest video mode (for example, 1600x1200x16M needs 7500 kB)
   consumes video memory.
   VBE v2.0+ is needed for 'V' and 'F'.



[3.] - Usage

      This is a VERY user-friendly program. You only have to type 'safbench'
   and press enter (assuming you are using a command interpreter).
   See chapter [5.] for more info.



[4.] - About Safbench

      Safbench tests your CPU and FPU very thoroughly. It uses almost every
   instruction you've got and them are balanced so that most used instuctions
   used in the applications are also most-used in Safbench.
      But every application uses CPU and memory differently, so there can't
   be a single computer speed test program which tells the absolute and
   correct speed of your computer. Safbench tries to do it by testing
   CPU and FPU with different kind of code. It also tests memory with
   different block sizes. It reads, writes and moves in sequential,
   reverse and butterfly order using block sizes 1 kB - 4 MB.
   So you can test how fast your new SDRAM (or something) is.
   About the memory move routine: when it says 4 MB, it means that it
   uses about 4 MB of memory. It copies 2 MB from the start of the buffer
   to the middle of the buffer. So it reads 2 MB and writes 2 MB.
      Butterfly accessing (when reading and writing) means, that first it
   reads 32bits from start of the memory block and then from end of the
   block, then from start+4bytes and end-4bytes, start+8bytes and end-8bytes
   and so on, try to understand it... If you see butterfly read speeds which
   are faster than when using sequential reads, it's because your CPU fills
   the whole cache line when the requested data item is not present in the
   CPU's internal cache (the line is 32B on Pentiums). Butterfly routine:
    1) read from start of buffer
    2) write to end of buffer
    3) write to start of buffer+4bytes
    4) read from end of buffer-4bytes
    5) increase start buffer by 8 and decrease end buffer by 8
    6) goto 1
      Random accessing first makes a table, which contains randomly
   generated pointers to different places to memory. 'Mixed' test does
   complex memory accessing; it reads, writes and performs "read/modify"
   and "read/modify/write" instuctions (for example additions, shifts,
   and bit operations). For example, when testing 'Mixed' 4MB, Safbench
   reads one 32bit pointer (random) from memory and then reads/writes/etc
   to that position (and around it) and then gets the next pointer...
   The pointers are STORED sequentially but they point to random places
   in memory. 4MB test actually uses 8MB. MB/s value for "Random accessing"
   is calculated based on the ACTUAL numbers of BYTES accessed, including
   reading the pointers. MB/s value for Reads and Writes is based on
   bytes read or written, pointers not included. For Reads and Writes,
   the whole cache line (32 bytes) is accessed, so write speeds to memory
   are fair on CPUs which allocate new cache line on write miss (such as
   M1 and PPro/PII).
      If you want to see how fast main memory you've got (for example,
   after upgrading from FPM to SDRAM), disable L2 cache and run
   'Qsort32 16M' and 'Safbench R'.

      If you've been using CacheChk or something to test memory read speeds,
   it gives lower results (at least for 1st level cache), because it uses
   string instructions to read data (REP LODSD). Safbench uses a faster
   method. Your CPU might have 1st level data cache access speed less than
   1 cycle per read or write, so it can do (as much as) two reads in parallel.
   Actually, Pentium+ class machines are capable of doing so. REP LODSD is
   stupid way to check cache speed, since NO PROGRAM uses REP LODSD to read
   from memory (if it does, it's a braindead program anyway). LODSD takes
   at least one cycle with every CPU and it isn't pairable, so you CAN'T
   read two double words in one cycle.
      Pentium Pro has 256 or 512 kB 2nd level INTERNAL cache, which is damn
   fast. A lot more faster than any PB-cache. It's because it runs at core
   speed. Pentium II's 2nd level cache is SRAM, it runs at half the core
   speed (266 MHz CPU has L2 cache running at 133 MHz...).

      "Simple integers" tests simple CPU instructions, such as additions,
   logical operations, jumps, memory and bit manipulation. Most of them
   execute in one cycle and in parallel on a Pentium-class machine.
      "Complex integers" tests multiplications, divisions, bit scans, string
   instructions, complex memory addressing and manipulation etc. Try running
   6X86OPT if you get poor values on a Cyrix 6x86 in this test, it enables
   various options, such as N_LOCK.
      "Simple arithmetic FPU" tests additions, subtractions, multiplying,
   dividing, comparing, sign changes, loads and stores with different memory
   sizes and values. This kind of instructions are the most-used in most of
   the applications and games (such as Quake :) ).
      "Mixed FPU&CPU" tests the same instructions as the previous test, but
   does some stuffs with the integer unit while the FPU is processing data.
   386 can't do nothing but wait for the 287/387 to complete it's task, 486+
   can use the CPU while the FPU is processing data.
      "Transcendental FPU" tests transcendental instructions. Them include
   sine, cosine, tangent, arctangent, sine and cosine, 2^x-1, Y*log2(X) and
   Y*log2(X+1). 24 different values are tested for every instruction. These
   are the most time-consuming instructions you've got on your FPU.
      "Non-transcendental FPU" tests the other arithmetic instuctions not
   covered in "Simple arithmetic FPU", "Transcendental FPU" and "Processor
   control CPU" tests. For example, square root, scale, extract, remainder
   and integer part. 16 different valid values are tested with this test.
      "Processor control FPU" contains some system-managing stuff, such as
   FPU initializing, FPU state store and restore, environment saving and
   restoring, control word store/load and status word store and exception
   clearing and constant loading (0,1,pi,different log-values), BCD loads
   and stores, examine. Speed of these instructions isn't a big deal when
   comparing FPU speeds. These ARE used, not very much, though. For example
   FPU state store/restore is used in multitasking environments to store FPU
   state when switching tasks.
      "16bit speed" encodes binary data to MIME64 and does a lot of stuffs
   with mixed 8, 16 and 32 bit registers. This test shows how well your CPU
   can execute 16bit code with prefixes and such things. PPro has serious
   trouble with this test, as does PII. "Partial Register Stalls" get
   generated on them quite a lot.
      "Average speed" is average of the eight tests compared to a selected
   microprocessor family (386, 486, Pentium, Cyrix 6x86, PPro, PII, K6).

      64bit precision is used in calculating (but it affects only to
   instructions FADD, FSUB, FMUL, FDIV and FSQRT). On Pentiums, only FDIV
   executes quicker when lower precision is used. With Cyrix 6x86, every
   FPU instruction takes the same amount of clock cycles, no matter what
   the precision is.

      Video memory speed test tests your video card memory's write speed
   and moves from main memory (or cache) to video memory. Only graphics
   linear framebuffer modes are tested. If your card doesn't support them,
   that's your problem. Without LFB, bank switching must be performed after
   every 64 kB. Using S3VBE20 or Univbe is a wise idea to increase performance,
   since bank switching is not very fast. Write speed test tests the absolute
   maximum write speed of your video card in every possible mode. The screen
   might look complex and colorful, but written data is changed only after
   the whole screen is filled. If you want to see, for example, what video
   card is "the best for "DOS games", test the cards and buy the one which
   has the best values for "Write" at the resolution and color depth you are
   using the game. Safbench doesn't test 3D speed nor GUI-acceleration speeds.
   For example (again...), Quake calculates the screen in main memory(+caches)
   and then moves it to video memory. It is where the write speed of the video
   card is needed.
      If you don't like the CLICKs when swithing video modes, turn off your
   monitor...



[5.] - Command line arguments

   Here's the possible options...

   ?: Help help help help help...

   M: Test memory speed (needs 4112 kB free mem, but if 8224 kB is available,
      also Random access speed is tested)
      See Chapter [4.] for more info (line 68).

   R: Test only Random access speed (needs 8224 kB)

   !: Use 31 different block sizes for Random accessing test when using
      'M' or 'R'

   B: Test BUS speed (MHz), multiplier, MHz-rate and main mem read/write speed
      Use this on Pentiums (with or without MMX). Sorry, but Bus-value isn't
      correct, even though it counts the number of cycles during which the
      processor's external memory bus is in use (according to Intel).
      Could you please tell me how to make it calculate the REAL bus speed!
      4MB block size is used for reading.
      Also number of bus cycles and CPU cycles to read/write one quadword
      (64 bits) is displayed (RAM bank is 64 bits wide).
       -bus cycles/quadword=(bus speed (MHz) / (MB/s*1,048576/8))
       -cycles/quadword=(CPU speed (MHz) / (MB/s*1,048576/8))

      Use 'B' or 'P' with option 'C' to test cache accessing (256 kB block
      size). I hope nobody has 128 kB (or less) L2 cache on a Pentium!

   P: Same as 'B', but for Pentium Pros and Pentium II's.

   V: Test video memory speed (needs VBE v2.0+ with at least one LFB mode)

   F: Same as 'V', but use 64bit accessing with FPU (intended for Pentiums)

   F:xxxx or V:xxxx tests VESA mode (in hex), for example 'F:105'

   I: Display info about every VESA mode

   S: Show every graphics linear framebuffer mode which can be tested

   W: Wait one second after mode set (default is no wait)

   0: Compare your CPU and FPU speeds to Intel 386SX 20 MHz with Cyrix FasMath

   1: Intel 486DX 33 MHz

   2: Intel Pentium 90 MHz

   3: Cyrix 6x86-P120+ 100 MHz

   4: Intel Pentium Pro 180 MHz

   5: Intel Pentium II 266 MHz

   6: AMD K6 200 MHz

      If you've got Pentium MMX, PPro, PII or Cyrix 6x86MX, I'd be pleased
   to receive the test results! Run batch file 'MAKERES.BAT'and send it to
   me so I can include them in the next release of Safbench. With no stupid
   windows in the background. ('MAKERES1.BAT' is for Pentiums, 'MAKERES2.BAT'
   is for PPro and PII CPUs, 'MAKERES.BAT' is for 6x86, 6x86MX, K5, K6 etc.)
     Run the "usual" speedup programs when getting the results: 6X86OPT (ALSO
   with option '-l'), S3VBE20, MCLK, FASTVID and so on.



[6.] - Technical (?) stuff

      All the eight tests in Safbench use <4 kB of memory each, so speed
   of 2nd level cache nor main memory affects the result. To see how fast
   memory subsystem you've got, try command line option 'M' or run Qsort32.

      Video copy test in Safbench moves data from main memory or cache to
   video memory. Depending on how much the video mode uses memory, READ speed
   of your 1st level cache, possible 2nd level cache and main memory limits
   the copy speed result. If you have 256 kB cache, you probably get constant
   copy speeds for video modes which use less than 256 kB memory. When you
   test 640x480x256c, it needs 300 kB memory, so main memory speed limits
   the maximum moving speed and speed of 2nd level cache doesn't matter in
   that case (very much). On my S3, 16M modes are a lot slower than other
   modes. Read speeds from video memory are not tested, because you DON'T
   NEED TO KNOW IT. If some program reads from your video card, it's a
   stupid program or it reads just a little bit, for example mouse cursor
   updating.

     If your processor can write to video mem in 64bit chunks (PPro with write
   combining enabled, Cyrix 6x86 with write gathering enabled or Pentium when
   FPU is used to write 64bits per one instruction), you get better results for
   both write and copy speeds in Safbench. If you get a lot better values with
   option 'F' than 'V', your processor doesn't support write gathering/combining
   (or it isn't enabled) and datapath to video mem is 64bits. When using PPro
   or Cyrix 6x86 (or newer...), you should get about the same values for write
   speeds with both 'F' and 'V'. Copy speeds can be a bit different; if 'F' is
   faster, you've got relatively quick FPU. It also means that reading in 64bit
   chunks with FPU is faster than reading in 32bit chunks with CPU. If 'F' is
   slower, you've got slow FPU and/or datapath to memory is 32bits.
   FLD [QWORD ESI+i] [1] and FSTP [QWORD EDI+i] are used for copying from main
   memory or cache to video mem and FST [QWORD EDI+i] [2] is used for writing.
   [1]: ST(0) contains a valid non-zero value
   [2]: ST(0) contains a valid non-zero value and it's increased by a big
        number after every screen written
   Option 'F' is intended to be used with Pentiums (no PPros), since it can't
   do write gathering. Video write test in Safbench is supposed to tell you
   the maximum throughput of a particular video card with every processor.
   You don't necessarily get better results with 'F' on Pentiums, though.
   That's why FILD and FIST(P) aren't used to copy/write data, since they are
   a lot slower than FLD and FST(P). (FLD and FSTP can't be used in the
   "real world" to copy data, since a loaded value can be NaN, Denormal etc.)

   To get max speed out of your video card, I suggest you the following:
    -First, use S3VBE20 or UniVbe if VBE v2.0 isn't already supported
    -If you've got Cyrix 6x86, run 6X86OPT -linbuf. It modifies Address Region
     Register (enables write gathering for linear frame buffer). It can almost
     double the write speed to the LFB. VBE v2.0+ needed.
    -Try MCLK, overclocking might give speedups. I overclocked from 60 to 85
     MHz and it speeded up writes from 25 to 120%! Don't overclock too much,
     since it might cause lockups in win etc, when the driver doesn't get
     the data it expects to get. Speeds over 85 MHz don't give any benefits
     in anything. I noticed that using option /1 3 (use FPM timing) was faster
     than the default 2-cycle EDO (with my video card!).
    -Use FASTVID or something similar proggy on a PPro if your BIOS doesn't
     turn on some bits which affect performance (DON'T use it if you have a
     buggy chipset, such as Orion 450GX rev. A2).

      I use MCLK and 6X86OPT and they speed up writes to video mem from 1.7
   to 2.6 times (average being 1.95). Now my S3 Trio64 PCI runs at 85 MHz and
   writes 67 MB/s in 320x200x256c mode. That's 17 MB more than my main memory.
   Also Windows speed is increased (due to the overclocking) by 10-85%.

      About the dump 'V' and 'F' display:
   Mode=VESA mode (hexadecimal). BPP means Bits Per Pixel, 4=16 colors,
   8=256 colors, 15=32K colors, 16=64K colors, 24=16M colors, 32=16M too.
   BPSL means Bytes Per Scan Line, which when multiplied by Height is the
   number of bytes that the video mode takes memory. FPS is Frames Per Second
   (MB/s rate divided by BPSL*Height). Write and Copy speeds are in megabytes
   (1048576 bytes). Hz means vertical refresh rate in the case you didn't
   guess it! Windows(tm) Screws Up(tm) that result, too.

   The test goes like this:
   1) Switch to the video mode to be tested
   2) Fill video memory with some data
   3) Goto 2 if one second is not elapsed yet (waits for the screen to "settle
      down") if option 'W' is used
   4) Write to the video memory (clears the whole screen with different colors)
      This test lasts for about 0.3 seconds. That's enough. Testing one minute
      doesn't give any more better results.
   5) Move from main memory (or cache, depending on the mode&cache size) to
      video memory. Data is moved once before starting the timer to flush
      caches etc. This takes 0.3 seconds, too.
   6) Check Hz-rate (0.1 seconds?)
   7) Store results (0.01 microseconds?)
   8) Goto 1

   MCLK homepage: http://www.oac.uci.edu/~rliao
   S3VBE20 homepage: http://www.uni-muenster.de/math/u/mesched
   UniVbe (Display Doctor) homepage: http://www.scitechsoft.com/down_sdd.html

      You probably get better values when the computer is "cold", just turned
   on. When it warms up, write speed of every video mode decreases by ~0-6%.
   If someone has a nice technical explanation for this little thing, I'd
   be pleased to hear it.


      FPU tests in Safbench calculate _RAW_ power of your FPU, "real world apps"
   ALWAYS include some integer and mem accessing. So the FPU-results are
   almost-worst-case-results.

      Let's take an example. PovRay (the version 3 _I_ have) uses Pentium-
   optimized code. It means that is uses FXCH-instruction to swap the
   FPU registers so that the FPU can overlap instructions better, since
   most of the FPU-instructions need ST(0) (TOS, Top Of Stack).
   8087+ FPUs have eight registers, ST(0) to ST(7). If an instruction
   needs the result that the previous instruction used, overlapping is
   not possible. FXCH is optimized so well on Pentiums, that it can be
   executed even while the register is in use. So Pentium-optimized FPU-
   programs use FXCH quite a lot. That's a big speed decrease for other
   than Pentiums. For example, 387 needs 18 cycles for FXCH. Addition takes
   23-37 cycles. 486 needs 4 for FXCH and 8-20 for addition, Cyrix 6x86 needs
   3 and 4-9. Pentium CAN execute them BOTH or just the ADDITION in 3 cycles.
   Thus it's a big speed decrease to use Pentium-optimized code on other
   than Pentiums. Pentiums are also able to overlap FPU instructions, so
   it's possible to execute FADD, FSUB, or FMUL in one cycle (assuming the
   result for those operations is not needed in three cycles).
   Safbench uses FXCH only just a little.
      So... PovRay would run faster on 387, 486 and Cyrix 6x86 if it didn't
   use FXCH to achieve speedups on Pentiums (yes, I know there are still
   compilers which support 486-optimizations).

      Oh well. Then there's Quake. That's also an example of a program, which
   uses FPU very extensively, and it's optimized for Pentiums. So you are
   probably not surprised that it performs poorly on Cyrix 6x86. Please
   note that Pentium 120 runs at 120 MHz, but Cyrix 6x86-P120+ runs at 100 MHz.
   If Quake was optimized for, let's say, 486, Cyrix 6x86-P120+ wouldn't seem
   so slow when compred to Pentium 100. Quake also does integer calculation.
   Quake is so fast on Pentium Pros, because it's got VERY fast 2nd level
   cache (read speed about 400 MB/s on PPro/180) and CPU and FPU are about
   20% faster than on Pentiums (with same MHz-rate). PPro has a nice technique,
   which allows 64bit moves in one cycle.
      I've read articles about Quake and Cyrix 6x86, where was stated, that
   Quake uses FPU only to BUFFER the data, but I don't agree. After debugging
   Quake, I found that it actually did something 'useful', such as multiplying
   and addition. I didn't see it move data via the FPU in 64bit chunks.
   Actually, it doesn't move data all day long, except to video mem.
   Cyrix 6x86 does write gathering (so data is moved in 64bit chunks anyway,
   even though a software uses 32 or 16bit move), so it's best to use CPU to
   move the data (REP MOVSD)... And the same goes for Pentium Pros.
   But Pentium doesn't do write gathering, so it might be faster to use
   the FPU to move data. Quake is optimized purely for Pentiums, it takes
   advantage of FXCH being pairable with some FPU instructions, so it can
   be executed with no extra cost. On a 6x86 FXCH takes 3 clocks.
      Also, main routine in Quake is Pentium-optimized assembly-code, for
   example FDIV with lowest precision. That FDIV is parallelized with integer
   ops (C-compilers can't make that kind of optimized code). So, when FDIV
   takes 39 cycles on a Pentium with highest precision (the default one, 64
   bits), it takes 34 cycles on Cyrix 6x86. But in Quake, 24 bit precision
   is used. With Cyrix 6x86, that FDIV still takes 34 cycles, but with
   Pentium only 19 cycles. Quake is 100% FPU down to the 8 or 16 pixel sub-
   divisions. Quake is optimized so that FPU and integer units complete
   roughly at the same time, but that piece of code is timed for Pentiums!
   Quake is not a GOOD benchmark of FPU speed; it's a GOOD benchmark of
   Quake speed.
   See http://www.gamers.org/dEngine/quake/papers/mikeab-cgdc.html for more
   information.

      My Cyrix 6x86-P120+ at 100 MHz runs PovRay as fast as Pentium at 75 MHz,
   so FPU-speed of Cyrix isn't that bad. In contrast, CPU is a lot faster.
   Quake isn't the only program on the world.

     And also Excel uses FPU (long reals, 64bits) in the calculations...
   But there's so much overhead that speed of FPU doesn't matter very much.

   Some size statistics of Safbench.EXE:
    - Main code  4.8  kB (timing routines, number displaying etc.)
    - Mem speeds 9.0  kB (main and video memory speed tests)
    - Test code  7.2  kB (Simple Int., Complex Int., FPU stuffs etc.)
    - Data       6.5  kB (FPU test data (long reals etc), texts etc.)
   The executable is compressed and it contains the stub loader...



[7.] - Tables

   Here's the results I've got from different computers
   (Cyrix 6x86MX and Pentium MMX -results still missing!).

Nr Test
1: Simple integers
2: Complex integers
3: Simple arithmetic FPU
4: Mixed FPU&CPU
5: Transcendental FPU
6: Non-transcendental FPU
7: Processor control FPU
8: 16bit speed

     386/20    486/33     P5/90  CxM1/100  PPro/180   PII/266    K6/200
1:   6319.5   37555.5  183882.0  220270.9  407299.4  607075.1  409915.5
2:   4553.1   15640.6   58318.9   84934.8  162444.4  245181.8  142768.1
3:   2797.3   13836.4  108689.0   69841.8  255503.7  383141.8  207291.2
4:   2122.9   12229.5   82968.0   66460.9  182355.0  271606.9  158958.5
5:    453.1     912.2    4774.9    3395.4    9733.3   14334.9    9253.9
6:   2922.9    7123.7   26361.4   32720.5   56377.8   83696.8   72311.5
7:   4854.7   15134.1   58890.6   70579.9   87067.2  134227.8  101341.5
8:   2631.2    8595.4   45247.5   56439.4   53113.3   79437.0   86928.4

   If you've got Intel 486DX, I'd like to have the results for it.
   I changed FPU tests after I got the results from that 486...


Power Per MHz
        386      486  Pentium  Cyrix M1      PPro      PII   AMD K6
1:    316.0   1126.7   2043.1    2222.2    2262.8   2276.5*  2049.6
2:    227.7    469.2    648.0     837.3     902.5    919.4*   713.8
3:    139.9    415.1   1207.7     698.4    1419.5   1436.8*  1036.5
4:    106.1    366.9    921.9     664.6    1013.1   1018.5*   794.8
5:     22.7     27.4     53.1      34.0      54.1*    53.8     46.3
6:    146.1    213.7    292.9     327.2     313.2    313.9    361.6*
7:    242.7    454.0    654.3     705.8*    483.7    503.4    506.7
8:    131.6    257.9    502.8     563.9*    295.1    297.9    434.6
*=most powerful

   Intel Processor families, processor speed improvements
                  From...  386      486     Pentium    PPro
                  To.....  486    Pentium     PPro      PII
Simple integers           3.60x    1.81x     1.13x    1.01x
Complex integers          2.08x    1.38x     1.39x    1.02x
Simple arithmetic FPU     3.00x    2.91x     1.18x    1.01x
Mixed FPU&CPU             3.49x    2.51x     1.10x    1.01x
Transcendental FPU        1.22x    1.94x     1.02x    1.00x
Non-transcendental FPU    1.48x    1.37x     1.07x    1.00x
Processor control FPU     1.89x    1.44x     0.74x    1.04x
16bit speed               1.99x    1.95x     0.59x    1.01x
-------------------------------------------------------------
Average                   2.34x    1.91x     1.03x    1.01x

   As you can see, difference between PPro and PII is very little,
   but PII has 32 kB 1st level cache (PPro has 16 kB). And MMX.
   PII's L1 data cache is 4-way set-associative (16 kB), PPro's L1
   data cache is is 2-way set-associative (8 kB).
   PII's 2nd level cache runs at half the core speed (133 MHz for 266
   MHz CPU), PPro's 2nd level cache runs as fast as the CPU (but it
   can't be accessed as fast as 1st level cache, see below).


   Memory speed results (MB/s), speeds taken from cache boundary
   (8kB and 64kB for 486/33 etc.) If the result is in brackets, it's
   not necessarily correct and I want results for that particular CPU!

   M1/100: Cyrix 6x86-P120+ 100 MHz
   M1/133: Cyrix 6x86-P166+ 133 MHz (no L2 cache)
   M1/133t: Cyrix 6x86-P166+ 133 MHz (ASUS mobo, 512 kB PB-cache, 64 MB EDO)
   M1/150:  Cyrix 6x86-P200+ 150 MHz (ASUS mobo, 512 kB PB-cache, 64 MB EDO)
   AMD K6/200: 66.666*3
   AMD K6/208: 83.333*2.5=208,333
   AMD K6/208t: same as above, but memory timings tweaked

                          Sequential accessing
                   read                write               move
   CPU        1st   2nd  main     1st   2nd  main     1st   2nd  main
   486/33    70.8  35.9  12.9    61.4  62.4  30.7    41.5  29.5   7.9
  [P54/90   313.0 107.6  64.0   448.8  47.2  28.8   246.3  37.4  18.7]
   P54/120  718.2 178.1  89.3   389.7  78.2  73.3   396.5  61.0  34.9
   P54/166 1002.2 202.3 100.5   473.7  86.7  83.5   540.4  67.6  42.7
   M1/100   607.3 194.1  72.2   288.8 109.2  50.4   376.8  66.1  26.4
   M1/133  *638.6 -----  74.7   376.3 -----  41.2   494.4 -----  26.6
   M1/133t  815.1 256.2 111.7   384.0 150.6  64.8   500.4  96.8  37.5
   M1/150   896.4 287.7 125.4   429.8 169.1  72.8   562.0 108.7  42.2
   K6/200   731.6 250.1 125.4   725.9 127.0  69.3   748.0  84.3  42.3
   K6/208   763.0 282.2 125.3   758.1 154.0  73.8   781.2 101.1  44.2
   K6/208t  763.0 282.2 147.8   758.1 154.0  86.8   781.2 101.1  51.0
  [P6/180   675.0 395.5 183.6   600.7 341.3  73.4   746.9 234.8  43.9]

   *: 847.6 with block size of 8 kB

                          Reverse accessing
                   read                write               move
   CPU        1st   2nd  main     1st   2nd  main     1st   2nd  main
   486/33    71.0  37.3  13.1    55.4  62.5  30.7    40.6  29.5   8.1
  [P54/90   313.0 107.7  65.7   447.9  47.2  28.9   278.9  37.4  19.1]
   P54/120  716.9 178.1  89.3   382.2  78.2  74.9   406.2  61.0  34.9
   P54/166  998.2 202.3 100.5   471.9  86.7  83.5   546.9  67.6  42.7
   M1/100   624.2 123.0  53.7   285.0 109.1  50.4   284.2  54.6  23.9
   M1/133   719.9 -----  57.3   371.5 -----  41.2   382.9 -----  24.0
   M1/133t  819.8 177.2  79.3   379.2 145.2  64.8   377.9  81.3  34.8
   M1/150   897.0 199.4  89.1   424.4 163.0  72.8   424.6  91.3  39.1
   K6/200   746.3 254.0 125.4   732.0 127.0  69.3   747.7  84.3  42.3
   K6/208   778.0 281.5 125.4   763.7 154.0  73.8   781.2 101.1  44.2
   K6/208t  778.0 282.2 147.9   763.7 154.0  86.8   781.2 101.1  51.0
  [P6/180   673.6 409.7 185.7   599.6 343.0  73.8   291.0 229.4  37.2]

                          Butterfly accessing
                   read                write               move
   CPU        1st   2nd  main     1st   2nd  main     1st   2nd  main
   486/33    70.4  40.8  10.5    60.7  60.5  15.4    34.7  21.2   6.2
  [P54/90   299.0 110.8  53.3   365.3  47.0  20.3   194.1  38.1  16.6]
   P54/120  415.7 157.1  82.0   316.0  78.1  23.7   310.4  57.1  25.7
   P54/166  578.3 192.4  98.0   384.0  86.6  29.2   430.7  63.3  30.3
   M1/100   670.6 151.0  59.6   190.0  92.6  47.0   186.2  47.8  22.8
   M1/133   833.3 -----  69.1   249.1 -----  41.2   238.7 -----  19.0
   M1/133t  894.4 222.6 100.5   252.7 127.0  64.5   248.6  75.3  32.2
   M1/150  1035.5 250.1 112.9   283.2 142.7  72.5   277.5  84.5  36.2
   K6/200   734.8 233.8 124.8   726.6 126.0  68.7   577.7  66.3  34.4
   K6/208   766.2 279.6 129.6   758.7 152.8  73.2   604.6  80.3  36.7
   K6/208t  766.2 279.6 153.9   758.7 152.8  85.8   604.6  80.3  43.0
  [P6/180   671.3 469.5 114.3   503.8 246.4  55.6   350.6 107.1  29.8]


   These are for Safbench v1.32:
                         Random accessing
   Cyrix 6x86-P120+ (256 kB PB, 16 MB FPM RAM, MG i430VX mobo)
   With SADS enabled/disabled, speedup displayed for Mixed:
   Block size    Mixed                     Read             Write
   1        195.428/195.425          423.692/423.692   263.923/263.923
   2        197.336/197.339          425.273/425.273   261.743/261.743
   4        199.942/199.923          425.411/425.411   261.158/261.162
   8        188.310/188.359          343.730/343.541   260.116/260.135
   16       135.166/137.681  1.86%   191.103/191.387   146.147/147.752
   32        78.132/ 84.335  7.94%   146.816/146.911   102.537/104.844
   64        61.998/ 68.398 10.32%   131.345/131.507    89.535/ 91.973
   128       55.776/ 62.020 11.19%   124.991/125.132    84.321/ 86.866
   256       52.862/ 58.861 11.35%   120.015/120.203    81.484/ 84.133
   512       34.213/ 36.142  5.64%    84.212/ 84.256    58.376/ 59.909
   1024      27.671/ 28.684  3.66%    68.967/ 69.064    49.333/ 50.458
   2048      25.149/ 25.946  3.17%    62.399/ 62.573    45.292/ 46.308
   4096      22.576/ 23.281  3.12%    58.526/ 58.778    42.170/ 43.392
   Avg.:     98.043/100.484  2.49%   200.499/200.510   134.318/135.584


   AMD K6/83.333*2.5 MHz, ASUS P/I P55T2P4, 512 kB PB-cache, 64 MB EDO
   Block size    Mixed         Read        Write
   1           373.015      629.963      773.795
   2           377.418      630.913      773.285
   4           379.344      630.875      775.466
   8           377.372      631.245      775.632
   16          372.540      623.789      759.802
   32          237.125      327.785      278.042
   64          112.521      233.931      174.797
   128          93.360      211.812      156.690
   256          86.757      203.742      150.902
   512          83.660      198.390      147.052
   1024         57.710      130.714       99.605
   2048         49.193      111.404       85.822
   4096         43.246      103.595       80.371
   Avg.:       203.328      359.089      387.020


   Cyrix 6x86-P166+ (133 MHz), ASUS P/I P55T2P4, 512 kB PB-cache, 64 MB EDO
   Block size    Mixed         Read        Write
   1           259.858      563.387      350.938
   2           262.393      565.466      348.041
   4           265.744      565.676      347.261
   8           250.740      457.401      345.923
   16          186.419      266.773      206.649
   32          115.155      206.706      147.252
   64           92.936      185.025      128.697
   128          84.131      176.046      121.262
   256          80.032      171.752      117.875
   512          78.066      168.261      115.423
   1024         47.724      115.500       80.337
   2048         38.604       98.192       67.480
   4096         33.115       89.544       61.107
   Avg.:       138.071      279.210      187.557


   Cyrix 6x86-P200+ (150 MHz), ASUS P/I P55T2P4 mobo, 512 kB PB-cache, 64 MB EDO
   Block size    Mixed         Read        Write
   1           291.884      632.785      394.173
   2           294.726      635.164      390.932
   4           298.562      635.384      390.042
   8           281.628      512.774      388.527
   16          208.419      293.247      228.372
   32          128.588      223.527      163.267
   64          103.892      202.029      143.035
   128          94.048      194.296      134.942
   256          89.664      190.943      131.739
   512          87.718      188.968      129.843
   1024         53.730      129.747       90.740
   2048         43.418      110.273       76.088
   4096         37.219      100.591       67.861
   Avg.:       154.884      311.518      209.966


   These are for Safbench v2.13:
                           Random accessing
   Cyrix 6x86-P120+ (256 kB PB, 16 MB FPM RAM, MG i430VX mobo)
   With SADS enabled/disabled, speedup displayed for Mixed and Write:
   Block size    Mixed                    Read              Write
   1        171.830/171.835         423.692/423.680    264.402/264.402
   2        164.746/165.080  0.20%  426.327/426.327    263.845/263.841
   4        153.398/155.446  1.34%  426.399/426.217    263.618/263.618
   8        134.603/137.894  2.44%  401.759/401.753    262.177/262.221
   16        97.326/102.613  5.43%  265.927/266.133    179.422/182.025 1.45%
   32        71.016/ 77.150  8.64%  165.790/166.132    112.048/114.557 2.24%
   64        59.432/ 65.381 10.01%  138.675/138.837     93.133/ 95.572 2.62%
   128       54.611/ 60.417 10.63%  128.084/128.294     85.975/ 88.511 2.95%
   256       51.437/ 56.844 10.51%  120.194/120.320     81.453/ 84.080 3.23%
   512       33.234/ 34.956  5.18%   84.216/ 84.261     58.384/ 59.922 2.63%
   1024      27.318/ 28.220  3.30%   69.113/ 69.212     48.956/ 50.145 2.43%
   2048      24.884/ 25.565  2.74%   62.672/ 62.826     44.975/ 46.046 2.38%
   4096      23.626/ 24.228  2.55%   58.548/ 58.797     42.190/ 43.413 2.90%
   Avg.:     82.112/ 85.048  3.58%  213.184/213.291    138.506/139.873 0.99%

   BTW: what the hell means "Slow ADS"?


   In Mixed test, the whole cache line is often fetched from cache/main
   mem and then processed in CPU's cache.
   With Random reads/writes, 32 bytes are read/written from/to a random
   place and then proceeded to the next random position. When cache read
   miss occurrs, CPU fetches the whole cache line (32B) from main memory
   on Pentium+ CPUs, but when cache write miss occurrs, Pentium doesn't
   allocate new cache line for it (get it from memory), but PPro does.
   That's why 32 byte reads AND writes are used.
   The randomization routine gives the same values for every run with
   every CPU, so memory is accessed at the same places every time you
   run Random access test.

   Main memory speeds are taken from 4096 kB block size.

   Does anyone know this: why butterfly read from main mem on PPros is
   so much slower than sequential/reverse read? Why reverse reading from
   main memory and L2 cache is much slower than seq. read on M1, but not
   with other CPUs?



   Speed of my Diamond Stealth 64 DRAM PCI, 1 MB 2-cycle EDO, Trio64 (764).
   Main memory is 2*8 MB 60 ns FPM RAM, 256 kB PB-cache, MG i430VX mobo.
   Fastest possible settings for RAM in BIOS. Cyrix 6x86-P120+ (100 MHz).
   The shitty monitor is ADI MicroScan 2E, about 15 inches.

---Clear boot---
S3VBE20, no MCLK run (S3's DRAM clock generator running at 59.96 MHz):
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 163     320     200    8    320   35.40   580.02   33.23   544.50   69.94
 164     320     240    8    320   31.79   434.08   30.71   419.25   67.27
 165     320     400    8    320   35.36   289.69   33.23   272.23   69.94
 166     320     480    8    320   31.72   216.57   30.67   209.34   67.27
 14F     400     300    8    400   37.16   324.75   33.23   290.37   73.40
 12D     512     384    8    512   36.75   195.99   33.23   177.22   81.60
 100     640     400    8    640   36.76   150.55   33.20   136.00   69.87
 101     640     480    8    640   36.74   125.39   30.84   105.26   60.16
 103     800     600    8    800   33.96    74.18   26.82    58.60   56.15
 105    1024     768    8   1024   25.25    33.66   22.84    30.45   87.10
 110     640     480   15   1280   30.18    51.50   26.32    44.92   60.04
 111     640     480   16   1280   30.17    51.48   26.32    44.92   60.04
 113     800     600   15   1600   22.71    24.80   20.04    21.89   56.15
 114     800     600   16   1600   22.72    24.82   20.04    21.89   56.15
 211     640     400   32   2560   15.98    16.36   12.24    12.53   70.09


---MCLK used---
S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz):
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 163     320     200    8    320   37.30   611.12   33.23   544.50   69.94
 164     320     240    8    320   37.30   509.26   33.23   453.75   67.27
 165     320     400    8    320   37.30   305.56   33.23   272.23   69.94
 166     320     480    8    320   37.30   254.63   33.23   226.86   67.27
 14F     400     300    8    400   37.36   326.45   33.23   290.37   73.40
 12D     512     384    8    512   37.30   198.93   33.23   177.22   81.60
 100     640     400    8    640   37.30   152.78   33.20   136.00   69.87
 101     640     480    8    640   37.30   127.32   30.84   105.26   60.16
 103     800     600    8    800   37.30    81.48   26.84    58.63   56.14
 105    1024     768    8   1024   37.30    49.73   26.32    35.09   87.09
 110     640     480   15   1280   37.30    63.66   26.32    44.92   60.04
 111     640     480   16   1280   37.30    63.66   26.32    44.92   60.04
 113     800     600   15   1600   34.98    38.21   26.31    28.74   56.15
 114     800     600   16   1600   34.96    38.18   26.31    28.74   56.15
 211     640     400   32   2560   21.28    21.79   17.48    17.90   70.09


---6X86OPT used---
6X86OPT -LINBUF, S3VBE20, no MCLK run (59.96 MHz):
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 163     320     200    8    320   43.52   713.03   40.11   657.24   69.94
 164     320     240    8    320   38.54   526.21   35.89   489.96   67.27
 165     320     400    8    320   43.27   354.43   39.93   327.08   69.94
 166     320     480    8    320   38.36   261.88   35.67   243.49   67.27
 14F     400     300    8    400   43.85   383.14   41.62   363.67   73.40
 12D     512     384    8    512   46.78   249.51   42.21   225.14   81.60
 100     640     400    8    640   44.76   183.34   41.29   169.13   69.87
 101     640     480    8    640   43.82   149.56   39.48   134.76   60.16
 103     800     600    8    800   39.88    87.12   34.24    74.81   56.15
 105    1024     768    8   1024   30.42    40.56   25.23    33.64   87.07
 110     640     480   15   1280   37.35    63.75   30.09    51.35   60.04
 111     640     480   16   1280   37.37    63.78   30.10    51.36   60.04
 113     800     600   15   1600   28.52    31.15   22.60    24.69   56.14
 114     800     600   16   1600   28.57    31.20   22.60    24.69   56.15
 211     640     400   32   2560   26.70    27.34   16.05    16.44   70.09


---6X86OPT and MCLK used---
6X86OPT -LINBUF, S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz):
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 163     320     200    8    320   69.15  1132.97   52.70   863.46   69.94
 164     320     240    8    320   65.63   896.07   52.70   719.59   67.27
 165     320     400    8    320   69.03   565.46   52.71   431.77   69.94
 166     320     480    8    320   65.49   447.09   52.71   359.84   67.27
 14F     400     300    8    400   63.29   553.01   52.70   460.53   73.40
 12D     512     384    8    512   63.61   339.26   52.70   281.08   81.60
 100     640     400    8    640   62.18   254.68   52.64   215.63   69.87
 101     640     480    8    640   61.81   210.97   46.94   160.23   60.16
 103     800     600    8    800   58.09   126.89   38.27    83.61   56.15
 105    1024     768    8   1024   64.62    86.16   37.12    49.50   87.07
 110     640     480   15   1280   53.06    90.55   37.22    63.52   60.04
 111     640     480   16   1280   53.06    90.56   37.22    63.52   60.04
 113     800     600   15   1600   50.83    55.52   34.79    38.00   56.15
 114     800     600   16   1600   51.25    55.97   34.77    37.98   56.15
 211     640     400   32   2560   33.79    34.60   21.25    21.76   70.09


---6X86OPT and MCLK used, Cyrix 6x86 overclocked to 120 MHz---
----(Main memory, 1st and 2nd level cache now 20% faster)-----
6X86OPT -LINBUF, S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz):
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 163     320     200    8    320   74.29  1217.20   63.20  1035.39   69.94
 164     320     240    8    320   70.15   957.76   63.18   862.66   67.27
 165     320     400    8    320   74.07   606.77   63.19   517.68   69.94
 166     320     480    8    320   69.91   477.25   63.19   431.40   67.27
 14F     400     300    8    400   67.56   590.37   60.54   528.96   73.40
 12D     512     384    8    512   68.97   367.87   60.39   322.09   81.60
 100     640     400    8    640   67.05   274.63   59.99   245.71   69.87
 101     640     480    8    640   66.28   226.24   54.47   185.92   60.15
 103     800     600    8    800   61.33   133.97   45.52    99.44   56.14
 105    1024     768    8   1024   65.31    87.08   43.23    57.65   87.06
 110     640     480   15   1280   57.31    97.81   44.66    76.22   60.04
 111     640     480   16   1280   57.34    97.86   44.66    76.23   60.04
 113     800     600   15   1600   51.40    56.15   37.13    40.56   56.14
 114     800     600   16   1600   51.40    56.14   37.11    40.53   56.14
 211     640     400   32   2560   39.15    40.09   24.06    24.64   70.09


Pentium 166 MHz (66.666666*2.5) with Matrox Millenium 4MB
With option 'F', main mem sequential read speed 100.5 MB/s
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640   77.18   316.11   66.99*  274.39   70.17
 101     640     480    8    640   77.30   263.84   63.32*  216.12   60.02
 11D    1600    1200   16   3200   73.13    19.97   55.81    15.24   60.26
 11E    1600    1200   16   3200   73.13    19.97   55.81    15.24   60.26
 118    1024     768   32   4096   73.92    24.64   55.81    18.60   60.10
 119    1280    1024   16   2560   74.55    29.82   55.72    22.29   60.13
 11A    1280    1024   16   2560   74.55    29.82   55.71    22.28   60.14
 11C    1600    1200    8   1664   74.61    39.18   55.65    29.22   60.26
 107    1280    1024    8   1280   75.52    60.41   55.77    44.61   60.14
 116    1024     768   16   2048   75.52    50.35   55.75    37.17   60.10
 117    1024     768   16   2048   75.52    50.35   55.75    37.17   60.10
 115     800     600   32   3200   75.76    41.38   55.83    30.49   60.47
 114     800     600   16   1920   76.23    69.39   55.90    50.88   60.47
 113     800     600   16   1920   76.24    69.39   55.90    50.88   60.47
 112     640     480   32   2560   76.36    65.16   55.94    47.73   60.01
 105    1024     768    8   1024   76.83   102.44   55.96    74.62   60.10
 103     800     600    8   1024   76.89   131.23   55.96    95.51   60.47
 110     640     480   16   1280   76.99   131.40   55.96    95.51   60.01
 111     640     480   16   1280   76.99   131.39   55.96    95.51   60.02
*: other modes don't fit into the 256 kB L2 cache


AMD K6 208.333 MHz (83.333*2.5 MHz), 512 kB PB-cache, 64Mb EDO,
Asus P/I-P55T2P4 (HX chipset), Matrox Millenium 4Mb WRAM (41.666 MHz PCI-bus)
Main mem sequential read speed 125 MB/s. With option 'F'.
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640  116.78   478.35   84.60   346.51   70.18
 101     640     480    8    640  116.75   398.52   84.48   288.37   60.03
 103     800     600    8   1024  116.22   198.34   76.58   130.70   60.48
 105    1024     768    8   1024  115.84   154.45   68.73    91.64   60.11
 107    1280    1024    8   1280  113.58    90.86   62.94    50.35   60.15
 110     640     480   16   1280  115.99   197.95   76.56   130.66   60.03
 111     640     480   16   1280  115.99   197.95   76.56   130.66   60.03
 112     640     480   32   2560  115.27    98.37   62.93    53.70   60.03
 113     800     600   16   1920  114.73   104.43   62.94    57.29   60.48
 114     800     600   16   1920  114.74   104.44   62.93    57.28   60.48
 115     800     600   32   3200  113.54    62.01   62.93    34.37   60.48
 116    1024     768   16   2048  114.01    76.01   62.93    41.96   60.11
 117    1024     768   16   2048  114.02    76.01   62.93    41.96   60.11
 11C    1600    1200    8   1664  111.50    58.55   62.93    33.05   60.27
 118    1024     768   32   4096  110.99    37.00   62.93    20.98   60.11
 119    1280    1024   16   2560  111.59    44.64   62.92    25.17   60.15
 11A    1280    1024   16   2560  111.60    44.64   62.92    25.17   60.15
 11D    1600    1200   16   3200  108.44    29.61   62.93    17.18   60.27
 11E    1600    1200   16   3200  108.41    29.60   62.93    17.18   60.27


The same as above, but AMD K6 200 MHz (66.666*3 MHz, 33.333 MHz PCI-bus)
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640   93.78   384.13   71.92   294.57   70.18
 101     640     480    8    640   93.75   320.00   71.84   245.20   60.03
 103     800     600    8   1024   93.32   159.27   67.51   115.21   60.48
 105    1024     768    8   1024   93.06   124.08   62.99    83.99   60.11
 107    1280    1024    8   1280   91.96    73.56   59.45    47.56   60.15
 110     640     480   16   1280   93.44   159.47   67.53   115.26   60.03
 111     640     480   16   1280   93.44   159.47   67.53   115.26   60.03
 112     640     480   32   2560   92.88    79.26   59.45    50.73   60.03
 113     800     600   16   1920   92.73    84.41   59.45    54.12   60.48
 114     800     600   16   1920   92.73    84.41   59.45    54.12   60.48
 115     800     600   32   3200   91.30    49.86   59.46    32.47   60.48
 116    1024     768   16   2048   92.02    61.34   59.46    39.64   60.11
 117    1024     768   16   2048   92.01    61.34   59.45    39.64   60.11
 11C    1600    1200    8   1664   90.73    47.64   59.45    31.22   60.27
 118    1024     768   32   4096   90.90    30.30   59.46    19.82   60.11
 119    1280    1024   16   2560   90.79    36.32   59.46    23.78   60.15
 11A    1280    1024   16   2560   90.82    36.33   59.46    23.78   60.15
 11D    1600    1200   16   3200   89.27    24.38   59.46    16.24   60.27
 11E    1600    1200   16   3200   89.25    24.37   59.46    16.24   60.27


Same as above, but memory timings tweaked and 83.333*2.5 MHz (208.333 MHz)
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640  116.78   478.35   84.63   346.63   70.18
 101     640     480    8    640  116.76   398.54   84.55   288.59   60.03
 103     800     600    8   1024  116.22   198.35   79.20   135.17   60.48
 105    1024     768    8   1024  115.85   154.46   73.68    98.25   60.11
 107    1280    1024    8   1280  113.60    90.88   69.35    55.48   60.15
 110     640     480   16   1280  115.97   197.93   79.19   135.14   60.03
 111     640     480   16   1280  115.99   197.96   79.19   135.14   60.03
 112     640     480   32   2560  115.27    98.36   69.38    59.21   60.03
 113     800     600   16   1920  114.74   104.44   69.37    63.14   60.48
 114     800     600   16   1920  114.73   104.43   69.37    63.14   60.48
 115     800     600   32   3200  113.53    62.00   69.31    37.85   60.48
 116    1024     768   16   2048  114.01    76.00   69.33    46.22   60.11
 117    1024     768   16   2048  114.01    76.01   69.33    46.22   60.11
 11C    1600    1200    8   1664  111.52    58.56   69.24    36.36   60.27
 118    1024     768   32   4096  111.00    37.00   69.32    23.11   60.11
 119    1280    1024   16   2560  111.58    44.63   69.14    27.65   60.15
 11A    1280    1024   16   2560  111.59    44.63   69.14    27.66   60.15
 11D    1600    1200   16   3200  108.42    29.61   69.35    18.94   60.27
 11E    1600    1200   16   3200  108.44    29.61   69.35    18.94   60.27


Same as above, but with Cyrix 6x86-P200+
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640   98.46   403.31   92.98   380.85   70.18
 101     640     480    8    640   98.50   336.21   92.95   317.28   60.03
 103     800     600    8   1024   98.20   167.60   85.14   145.30   60.48
 105    1024     768    8   1024   97.98   130.64   77.18   102.91   60.11
 107    1280    1024    8   1280   96.85    77.48   71.05    56.84   60.15
 110     640     480   16   1280   98.16   167.53   85.14   145.31   60.03
 111     640     480   16   1280   98.16   167.52   85.15   145.32   60.03
 112     640     480   32   2560   97.56    83.25   71.17    60.73   60.03
 113     800     600   16   1920   97.49    88.74   71.14    64.75   60.48
 114     800     600   16   1920   97.49    88.74   71.14    64.75   60.48
 115     800     600   32   3200   96.59    52.75   70.98    38.76   60.48
 116    1024     768   16   2048   96.94    64.63   71.02    47.35   60.11
 117    1024     768   16   2048   96.94    64.63   71.02    47.35   60.11
 11C    1600    1200    8   1664   95.69    50.25   70.81    37.18   60.27
 118    1024     768   32   4096   95.65    31.88   70.97    23.66   60.11
 119    1280    1024   16   2560   95.62    38.25   70.65    28.26   60.15
 11A    1280    1024   16   2560   95.63    38.25   70.66    28.26   60.15
 11D    1600    1200   16   3200   94.44    25.79   71.00    19.39   60.27
 11E    1600    1200   16   3200   94.43    25.79   71.00    19.39   60.27


Same as above, but with Cyrix 6x86-P166+
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 100     640     400    8    640   87.95   360.26   83.03   340.11   70.18
 101     640     480    8    640   87.97   300.27   83.05   283.46   60.03
 103     800     600    8   1024   88.05   150.27   76.13   129.94   60.48
 105    1024     768    8   1024   87.47   116.63   68.74    91.65   60.11
 107    1280    1024    8   1280   86.90    69.52   63.38    50.70   60.15
 110     640     480   16   1280   87.90   150.01   76.03   129.75   60.03
 111     640     480   16   1280   87.90   150.02   76.03   129.75   60.03
 112     640     480   32   2560   87.38    74.57   63.41    54.11   60.03
 113     800     600   16   1920   87.45    79.60   63.40    57.71   60.48
 114     800     600   16   1920   87.45    79.60   63.40    57.70   60.48
 115     800     600   32   3200   86.96    47.49   63.32    34.58   60.48
 116    1024     768   16   2048   86.79    57.86   63.37    42.25   60.11
 117    1024     768   16   2048   86.79    57.86   63.37    42.25   60.11
 11C    1600    1200    8   1664   86.19    45.26   63.26    33.22   60.27
 118    1024     768   32   4096   85.77    28.59   63.35    21.12   60.11
 119    1280    1024   16   2560   86.17    34.47   63.29    25.32   60.15
 11A    1280    1024   16   2560   86.17    34.47   63.29    25.32   60.15
 11D    1600    1200   16   3200   84.34    23.03   63.36    17.30   60.27
 11E    1600    1200   16   3200   84.33    23.03   63.36    17.30   60.27


Cyrix 6x86-P166+, no L2 cache, 32MB FPM ram, S3 Trio64V+ 2MB EDO (33.333 PCI)
Mode   Width  Height  BPP   BPSL   Write      FPS    Copy      FPS      Hz
 170     320     200    8    320   88.94  1457.21   50.39   825.67   69.94
 171     320     240    8    320   86.58  1182.11   50.36   687.58   67.27
 172     320     400    8    320   88.89   728.21   50.37   412.61   69.94
 173     320     480    8    320   86.52   590.66   50.36   343.81   67.27
 174     400     300    8    400   89.93   785.78   50.39   440.27   73.40
 175     512     384    8    512   91.06   485.66   50.35   268.56   81.60
 100     640     400    8    640   90.69   371.45   50.35   206.22   69.79
 101     640     480    8    640   88.15   300.89   50.35   171.85   72.84
 103     800     600    8    800   81.02   176.99   50.33   109.94   72.27
 105    1024     768    8   1024   70.90    94.53   50.34    67.12   69.92
 107    1280    1024    8   1280   59.73    47.79   50.22    40.17   60.20
 10D     320     200   15    640   91.21   747.17   50.37   412.63   69.78
 10E     320     200   16    640   91.21   747.21   50.37   412.63   69.78
 10F     320     200   32   1280   82.01   335.93   50.35   206.23   69.78
 110     640     480   15   1280   77.12   131.61   50.34    85.92   72.84
 111     640     480   16   1280   77.14   131.65   50.34    85.92   72.84
 112     640     480   32   2560   55.81    47.63   44.66    38.11   72.84
 113     800     600   15   1600   65.13    71.14   50.32    54.97   71.99
 114     800     600   16   1600   65.14    71.15   50.32    54.96   71.99
 115     800     600   32   3200   36.66    20.02   27.11    14.81   72.27
 116    1024     768   15   2048   45.08    30.06   36.83    24.55   70.04
 117    1024     768   16   2048   45.08    30.06   36.83    24.55   70.04
 120    1600    1200    8   1600   54.21    29.61   42.57    23.25   96.02



Windows tests run under Win95 with the newest S3 video driver @ 800x600x16bit.
(Cyrix 6x86-P120, Trio64, 16 MB FPM...)
Test 1: 59.96 MHz, 2-cycle EDO timings
Test 2: 85.01 MHz, FPM timings

                          Test 1     Test 2   Speedup
PC Labs Winbench v3.1   14644561   24906850     70.1%
Wintach v1.0: Text          53.4       79.8     49.4%
              Cad          236.4      275.9     16.7%
              Spreadsheet   57.5       89.2     55.1%
              Paint        105.2      143.3     36.2%



[8.] - VESA info

   Mode      Hexadecimal VESA mode ('!' displayed after this if mode is not
             supported)
   Size      Width and Height of the video mode in pixels (graphics)
             or chars (text)
   Clrs      Number of different colours in the video mode
   BPP       Bits Per Pixel
   BPSL      Bytes Per Scan Line
   Size/B    Size of the video mode in bytes
   GCLB      G=graphics mode; Y=yes, -=no (text mode)
             C=color; Y=color, -=monochrome
             L=Linear Framebuffer supported for this mode, Y=yes, -=no
             B=Bank-switched mode supported for this mode, Y=yes, -=no
   Memtype   "    text" = text
             " CGA gfx" = CGA graphics
             " HGC gfx" = HGC graphics
             " 16c EGA" = 16-color (EGA) graphics
             "  packed" = packed pixel graphics
             "sequ 256" = sequ 256 (non-chain 4) graphics
             "  direct" = direct color (HiColor, 24-bit color)
             " YUV/YIQ" = YUV (luminance-chrominance, also called YIQ)
             "reserved" = reserved for VESA
             " OEM mem" = OEM memory models
   BO        Bios Output supported, Y=yes, -=no
D: Red       Red mask size/red field position
D: Grn       Green mask size/green field size
D: Blu       Blue mask size/green field size
D: Rsv       Reserved mask size/reserved mask position
D: P         Color ramp is programmable, Y=yes, -=no
D: A         Bytes in reserved field may be used by application, Y=yes, -=no
T: Char cell Char cell size in pixels

   D: only in Direct modes
   T: onlt in Text modes

   If reported VESA version is less than 2.0, '?' is displayed at the
   place of L and B, since they are officially available with VBE 2.0+.

   Yes, the output might seem 'messy', but at least it fits in 80 chars...



[9.] - Errorlevels

   0   No errors

   1   No FPU present

   2   Help screen displayed

   3   Not enough memory

   4   No 387+ FPU present (287 present)

   5   No VBE present

   6   Bad command line options

   7   No LFB modes found to test (try UniVbe or S3VBE20)

   8   Failed to map linear framebuffer

   9   Not enough memory to test video copy speeds

   10  Any Key Pressed

   12  Specified mode isn't graphical LFB mode (or it was invalid)

   13  VESA info displayed

   14  Could not set video mode

   15  CPL not 0 (needed for options 'B' and 'P'), try without multitasking

   16  CPUID and/or RDTSC not supported (for 'B' and 'P')

   17  Machine Specific Registers not supported ('B' and 'P' again)

   18  Under 8224 kB free memory, random access test skipped



[10.] - Final words

   If you encounter bugs (bugs? where?), don't hesitate to tell me.
   Also improvements accepted. See line #10 for more info.
   Thanks to Ralf Brown for Interrupt List. It's available also from
   my home page, even though the links don't exist in index.html.
   So long.