Safbench v2.13 Compiled 6 July 1997 32bit protected mode computer, main and video memory speed test program for 80386+ CPUs and 387+ FPUs (C) Copyright 1997 by Sami Farin All Rights Reserved mailto:sfarin@ratol.fi http://www.ratol.fi/~sfarin/ Index 1. What's this? 2. Requirements 3. Usage 4. About Safbench 5. Command line arguments 6. Technical (?) stuff 7. Tables 8. VESA info abbrevs 9. Errorlevels 10. Final words [1.] - What's this? Safbench v2.13 is a computer speed test program. It runs in 32bit protected mode (using DOS32 v3.3 by Adam Seychell). Safbench is programmed completely in assembly, because FOR EXAMPLE mixing integers with FPU code like it's done in Safbench isn't possible in other languages (Simple Arithmetic FPU and Mixed FPU&CPU contain the same FPU instructions, but Mixed one has some integer between FPU operations). CPU, FPU, main memory and video memory speeds can be tested. Safbench hooks the timer interrupt to achieve stabile results with every BIOS and (D)OS configuration, TSRs don't get called in Safbench. Timing accuracy is about one microsecond (assuming you don't have a buggy or shitty 8253/8254 timer chip). [2.] - Requirements Safbench needs 80386 or a better CPU with 387 or a better FPU. 287 doesn't have the needed instructions needed for some tests. About 4MB of free mem is needed, if you use option 'M' to test memory speed. If you use option 'V' or 'F', Safbench needs as much memory as the largest video mode (for example, 1600x1200x16M needs 7500 kB) consumes video memory. VBE v2.0+ is needed for 'V' and 'F'. [3.] - Usage This is a VERY user-friendly program. You only have to type 'safbench' and press enter (assuming you are using a command interpreter). See chapter [5.] for more info. [4.] - About Safbench Safbench tests your CPU and FPU very thoroughly. It uses almost every instruction you've got and them are balanced so that most used instuctions used in the applications are also most-used in Safbench. But every application uses CPU and memory differently, so there can't be a single computer speed test program which tells the absolute and correct speed of your computer. Safbench tries to do it by testing CPU and FPU with different kind of code. It also tests memory with different block sizes. It reads, writes and moves in sequential, reverse and butterfly order using block sizes 1 kB - 4 MB. So you can test how fast your new SDRAM (or something) is. About the memory move routine: when it says 4 MB, it means that it uses about 4 MB of memory. It copies 2 MB from the start of the buffer to the middle of the buffer. So it reads 2 MB and writes 2 MB. Butterfly accessing (when reading and writing) means, that first it reads 32bits from start of the memory block and then from end of the block, then from start+4bytes and end-4bytes, start+8bytes and end-8bytes and so on, try to understand it... If you see butterfly read speeds which are faster than when using sequential reads, it's because your CPU fills the whole cache line when the requested data item is not present in the CPU's internal cache (the line is 32B on Pentiums). Butterfly routine: 1) read from start of buffer 2) write to end of buffer 3) write to start of buffer+4bytes 4) read from end of buffer-4bytes 5) increase start buffer by 8 and decrease end buffer by 8 6) goto 1 Random accessing first makes a table, which contains randomly generated pointers to different places to memory. 'Mixed' test does complex memory accessing; it reads, writes and performs "read/modify" and "read/modify/write" instuctions (for example additions, shifts, and bit operations). For example, when testing 'Mixed' 4MB, Safbench reads one 32bit pointer (random) from memory and then reads/writes/etc to that position (and around it) and then gets the next pointer... The pointers are STORED sequentially but they point to random places in memory. 4MB test actually uses 8MB. MB/s value for "Random accessing" is calculated based on the ACTUAL numbers of BYTES accessed, including reading the pointers. MB/s value for Reads and Writes is based on bytes read or written, pointers not included. For Reads and Writes, the whole cache line (32 bytes) is accessed, so write speeds to memory are fair on CPUs which allocate new cache line on write miss (such as M1 and PPro/PII). If you want to see how fast main memory you've got (for example, after upgrading from FPM to SDRAM), disable L2 cache and run 'Qsort32 16M' and 'Safbench R'. If you've been using CacheChk or something to test memory read speeds, it gives lower results (at least for 1st level cache), because it uses string instructions to read data (REP LODSD). Safbench uses a faster method. Your CPU might have 1st level data cache access speed less than 1 cycle per read or write, so it can do (as much as) two reads in parallel. Actually, Pentium+ class machines are capable of doing so. REP LODSD is stupid way to check cache speed, since NO PROGRAM uses REP LODSD to read from memory (if it does, it's a braindead program anyway). LODSD takes at least one cycle with every CPU and it isn't pairable, so you CAN'T read two double words in one cycle. Pentium Pro has 256 or 512 kB 2nd level INTERNAL cache, which is damn fast. A lot more faster than any PB-cache. It's because it runs at core speed. Pentium II's 2nd level cache is SRAM, it runs at half the core speed (266 MHz CPU has L2 cache running at 133 MHz...). "Simple integers" tests simple CPU instructions, such as additions, logical operations, jumps, memory and bit manipulation. Most of them execute in one cycle and in parallel on a Pentium-class machine. "Complex integers" tests multiplications, divisions, bit scans, string instructions, complex memory addressing and manipulation etc. Try running 6X86OPT if you get poor values on a Cyrix 6x86 in this test, it enables various options, such as N_LOCK. "Simple arithmetic FPU" tests additions, subtractions, multiplying, dividing, comparing, sign changes, loads and stores with different memory sizes and values. This kind of instructions are the most-used in most of the applications and games (such as Quake :) ). "Mixed FPU&CPU" tests the same instructions as the previous test, but does some stuffs with the integer unit while the FPU is processing data. 386 can't do nothing but wait for the 287/387 to complete it's task, 486+ can use the CPU while the FPU is processing data. "Transcendental FPU" tests transcendental instructions. Them include sine, cosine, tangent, arctangent, sine and cosine, 2^x-1, Y*log2(X) and Y*log2(X+1). 24 different values are tested for every instruction. These are the most time-consuming instructions you've got on your FPU. "Non-transcendental FPU" tests the other arithmetic instuctions not covered in "Simple arithmetic FPU", "Transcendental FPU" and "Processor control CPU" tests. For example, square root, scale, extract, remainder and integer part. 16 different valid values are tested with this test. "Processor control FPU" contains some system-managing stuff, such as FPU initializing, FPU state store and restore, environment saving and restoring, control word store/load and status word store and exception clearing and constant loading (0,1,pi,different log-values), BCD loads and stores, examine. Speed of these instructions isn't a big deal when comparing FPU speeds. These ARE used, not very much, though. For example FPU state store/restore is used in multitasking environments to store FPU state when switching tasks. "16bit speed" encodes binary data to MIME64 and does a lot of stuffs with mixed 8, 16 and 32 bit registers. This test shows how well your CPU can execute 16bit code with prefixes and such things. PPro has serious trouble with this test, as does PII. "Partial Register Stalls" get generated on them quite a lot. "Average speed" is average of the eight tests compared to a selected microprocessor family (386, 486, Pentium, Cyrix 6x86, PPro, PII, K6). 64bit precision is used in calculating (but it affects only to instructions FADD, FSUB, FMUL, FDIV and FSQRT). On Pentiums, only FDIV executes quicker when lower precision is used. With Cyrix 6x86, every FPU instruction takes the same amount of clock cycles, no matter what the precision is. Video memory speed test tests your video card memory's write speed and moves from main memory (or cache) to video memory. Only graphics linear framebuffer modes are tested. If your card doesn't support them, that's your problem. Without LFB, bank switching must be performed after every 64 kB. Using S3VBE20 or Univbe is a wise idea to increase performance, since bank switching is not very fast. Write speed test tests the absolute maximum write speed of your video card in every possible mode. The screen might look complex and colorful, but written data is changed only after the whole screen is filled. If you want to see, for example, what video card is "the best for "DOS games", test the cards and buy the one which has the best values for "Write" at the resolution and color depth you are using the game. Safbench doesn't test 3D speed nor GUI-acceleration speeds. For example (again...), Quake calculates the screen in main memory(+caches) and then moves it to video memory. It is where the write speed of the video card is needed. If you don't like the CLICKs when swithing video modes, turn off your monitor... [5.] - Command line arguments Here's the possible options... ?: Help help help help help... M: Test memory speed (needs 4112 kB free mem, but if 8224 kB is available, also Random access speed is tested) See Chapter [4.] for more info (line 68). R: Test only Random access speed (needs 8224 kB) !: Use 31 different block sizes for Random accessing test when using 'M' or 'R' B: Test BUS speed (MHz), multiplier, MHz-rate and main mem read/write speed Use this on Pentiums (with or without MMX). Sorry, but Bus-value isn't correct, even though it counts the number of cycles during which the processor's external memory bus is in use (according to Intel). Could you please tell me how to make it calculate the REAL bus speed! 4MB block size is used for reading. Also number of bus cycles and CPU cycles to read/write one quadword (64 bits) is displayed (RAM bank is 64 bits wide). -bus cycles/quadword=(bus speed (MHz) / (MB/s*1,048576/8)) -cycles/quadword=(CPU speed (MHz) / (MB/s*1,048576/8)) Use 'B' or 'P' with option 'C' to test cache accessing (256 kB block size). I hope nobody has 128 kB (or less) L2 cache on a Pentium! P: Same as 'B', but for Pentium Pros and Pentium II's. V: Test video memory speed (needs VBE v2.0+ with at least one LFB mode) F: Same as 'V', but use 64bit accessing with FPU (intended for Pentiums) F:xxxx or V:xxxx tests VESA mode (in hex), for example 'F:105' I: Display info about every VESA mode S: Show every graphics linear framebuffer mode which can be tested W: Wait one second after mode set (default is no wait) 0: Compare your CPU and FPU speeds to Intel 386SX 20 MHz with Cyrix FasMath 1: Intel 486DX 33 MHz 2: Intel Pentium 90 MHz 3: Cyrix 6x86-P120+ 100 MHz 4: Intel Pentium Pro 180 MHz 5: Intel Pentium II 266 MHz 6: AMD K6 200 MHz If you've got Pentium MMX, PPro, PII or Cyrix 6x86MX, I'd be pleased to receive the test results! Run batch file 'MAKERES.BAT'and send it to me so I can include them in the next release of Safbench. With no stupid windows in the background. ('MAKERES1.BAT' is for Pentiums, 'MAKERES2.BAT' is for PPro and PII CPUs, 'MAKERES.BAT' is for 6x86, 6x86MX, K5, K6 etc.) Run the "usual" speedup programs when getting the results: 6X86OPT (ALSO with option '-l'), S3VBE20, MCLK, FASTVID and so on. [6.] - Technical (?) stuff All the eight tests in Safbench use <4 kB of memory each, so speed of 2nd level cache nor main memory affects the result. To see how fast memory subsystem you've got, try command line option 'M' or run Qsort32. Video copy test in Safbench moves data from main memory or cache to video memory. Depending on how much the video mode uses memory, READ speed of your 1st level cache, possible 2nd level cache and main memory limits the copy speed result. If you have 256 kB cache, you probably get constant copy speeds for video modes which use less than 256 kB memory. When you test 640x480x256c, it needs 300 kB memory, so main memory speed limits the maximum moving speed and speed of 2nd level cache doesn't matter in that case (very much). On my S3, 16M modes are a lot slower than other modes. Read speeds from video memory are not tested, because you DON'T NEED TO KNOW IT. If some program reads from your video card, it's a stupid program or it reads just a little bit, for example mouse cursor updating. If your processor can write to video mem in 64bit chunks (PPro with write combining enabled, Cyrix 6x86 with write gathering enabled or Pentium when FPU is used to write 64bits per one instruction), you get better results for both write and copy speeds in Safbench. If you get a lot better values with option 'F' than 'V', your processor doesn't support write gathering/combining (or it isn't enabled) and datapath to video mem is 64bits. When using PPro or Cyrix 6x86 (or newer...), you should get about the same values for write speeds with both 'F' and 'V'. Copy speeds can be a bit different; if 'F' is faster, you've got relatively quick FPU. It also means that reading in 64bit chunks with FPU is faster than reading in 32bit chunks with CPU. If 'F' is slower, you've got slow FPU and/or datapath to memory is 32bits. FLD [QWORD ESI+i] [1] and FSTP [QWORD EDI+i] are used for copying from main memory or cache to video mem and FST [QWORD EDI+i] [2] is used for writing. [1]: ST(0) contains a valid non-zero value [2]: ST(0) contains a valid non-zero value and it's increased by a big number after every screen written Option 'F' is intended to be used with Pentiums (no PPros), since it can't do write gathering. Video write test in Safbench is supposed to tell you the maximum throughput of a particular video card with every processor. You don't necessarily get better results with 'F' on Pentiums, though. That's why FILD and FIST(P) aren't used to copy/write data, since they are a lot slower than FLD and FST(P). (FLD and FSTP can't be used in the "real world" to copy data, since a loaded value can be NaN, Denormal etc.) To get max speed out of your video card, I suggest you the following: -First, use S3VBE20 or UniVbe if VBE v2.0 isn't already supported -If you've got Cyrix 6x86, run 6X86OPT -linbuf. It modifies Address Region Register (enables write gathering for linear frame buffer). It can almost double the write speed to the LFB. VBE v2.0+ needed. -Try MCLK, overclocking might give speedups. I overclocked from 60 to 85 MHz and it speeded up writes from 25 to 120%! Don't overclock too much, since it might cause lockups in win etc, when the driver doesn't get the data it expects to get. Speeds over 85 MHz don't give any benefits in anything. I noticed that using option /1 3 (use FPM timing) was faster than the default 2-cycle EDO (with my video card!). -Use FASTVID or something similar proggy on a PPro if your BIOS doesn't turn on some bits which affect performance (DON'T use it if you have a buggy chipset, such as Orion 450GX rev. A2). I use MCLK and 6X86OPT and they speed up writes to video mem from 1.7 to 2.6 times (average being 1.95). Now my S3 Trio64 PCI runs at 85 MHz and writes 67 MB/s in 320x200x256c mode. That's 17 MB more than my main memory. Also Windows speed is increased (due to the overclocking) by 10-85%. About the dump 'V' and 'F' display: Mode=VESA mode (hexadecimal). BPP means Bits Per Pixel, 4=16 colors, 8=256 colors, 15=32K colors, 16=64K colors, 24=16M colors, 32=16M too. BPSL means Bytes Per Scan Line, which when multiplied by Height is the number of bytes that the video mode takes memory. FPS is Frames Per Second (MB/s rate divided by BPSL*Height). Write and Copy speeds are in megabytes (1048576 bytes). Hz means vertical refresh rate in the case you didn't guess it! Windows(tm) Screws Up(tm) that result, too. The test goes like this: 1) Switch to the video mode to be tested 2) Fill video memory with some data 3) Goto 2 if one second is not elapsed yet (waits for the screen to "settle down") if option 'W' is used 4) Write to the video memory (clears the whole screen with different colors) This test lasts for about 0.3 seconds. That's enough. Testing one minute doesn't give any more better results. 5) Move from main memory (or cache, depending on the mode&cache size) to video memory. Data is moved once before starting the timer to flush caches etc. This takes 0.3 seconds, too. 6) Check Hz-rate (0.1 seconds?) 7) Store results (0.01 microseconds?) 8) Goto 1 MCLK homepage: http://www.oac.uci.edu/~rliao S3VBE20 homepage: http://www.uni-muenster.de/math/u/mesched UniVbe (Display Doctor) homepage: http://www.scitechsoft.com/down_sdd.html You probably get better values when the computer is "cold", just turned on. When it warms up, write speed of every video mode decreases by ~0-6%. If someone has a nice technical explanation for this little thing, I'd be pleased to hear it. FPU tests in Safbench calculate _RAW_ power of your FPU, "real world apps" ALWAYS include some integer and mem accessing. So the FPU-results are almost-worst-case-results. Let's take an example. PovRay (the version 3 _I_ have) uses Pentium- optimized code. It means that is uses FXCH-instruction to swap the FPU registers so that the FPU can overlap instructions better, since most of the FPU-instructions need ST(0) (TOS, Top Of Stack). 8087+ FPUs have eight registers, ST(0) to ST(7). If an instruction needs the result that the previous instruction used, overlapping is not possible. FXCH is optimized so well on Pentiums, that it can be executed even while the register is in use. So Pentium-optimized FPU- programs use FXCH quite a lot. That's a big speed decrease for other than Pentiums. For example, 387 needs 18 cycles for FXCH. Addition takes 23-37 cycles. 486 needs 4 for FXCH and 8-20 for addition, Cyrix 6x86 needs 3 and 4-9. Pentium CAN execute them BOTH or just the ADDITION in 3 cycles. Thus it's a big speed decrease to use Pentium-optimized code on other than Pentiums. Pentiums are also able to overlap FPU instructions, so it's possible to execute FADD, FSUB, or FMUL in one cycle (assuming the result for those operations is not needed in three cycles). Safbench uses FXCH only just a little. So... PovRay would run faster on 387, 486 and Cyrix 6x86 if it didn't use FXCH to achieve speedups on Pentiums (yes, I know there are still compilers which support 486-optimizations). Oh well. Then there's Quake. That's also an example of a program, which uses FPU very extensively, and it's optimized for Pentiums. So you are probably not surprised that it performs poorly on Cyrix 6x86. Please note that Pentium 120 runs at 120 MHz, but Cyrix 6x86-P120+ runs at 100 MHz. If Quake was optimized for, let's say, 486, Cyrix 6x86-P120+ wouldn't seem so slow when compred to Pentium 100. Quake also does integer calculation. Quake is so fast on Pentium Pros, because it's got VERY fast 2nd level cache (read speed about 400 MB/s on PPro/180) and CPU and FPU are about 20% faster than on Pentiums (with same MHz-rate). PPro has a nice technique, which allows 64bit moves in one cycle. I've read articles about Quake and Cyrix 6x86, where was stated, that Quake uses FPU only to BUFFER the data, but I don't agree. After debugging Quake, I found that it actually did something 'useful', such as multiplying and addition. I didn't see it move data via the FPU in 64bit chunks. Actually, it doesn't move data all day long, except to video mem. Cyrix 6x86 does write gathering (so data is moved in 64bit chunks anyway, even though a software uses 32 or 16bit move), so it's best to use CPU to move the data (REP MOVSD)... And the same goes for Pentium Pros. But Pentium doesn't do write gathering, so it might be faster to use the FPU to move data. Quake is optimized purely for Pentiums, it takes advantage of FXCH being pairable with some FPU instructions, so it can be executed with no extra cost. On a 6x86 FXCH takes 3 clocks. Also, main routine in Quake is Pentium-optimized assembly-code, for example FDIV with lowest precision. That FDIV is parallelized with integer ops (C-compilers can't make that kind of optimized code). So, when FDIV takes 39 cycles on a Pentium with highest precision (the default one, 64 bits), it takes 34 cycles on Cyrix 6x86. But in Quake, 24 bit precision is used. With Cyrix 6x86, that FDIV still takes 34 cycles, but with Pentium only 19 cycles. Quake is 100% FPU down to the 8 or 16 pixel sub- divisions. Quake is optimized so that FPU and integer units complete roughly at the same time, but that piece of code is timed for Pentiums! Quake is not a GOOD benchmark of FPU speed; it's a GOOD benchmark of Quake speed. See http://www.gamers.org/dEngine/quake/papers/mikeab-cgdc.html for more information. My Cyrix 6x86-P120+ at 100 MHz runs PovRay as fast as Pentium at 75 MHz, so FPU-speed of Cyrix isn't that bad. In contrast, CPU is a lot faster. Quake isn't the only program on the world. And also Excel uses FPU (long reals, 64bits) in the calculations... But there's so much overhead that speed of FPU doesn't matter very much. Some size statistics of Safbench.EXE: - Main code 4.8 kB (timing routines, number displaying etc.) - Mem speeds 9.0 kB (main and video memory speed tests) - Test code 7.2 kB (Simple Int., Complex Int., FPU stuffs etc.) - Data 6.5 kB (FPU test data (long reals etc), texts etc.) The executable is compressed and it contains the stub loader... [7.] - Tables Here's the results I've got from different computers (Cyrix 6x86MX and Pentium MMX -results still missing!). Nr Test 1: Simple integers 2: Complex integers 3: Simple arithmetic FPU 4: Mixed FPU&CPU 5: Transcendental FPU 6: Non-transcendental FPU 7: Processor control FPU 8: 16bit speed 386/20 486/33 P5/90 CxM1/100 PPro/180 PII/266 K6/200 1: 6319.5 37555.5 183882.0 220270.9 407299.4 607075.1 409915.5 2: 4553.1 15640.6 58318.9 84934.8 162444.4 245181.8 142768.1 3: 2797.3 13836.4 108689.0 69841.8 255503.7 383141.8 207291.2 4: 2122.9 12229.5 82968.0 66460.9 182355.0 271606.9 158958.5 5: 453.1 912.2 4774.9 3395.4 9733.3 14334.9 9253.9 6: 2922.9 7123.7 26361.4 32720.5 56377.8 83696.8 72311.5 7: 4854.7 15134.1 58890.6 70579.9 87067.2 134227.8 101341.5 8: 2631.2 8595.4 45247.5 56439.4 53113.3 79437.0 86928.4 If you've got Intel 486DX, I'd like to have the results for it. I changed FPU tests after I got the results from that 486... Power Per MHz 386 486 Pentium Cyrix M1 PPro PII AMD K6 1: 316.0 1126.7 2043.1 2222.2 2262.8 2276.5* 2049.6 2: 227.7 469.2 648.0 837.3 902.5 919.4* 713.8 3: 139.9 415.1 1207.7 698.4 1419.5 1436.8* 1036.5 4: 106.1 366.9 921.9 664.6 1013.1 1018.5* 794.8 5: 22.7 27.4 53.1 34.0 54.1* 53.8 46.3 6: 146.1 213.7 292.9 327.2 313.2 313.9 361.6* 7: 242.7 454.0 654.3 705.8* 483.7 503.4 506.7 8: 131.6 257.9 502.8 563.9* 295.1 297.9 434.6 *=most powerful Intel Processor families, processor speed improvements From... 386 486 Pentium PPro To..... 486 Pentium PPro PII Simple integers 3.60x 1.81x 1.13x 1.01x Complex integers 2.08x 1.38x 1.39x 1.02x Simple arithmetic FPU 3.00x 2.91x 1.18x 1.01x Mixed FPU&CPU 3.49x 2.51x 1.10x 1.01x Transcendental FPU 1.22x 1.94x 1.02x 1.00x Non-transcendental FPU 1.48x 1.37x 1.07x 1.00x Processor control FPU 1.89x 1.44x 0.74x 1.04x 16bit speed 1.99x 1.95x 0.59x 1.01x ------------------------------------------------------------- Average 2.34x 1.91x 1.03x 1.01x As you can see, difference between PPro and PII is very little, but PII has 32 kB 1st level cache (PPro has 16 kB). And MMX. PII's L1 data cache is 4-way set-associative (16 kB), PPro's L1 data cache is is 2-way set-associative (8 kB). PII's 2nd level cache runs at half the core speed (133 MHz for 266 MHz CPU), PPro's 2nd level cache runs as fast as the CPU (but it can't be accessed as fast as 1st level cache, see below). Memory speed results (MB/s), speeds taken from cache boundary (8kB and 64kB for 486/33 etc.) If the result is in brackets, it's not necessarily correct and I want results for that particular CPU! M1/100: Cyrix 6x86-P120+ 100 MHz M1/133: Cyrix 6x86-P166+ 133 MHz (no L2 cache) M1/133t: Cyrix 6x86-P166+ 133 MHz (ASUS mobo, 512 kB PB-cache, 64 MB EDO) M1/150: Cyrix 6x86-P200+ 150 MHz (ASUS mobo, 512 kB PB-cache, 64 MB EDO) AMD K6/200: 66.666*3 AMD K6/208: 83.333*2.5=208,333 AMD K6/208t: same as above, but memory timings tweaked Sequential accessing read write move CPU 1st 2nd main 1st 2nd main 1st 2nd main 486/33 70.8 35.9 12.9 61.4 62.4 30.7 41.5 29.5 7.9 [P54/90 313.0 107.6 64.0 448.8 47.2 28.8 246.3 37.4 18.7] P54/120 718.2 178.1 89.3 389.7 78.2 73.3 396.5 61.0 34.9 P54/166 1002.2 202.3 100.5 473.7 86.7 83.5 540.4 67.6 42.7 M1/100 607.3 194.1 72.2 288.8 109.2 50.4 376.8 66.1 26.4 M1/133 *638.6 ----- 74.7 376.3 ----- 41.2 494.4 ----- 26.6 M1/133t 815.1 256.2 111.7 384.0 150.6 64.8 500.4 96.8 37.5 M1/150 896.4 287.7 125.4 429.8 169.1 72.8 562.0 108.7 42.2 K6/200 731.6 250.1 125.4 725.9 127.0 69.3 748.0 84.3 42.3 K6/208 763.0 282.2 125.3 758.1 154.0 73.8 781.2 101.1 44.2 K6/208t 763.0 282.2 147.8 758.1 154.0 86.8 781.2 101.1 51.0 [P6/180 675.0 395.5 183.6 600.7 341.3 73.4 746.9 234.8 43.9] *: 847.6 with block size of 8 kB Reverse accessing read write move CPU 1st 2nd main 1st 2nd main 1st 2nd main 486/33 71.0 37.3 13.1 55.4 62.5 30.7 40.6 29.5 8.1 [P54/90 313.0 107.7 65.7 447.9 47.2 28.9 278.9 37.4 19.1] P54/120 716.9 178.1 89.3 382.2 78.2 74.9 406.2 61.0 34.9 P54/166 998.2 202.3 100.5 471.9 86.7 83.5 546.9 67.6 42.7 M1/100 624.2 123.0 53.7 285.0 109.1 50.4 284.2 54.6 23.9 M1/133 719.9 ----- 57.3 371.5 ----- 41.2 382.9 ----- 24.0 M1/133t 819.8 177.2 79.3 379.2 145.2 64.8 377.9 81.3 34.8 M1/150 897.0 199.4 89.1 424.4 163.0 72.8 424.6 91.3 39.1 K6/200 746.3 254.0 125.4 732.0 127.0 69.3 747.7 84.3 42.3 K6/208 778.0 281.5 125.4 763.7 154.0 73.8 781.2 101.1 44.2 K6/208t 778.0 282.2 147.9 763.7 154.0 86.8 781.2 101.1 51.0 [P6/180 673.6 409.7 185.7 599.6 343.0 73.8 291.0 229.4 37.2] Butterfly accessing read write move CPU 1st 2nd main 1st 2nd main 1st 2nd main 486/33 70.4 40.8 10.5 60.7 60.5 15.4 34.7 21.2 6.2 [P54/90 299.0 110.8 53.3 365.3 47.0 20.3 194.1 38.1 16.6] P54/120 415.7 157.1 82.0 316.0 78.1 23.7 310.4 57.1 25.7 P54/166 578.3 192.4 98.0 384.0 86.6 29.2 430.7 63.3 30.3 M1/100 670.6 151.0 59.6 190.0 92.6 47.0 186.2 47.8 22.8 M1/133 833.3 ----- 69.1 249.1 ----- 41.2 238.7 ----- 19.0 M1/133t 894.4 222.6 100.5 252.7 127.0 64.5 248.6 75.3 32.2 M1/150 1035.5 250.1 112.9 283.2 142.7 72.5 277.5 84.5 36.2 K6/200 734.8 233.8 124.8 726.6 126.0 68.7 577.7 66.3 34.4 K6/208 766.2 279.6 129.6 758.7 152.8 73.2 604.6 80.3 36.7 K6/208t 766.2 279.6 153.9 758.7 152.8 85.8 604.6 80.3 43.0 [P6/180 671.3 469.5 114.3 503.8 246.4 55.6 350.6 107.1 29.8] These are for Safbench v1.32: Random accessing Cyrix 6x86-P120+ (256 kB PB, 16 MB FPM RAM, MG i430VX mobo) With SADS enabled/disabled, speedup displayed for Mixed: Block size Mixed Read Write 1 195.428/195.425 423.692/423.692 263.923/263.923 2 197.336/197.339 425.273/425.273 261.743/261.743 4 199.942/199.923 425.411/425.411 261.158/261.162 8 188.310/188.359 343.730/343.541 260.116/260.135 16 135.166/137.681 1.86% 191.103/191.387 146.147/147.752 32 78.132/ 84.335 7.94% 146.816/146.911 102.537/104.844 64 61.998/ 68.398 10.32% 131.345/131.507 89.535/ 91.973 128 55.776/ 62.020 11.19% 124.991/125.132 84.321/ 86.866 256 52.862/ 58.861 11.35% 120.015/120.203 81.484/ 84.133 512 34.213/ 36.142 5.64% 84.212/ 84.256 58.376/ 59.909 1024 27.671/ 28.684 3.66% 68.967/ 69.064 49.333/ 50.458 2048 25.149/ 25.946 3.17% 62.399/ 62.573 45.292/ 46.308 4096 22.576/ 23.281 3.12% 58.526/ 58.778 42.170/ 43.392 Avg.: 98.043/100.484 2.49% 200.499/200.510 134.318/135.584 AMD K6/83.333*2.5 MHz, ASUS P/I P55T2P4, 512 kB PB-cache, 64 MB EDO Block size Mixed Read Write 1 373.015 629.963 773.795 2 377.418 630.913 773.285 4 379.344 630.875 775.466 8 377.372 631.245 775.632 16 372.540 623.789 759.802 32 237.125 327.785 278.042 64 112.521 233.931 174.797 128 93.360 211.812 156.690 256 86.757 203.742 150.902 512 83.660 198.390 147.052 1024 57.710 130.714 99.605 2048 49.193 111.404 85.822 4096 43.246 103.595 80.371 Avg.: 203.328 359.089 387.020 Cyrix 6x86-P166+ (133 MHz), ASUS P/I P55T2P4, 512 kB PB-cache, 64 MB EDO Block size Mixed Read Write 1 259.858 563.387 350.938 2 262.393 565.466 348.041 4 265.744 565.676 347.261 8 250.740 457.401 345.923 16 186.419 266.773 206.649 32 115.155 206.706 147.252 64 92.936 185.025 128.697 128 84.131 176.046 121.262 256 80.032 171.752 117.875 512 78.066 168.261 115.423 1024 47.724 115.500 80.337 2048 38.604 98.192 67.480 4096 33.115 89.544 61.107 Avg.: 138.071 279.210 187.557 Cyrix 6x86-P200+ (150 MHz), ASUS P/I P55T2P4 mobo, 512 kB PB-cache, 64 MB EDO Block size Mixed Read Write 1 291.884 632.785 394.173 2 294.726 635.164 390.932 4 298.562 635.384 390.042 8 281.628 512.774 388.527 16 208.419 293.247 228.372 32 128.588 223.527 163.267 64 103.892 202.029 143.035 128 94.048 194.296 134.942 256 89.664 190.943 131.739 512 87.718 188.968 129.843 1024 53.730 129.747 90.740 2048 43.418 110.273 76.088 4096 37.219 100.591 67.861 Avg.: 154.884 311.518 209.966 These are for Safbench v2.13: Random accessing Cyrix 6x86-P120+ (256 kB PB, 16 MB FPM RAM, MG i430VX mobo) With SADS enabled/disabled, speedup displayed for Mixed and Write: Block size Mixed Read Write 1 171.830/171.835 423.692/423.680 264.402/264.402 2 164.746/165.080 0.20% 426.327/426.327 263.845/263.841 4 153.398/155.446 1.34% 426.399/426.217 263.618/263.618 8 134.603/137.894 2.44% 401.759/401.753 262.177/262.221 16 97.326/102.613 5.43% 265.927/266.133 179.422/182.025 1.45% 32 71.016/ 77.150 8.64% 165.790/166.132 112.048/114.557 2.24% 64 59.432/ 65.381 10.01% 138.675/138.837 93.133/ 95.572 2.62% 128 54.611/ 60.417 10.63% 128.084/128.294 85.975/ 88.511 2.95% 256 51.437/ 56.844 10.51% 120.194/120.320 81.453/ 84.080 3.23% 512 33.234/ 34.956 5.18% 84.216/ 84.261 58.384/ 59.922 2.63% 1024 27.318/ 28.220 3.30% 69.113/ 69.212 48.956/ 50.145 2.43% 2048 24.884/ 25.565 2.74% 62.672/ 62.826 44.975/ 46.046 2.38% 4096 23.626/ 24.228 2.55% 58.548/ 58.797 42.190/ 43.413 2.90% Avg.: 82.112/ 85.048 3.58% 213.184/213.291 138.506/139.873 0.99% BTW: what the hell means "Slow ADS"? In Mixed test, the whole cache line is often fetched from cache/main mem and then processed in CPU's cache. With Random reads/writes, 32 bytes are read/written from/to a random place and then proceeded to the next random position. When cache read miss occurrs, CPU fetches the whole cache line (32B) from main memory on Pentium+ CPUs, but when cache write miss occurrs, Pentium doesn't allocate new cache line for it (get it from memory), but PPro does. That's why 32 byte reads AND writes are used. The randomization routine gives the same values for every run with every CPU, so memory is accessed at the same places every time you run Random access test. Main memory speeds are taken from 4096 kB block size. Does anyone know this: why butterfly read from main mem on PPros is so much slower than sequential/reverse read? Why reverse reading from main memory and L2 cache is much slower than seq. read on M1, but not with other CPUs? Speed of my Diamond Stealth 64 DRAM PCI, 1 MB 2-cycle EDO, Trio64 (764). Main memory is 2*8 MB 60 ns FPM RAM, 256 kB PB-cache, MG i430VX mobo. Fastest possible settings for RAM in BIOS. Cyrix 6x86-P120+ (100 MHz). The shitty monitor is ADI MicroScan 2E, about 15 inches. ---Clear boot--- S3VBE20, no MCLK run (S3's DRAM clock generator running at 59.96 MHz): Mode Width Height BPP BPSL Write FPS Copy FPS Hz 163 320 200 8 320 35.40 580.02 33.23 544.50 69.94 164 320 240 8 320 31.79 434.08 30.71 419.25 67.27 165 320 400 8 320 35.36 289.69 33.23 272.23 69.94 166 320 480 8 320 31.72 216.57 30.67 209.34 67.27 14F 400 300 8 400 37.16 324.75 33.23 290.37 73.40 12D 512 384 8 512 36.75 195.99 33.23 177.22 81.60 100 640 400 8 640 36.76 150.55 33.20 136.00 69.87 101 640 480 8 640 36.74 125.39 30.84 105.26 60.16 103 800 600 8 800 33.96 74.18 26.82 58.60 56.15 105 1024 768 8 1024 25.25 33.66 22.84 30.45 87.10 110 640 480 15 1280 30.18 51.50 26.32 44.92 60.04 111 640 480 16 1280 30.17 51.48 26.32 44.92 60.04 113 800 600 15 1600 22.71 24.80 20.04 21.89 56.15 114 800 600 16 1600 22.72 24.82 20.04 21.89 56.15 211 640 400 32 2560 15.98 16.36 12.24 12.53 70.09 ---MCLK used--- S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz): Mode Width Height BPP BPSL Write FPS Copy FPS Hz 163 320 200 8 320 37.30 611.12 33.23 544.50 69.94 164 320 240 8 320 37.30 509.26 33.23 453.75 67.27 165 320 400 8 320 37.30 305.56 33.23 272.23 69.94 166 320 480 8 320 37.30 254.63 33.23 226.86 67.27 14F 400 300 8 400 37.36 326.45 33.23 290.37 73.40 12D 512 384 8 512 37.30 198.93 33.23 177.22 81.60 100 640 400 8 640 37.30 152.78 33.20 136.00 69.87 101 640 480 8 640 37.30 127.32 30.84 105.26 60.16 103 800 600 8 800 37.30 81.48 26.84 58.63 56.14 105 1024 768 8 1024 37.30 49.73 26.32 35.09 87.09 110 640 480 15 1280 37.30 63.66 26.32 44.92 60.04 111 640 480 16 1280 37.30 63.66 26.32 44.92 60.04 113 800 600 15 1600 34.98 38.21 26.31 28.74 56.15 114 800 600 16 1600 34.96 38.18 26.31 28.74 56.15 211 640 400 32 2560 21.28 21.79 17.48 17.90 70.09 ---6X86OPT used--- 6X86OPT -LINBUF, S3VBE20, no MCLK run (59.96 MHz): Mode Width Height BPP BPSL Write FPS Copy FPS Hz 163 320 200 8 320 43.52 713.03 40.11 657.24 69.94 164 320 240 8 320 38.54 526.21 35.89 489.96 67.27 165 320 400 8 320 43.27 354.43 39.93 327.08 69.94 166 320 480 8 320 38.36 261.88 35.67 243.49 67.27 14F 400 300 8 400 43.85 383.14 41.62 363.67 73.40 12D 512 384 8 512 46.78 249.51 42.21 225.14 81.60 100 640 400 8 640 44.76 183.34 41.29 169.13 69.87 101 640 480 8 640 43.82 149.56 39.48 134.76 60.16 103 800 600 8 800 39.88 87.12 34.24 74.81 56.15 105 1024 768 8 1024 30.42 40.56 25.23 33.64 87.07 110 640 480 15 1280 37.35 63.75 30.09 51.35 60.04 111 640 480 16 1280 37.37 63.78 30.10 51.36 60.04 113 800 600 15 1600 28.52 31.15 22.60 24.69 56.14 114 800 600 16 1600 28.57 31.20 22.60 24.69 56.15 211 640 400 32 2560 26.70 27.34 16.05 16.44 70.09 ---6X86OPT and MCLK used--- 6X86OPT -LINBUF, S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz): Mode Width Height BPP BPSL Write FPS Copy FPS Hz 163 320 200 8 320 69.15 1132.97 52.70 863.46 69.94 164 320 240 8 320 65.63 896.07 52.70 719.59 67.27 165 320 400 8 320 69.03 565.46 52.71 431.77 69.94 166 320 480 8 320 65.49 447.09 52.71 359.84 67.27 14F 400 300 8 400 63.29 553.01 52.70 460.53 73.40 12D 512 384 8 512 63.61 339.26 52.70 281.08 81.60 100 640 400 8 640 62.18 254.68 52.64 215.63 69.87 101 640 480 8 640 61.81 210.97 46.94 160.23 60.16 103 800 600 8 800 58.09 126.89 38.27 83.61 56.15 105 1024 768 8 1024 64.62 86.16 37.12 49.50 87.07 110 640 480 15 1280 53.06 90.55 37.22 63.52 60.04 111 640 480 16 1280 53.06 90.56 37.22 63.52 60.04 113 800 600 15 1600 50.83 55.52 34.79 38.00 56.15 114 800 600 16 1600 51.25 55.97 34.77 37.98 56.15 211 640 400 32 2560 33.79 34.60 21.25 21.76 70.09 ---6X86OPT and MCLK used, Cyrix 6x86 overclocked to 120 MHz--- ----(Main memory, 1st and 2nd level cache now 20% faster)----- 6X86OPT -LINBUF, S3VBE20, MCLK /0 93 2 2 /1 3 (85.01 MHz): Mode Width Height BPP BPSL Write FPS Copy FPS Hz 163 320 200 8 320 74.29 1217.20 63.20 1035.39 69.94 164 320 240 8 320 70.15 957.76 63.18 862.66 67.27 165 320 400 8 320 74.07 606.77 63.19 517.68 69.94 166 320 480 8 320 69.91 477.25 63.19 431.40 67.27 14F 400 300 8 400 67.56 590.37 60.54 528.96 73.40 12D 512 384 8 512 68.97 367.87 60.39 322.09 81.60 100 640 400 8 640 67.05 274.63 59.99 245.71 69.87 101 640 480 8 640 66.28 226.24 54.47 185.92 60.15 103 800 600 8 800 61.33 133.97 45.52 99.44 56.14 105 1024 768 8 1024 65.31 87.08 43.23 57.65 87.06 110 640 480 15 1280 57.31 97.81 44.66 76.22 60.04 111 640 480 16 1280 57.34 97.86 44.66 76.23 60.04 113 800 600 15 1600 51.40 56.15 37.13 40.56 56.14 114 800 600 16 1600 51.40 56.14 37.11 40.53 56.14 211 640 400 32 2560 39.15 40.09 24.06 24.64 70.09 Pentium 166 MHz (66.666666*2.5) with Matrox Millenium 4MB With option 'F', main mem sequential read speed 100.5 MB/s Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 77.18 316.11 66.99* 274.39 70.17 101 640 480 8 640 77.30 263.84 63.32* 216.12 60.02 11D 1600 1200 16 3200 73.13 19.97 55.81 15.24 60.26 11E 1600 1200 16 3200 73.13 19.97 55.81 15.24 60.26 118 1024 768 32 4096 73.92 24.64 55.81 18.60 60.10 119 1280 1024 16 2560 74.55 29.82 55.72 22.29 60.13 11A 1280 1024 16 2560 74.55 29.82 55.71 22.28 60.14 11C 1600 1200 8 1664 74.61 39.18 55.65 29.22 60.26 107 1280 1024 8 1280 75.52 60.41 55.77 44.61 60.14 116 1024 768 16 2048 75.52 50.35 55.75 37.17 60.10 117 1024 768 16 2048 75.52 50.35 55.75 37.17 60.10 115 800 600 32 3200 75.76 41.38 55.83 30.49 60.47 114 800 600 16 1920 76.23 69.39 55.90 50.88 60.47 113 800 600 16 1920 76.24 69.39 55.90 50.88 60.47 112 640 480 32 2560 76.36 65.16 55.94 47.73 60.01 105 1024 768 8 1024 76.83 102.44 55.96 74.62 60.10 103 800 600 8 1024 76.89 131.23 55.96 95.51 60.47 110 640 480 16 1280 76.99 131.40 55.96 95.51 60.01 111 640 480 16 1280 76.99 131.39 55.96 95.51 60.02 *: other modes don't fit into the 256 kB L2 cache AMD K6 208.333 MHz (83.333*2.5 MHz), 512 kB PB-cache, 64Mb EDO, Asus P/I-P55T2P4 (HX chipset), Matrox Millenium 4Mb WRAM (41.666 MHz PCI-bus) Main mem sequential read speed 125 MB/s. With option 'F'. Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 116.78 478.35 84.60 346.51 70.18 101 640 480 8 640 116.75 398.52 84.48 288.37 60.03 103 800 600 8 1024 116.22 198.34 76.58 130.70 60.48 105 1024 768 8 1024 115.84 154.45 68.73 91.64 60.11 107 1280 1024 8 1280 113.58 90.86 62.94 50.35 60.15 110 640 480 16 1280 115.99 197.95 76.56 130.66 60.03 111 640 480 16 1280 115.99 197.95 76.56 130.66 60.03 112 640 480 32 2560 115.27 98.37 62.93 53.70 60.03 113 800 600 16 1920 114.73 104.43 62.94 57.29 60.48 114 800 600 16 1920 114.74 104.44 62.93 57.28 60.48 115 800 600 32 3200 113.54 62.01 62.93 34.37 60.48 116 1024 768 16 2048 114.01 76.01 62.93 41.96 60.11 117 1024 768 16 2048 114.02 76.01 62.93 41.96 60.11 11C 1600 1200 8 1664 111.50 58.55 62.93 33.05 60.27 118 1024 768 32 4096 110.99 37.00 62.93 20.98 60.11 119 1280 1024 16 2560 111.59 44.64 62.92 25.17 60.15 11A 1280 1024 16 2560 111.60 44.64 62.92 25.17 60.15 11D 1600 1200 16 3200 108.44 29.61 62.93 17.18 60.27 11E 1600 1200 16 3200 108.41 29.60 62.93 17.18 60.27 The same as above, but AMD K6 200 MHz (66.666*3 MHz, 33.333 MHz PCI-bus) Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 93.78 384.13 71.92 294.57 70.18 101 640 480 8 640 93.75 320.00 71.84 245.20 60.03 103 800 600 8 1024 93.32 159.27 67.51 115.21 60.48 105 1024 768 8 1024 93.06 124.08 62.99 83.99 60.11 107 1280 1024 8 1280 91.96 73.56 59.45 47.56 60.15 110 640 480 16 1280 93.44 159.47 67.53 115.26 60.03 111 640 480 16 1280 93.44 159.47 67.53 115.26 60.03 112 640 480 32 2560 92.88 79.26 59.45 50.73 60.03 113 800 600 16 1920 92.73 84.41 59.45 54.12 60.48 114 800 600 16 1920 92.73 84.41 59.45 54.12 60.48 115 800 600 32 3200 91.30 49.86 59.46 32.47 60.48 116 1024 768 16 2048 92.02 61.34 59.46 39.64 60.11 117 1024 768 16 2048 92.01 61.34 59.45 39.64 60.11 11C 1600 1200 8 1664 90.73 47.64 59.45 31.22 60.27 118 1024 768 32 4096 90.90 30.30 59.46 19.82 60.11 119 1280 1024 16 2560 90.79 36.32 59.46 23.78 60.15 11A 1280 1024 16 2560 90.82 36.33 59.46 23.78 60.15 11D 1600 1200 16 3200 89.27 24.38 59.46 16.24 60.27 11E 1600 1200 16 3200 89.25 24.37 59.46 16.24 60.27 Same as above, but memory timings tweaked and 83.333*2.5 MHz (208.333 MHz) Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 116.78 478.35 84.63 346.63 70.18 101 640 480 8 640 116.76 398.54 84.55 288.59 60.03 103 800 600 8 1024 116.22 198.35 79.20 135.17 60.48 105 1024 768 8 1024 115.85 154.46 73.68 98.25 60.11 107 1280 1024 8 1280 113.60 90.88 69.35 55.48 60.15 110 640 480 16 1280 115.97 197.93 79.19 135.14 60.03 111 640 480 16 1280 115.99 197.96 79.19 135.14 60.03 112 640 480 32 2560 115.27 98.36 69.38 59.21 60.03 113 800 600 16 1920 114.74 104.44 69.37 63.14 60.48 114 800 600 16 1920 114.73 104.43 69.37 63.14 60.48 115 800 600 32 3200 113.53 62.00 69.31 37.85 60.48 116 1024 768 16 2048 114.01 76.00 69.33 46.22 60.11 117 1024 768 16 2048 114.01 76.01 69.33 46.22 60.11 11C 1600 1200 8 1664 111.52 58.56 69.24 36.36 60.27 118 1024 768 32 4096 111.00 37.00 69.32 23.11 60.11 119 1280 1024 16 2560 111.58 44.63 69.14 27.65 60.15 11A 1280 1024 16 2560 111.59 44.63 69.14 27.66 60.15 11D 1600 1200 16 3200 108.42 29.61 69.35 18.94 60.27 11E 1600 1200 16 3200 108.44 29.61 69.35 18.94 60.27 Same as above, but with Cyrix 6x86-P200+ Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 98.46 403.31 92.98 380.85 70.18 101 640 480 8 640 98.50 336.21 92.95 317.28 60.03 103 800 600 8 1024 98.20 167.60 85.14 145.30 60.48 105 1024 768 8 1024 97.98 130.64 77.18 102.91 60.11 107 1280 1024 8 1280 96.85 77.48 71.05 56.84 60.15 110 640 480 16 1280 98.16 167.53 85.14 145.31 60.03 111 640 480 16 1280 98.16 167.52 85.15 145.32 60.03 112 640 480 32 2560 97.56 83.25 71.17 60.73 60.03 113 800 600 16 1920 97.49 88.74 71.14 64.75 60.48 114 800 600 16 1920 97.49 88.74 71.14 64.75 60.48 115 800 600 32 3200 96.59 52.75 70.98 38.76 60.48 116 1024 768 16 2048 96.94 64.63 71.02 47.35 60.11 117 1024 768 16 2048 96.94 64.63 71.02 47.35 60.11 11C 1600 1200 8 1664 95.69 50.25 70.81 37.18 60.27 118 1024 768 32 4096 95.65 31.88 70.97 23.66 60.11 119 1280 1024 16 2560 95.62 38.25 70.65 28.26 60.15 11A 1280 1024 16 2560 95.63 38.25 70.66 28.26 60.15 11D 1600 1200 16 3200 94.44 25.79 71.00 19.39 60.27 11E 1600 1200 16 3200 94.43 25.79 71.00 19.39 60.27 Same as above, but with Cyrix 6x86-P166+ Mode Width Height BPP BPSL Write FPS Copy FPS Hz 100 640 400 8 640 87.95 360.26 83.03 340.11 70.18 101 640 480 8 640 87.97 300.27 83.05 283.46 60.03 103 800 600 8 1024 88.05 150.27 76.13 129.94 60.48 105 1024 768 8 1024 87.47 116.63 68.74 91.65 60.11 107 1280 1024 8 1280 86.90 69.52 63.38 50.70 60.15 110 640 480 16 1280 87.90 150.01 76.03 129.75 60.03 111 640 480 16 1280 87.90 150.02 76.03 129.75 60.03 112 640 480 32 2560 87.38 74.57 63.41 54.11 60.03 113 800 600 16 1920 87.45 79.60 63.40 57.71 60.48 114 800 600 16 1920 87.45 79.60 63.40 57.70 60.48 115 800 600 32 3200 86.96 47.49 63.32 34.58 60.48 116 1024 768 16 2048 86.79 57.86 63.37 42.25 60.11 117 1024 768 16 2048 86.79 57.86 63.37 42.25 60.11 11C 1600 1200 8 1664 86.19 45.26 63.26 33.22 60.27 118 1024 768 32 4096 85.77 28.59 63.35 21.12 60.11 119 1280 1024 16 2560 86.17 34.47 63.29 25.32 60.15 11A 1280 1024 16 2560 86.17 34.47 63.29 25.32 60.15 11D 1600 1200 16 3200 84.34 23.03 63.36 17.30 60.27 11E 1600 1200 16 3200 84.33 23.03 63.36 17.30 60.27 Cyrix 6x86-P166+, no L2 cache, 32MB FPM ram, S3 Trio64V+ 2MB EDO (33.333 PCI) Mode Width Height BPP BPSL Write FPS Copy FPS Hz 170 320 200 8 320 88.94 1457.21 50.39 825.67 69.94 171 320 240 8 320 86.58 1182.11 50.36 687.58 67.27 172 320 400 8 320 88.89 728.21 50.37 412.61 69.94 173 320 480 8 320 86.52 590.66 50.36 343.81 67.27 174 400 300 8 400 89.93 785.78 50.39 440.27 73.40 175 512 384 8 512 91.06 485.66 50.35 268.56 81.60 100 640 400 8 640 90.69 371.45 50.35 206.22 69.79 101 640 480 8 640 88.15 300.89 50.35 171.85 72.84 103 800 600 8 800 81.02 176.99 50.33 109.94 72.27 105 1024 768 8 1024 70.90 94.53 50.34 67.12 69.92 107 1280 1024 8 1280 59.73 47.79 50.22 40.17 60.20 10D 320 200 15 640 91.21 747.17 50.37 412.63 69.78 10E 320 200 16 640 91.21 747.21 50.37 412.63 69.78 10F 320 200 32 1280 82.01 335.93 50.35 206.23 69.78 110 640 480 15 1280 77.12 131.61 50.34 85.92 72.84 111 640 480 16 1280 77.14 131.65 50.34 85.92 72.84 112 640 480 32 2560 55.81 47.63 44.66 38.11 72.84 113 800 600 15 1600 65.13 71.14 50.32 54.97 71.99 114 800 600 16 1600 65.14 71.15 50.32 54.96 71.99 115 800 600 32 3200 36.66 20.02 27.11 14.81 72.27 116 1024 768 15 2048 45.08 30.06 36.83 24.55 70.04 117 1024 768 16 2048 45.08 30.06 36.83 24.55 70.04 120 1600 1200 8 1600 54.21 29.61 42.57 23.25 96.02 Windows tests run under Win95 with the newest S3 video driver @ 800x600x16bit. (Cyrix 6x86-P120, Trio64, 16 MB FPM...) Test 1: 59.96 MHz, 2-cycle EDO timings Test 2: 85.01 MHz, FPM timings Test 1 Test 2 Speedup PC Labs Winbench v3.1 14644561 24906850 70.1% Wintach v1.0: Text 53.4 79.8 49.4% Cad 236.4 275.9 16.7% Spreadsheet 57.5 89.2 55.1% Paint 105.2 143.3 36.2% [8.] - VESA info Mode Hexadecimal VESA mode ('!' displayed after this if mode is not supported) Size Width and Height of the video mode in pixels (graphics) or chars (text) Clrs Number of different colours in the video mode BPP Bits Per Pixel BPSL Bytes Per Scan Line Size/B Size of the video mode in bytes GCLB G=graphics mode; Y=yes, -=no (text mode) C=color; Y=color, -=monochrome L=Linear Framebuffer supported for this mode, Y=yes, -=no B=Bank-switched mode supported for this mode, Y=yes, -=no Memtype " text" = text " CGA gfx" = CGA graphics " HGC gfx" = HGC graphics " 16c EGA" = 16-color (EGA) graphics " packed" = packed pixel graphics "sequ 256" = sequ 256 (non-chain 4) graphics " direct" = direct color (HiColor, 24-bit color) " YUV/YIQ" = YUV (luminance-chrominance, also called YIQ) "reserved" = reserved for VESA " OEM mem" = OEM memory models BO Bios Output supported, Y=yes, -=no D: Red Red mask size/red field position D: Grn Green mask size/green field size D: Blu Blue mask size/green field size D: Rsv Reserved mask size/reserved mask position D: P Color ramp is programmable, Y=yes, -=no D: A Bytes in reserved field may be used by application, Y=yes, -=no T: Char cell Char cell size in pixels D: only in Direct modes T: onlt in Text modes If reported VESA version is less than 2.0, '?' is displayed at the place of L and B, since they are officially available with VBE 2.0+. Yes, the output might seem 'messy', but at least it fits in 80 chars... [9.] - Errorlevels 0 No errors 1 No FPU present 2 Help screen displayed 3 Not enough memory 4 No 387+ FPU present (287 present) 5 No VBE present 6 Bad command line options 7 No LFB modes found to test (try UniVbe or S3VBE20) 8 Failed to map linear framebuffer 9 Not enough memory to test video copy speeds 10 Any Key Pressed 12 Specified mode isn't graphical LFB mode (or it was invalid) 13 VESA info displayed 14 Could not set video mode 15 CPL not 0 (needed for options 'B' and 'P'), try without multitasking 16 CPUID and/or RDTSC not supported (for 'B' and 'P') 17 Machine Specific Registers not supported ('B' and 'P' again) 18 Under 8224 kB free memory, random access test skipped [10.] - Final words If you encounter bugs (bugs? where?), don't hesitate to tell me. Also improvements accepted. See line #10 for more info. Thanks to Ralf Brown for Interrupt List. It's available also from my home page, even though the links don't exist in index.html. So long.