FastVid 1.03. Copyright 1996 by John Hinkley. 72466.1403@compuserve.com -------------------------------------------------------------------------- WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING -------------------------------------------------------------------------- THIS PROGRAM WILL ONLY WORK ON PENTIUM PRO PROCESSORS. IT WILL NOT WORK, AND IS NOT NEEDED, ON PENTIUM AND EARLIER CPU's. According to Intel, enabling Write Posting (see below) on 82450 steppings before B0 could result in "rare" problems on the PCI bus. If the program indicates that Write Posting is enabled when you first run this program (the "Before" message) then you have a B0 or later stepping of the 82450 and don't need to worry about the A2 bugs. The problem will manifest itself when there are high levels of traffic on the PCI bus -- multiple devices reading and writing at the same time. A typical example where you might have problems is when playing multimedia files like AVI and MPEG animations. The write combining options of this program can be used without problems on any version of the 82450. If you have a pre-B0 motherboard you may want to play around with write posting to see the difference it makes but you shouldn't enable it all the time -- it will occasionally lock up your computer. If you really want or need write posting you should consider getting a new motherboard. Be forewarned: YOU USE THIS PROGRAM AT YOUR OWN RISK. -------------------------------------------------------------------------- End of Warning. -------------------------------------------------------------------------- This program enables Write Posting, banked VGA Write Combining and SVGA linear frame buffer Write Combining on Pentium Pro motherboards based on the 82450 chipset. This will significantly improve graphic performance from DOS and Win95. The program must execute privileged instructions so it must be run in real mode. For the time being that means it must be run from DOS. You cannot run it from a DOS window or a full screen DOS session from Windows 3.x, Win95, WinNT or OS/2. For DOS, Windows 3.x and Win95 you can include the program in your AUTOEXEC.BAT file (keep in mind that DOS4GW.EXE must be in your search path). If you try to run the program from a protected mode OS you will get a DOS4GW error message and register dump. -------------------------------------------------------------------------- Steppings of the 82450 chipset before B0 have bugs which forced Intel to disable Write Posting -- in essence cache writes have been disabled for the PCI bus. The B0 stepping has been fixed and Write Posting is enabled by default by the BIOS. The difference is easily visible in writing to video memory. An A2 motherboard can only write about 8MB/sec to the graphics card, a B0 motherboard gets about 18MB/sec. But this is not the entire story. With the Pentium Pro, Intel also decided that Write Combining (the combining of several writes into a cache line that can be bursted out the PCI bus) should be the responsibility of the O/S, not the BIOS or hardware. By enabling Write Combining the throughput to video RAM can be further increased to 88MB/sec or more. There are two mechanisms for which Write Combining needs to be enabled: the banked VGA mechanism (the 128KB from A0000 to BFFFF) and the unbanked, linear frame buffer that many of the newer cards support. I will refer henceforth refer to linear frame buffer write combining as LFBWC and banked VGA write combining as BVWC. Most low resolution DOS graphics applications and games use the banked mechanism. But since the VESA committee has defined a standard, and UNIVBE and a few of the graphic card manufacturers have provided the VESA services, some of the latest games are using the linear frame buffer (Duke Nuke'm 3D and the Quake demo are two examples). The linear frame buffer usually gives better performance since it alleviates the reqirement to switch banks in hires modes. I have only personally tested this program with 2MB and 4MB Matrox MGA Millennium cards. It has been run by others with other cards (S3 964 based, S3 968 based, Tseng 6000 based) and most benefit to some extent. The Number Nine Imagine 128 card does not seem to benefit. With a 2MB Millennium there were problems with BVWC -- most hires VESA modes would result in vertical stripes over the entire screen. This appears to be either a hardware or software bug on the part of Matrox. I found a workaround which eliminates the stripes but you don't get the full speed enhancement from the BVGA. The LFB will still run at full speed. Using a negative value for the number of megabytes (for example FASTVID x11 -2) will enable this workaround. I haven't tested FASTVID on any 8MB graphic cards but I think it will work properly. Please let me know if you find othewise. On many graphic cards, enabling the BVWC results in problems with some programs that use VGA mode 0x12 (640x480x16colors). This appears to be either a hardware or software problem on the part of Matrox. The problem stems from the BVWC so you can run with that disabled if necessary. Users of other graphic cards have indicated the same problem. Note that this is not the same as the "vertical stripe" problem mentioned above. Unfortunately, I have found that EMM386 interferes in some way with LFBWC (tested with DOS6.2 and 7.0). I also have reports from beta testers that QEMM can interfere with LFBWC under DOS. When running DOS you must remove EMM386 from your CONFIG.SYS file for LFBWC to work (BVWC is not affected by EMM386). If EMM386 is loaded you will see no increase in speed of the linear frame buffer. Early on I wasn't able to get LFBWC with Win95. After removing EMM386, LFBWC started working form both DOS and Win95 sessions. At some point I re-enabled EMM386 and found that LFBWC contiued to work from Win95. I don't know what caused this "permanent" change (maybe re-installing the graphic driver with EMM386 removed and LFBWC turned on did it) -- let me know if you find out... LFB write combining requires that FASTVID know where the linear frame buffer is located. Different graphic card manufacturers put it at different addresses. The LFBWC code in FASTVID currently queries any installed VESA BIOS Extension driver for the LFB address so you should install your VESA driver before FASTVID. If you don't have a VESA driver loaded (keep in mind that many cards have the driver in BIOS so you don't need to explicitly load one) or your VESA driver doesn't support the LFB, you will have to supply an address. Theoretically this program will work for any LFB address above 0x80000000 but I have only tested and verified that it works for the Matrox MGA Millennium at 0xFF000000. (Others have successfully used it with other graphics cards at other addresses). If you supply an incorrect LFB address you will not see any increase in speed of the LFBWC. If this program can't automatically detect the LFB address, you can determine it's location from Win95. Select Start, Settings, Control Panel, (or My Computer, Control Panel), System, Device Manager, Display Adaptors, your graphics card, Resources. Scroll to the bottom of the Resource Settings box and you will see a line that reads: "Memory Range XXXXXXXX - YYYYYYYY". The first value is the location of the linear frame buffer. For the Matrox MGA Millennium it is 0xFF000000. If you have another address take note of it and input it into FASTVID when asked. -------------------------------------------------------------------------- Usage: FASTVID4 XYZ N ADDRESS X controls Write Posting. Y controls VGA (banked) Write Combining. Z controls SVGA (linear frame buffer) Write Combining. For all three, 0 disables, 1 enables, any other value results in no change from the current setting. N indicates the amount of video memory in MegaBytes. Valid values are 2, 4, and 8. Also valid are -2, -4, and -8 to apply the special "vertical stripe" patch. ADDRESS is the address of the linear frame buffer in hex. The Matrox MGA Millennium has it at FF000000. Example 1: FASTVID4 If no arguments are supplied you run through a question and answer dialogue and the program sets up the environment. It will also tell you what the equivalent command line is for the options you chose. Example 2: FASTVID4 111 4 FF000000 Write posting is enabled VGA Write Combining is enabled SVGA Write Combining is enabled for 4MB video memory at FF000000 Example 3: FASTVID4 x01 4 FF000000 The write posting setting is not changed by FASTVID VGA Write Combining is disabled SVGA Write Combining is enabled for 4MB video memory at FF000000 Example 4: FASTVID4 111 -2 FF000000 Write posting is enabled VGA Write Combining is enabled. "Vertical stripe" patch applied. SVGA Write Combining is enabled for 2MB video memory at FF000000 Example 5: FASTVID4 111 -4 FF000000 Write posting is enabled VGA Write Combining is enabled. "Vertical stripe" patch applied. SVGA Write Combining is enabled for 4MB video memory at FF000000 -------------------------------------------------------------------------- Included is a test program called VSPEED.EXE that reports the video throughput for bit blit operations from DRAM to VRAM for both the banked VGA and linear frame buffer mechanisms. If you experience difficulties with VSPEED try using -l or -L on the command line to eliminate the linear frame buffer test. For example: VSPEED -l will test only the banked VGA mechanism. VSPEED will test both the banked VGA and the linear frame buffer (assuming the card and VESA driver support it). -------------------------------------------------------------------------- Sample VSPEED results from an Intel Aurora motherboard with the B0 stepping of the 82450 and a 4MB Matrox MGA Millennium: FASTVID 000 Copy DRAM to banked VGA: 8.07 million bytes per second Copy DRAM to linear framebuffer: 8.14 million bytes per second FASTVID 100 Copy DRAM to banked VGA: 18.72 million bytes per second Copy DRAM to linear framebuffer: 18.91 million bytes per second FASTVID 011 Copy DRAM to banked VGA: 37.95 million bytes per second Copy DRAM to linear framebuffer: 39.60 million bytes per second FASTVID 111 Copy DRAM to banked VGA: 87.72 million bytes per second Copy DRAM to linear framebuffer: 93.46 million bytes per second FASTVID 111 -2 Copy DRAM to banked VGA: 49.20 million bytes per second Copy DRAM to linear framebuffer: 93.46 million bytes per second -------------------------------------------------------------------------- Sample VSPEED results from other cards with FASTVID 111 (unverified): STB Powergraph (S3 Trio64) 48 MillionBytes/sec Spea/V7 (S3 Trio64) 78 MillionBytes/sec GXE64Pro (S3-964) 22 MillionBytes/sec -------------------------------------------------------------------------- The following tests were run on an Intel Aurora motherboard with the B0 stepping of the 82450, 64MB DRAM (all four SIMM sockets populated), and a 4MB Matrox MGA Millennium. The "000" setting simulates an A2 motherboard where Write Posting is disabled. -------------------------------------------------------------------------- program: fastvid setting: 000 100 011 111 -------------------------------------------------------------------------- VSPEED (LFB, million bytes/sec) 8 19 40 93 Duke Nuke'm 3D (640x480, fps) 14 25 18 31 Doom Benchmark (fps) 38 70 48 74 640x480 FLC animation (fps) 25 48 88 121 Chris's 3D benchmark (SVGA) 21 38 66 77 Note that differences in motherboard and graphic card design may lead to different results. Most notably, some cards cannot sustain 93MB/sec in the VSPEED test. The above are all DOS applications. If you have an A2 motherboard turning on write posting will increase the WinBench96 Graphic Winmark score by about 25 percent. The write combining features don't make much difference to the Graphic Winmark score but there _are_ circumstance where write combining can make a big difference. One example is using the Media Player to play an animation to a high resolution, highcolor or truecolor window. For example: Run Win95 in a high resolution, direct color mode; say 1024x768, 24bits per pixel. Start the Media Player. Open \FUNSTUFF\VIDEOS\WEEZER.AVI from the Win95 CD-ROM. Enlarge the playback window to nearly full screen (do not use the Media Player's "full screen" option -- if you do it will change the screen to a lower resolution 8 bit mode for playback). Press the Play button. With write posting and write combining turned off you will get very poor results, about 2 frames per second. With write posting on and write combining off that will improve to about 4 frames per second. With both write posting and write combining on you will get very smooth playback with the frame rate too fast to count. You can see similar affects with the Hover! game on the Win95 CD-ROM. Again, with Win95 in a hires direct color mode, enlarge the game window as large as it will let you (about 640x480). With write posting and write combinging off you will get poor performance. With write posting on the game will be playable. With write posting and write combining the action will be very smooth. If you have a pre-B0 motherboard you can still benefit from write combining (without fear of encountering the 82450 bugs) in the above DOS and Win95 situations. -------------------------------------------------------------------------- further descriptions: -------------------------------------------------------------------------- 1) Write Posting: Write Posting is where the processor "posts" data to the PCI bus and then goes on it's way without waiting for the write operation to complete. Because of bugs in the pre-B0 stepping of the 82450 chipset Write Posting is disabled on early Pentium Pro motherboards. This severly limits the PCI throughput to about 8MB/sec. Most Pentium motherboards these days can get over 80MB/sec, 10 times faster. FASTVID can enable Write Posting on these motherboards, increasing PCI throughput to about 18MB/sec. You don't want to do this routinely because the bugs in the chipset will eventualy cause the PCI bus to hang, forcing a reboot of the machine. Motherboards with the B0 stepping have this bug fixed and Write Posting enabled by default. 2) Banked VGA Write Combining (BVGAWC): This function allows seperate writes to the banked VGA mechanism to be combined into a cacheline that can be bursted out to video memory via the PCI bus. I believe this used to be handled in hardware but Intel decided to make it a programable function with the Pentium Pro to make the motherboard architecture more general. If you enable BVGAWC with FASTVID PCI throughput will increase from 18MB/sec (B0 motherboard) to 90MB/sec for programs that use the banked VGA mechanism (most DOS games). If you enable only BVGAWC on an early motherboard (Write Posting remains off) the bus bandwidth increases from 8MB/sec to about 40MB/sec. Some of the newer motherboards (ASUS for instance) have this as a BIOS setup option. 3) Linear Frame Buffer Write Combining: Many newer graphics cards have their graphics memory mapped linearly at very high physical addresses (in addition to the banked VGA mechanism at A000:0000 and B000:0000) beyond the 2GB mark. The reason for doing this is to make access to video memory simpler and faster -- programs (and Windows drivers) don't have to switch banks all the time to access all of video memory. I believe Pentium motherboards enable Write Combining for all high addresses but the Pentium Pro design requires the use of the processors MSR registers to enable Write Combining. Again, this was done to generalize the motherboard design. You can theoretically have multiple devices mapped in high address space with different cachability options. Intel believes that proper place for this to be handled is within a PNP operating system. Unfortunately, no operating system yet supports this. As with BVGAWC, LFBWC will increase PCI throughput from 18MB/sec to 90MB/sec (or 8MB/sec to 45MB/sec with Write Posting off) for programs that use the linear frame buffer (some of the new hires DOS games, Windows drivers). Exactly how much a difference any of these functions makes depends on the applications being run and the graphics card you're using. If you are using a very slow graphics card you won't see much difference. Programs that do very little graphic output will show little or no difference. Programs that do lots of graphic output (realtime 3D games, multimedia animations) can show a large difference. There are even circumstances under Win95, OS/2 and NT where the difference can be huge. Here are some results with my Matrox MGA Millennium: A2 B0 FASTVID ---------------------------------------------------------------- Duke Nuke'm 3D (640x480, fps) 14 25 31 Doom Benchmark (fps) 38 70 74 640x480 FLC animation (fps) 25 48 121 Chris's 3D benchmark (SVGA) 21 38 77 Win95 Media Player* (fps) 2 5 15** * WEEZER.AVI from the Win95 CD-ROM enlarged to _nearly_ full screen at 1152x864, 32 bits-per-pixel. ** the frame rate was too fast to count, 15fps is an estimate -- the animation played fairly smoothly. -------------------------------------------------------------------------- Notes for SuperMicro P6SNF and P6DNF motherboard users, and other 82440 (Natoma chipset) based motherboards: Several FASTVID users have reported that one BIOS setting on these motherboards conflicts with FASTVID resulting in a system crash. If you experience these crashes try turning off "USWC write combining" (Uncached Speculative Write Combining) using the BIOS setup procedure. FASTVID's controls for write-posting don't seem to have any effect on 82440 motherboards. Presumably this means that write-posting is controlled by a different mechanism. I suggest using "FASTVID x11" so that FASTVID doesn't attempt to change the write-posting option if you have one of these motherboards. --------------------------------------------------------------------------