386POWER (sort of) PROGRAMMER GUIDE ------------------------------------------------------------------------------- Introduction, warnings and credits SMILEY WARNING: I use "smileys" (the faces drawn with punctuation marks and other chars that you must roll your head to the left to see clearly) a lot because i like to write as i talk and i heavily use pronunciation inflection. (read: the things i say, if they are written as they are,sometimes look like an offense or a menace, the smiley correct this). ;-) or ;) == it's a joke, you know i don't mean exactly this. :-) or :) == happy :-} or :} == oops! :-( == i'm not happy of this, will make it better next time PG-A386 WARNING: 386POWER HAS BEEN RATED Programmer Grade - Assembly 386 !!!!! ;) I suppose you know enough of assembly and 386 programming. If you don't, try the following crash program a) Get an access to Internet (direct or by way of a friend) (Internet is the big international net i put 386P 2.00 for public distribution) b) Learn how to use the FTP program to get access to the huge resources available on the ftp servers connected to Internet. If you can, look into the x2ftp.oulu.fi ftp server, it's an ibm pc programmer heaven, there, into the /pub/msdos/programming directory you can find all the infos and tools you need: 1) A zipped file containing a complete 386 programming manual by Intel (nearly perfect, if you can forgive some typos here and there). 2) The docs describing VCPI,DPMI and VDS. 3) The Ralph Brown Interrupt list (lots of info about all the function calls you can find working with a pc compatible) 4) Dos-extenders better than mine :}. Just in case you think mine is not good enough, look for pmode305.zip You can "attach" it to the basic 386Power and get complete DPMI support for most of the ms-dos configuration you can find (raw,XMS,VCPI,DPMI). ASM warning: This thing is like a shotgun, don't look into its barrel when it's loaded. Running under VCPI this baby is capable to tunnel thru the protection levels and do nasty things if programmed to do this. Be also aware that most of current assemblers perform "automatic" code optimizations, i usually test my programs with no optimizations allowed so it is easier to find my bugs or the assembler's bugs (believe me, assemblers have their own bugs). I take no responsability for this, this code is not harmfull since it is intended mostly for videogames, but intentionally or not you can modify it to do very dangerous things. Be also aware that even cpus have their bugs and quirks so test your program on different cpus if you can. LM WARNING: THIS will scare you ;). As you can guess from my name and address, i'm italian and to say the worse i'm from the northen Italian region called Veneto more precisely from a place 40km away from Verona (you know, Romeo and Juliet, the Arena) here in Veneto there is an old popular motto saying "Veronesi tutti matti" (Verona citizens are all crazy) and i'm not an exception :). This means i think in a very weird way (for example: i prefer well documented 386 assembly to C ) and hate writing documentation like this (i prefer putting comments inside the include files ). I'm so crazy i'd like to find a job as a game programmer/designer (nearly zero probability here in Italy). What's more i hate to overtest things, i test them only for what i need (don't use this code for an x-ray machine control program if you don't like the idea to see people glowing in the dark ;) ) and i'm never sure to have eradicated all the bugs (but i try hard to do this, quality control and bug eradication is planned before starting coding). If you find some comments or documentation quite hard to understand remember some of it has been written while under "berseker coding fury" ( picture me typing like a maniac at 00:30 a.m. and yelling "Ah! This code is so neat that explains itself!" while the radio volume is set to 'sonic boom' level) ( not to mention i haven't lots of freetime). And remember that sometimes i write in italo-venetian-english too (when i write english things quickly, i keep writing in english using phrase structures similar to italian or venetian). Thomas Pytel's PMODE: I owe lots of thanks to Thomas Pytel for distribuiting the source code of his dos-extender. Since 386Power is sort of PMODE stepbrother initially i based this sort of tech manual on the pmode.doc file included into pmode.zip by Thomas "Tran" Pytel, this way it was easier to get the differences between the two extenders. Now ... well the dos-extender has grown a lot (and degenerated a lot, ulp!) and tech.txt too. You might say i mention Tran too many times (or too few times if you are Tran ;) ), fact is that i never read a book about 386 protected mode programming before starting code this (don't get me wrong, i had years of experience programming everything from the Z80 to the MC68000) but read a lot about other things related to this and i already knew real mode assembly programming very well, i just looked into tran docs and played with pmode, then i found the DPMI docs and the Ralph Brown Interrupt List on the internet and decided to try to build 386Power. What's more i found in the net a complete book in electronic format about 386 programming with all the info one may need (look in the x2ftp.oulu.fi site). Anyway, Tran docs together with the pmode sources were the most complete thing i found about protected mode programming with some things you hardly find in the docs (like how to make VCPI paging work). Fact was that Tran coding style was a lot different from mine and used self modifying code, PMODE was a wonderful thing but wasn't exactly what i wanted, so i restarded from scratch keeping and eye on compatibility. If you are familiar with the OLD pmode dos-extender OR with the old 386power dos-extender, read carefully there are some things that are similar but a lot different. By the way, as far i know, Tran has produced new and very powerful dos-extenders, if you just need a dos-extender, look for the things he recently distribuited on the 'net. If you think the 386Power dos-extender is "not enough" for you but you still want to keep the support routines, get a dos-extender providing a DPMI interface (like Tran's PMODE 3.05) and use 386Power in DPMI mode together with it (but you'll have to modify the 386P startup code a little, to set correctly some system vars) (by the way, if you choose to do this, you can remove the VCPI stuff from 386power.asm and make your program slimmer). VERSION WARNING: 386P is still evolving (look at the XGE driver docs) so remember to give a look at the code (AND to the include files) if something doesn't work as you expect from the docs. ------------------------------------------------------------------------------ Introduction 386POWER (386P for short ) has been conceived and coded by Lorenzo Micheletto (that's me) but is not all "my stuff" (besides mostly is). It is based on the PMODE dos-extender by Tran (a.k.a. Thomas Pytel) 386P uses modified algorithms found in the old PMODE 2.24. I'm not the guy who reinvents the wheel every time (just NEARLY every time ;) ). I wanted to build a GAME ENGINE capable to run under 32bit protected mode. A game engine is the game equivalent of a data base engine, a set of software modules you can use as a base to build lots of different games with the same underlying "animation and sound model". Think about games like Commander Keen 1,2,3,4,5,6 , Duke Nukem 1,2,3,4,5 Bio Menace, Cosmo, etc. etc. They ALL follow the same "model", if they were designed using the same set of basic modules handling sound,video and animation once the first game was build, the other would have needed only different sound and graphics plus minor game engine updates. (Well, if you look at Commander K. you will notice 1,2,3 uses the same engine and 4,5,6 uses an evolution of the previous engine) From MY point of view (well, i bet somebody has a better point of view) to build a games engine running in protected mode the following things were needed: a) dos-extender module (386Power), to get access to full 32bit power. b) timekeeping/graphics/text/blitter/game_control_input/sound modules (XGE) c) finite state automata engine (to program objects motion and "rules of action" of moving object using a table driven automata engine). < getting here you can easily build a Commander Keen-like game or a cool board game like Settlers > d) geometry engine (poligon or bitmap oriented) < getting here you can build a 3d game > Included with 386P are the XGE modules ( a) and b) ), the other things are up to you. ------------------------------------------------------------------------------ Why use it? This little thing lets you run a 32bit protected mode program on nearly any computer with a 386 or better. It can run ... On MS-DOS with an VCPI/DPMI server (like EMM386, QEMM and others). [NO! It won't run on plain real mode, i think it is better to run on a trusted "mode switch manager" to smooth out hardware differences] AND WITH XSM and "raw 386", now! (but these later modes are not that robust in this "forced to be early" release). In a full-screen MS-Windows dos-box (or dos session) using the DPMI server included with MS-Windows running in 386 enhanced mode (see the included *.PIF files to see how to configure a 386powered program to run under Windows without "warnings"). In an OS/2 dos-box (maybe this will need a little tweaking of the dos-box setup file). In any other OS with a decent 386+MS-DOS emulator (even on risc workstations!!!!). Supporting "only" VCPI and DPMI this baby does not need to run complex configuration procedures because EMM386 (the ms-dos VPCI server) is installed by default by the ms-dos install program and the Windows DPMI server is always present. I'm also adding XMS and "raw 386" (HARD) support but i'm not happy of this because i just saw that the some program needs that extra kbyte that only an XMS or "raw" setup can make available. Running in 32bit (nearly) flat protected mode you will get access to a) linear addressable memory up to 4Gbyte b) 32bit addressing and data (this means you will never have to worry about loading segment registers and other things like that). This means a lot more speed if you understand the strong points and the weak points of protected mode. ------------------------------------------------------------------------------ ENVIRONMENT INITIALIZATION: 386Power boots from a ms-dos environment (even if emulated) checks if there is a 32bit (or more) intel compatible CPU (386,486,Pentium,..) checks if a VCPI or DPMI manager is present (read: it needs EMM386, QEMM, Windows running in 386 Enhanced Mode or OS/2 or at least a VCPI or DPMI server) checks for XSM or "nothing can help me, i'll have to go 'raw 386' " and if it finds one it initialize the protected mode environment for 32bit FLAT PROTECTED mode (one big segment spanning 4Gbyte). Once in Protected Mode IRQs are active and redirected to their real mode default handlers. ALL "ms-dos interrupt vectors" gets copied to the _OldInt table (and restored on program termination) so you don't need to save them and if something goes crazy, there are good changes that the _Exit routine will be capable to return the system into a "safe" situation. The IRQ mask gets saved, once the program terminates it gets restored (this should take care of berseker irqs). ------------------------------------------------------------------------------ THE ENVIRONMENT First of all, the memory layout, as you can guess, is nearly the same of PMODE: Your program will have at least 3 segments code16 A 16bit segment that holds all the real mode and 16bit protected mode init,exit code and real mode irq handlers. It has to be the first segment of the EXE. code32 The huge 32bit segment. You can throw in as much code as will fit in low memory (no 64k fixup overflows). If you need more code space, you're gonna have to load it into extended memory at runtime and use it there. Under protected mode, addresses (code and data, they're the same memory space) are offset from the beginning of this segment. I'll explain later how to access things outside this segment if you really need to. codeend This MUST be the last segment in the EXE. It is the base for the stack and low memory allocation. If you want to add other segments insert them between code32 and codeend The space used by code16+code32+codeend can be as big as the space available in low memory (the free memory available to plain ms-dos programs) this usually means between 400k...610k (it's not a real limit as you will see, because you can put all the data into extended memory) two things seems to miss ... 1) The STACK... The stack is shared by your protected mode program AND real mode calls. This means it can be at most 64k wide (don't worry it's wide enough) and must be in low memory (where dos code can access it directly). The stack region always begins at codeend, and goes on for STACKSIZE paragraphs. (a value declared in 386power.inc) but the stack the main program can use can be at most STACKUSER paragraphs because the rest is reserved for "temporary stacks" needed to switch from protected mode to real mode and back (more on this later). 2) The HEAP.... There are TWO HEAPS you can allocate memory from at run time. The "Low memory" heap, which covers all conventional memory below A0000h (the memory accessible to 16bit ms-dos code). The "Extended memory" heap, which covers that big and fat block of memory above the first megabyte of memory (where only your 386powered program can go). There's a reason for keeping them separate (other than that big hole between A0000h and 100000h which I did not want to fill in or rearrange with paging): MS-DOS real mode can only see the low memory heap so low memory is "precious". In calls to ms-dos where you have to pass buffer addresses you must pass only buffers located in low memory. Low memory is the place where you would allocate any critical disk or DMA buffers (more on this later). ------------------------------------------------------------------------------ VIRTUAL REGISTERS You call real mode interrupts and far routines with "virtual registers". They are memory images of the registers as they will be set for the real mode interrupt. When the mode-switch occours, the current register values are pushed into the stack, then the mode switch occours and the NEW register VALUES are LOADED FROM the virtual registers table. So.. if you set the V86eax variable to 10h when you switch to real mode, the eax register will contain 10h. Then when you are back, you can look at the last value of "eax in real mode" looking into V86eax. ------------------------------------------------------------------------------ Details of runtime: After initialization of the protected mode environment 386POWER will call a label called _Main located in the code32 segment (so place _Main: where you want to start your code and define it as a PUBLIC variable i.e: public _Main _Main: ) When _Main is reached you can assume that: 1) The stack is set up and it is STACKUSER*16 bytes wide. 2) The interrupts are disabled (and have been all the way from real mode) just in case there's something you want to do before enabling them. 3) CS points to the code segment you're running in (code32) by way of a selector set up by 386POWER. 4) DS,ES,FS, and SS point to an alias of the code segment (same memory, but write permissions activated) (in protected mode the memory referenced thru CS is read-only, so we need another selector pointing to the same memory but with read/write permissions). N.B. Both selectors in 3) and 4) are set with a 4Gigabytes limit so you can stuff anything you want into them. 5) GS is a segment that's 4Gigabyte wide but starts at absolute address 0. This is useful for accessing the real mode dos data area, or the BIOS data area, or the PSP, etc... to access something with real-mode address r_segment:r_offset just use gs:(r_offset+(r_segment*16)). 6) ALL interrupt vectors are copied to the _OldInt table (they will be restored on interrupt termination) so you can change the dos interrupt vectors without having to be worried to save 'em. 7) If you run under DPMI and you need locked memory (i.e. for IRQ handling) use the DPMI functions (check the _386Man var. and call the DPMI lock function if _386Man anded with IS_DPMI is different from zero ) 386P tries to lock the memory it allocates, but if the lock fails it goes ahead because maybe you don't need it. 8) If you run under VCPI, 386P goes for overkill, because it runs at CPL0 where no operating system can stop it. Man/woman/person/alien warned, half saved, the other half is up to you. I usually test my programs under DPMI, then if nothing lethal happens i try to see how they work under VCPI (where they run faster and have more memory available). 9) If you run under DPMI, an IRQ happened while in "real mode" gets reflected to the protected mode handler. IF YOU DON'T RUN UNDER DPMI, an irq happened in real mode goes to the real mode handler, but 386Power substibutes the real mode handler with an "irq reflector" that switches control to the protected mode handler and back. If an irq is not "redirected", the real mode handler is called "directly" (so we can spare two mode switches and go faster). There are two exceptions to this rule, IRQ 0 (int 8) and IRQ 1 (int 9) are not "automatically reflected" (see into 386Power.inc the description of _SetIRQ for more info) Selectors: There are three main selectors you have to know. _SelCode, _SelData, and _SelZero are 16bit word vars you can access to get the selector values for the code, data, and zero (GS) segments respectively. When 386power pass control to _Main you can assume CS=_SelCode, DS=ES=FS=SS=_SelData, and GS=_SelZero. You can change the segment register if you wish (for example to do a REP MOVS in the zero seg). But the 386POWER routines and ints expect the segregs to be these values And these MUST be the values when you jump to '_Exit' to return to DOS. Another thing that is assumed by 386POWER is DF=0 (direction flag is clear (like the CLD instruction)). You can perform STD, but before calling 386POWER stuff you have to "re-set" the direction flag with CLD. Memory layout: Linear address selector: Usage: 00000000h _SelZero ## MS-DOS code/data ## ## _Code16Base @@ Your program code starts here @@ and here you will find @@ the 16 bit code16 segment @@ that initializes @@ your program to prot. mode _Code32Base _SelCode,_SelData || code32 segment starts here || the lower part of if contains || the 386P dos extender || static data and || 32bit system code || || After that there will be your || program's code || and static data || || || $$ codeend segment $$ here there will be your stack $$ $$ $$ $$ LL Here starts the "low" heap LL LL LL LL LL LL LL LL LL LL LL 000A0000h VV Here starts the VGA video VV memory "address window" VV VV 000C0000h RR Here starts the 384k block RR used to map EMS pages and RR UMBs (remapped blocks of ram) RR where you place the things RR you "loadhigh" RR and ROMs RR RR RR RR RR RR 00100000h XX Extended memory starts here XX and is usually present other XX operating system code XX XX XX .. Here starts the "high" heap .. (actually extended memory) .. it is usually more than .. one megabyte, here you should .. place all the big chunks .. of data you need. .. .. .. .. .. ------------------------------------------------------------------------------- Linear & relative addresses: All addresses of thing declared in code32 are RELATIVE to the beginning of the code32 segment (which could start anywhere in low memory). nearly all the system code expects addresses that are code32 relative. For this reason, you must adjust any physical memory pointers before you use them. That is, to access something at linear address A0000h (A000:0000, in 16 bit seg:ofs notation) you can use GS (it is loaded with the _SelZero selector) and write to GS:0A0000h. If you want to write to A0000h using the default data segment (because it is faster to do so) it will not be at DS:A0000, but at DS:(A0000h - linear_address_of_the_beginning_of_Code32). This linear_address_of_the_beginning_of_Code32 is stored in a variable called '_Code32Base'. So if the segment code32 is 1F43, the linear address will be 1F430h To get a code32-relative pointer to linear address A0000h you will have to do something like: mov eax,0A0000h sub eax,_Code32Base ; now ds:eax points to the same location of linear address 0A0000h To convert the code32-relative offset of MyVar to a linear address you will have to do something like: mov eax,offset MyVar add eax,_Code32Base ; now gs:eax points to the same location of code32:MyVar The linear address for code16 is also provided in _Code16Base. As well as the linear address of the PSP in '_PSPBase'. The linear addresses of code16 and PSP will always be less than code32. To access them (memory pointed to by them, these vars are in code32), you will have to use one of two methods. One is easy enough, just use the GS segment. Or you could use negative indexes from the normal segment, causing a 4Gbyte wraparound (a lot like the 64kbyte wraparound under real mode). However it is rare to access things outside code32, so my advice is to use GS to make things clear and as a "marker" of things "done outside code32". Only when you really need speed (i.e. to access the vga memory) translate the base address once and store it into a variable. Whew! Confused? Look into 386file.asm to see ho this works, it is easier to see than to explain. ------------------------------------------------------------------------------- THE MEMORY HEAPS: Memory where is my memory? You have two heaps, into low and high (extended) memory. Each of which is guaranteed to be at least as much as you specified in LOWMIN and EXTMIN in 386power.inc, the startup code will grab all low memory for you (because it's meant to run standalone), and it will attempt to grab all the high memory it can. Two dword variables hold information about each memory area. _LoMemBase and _LoMemTop specify the base and top of the low memory pool as code32-relative addresses (ready to use, no adjustment needed). The total amount of low memory available in bytes is _LoMemTop - _LoMemBase (notice _LoMemTop points to one byte beyond the last available byte). The _GetLoMem routine is a very simple routine that takes a length in EAX and checks to see if there is enough low memory. If there is enough, it adds the length to _LoMemBase and returns a pointer (code32 relative, ready to use) in EAX to the low memory block along with the carry flag clear. If it finds not enough memory, it returns with the carry flag set. _HiMemBase, _HiMemTop, and _GetHiMem are the same thing for high memory. Then there is the _GetMem routine that first tries to allocate low memory and if this fails it tries high memory. ------------------------------------------------------------------------------- Calling real (actually quite virtual) mode: "Real mode" here is used as equivalent to "virtual 8086 mode" (emulated real mode), when you run under VCPI and DPMI you cannot get to "true" real mode, but your 16bit "real mode" code won't notice the difference. You can call real mode, and back. This is only provided so that you can call real mode interrupts, and routines that you can't recode in protected mode (like the _SetCPUWindows vesa bios function). Keep these cross-mode calls to the minimum because they eat lots of cpu time and can raise some nasty "interaction" bugs. You can call real mode interrupts or procedures from protected mode through CALL _ExecReal (execute real mode far proc), and CALL _ExecINT (execute real mode int). These function calls are only available to the PROTECTED MODE part of your program. To pass register values from protected mode to real mode and back you use 'virtual registers'. These 'virtual registers' are merely memory images of EAX,EBX,ECX,EDX,ESI,EDI, EBP,DS,ES,FS,and GS. AL and AH and AX and BL ... etc ... are there too, and they share the appropriate memory space with each other so if you change the 'virtual' AH register, the 'virtual' AX and EAX registers will be changed accordingly. The virtual registers are called V86eax,V86ax,V86ah,V86al and so on... There are no SS,ESP,CS,EIP registers. CS:EIP is taken from the real mode interrupt vector table (call _ExecINT) or from the processor's CX:DX registers (call _ExecReal). SS:ESP is set up by 386POWER together with the extended memory manager providing protected mode services. @ Call _ExecINT : Do a real mode interrupt. AL=interrupt you want to do. All the virtual registers will be passed to the real mode handler. They will also be passed back as the return values into the virtual register table. The carry, zero, aux, parity, sign, and overflow flags will be passed back as the actual CPU flags. The real mode interrupt will be called with interrupts disabled (as it is usually). Keep in mind, no CPU registers will be modified (except the flags mentioned). Only their "virtual" images will be changed by the real mode int handler. REMEMBER that _ExecINT uses the "real mode INT" table found at program start (the one copied to the _OldInt table), this looks weird, but it is very useful under VCPI when you need "reflected IRQ" service routines. @ Call _ExecReal: Call a real mode far procedure with interrupts disabled. CX:DX=seg:off you want to call. The register passing works just like _ExecINT. ------------------------------------------------------------------------------ Things to know about IRQs: Upon startup, all the interrupt vectors for IRQs point to routines that redirect the IRQs to their default real mode handlers. You can hook into any IRQ you want. There are two dword pointers that allow you to get and set IRQ vectors. _GetIRQ and _SetIRQ point to (32bit near) routines to get and set the relative address of the handler for specific IRQs within the code32 segment. To get the address of a handler, just do a 'call _GetIRQ' with BL set to the IRQ num you want (0-15). EDX will be returned pointing to its current handler. To set an IRQ, pass BL again as the IRQ number, and EDX as the offset of the new handler. You can chain to the old handler if you want just by jumping to the old address when your handler is done processing. (see specific things about irqs under VCPI in the VCPI specific section some pages after this). If you have to enable or disable an IRQ line use the _GetIRQMask and _SetIRQMask functions, this way your application will work on anything supported by 386P. Irq handlers in 32bit protected mode are terminated by IRETD. When your IRQ handler is called, you can be sure of ONLY TWO THINGS. The IF flag is clear and CS is loaded with _SelCode. All the general regs and segregs should be treated as undefined. Even SS cannot be trust, because under DPMI it is set to a little "interrupt stack" set up by DPMI. In other words ... DON'T CHANGE STACK!!!! And don't assume SS=DS=CS, if you want access DS, reload it with cs:_SelData and if you access things using ebp (i.e. mov eax,[ebp] ) remember SS may not be equal to DS (so use mov eax,ds:[ebp]). Another consideration for DPMI is the IF flag. According to DPMI specs, only CLI, STI, and INT 31h functions AX=900h and AX=901h should be counted on to modify the interrupt flag (POPF(D) and IRET(D) should not). This is because certain DPMI systems might have to virtualize the interrupt flag, and keep the real flag enabled at all times (but don't worry, if the 'virtual' flag is clear, your program will not get any IRQs). In practice, certain DPMIs do allow IRET(D)s and POPF(D)s to modify the virtual interrupt flag. But this is inconsistent across them. So you should follow these rules: @ CLI and STI are allowed, and do their functions. @ Don't assume anything about POPF(D) and IRET(D) and the interrupt flag. @ Don't assume the interrupt flag PUSHF(D) stores on the stack is correct, it might be the real flag or the 'virtual' flag. @ These DPMI INT 31h functions are supported under VCPI too (by way of the 386POWER interface). ) AX=900h: Get state of IF and disable it. Returns AL set to the IF flag. ) AX=901h: Get state of IF and enable it. Returns AL set to the IF flag. ) AX=902h: Only returns AL set to the IF flag (0=disabled, 1=enabled). @ At the end of an IRQ handler, put a STI. When the handler is called, flags are automatically disabled. If you do not reenable them, and neither does the IRETD... Well... under some systems it will run and on other it will hang. ----------------------------------------------------------------------------- How the stack is shared between modes: 386POWER uses the same stack for both pmode and real mode. This stack is always located in low memory (always locked under DPMI). The total size of the stack is set as STACKSIZE in 386power.asm. There are other values there called STACKUSER,STACKSWTR & STACKSWTP that needs further explanations. Your program starts running from the _Main label with a stack width of STACKUSER paragraphs, so your program can use up to STACKUSER paragraphs of stack space. When a mode switch occurs FROM PROTECTED MODE TO REAL MODE the new stack is the old stack base minus STACKSWTR paragraphs. (STACKSWTR == STACK SWitch To Real mode) This happens when you execute _ExecINT, _ExecReal or and irq handlers "chains back" to the real mode irq handler. The stack base is the stack location when your program starts. And it is only modified by mode switches. When a mode switch occurs FROM REAL MODE TO PROTECTED MODE the new stack is the old stack base minus STACKSWTP paragraphs. (STACKSWTP == STACK SWitch To Protected mode) This happens under VCPI when an irq handler gets reflected to prot. mode. When a routine "switches back", the stack it used is destroyed and the previous stack is selected again. That is, the whole stack structure uses STACKSIZE paragraphs but when the program is running it can see only a "local stack" (let's call it a slot) of variable width depending on "what is running" (main program or "mode switched stuff"). Into each slot you can safely allocate local variables on the stack referencing them with esp ad ebp without having to worry of mode switches or reflected irqs triggering a mode switch. Every time you switch mode, a new slot is used and when you "get back" to the caller, the caller's slot becomes the current stack again. Let's make an example: Immagine the following "chunks of code" A is the main program B is a real mode routine C is a real mode timer driven interrupt service routine D is a protected mode routine To describe the stack status we will use the following notation: XYZ 123--- a stack structure with six slots and slots 1,2,3 "in use" MRP where 3 is the "active stack" and is in use by the Z subroutine slot 1 is the "MAIN" slot (STACKUSER wide) slot 2 is a "real mode switch" slot (STACKSWTR wide) and slot 3 is "prot mode switch" slot (STACKSWTP wide) 1] A is running A 1----- M 2] A calls B (mode switch) Now B can switch stack if it wants to but the "current slot" from the point of view of 386P is still SLOT 2 AB 12---- MR 3] B calls D (mode switch) As i said, B may have changed its stack, but the mode-switch code doesn't care of this neither has to check what B did it just says "well, slot 2 is in use, better get to slot 3". ABD 123--- MRP 4] routine C triggered by timer interrupt , and the irq handler is in real mode (unexpected mode switch you cannot control) As usual, another switch happens. ABDC 1234-- MRPR 5] interrupt handler returns ABD 123--- MRP 6] D returns AB 12---- MR 7] B returns A 1----- M As you can guess, you'd better keep enough slots available because if too many mode switch "accumulates" the stack structure will break. 386P up to release 1.01 used only one "slot size", now there are three slot sizes: STACKUSER for the main program and the routines in protected mode it calls usually i set it to 16..32kbytes STACKSWTR for every call from protected mode TO real mode. STACKSWTP for every call from real mode TO protected mode. usually is set the "mode switch" stacks to 1..4kbytes Usually the allocation of STACKSWTR/P slots is caused by IRQ and "mode switch" calls so these don't use lots of stack space (256..512 bytes --> 10h .. 20h para) and can be small. Under DPMI things are different but the rule "the less mode switches, the better" is still true. DPMI handles stack switching on its own, any IRQ causes a switch to a totally different stack provided by DPMI (argh!). To summarize: @ In an IRQ handler, DON'T switch off the stack it is entered with. Which is not guaranteed that SS=_seldata. @ Don't do too many nested calls across modes. @ You CAN safely assume SS=_seldata in protected mode only in your main stream of execution (read: not inside an Interrupt Service Routine) @ Consider your maximum effective stack size to be STACKUSER while executing "non mode switched" code. @ You CAN call across modes using INT32/33 from an IRQ handler but remember this takes time (giving IRQs enough time to stockpile if they "accumulate" fast enough ) and if you do it the wrong way, the calls get into an infinite loop. @ You'd better handle IRQs from prot mode only and let 386P do the nasty irq-reflection work for you. @ If you call ms-dos/bios functions from an IRQ handler, remember most of them are NOT reentrant and the actual virtual-machine switches can be different from what you expects. @ remember to use IRETD to return control to the main program IRET is the "16 bit" interrupt return and you are in 32bit mode. ------------------------------------------------------------------------------- Exception handling: Under DPMI are handled entirely by the DPMI host. Under VCPI, exceptions cause immediate termination (maybe in future releases i will include some debug messages when terminating due to an exception). ------------------------------------------------------------------------------ Potential DMA and address mapping problems: As you know, the DMA controllers in the PC use physical addresses. Nothing but the processor itself knows how linear memory is arranged in the physical memory banks. When paging is disabled, the relationship is very simple. The linear address is always the same as the physical address. But when you enable paging, that could get all screwed up. Under VCPI and DPMI paging IS enabled (bad luck, uh?). You can almost definately count on extended memory addresses not being consistent with their physical addresses. Low memory however, will usually map perfectly to its physical addresses. Usually i said, NOT always. The point is that you shouldn't use "raw" DMA under VCPI and DPMI. To handle DMA you should follow the Virtual DMA Specification (VDS). This is the recommended way of handling DMA under VCPI and DPMI. See the Ralph Brown interrupt list for INT 4Bh for more info. I've provided a simple VIRTUAL DMA MANAGER [included into 386P] to handle DMA with or without a Virtual DMA Server. The current release handles any DMA controller up to ISA bus systems i plan to add EISA and PCI support (to get faster DMA transferts) as soon as someone sends me some docs. Handling "just ISA" means it is not possible to do DMA to/from memory above the first 16Mbyte (EISA ans PCI can support up to 4Gbyte of addresses). I hope the current VDS servers included into EMM managers and Windows can "remap" correctly if you try to do dma above the ISA addressing limit. The main reason to use dma is to make sound, i've included the EXPERIMENTAL sources of a virtual dma module and chunks of my sound system "still under devenlopement" and into the dos-extender there is a function to "map-in" physical addresses (i still have to test it because to do so i need a board with ram addresses to map in) (for example: a super vga board with a "linear mapping" addressing mode like the ET4000) ------------------------------------------------------------------------------ And now to discuss some of the finer points of different protected mode environments: VCPI: VCPI is nearly "raw" mode. The CPL is 0, and no op-sys routine can trap into your code if you don't want to. Paging is enabled, but there is no "virtual memory manager" (i.e. a thing that can "swap" blocks of memory to/from disk and handle sharing of address spaces and other thing like that) (read: nothing will try to "move your memory" under your feet) if you want you can mess directly with the page tables (terrific, uh?!?). The problem comes with the way VCPI compatibility works. To call a real mode interrupt or procedure, we have to pass protected mode control back to the VCPI server. This comes out to one thing. IRQs that occur in a real mode call MAY NOT make it to your protected mode handler. It's just the way VCPI works. Under VCPI the real mode interrupt table and the protected mode interrupt table belongs to different Task Segment Descriptors. 386P CAN PLACE an "irq redirector" handler into the real mode task so if the irq happens in real mode and a "new" irq is installed in protected mode, the prot mode handler will be called. Well, it looks perfect, isn't it ? Not so perfect, under VPCI and DPMI there can be MULTIPLE PROTECTED MODE APPLICATIONS RUNNING!!! And the irq redirector works only from the "real mode task" to the "386P task" and back, if the IRQ happens while into another VCPI application you have to hope that "the unknown task" is a friendly one ( one that reflects to real mode the irqs it doesn't know how to handle) and what's more, the VCPI/DPMI server may "look at what's happening" and virtualize things you don't expect to be virtualized. Anyway, worry not! This is the worst case scenario. It's all? Not. There are two other things you have to remember: a) IRQs needs to be serviced as quickly as you can, or you'll get irq overruns and you'll have to send SEOI (specific EOI) codes to tame the beast. If an irq gets reflected from real to protected mode (to let the p. mode handler give a look at it) and then it is reflected back to real mode (to let the ms-dos irq handler give a look at it too) and then you step back where you started .... Well, (BAAAAAD!) it takes FOUR mode-switches to go back and forth!!! b) When you switch, the "mode switch" code does not have to touch the hardware that caused the irq.... this is really hard if the interrupting device was the keyboard controller (that little chip that handles keyboard and other little things INCLUDING THE A20 LINE that every mode-switch has to handle). So forget to reflect directly to protected mode if the irq was from the keyboard controller. To avoid these problems, when you call _SetIRQ under VCPI the 386Power system code "places" in the real mode interrupt table a routine that redirects the irq to the protected mode handler ONLY IF THE IRQ IS NOT one of the "VCPI most wanted" irqs (IRQ 0 and IRQ1, also known in real mode as INT 8 and INT 9) If you do want to handle irq 0/1 in real mode too, you can but you need to set the real mode handler yourself, calling the ms-dos get/set irq functions by way of _ExecINT. N.B. Look into 386timer.asm how to handle directly an irq call in both protected mode and real mode. ANYWAY, when you "restore" the "default" irq handler under protected mode the 386Power system code AUTOMATICALLY restores the "default" real mode handler too!!! ------------------------------------------------------------------------------ DPMI: DPMI is not that bad. It could let you select CPL0 for "trusted" applications. I don't like the overhead imposed by CPL3 (in CPL3, certain instructions have to be emulated by software). Multitasking virtual machines in general are not that hot when you're trying to do a timing critical action game. One really annoying problem with DPMI is that current implementations are far from perfect. QDPMI 1.01 for example, dies when an IRQ occurs in a prot. mode call from a real mode IRQ. DPMI docs say this shouldn't happen, and it doesn't under Windows 3.1 DPMI implementation. The Windows 3.1 DPMI is a little better but has lethal quirks that caused me quite big headaches. Hmm, another little problem is that I'm not sure how many DPMIs out there actually reflect IRQs to real mode if they occur in protected mode. Windows 3.1 seems to send them all over as it should. According to Tran QDPMI 1.01 sends IRQ1, but not IRQ0. And it also doesn't seem to pass IRQs that occur in real mode through their protected mode handlers, while Windows 3.1 does. Under DPMI, the dpmi server should reflect interrupts for you (as happens under Win 3.1). ------------------------------------------------------------------------------ XMS / HARD As i said elsewere, these two "nearly raw" (XMS) and "totally raw" (HARD) "operating modes" are activated when 386P does not detect a protected mode interface. This means "linear" memory addresses are equivalent to physical memory addresses, paging is not active .... AND WE HAVE TO "talk" to hardware and processor directly. This also means that every hardware incompatibility cannot be "masked out" by system software. there is more extended memory available but there are more possibilities of error. There are no exceptions handler (this would require P.I.C. remapping and may break on "new" computer featuring "improved" P.I.C.) and if an exception happens this usually means your system hangs up. ------------------------------------------------------------------------------ Some misc notes: @ Under VCPI, 386Power will map as much extended memory as it can, up to 56M without allowing the page tables to use up more memory than would leave LOWMIN. Allocating up to 56M means that extended memory under VCPI can be at most 55M (even if there is more available) plus up to 4M of "mapped ram" (ram mapped into the address space from a physical device) @ Before exiting your program, you do NOT need to restore any vectors (protected or real mode) nor the typematic rate. And you do not have to restore the IRQ masks at 21h and A1h (PMODE stores them before jumping to _main, and restores them before exiting). BUT if you reprogrammed the irq8/int70h timer interval, set it back to its initial value (the irq0/int8 is automatically set back to 18.2 Hz by the _Exit routine). @ If you're gonna add other 16bit segments, put them in between code32 and codeend. @ Remember that upon reaching '_Main', interrupts are still disabled. Don't forget to do the STI. INTERSEGMENT ACCESS: If you look into 386power.asm you will see that the 16bit initialization code uses the 32bit segment as a data segment, i did it to prevent "automatic grouping" and other "smarts" that current assembler performs that sometime can disrupt what your code is designed to do. Anyway the data accessed by the "little" code16 segment must be the first into the "big" code32 segment so WHEN LINKING, put 386power.obj FIRST on the list of obj files you pass to the linker (this way you are sure the 32bit code in 386power is the nearest to the code16 segment) and check if the segment ordering is SEQUENTIAL (it is possible to turn on alphabetic ordering on some assembler/linkers). if you look into 386P remember: d16_ means DPMI stuff in code16 d32_ means DPMI stuff in code32 v16_ means VCPI stuff in code16 v32_ means VCPI stuff in code32 s16_ means "raw" stuff in code16 s32_ means "raw" stuff in code32 "raw" == shared between VCPI/DPMI xor "dos-extending" code handle it with care, it bytes!!! ------------------------------------------------------------------------------ Heres a list of other vars provided by PMODE to your program: _LoMemBase:dword Low mem base for allocation (first free byte). _LoMemTop:dword Top of low mem (last free byte +1). _HiMemBase:dword High mem base for allocation (first free byte). _HiMemTop:dword Top of high mem (last free byte +1). _PSPBase:dword Absolute linear address of start of PSP. _Code16Base:dword Absolute linear address of start of code16. _Code32Base:dword Absolute linear address of start of code32 (32bit code offset from this). _SelCode:word Code segment selector. _SelData:word Data segment alias selector for code. _SelZero:word Data segment starting at absolute 0. _386Man:byte 386 manager type: 0=VCPI == program runs in CPL0 real mode irqs are not reflected by default to protected mode. Paging enabled, but no virtual memory. 1=DPMI == program runs in CPL3 real mode irqs are always reflected to prot. mode at least the full DPMI 0.9 is available and you can use the int 31h functions to see if DPMI 1.0 is present. Paging enabled, virtual memory enabled (memory pages may be swapped to/from disk) 2=XMS == program runs in CPL0 Initialization i performed using the XMS functions then the program "drives the 386" directly. (still under testing) 3=HARD == Absolutely no extended memory or mode-switch support from ms-dos, the program runs at CPL0 and has total control. (still under testing) _CPUPower:byte Contains the CPU type detected at initialization: 3 = 386 4 = 486sx/dx 5 = Pentium N.B No checks are done about the FLOATING POINT capabilities!!!! _386Return: dword Contains the code32 relative offset to the first message string the _Exit function displays when terminating a 386Power program. The end of the string pointed by _386Return must be marked with '$' because it is written to screen using the dos function int 21h, ah=09h. _386Terminator:byte This is the first byte of the standard "termination" program. If you modified _386Return and want to set it back to the standard message, simply use the following instruction: mov _386Return, offset _386Terminator _GetIRQ:dword A pointer to the get IRQ function appropriate for the mode. The function takes arguments as follows: In: BL == IRQ num (0-0fh) Out: EDX == offset of current IRQ handler _SetIRQ:dword A pointer to the set IRQ function. In: BL == bit 0..5: IRQ num (0-15 are the only allowed values) bit 6: IRQ REFLECTOR STRATEGY 0 = "KILL" REAL MODE IRQ ( only if BIT7 is set and not DPMI) 1 = REFLECT REAL MODE IRQ TO PROT. MODE HANDLER (only if BIT7 is ser and not DPMI) bit 7: ACTIVATE CUSTOM IRQ REFLECTOR STRATEGY 0 = Let the extended mode server handle irqs from real mode as it likes (usually VCPI servers do not reflect irqs) 1 = Enable "direct control" of this irq when it happens in real mode ( "activate" bit6 ) EDX == offset of new IRQ handler to set. _OldInt:table of 256 dwords LOCATED IN THE CODE32 SEGMENT It contains all the original real mode int vectors found at 386P startup. Instead of storing "old" real mode interrupts yourself 386P does it for you. ------------------------------------------------------------------------------ And now some 'functions'. Remember, they ALL need: CS=_SelCode, DS=ES=FS=_Seldata, GS=_SelZero, DF=0 (CLD). _GetMem: Allocate any mem, (first checks low, then high) In: EAX == size requested Out: CF=0 memory allocated CF=1 not enough mem EAX == relative pointer to mem or undefined if not enough _GetLoMem: Allocate some low mem In: EAX == size requested Out: CF=0 memory allocated CF=1 not enough mem EAX == relative pointer to mem or undefined if not enough _GetHiMem: Allocate some high mem In: EAX == size requested Out: CF=0 memory allocated CF=1 not enough mem EAX == linear pointer to mem or undefined if not enough _MapPhysmem: Maps into address space a physical memory block of locations used by a physical device. in: eax=physical address base edx=size in bytes out: if CARRY CLEAR then eax= equivalent code32 relative offset else error, no mapping performed. This function is useful when you need to map into the program's address space the "linear addressable video-ram" provided by some supervga graphics cards. Usually those "memory windows" are enabled by th eprogrammer, so are not visible to the vcpi or dmpi manager, this functions lets you access "new addresses" using the code32 segment. N.B. When running under VCPI the initialization code leaves space for a maximum of 4Mbytes of "new ram" and 56Mbyte of already allocated ram. _GetIRQMask: Get status of IRQ mask bit (at port 21h or A1h) In: BL == IRQ num (0-15) Out: AL == status: 0=enabled, 1=disabled _SetIRQMask: Set status of IRQ mask bit In: BL == IRQ num (0-15) AL == status: 0=enabled, 1=disabled _OnExit: Appends a subroutine BEFORE the current exit code in: EAX= code32 relative offset of routine to call N.B. the "appended" routine must terminate with a RET and preserve all the general porpouse registers EAX..EBP and segment registers and it can only assume CS=_SelCode,DS=_SelData. _Exit: Exit to real mode, restore initial IRQ mask restore timer 0 frequency to 18.2 Hz restore ALL interrupt vectors and get out setting video mode 03 (color text 80x25) if you really need to deal directly with DMA, check into 386power.inc for the "dma support" routines _DMAInit,_DMAInfo,_DMAMap,_DMAUnMap, _DMALock,_DMAUnLock,_DMASend,_DMAReceive ------------------------------------------------------------------------------ PROGRAM SUPPORT I'm not going to go commercial with 386Power and its companion code. Maybe i will go commercial with the games i will produce with it. Anyway i will distribuite the sources of this and ALL the future releases of 386power and its companion code (the XGE stuff). If 386Power doesn't work on your system i'd like to know (so i can fix it) you can reach me at the addresses listed in the end. Call me if you have suggestions about new improvements and things like that. BUT REMEMBER, 386power is NOT TOTALLY FREE (gosh!)(actually is nearly free) See the DISTRIBUTION AGREEMENT LICENSE.DOC file. If you modify 386P (not just cosmetic adjustments of course) you should use a different name (to avoid confusion) BUT you have to include almost the following message "based on the 386Power dos-extender by Lorenzo Micheletto MCHLNZ67T19C890A and on the PMODE dos-extender by Thomas 'Tran' Pytel". What's more if you make a commercial thing with it (NOT a commercial dos-extender library !!) ... send me a free copy :). I won't ask anything more from you. It is not a free lunch ... it is a nearly free lunch. The main reason i coded 386P was for pure fun and get skilled about protected mode programming before coding the real game (if i just wanted a dos-extender i would have bought one from FlashTek). I distribuite it because: (in priority order) a) Needed a wider test base (just my 386 and the 486s' of some friends is a too restricted test base). b) I owe it to all the people who distribuited their sources on the net and let me learn how to handle the pc hardware and more. c) Need to get a reputation as a programmer, sooner or later when i will try to publish/sell my games, it could be useful. BY THE WAY, 386P 2.00 has to be considered a "partial alpha" release because it wasn't my intention to distribuite it, some people asked me about some routines i included into it, so i decided it was a good thing to send "all the package" instead of a bit here and there. ------------------------------------------------------------------------------ Final things: 386Power has been designed with a restricted goal to reach (kick my code into 32bit protected mode :} and be sure that on program termination no bad things will happen). You are free to use it and change it any way you want, but please don't make protected mode viruses or trojan horse, i really hate those losers that think they are smart because they can destroy things. If you think you are smart, design and code a terrific piece of software. And as i said, 386P is limited to VCPI and DPMI (if you call it a limit). If you want a "real" dos-extender capable to handle anything contact the FlashTek guys and buy it or search the latest pmode release by Tran (Thomas Pytel) on internet or get Watcom C++ with the DOS4GW dos-extender or anything else you like. To reach FlashTek try to ask about them in the rec.games.programmer internet newsgroup. To find Tran, do the same or ... ....as far i know on may 1993 he could be reached on Creativity Demo Net or SBCnet as 'Tom Tran' or at the Sound Barrier: (718)979-6629, (718)979-9406 or on Internet: tran@phantom.com If you want to contact me (for bug reports or other things) you can reach me .... On the internet as: knight@maya.dei.unipd.it (I heard calls from Germany may fail sometimes, if you fail to send e-mail from there, retry two or three times, it is a stocastic bug i think) By plain mail at the following address: Lorenzo Micheletto Via Piazza Miega 10/A Veronella (VERONA) ITALY 37040 N.B. Maybe in 1995 i will be reachable only by plain mail ( i will be in the italian army for a year) but after that (february 1996) i will sure be back to Padova WITH A "FULL POWER" 386P 3.00 release that hopefully take care of all the incomplete things in release 2.00 and will give you all you need to make game engines.