SYSTEM64 HARDWARE MAINTENANCE SERVICE v1.210 by Desmond Germans (Simm / Analogue) INTRODUCTION ------------ SYSTEM64 is Analogue's proprietary protected mode system especially taylor-made for demoprogramming. It supports raw, XMS and VCPI (EMM) protected mode as well as simple datafile loading, IRQ handling, heap management and things like that. Together with SYSTEM64 comes DEBUG64 which interactively lets you view your code in action. Currently SYSTEM64 and DEBUG64 are under development and Analogue internal use, so please don't spread it around too much. I don't ask that because my code is supposed to be a secret or something, but because I don't want to put in all the effort to really release this. Also, use this at your own risk. If your computer hangs up or if Windows 95 crashes, DON'T call me, DON'T ask me for a bugfix and DON'T blame it on my code, atleast that is what you agree with when you use this product. The only reason I released this far-from-perfect prerelease is because of many people asking for it. I will maybe make a descent public domain version with the musicplayer and an example demo (maybe even the source of something we released), but right now I don't feel like that. This documentation is not available in German, Dutch, Finnish or Swedish. SYSTEM REQUIREMENTS ------------------- To be able to use SYSTEM64, you need atleast a 386 compatible PC with let's say 4MB of memory. For a demo which is coded with SYSTEM64 you'd most certainly need a Gravis Ultrasound and a fast VGA card, and probably something much better than a 386 PC :) PROGRAMMING STYLE WITH SYSTEM64 ------------------------------- Programming in protected mode is easy... Very easy. For instance, you don't need to worry about segment registers, you got 32-bit register power at your fingertips and most of all, you can access as much memory as you wish at once. However, switching from real mode to protected mode properly is a menace, so SYSTEM64 does that for you. What is important for the programmer is that after initialization of SYSTEM64, the processor is running in CPL0 (Current Priviledge Level 0 - the system is yours) protected mode. The architecture used can be called 'semi-flat' memory model. There is one big code segment called 'pcode' and one big data segment called 'data'. All pointers are near pointers relative to one of these. After initialization, SYSTEM64 will jump to an external label called 'main' which is the starting point of your code. To exit to DOS again, jump to SYSTEM64's label called 'exit'. In a simple overview, this all means: after initialization, the registers are set to: EAX = 00000000 CS = pcode EBX = 00000000 DS = data ECX = 00000000 ES = data EDX = 00000000 FS = data ESI = 00000000 GS = pcode (for SMC) EDI = 00000000 SS = pstack ESP = points to empty stacktop EBP = 00000000 EIP = main WARNING: Because SYSTEM64 uses these segment register settings internally when processing IRQ's and exceptions, it is very hazardous to change any of them !! A MODEL SOURCE FILE ------------------- Here is a source file which can serve as a model for SYSTEM64 programming (MODEL.ASM): .386 locals include s64.ash public main,datafile pcode segment public use32 assume cs:pcode,ds:data main proc near ; your code comes here jmp exit main endp pcode ends data segment public use32 datafile db 0 data ends end USING DATAFILES WITH SYSTEM64 ----------------------------- When you want to use a datafile with your project, change the variable called 'datafile' in 'data' to an ASCIIZ-string containing the name of the datafile. This datafile must be present in the current directory together with the executable. So if you have a file called "dope2.dat" in the current directory, you change the datafile declaration into: datafile db "dope2.dat",0 ASSEMBLING AND LINKING WITH SYSTEM64 ------------------------------------ When you use TASM and TLINK (as I do), you can supply it with several options. Currently I use the following TASM options: /s - source code segment ordering /ml - case sensitivity on all symbols /m2 - 2-pass mode /p - check code for CS overrides in protected mode /q - suppress OBJ records not needed for linkage /zn - no symbolic debugging info And the following TLINK options: /x - no map file /c - case sensitivity on symbols /3 - enable 32-bit records You can put these options in TASM.CFG and TLINK.CFG response files so you don't have to type them all again each time if you are going to use them. If you have a one-file project as the above 2 examples, you can TLINK the objects like this: TLINK S64 ,RUN and then run the project by starting RUN.EXE. Ofcourse more-file projects can be linked the same way like this: TLINK S64 ... ,RUN It is also quite easy to use a makefile for this or some other utility. I don't believe it's possible to link S64 with compiled C or Pascal (certainly not 16- bit C or Pascal) but who knows what wonders are still awaiting the world. SYSTEM64 SERVICES ----------------- Apart from simple initialization, SYSTEM64 also supports some features of a demo-oriented system. Following is a reference list of all service functions supported in SYSTEM64 v1.000: stub This is a routine that does absolutely nothing! exit Exit to DOS. This just switches back to real mode and returns back to the crippled world of DOS. panic Panic to DOS with a message. If something goes wrong in the demo you can call 'panic' with the pointer to a '$'-terminated error message in eax. Note that this error message must reside within 64Kb from the start of 'data' because the error is displayed using a real mode routine. set_exit Sets up a custom exit routine which has to be called just before returning to DOS. This routine will then be called each time an exception occurs, a jump/call to 'panic' occurs or a jump/call to 'exit' occurs. This is useful to turn off annoying soundcard tones or loops just before exceptions or panic situations occur. The exit routine address needs to be in eax. heap_avail Returns the available amount of bytes left on the heap in eax. heap_alloc Allocates eax bytes on the heap and returns a pointer to the allocated region in eax. If there is not enough space on the heap, 'panic' is called with an appropriate error message. heap_mark Marks the current state of the heap in eax. This can be used together with heap_release to allocate and deallocate things on the heap easily. heap_release Releases the current heap state from eax. Everything allocated AFTER 'heap_mark' is deallocated. This is the only way of deallocating something on the heap because the heap is not as sophisticated as heaps in higher programming languages. heap_reset Deallocates EVERYTHING on the heap. set_irq Modifies the IDT to have the appropriate vector point to a new IRQ handler. This IRQ handler must handle interrupt acknowledgement and may not modify the registers and must exit using an IRETD instruction. The IRQ number is in bl and the address of the routine in eax. get_irq Returns a pointer of the current IRQ handler for a certain IRQ. The IRQ number is in bl and the returned pointer is in eax. int_rm Performs a real mode interrupt call. This can be useful in very rare occasions. Note that a true protected mode to real mode switch occurs, so interrupts are set back to their original real mode handlers. This means that when playing music or something like that, the music will stop during the real mode interrupt. The register values for the interrupt are in 'rm_ax'..'rm_es' and the interrupt number is in al. Functions supported in SYSTEM64 v1.001 are: set_color change a VGA DAC palette entry. This function is supported by SYSTEM64 to make the debugger aware of color changes so it can properly show the screen when pressing F5. the color needs to be in EAX like this: EAX = 0BBGGRRXXh where BB is the blue value (0..3F), GG is the green value (0..3F) and RR is the red value (0..3F). XX is the color index number (0..FF). set_palette change the entire VGA DAC palette. ESI points to a 768-byte array of R,G,B-records for each color. These functions were supported to ensure that DEBUG64 sets the colors to the right values each time. Not too elegant, but you wanted a prerelease :) Functions supported in SYSTEM64 v1.203 are: enable_irq hardware enable the IRQ signal for the IRQ in bl. disable_irq hardware disable the IRQ signal for the IRQ in bl. Functions supported in SYSTEM64 v1.205 are: unchain_vga initialize mode X (and instruct debugger if needed) chain_vga deinitialize mode X (and instruct debugger if needed) Functions supported in SYSTEM64 v1.209 are: heap_alloc4 allocate memory block on the heap, dword aligned. Note: chain_vga and unchain_vga are removed in v1.210 because mode X is not for me and it's also not faster than mode 13 if you do what I do with it. (ofcourse nobody says that you can't put them back in :)) Global variables in SYSTEM64 v1.000 are: sys_type a dword indicating the type of protected mode system. 0 = raw, 1 = XMS and 2 = VCPI (EMM). data_base When using datafiles, this points to a memory region (relative to 'data') where the datafile is loaded. psp This points to the PSP (relative to 'data') if you need information from there. rm_al These are the real mode register copies for the rm_ah 'int_rm' function call. rm_ax rm_bl rm_bh rm_bx rm_cl rm_ch rm_cx rm_dl rm_dh rm_dx rm_bp rm_si rm_di rm_ds rm_es rm_data Segment value corresponding to 'data'. Global variables in SYSTEM64 v1.001 are: page_base base of a 64K area which can be used as a VGA double buffer for animation purposes. With animation, write everything you want to page_base and then when you are finished, copy page_base to vid_base to update the screen. vid_base base of the 64K VRAM window at A0000 in system memory. rm_flags this should fix a nasty bug in the int_rm routine, but I'm not sure yet :) Global variables in SYSTEM64 v1.205 are: page_base base of a 256K (!!!) area which can be used as a VGA double buffer for animation purposes. THE HEAP UNDER SYSTEM64 ----------------------- SYSTEM64 supports a very simple heap-like memory management. The SYSTEM64- compliant memory map looks like this: +--------+ <- heap top | | : : FREE HEAP : : | | +--------+ <- heap pointer | USED | | HEAP | +--------+ <- heap base +--------+ | | | DATA | | FILE | | | +--------+ <- 'data_base' +--------+ | PAGE | | BUFFER | +--------+ <- 'page_base' above 1MB .................................................................... +--------+ below 1MB | | | STACK | | | +--------+ <- 'pstack' (ss) | | | DATA | | | +--------+ <- 'data' (ds, es, fs) | | | CODE | | | +--------+ <- 'pcode' (cs, gs) Where all these regions are physically is unknown because of the different protected mode systems. SYSTEM64 calculates all pointers relative to 'data', so no segment register reloading is necesary. Allocating something on the heap with 'heap_alloc' is very simple. Let's say you want to allocate 1MB of data. You put 000100000h in eax and call 'heap_alloc': heap before heap after +--------+ <- heap top +--------+ <- heap top | | | | : : : : FREE FREE HEAP HEAP : : | | : : +--------+ <- heap pointer | | | | +--------+ <- heap pointer + USED + <- returned eax | USED | | HEAP | | HEAP | | | +--------+ <- heap base +--------+ <- heap base Allocating can go on until the heap is full, so a primitive way of saving and restoring the heap state is also supported. Let's say you want to allocate 1MB on the heap and then deallocate it again. Now, you mark the state of the heap in a variable with 'heap_mark' before you allocate anything. This goes as follows: 'heap_mark' returns the current heap pointer in eax which you can store in a variable. 'heap_alloc' allocates the wanted 1MB and returns a pointer in eax. You use the allocated region. 'heap_release' restores the current heap pointer back to it's saved state, in effect deallocating the 1MB region. For demos, it's maybe best to do the following: Allocate some general stuff on the heap, things you are always going to use during the whole demo. Mark the state of the heap. For each event: Allocate event-specific stuff on the heap. Use it. Release the state of the heap with the earlyer marked value. It is not necesary to deliver an empty heap before exiting to DOS because DOS doesn't understand what you're doing with a protected mode heap, and all memory is 'physically' deallocated before exiting to DOS anyway. CUSTOM IRQ HANDLERS UNDER SYSTEM64 ---------------------------------- By default SYSTEM64 has simple empty IRQ handlers for IRQ0,IRQ2..IRQF which only acknowledge the interrupt. IRQ1 triggers a 'keyboard break fault' exception which returns to DOS. This can be considered as the protected mode ctrl-break in SYSTEM64. To use custom IRQ handlers, you can use 'set_irq' and 'get_irq'. 'set_irq' sets up a new IRQ handler for a specific IRQ number and 'get_irq' returns a pointer to the current IRQ handler. A new IRQ handler must do the following things: 1. NOT modify ANY of the registers (use PUSHA/POPA or something) 2. Acknowledge the interrupt by outputting 020h to I/O-port 020h 3. return to normal processing with an IRETD instruction These rules are not SYSTEM64 specific, they're just the way Intel and IBM want their IRQ's to be handled, and IRQ's in SYSTEM64 are mapped DIRECTLY into the IDT (so no safe and slow routine shells around it) :) REAL MODE INTERRUPTS UNDER SYSTEM64 ----------------------------------- Sometimes, you want to call a real mode interrupt. Advice is against this, but there are some situations where a real mode interrupt is easyer than something else (setting a video mode, waiting for a key and you're too lazy to write your own IRQ1 handler, etc.). SYSTEM64 supports this as well. A group of real mode register copies are available in 'data' to be used for this purpose. A simple example which sets the VGA to 320x200 256 color mode: mov [rm_ax],00013h ; real mode ax = 00013h mov al,010h ; interrupt number 010h call int_rm ; do it Actual setting of the VGA to 320x200x256 is quite useless because S64 does that already. SYSTEM64 FUNCTIONS - QUICK REFERENCE ------------------------------------ stub do nothing. in: - out: - exit return to DOS. in: - out: - panic Panic to DOS with message. in: eax = pointer to message out: - set_exit Setup custom exit routine. in: eax = pointer to custom exit routine out: - heap_avail Return free heap size. in: - out: eax = free heap size heap_alloc Allocate heap block. in: eax = size in bytes out: eax = pointer to allocated heap block heap_alloc4 Allocate dword-aligned heap block. in: eax = size in bytes out: eax = pointer to allocated heap block heap_mark Mark heap state. in: - out: eax = current heap state heap_release Release heap state. in: eax = heap state out: - heap_reset Reset heap. in: - out: - set_irq Setup custom IRQ handler. in: eax = pointer to custom IRQ handler bl = IRQ number out: - get_irq Get address of current IRQ handler. in: bl = IRQ number out: eax = pointer to current IRQ handler int_rm Real mode interrupt. in: al = interrupt number rm_?? = real mode register values out: rm_?? = real mode register result values set_color Set VGA DAC palette color. in: eax = color data (bbggrrxx) out: - set_palette Set entire VGA DAC palette. in: esi = pointer to palette array out: - enable_irq Hardware enable IRQ signal. in: bl = IRQ number out: - disable_irq Hardware disable IRQ signal. in: bl = IRQ number out: - USING DEBUG64 ------------- Using DEBUG64 is pretty simple and straightforward. When you want to use DEBUG64, simply link S64D.OBJ instead of S64.OBJ with the project and watch the debugger unfold after running the executable. To be able to do this, DEBUG64.EXE must be in the same directory as the executable. A quick overview of the screen: +-----------+----+ | |regs| | +----+ | code |flgs| | +----+ | |heap| +-----------+----+ | data |stck| +-----------+----+ The keys that can be used here are: F1 = switch data window to BYTE or ASCII BYTE mode F2 = switch data window to WORD mode F3 = switch data window to DWORD mode F4 = enter address to have data window point to F5 = show contents of page_base on full screen F6 = force step over F7 = trace into F8 = step over F9 = run Q = quit to DOS A = about cursor keys = scroll data window From some version number on, you can capture hicolor images by pressing C during the user-screen mode. This only works on S3 chips and actually I never tested or finished it anywhere else but home. Also, it could be that you have a special version with some Pentium messages flying by. Ignore these, they are test by me. Note: The instruction disassembly is far from complete, so don't use the F8 key too bluntly because chances are you'll hit an "unsupported" instruction and desync the debugger with the processor. The program will then run on until it is supposed to end... ...without any debugger information. PRACTICAL USE IN DEMOS ---------------------- I'm currently working on a small demo shell which uses SYSTEM64 as a base (there is actually nothing very difficult to do). This demo shell should do the following: initialize soundcards (GUS preferred) initialize music player initialize user shit start playing music do all the events stop playing music deinitialize user shit deinitialize music player deinitialize soundcards For that I have made XMGUS.ASM and TIMER.ASM which respectively control the GUS with the playing of an XM soundtrack and IRQ0 to process the playing of this music. XMGUS and TIMER will be released with the public domain version, so wait for that or write your own stuff. For reference, watch the TEST.ASM program. Because all the demodata is loaded before the demo is started, it is not necesary to use disk access. I thought of the following scheme for doing the events. For each event, do: initialize the timing routine (see below) wait until event has to start by checking music_state in XMGUS do the event loop: update values from the timing routine process the data and build the screen see if the event has to stop by checking music_state deinitialize the timing routine Each event can have its own timing routine which updates special volatile event-specific variables each 1/70th of a second. The event loop must copy these in order to use them. To process the data and build the screen, I am used to: clear page_base (or not if you want some nice effects with it :) ) draw your stuff on page_base wait for the vertical retrace copy page_base to vid_base to update the VRAM Again, these are just what I do, so if you have better ideas, use them instead. PROTECTED MODE PROGRAMMING TIPS ------------------------------- Protected mode optimization is a little different that real mode or Motorola 32-bit optimization. Below are a set of small tips to improve performance of your low-level code. All these tips help a little, but you have to understand that when coding an algorithm, slowness is mainly caused by stupid design of the algorithm itself than using slow instruction passages. Also, when developing some algorithm that you have trouble understanding, first try it out in some higher level language. This might seem like a big lump of useless time, but you'll see that it works much faster than when you directly code it in assembly and sit 4 weeks at your workstation trying to find out why it doesn't work. * Use either 32-bit or 8-bit registers where possible. Native mode Intel Protected mode simply swaps all word-opcodes with dword-opcodes. To use a word (which sounds faster because it's 16 bits), you need to tell the processor to switch to Compatible mode (8088, real mode, V86 mode or whatever). This is done by a prefix-byte called OPSIZE: which switches the way the processor interprets the operands. Using this prefix will COST you a cycle rather then GIVING you one. Example: mov ax,[eax] codes as: 66 89 00 2 cycles mov eax,[eax] codes as: 89 00 1 cycle * Use Native mode memory operands wherever possible. The same thing as above applies to memory locations. If you use [di] or [bx + si] or some other familiar 8088 address, the processor needs to briefly switch to Compatible mode to interpret the address operand. This also costs a cycle. Example: mov eax,[bx] codes as: 67 89 07 2 cycles mov eax,[ebx] codes as: 89 03 1 cycle mov ax,[bx] codes as: 66 67 89 07 3 cycles !!! * Align your data to a 4-byte boundary on speed-critical routines. * Use DEC/JNZ rather than LOOP. * Use MOV/INC rather than MOVSx/STOSx/LODSx. Actually, use only MOV/MOV instead of MOVSx. * Avoid AGI's (Address Generation Interlocks) by putting instructions that have nothing to do with eachother after eachother. Uhm, example: shld ebx,ecx,8 mov bh,ch ; depends on last instruction add ebx,[boing] ; depends on last instruction mov al,[ebx] ; depends on last instruction mov [edi],al ; depends on last instruction inc edi ; depends on last instruction add ecx,edx dec ecx jnz @@ome_willem ; depends on last instruction better is: shld ebx,ecx,8 mov bh,ch ; depends on last instruction add ecx,edx add ebx,[boing] mov al,[ebx] ; depends on last instruction mov [edi],al ; depends on last instruction inc edi ; depends on last instruction dec ecx jnz @@one_willem ; depends on last instruction * Use Self Modifying Code to save registers because registers on the Intel processors are not as general-purpose as everyone thinks. So instead of: : add ebx,ecx add edx,ebp inc edi dec esi jnz @@bolke_de_beer you can self-modify the adders for eax and edx, making ecx and ebp available again: add ebx,???????? add edx,???????? inc edx dec esi jnz @@bolke_de_beer You'll need an i86 data sheet with the instruction format for these things. * Unroll loops when possible. It is often faster to make a jump-table and an unrolled loop instead of a real loop. This saves registers in many cases as it does cycles very much (any type of jump is a menace, even for the P5). So instead of: @@next: mov [edi],al inc edi dec ecx jnz @@next Do: jmp [jump_table + ecx * 4] : : mov [edi + ????],al mov [edi + ????],al mov [edi + ????],al : : PROBLEMS -------- If there are any problems with SYSTEM64, any undocumented features (bugs) or any north-Finnish girls who desperately want to get to know a lonely coder cowboy, don't hesitate to contact me ASAP using one of the following means: snail: Desmond Germans Jaagweg 15 1452 PB Ilpendam The Netherlands phone: (+31)-(0)20-4363364 (between 18:00 and 22:00 CET) email: germans@cs.vu.nl, npgerman@nat.vu.nl, nigerman@nat.vu.nl irc: 'Simm' on #coders, #nlcoders, #3dcoders, #finland, #lapland, #metsa and #analogue.