-================================================================- "Poor Man's Guide to Anti-Debugging on the Intel 80x86 Family" (release 1.00) (Mostly ;) Copyright 1997 (C) MHK Written and compiled by MHK ^^^^^^^^^^^^^^^^^^^^^^^^^^^ -================================================================- NOTE: This document was inspired by and is mostly based on the article 'Anti Debugging Tricks' written by _Inbar Raz_ (with assistance from _Eden Shochat_ and _Yossi Gottlieb_). I wish to thank them, especially Mr. Raz, who generously permitted me to add the tricks from his own article as a part of this document! About this document ------------------- In my article I intend to concentrate on introducing you deeper into the basis of anti-debugging and presenting you with some _ideas_ rather than concrete code examples. I will also try to explain things in a detailed way and make everything easy to understand. However, some assembler knowledge is still required. The information in this document has been gathered from various other documents and books, as donations from people all over the world, by experimenting, examining code execution of several "armored" programs, and plainly as ideas (in case of tricks). Anyway, let's just hope this document will give you new, fresh ideas even if you are a pro in the subject. Be creative and use your imagination! DISCLAIMER ========== I cannot guarantee that _any_ of the information contained within this document is accurate, functional or otherwise suitable for any purpose other than just reading for pure happy-happy-joy-joy. :) If you decide to use the information for your own purposes, you will do it AT YOUR OWN RISK. If you run into any trouble, undesired program behaviour occurs or in the worst case, any data is lost due to experimenting with the information offered by this document, you're on your own. So don't bother complaining about your screwed-up hard disk, if you should happen to produce one. So much for legal issues. Now, feel free to read on... IMPORTANT NOTES: ---------------- 1) All of the numbers used in conjunction with assembler instructions or as register values are in hexadecimal, others are decimal numbers unless otherwise stated. To clarify some spots, an extra 'h' has been added to indicate that the value is in hex. 2) The code examples are written in kind of a "pseudo-assembler". It is some sort of a mixture between the code Debug accepts and the one, which assemblers require. If entered into DOS Debug, code examples should almost certainly be accepted (if no other than 8086 instructions are used). Nevertheless, example code has also been included as bytes corresponding to the instructions. 3) Although each section should be self-explanatory enough to understand it without a need to browse others through, I advise you to read the _whole_ document from the beginning to the end referring to the explanations section (appendix A) when needed. 4) There may be some inaccuracies in the text due to my misunderstandings (sorry!), incorrect information, or processor development (ie. new CPUs have things implemented in a slightly different way). I have marked some spots with (*)'s to indicate info that I'm not 100% sure of and that they need to be confirmed. I need help with correcting errors and if you see something here that definitely is wrong, please do contact me! Also, you may notice this document is a bit sketchy. I may have taken a bit too large a bite when trying to include all the "might-be-useful" info here... 5) ...and then a message to people who just want to criticize others' work without ever accomplishing anything themselves: This document has required a lot of research with debuggers (mainly finding out bugs in their routines ;) and the way how a 80x86 processor's debugging support works. All of the tricks have been thoroughly tested and are fully working on an "IBM PC compatible" (only those requiring a PC/XT haven't been verified, and some tricks using system-specific features, such as some I/O ports, may not work on PS/2-series machines). So, in case you feel like complaining about this document not containing enough methods which can trick a 386 debugger, as _some_ people do, _you_ invent a trick not included here and send it to me with the complaint! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - INDEX OF CONTENTS ================= SECTION 1: GENERAL STUFF ---------------- 1.1 - Introduction to anti-debugging 1.2 - Protecting your programs with anti-debugging code 1.3 - Armored programs and virii SECTION 2: DEBUGGING ------------ 2.1 - Intel processor debugging support 2.1.1 - 8086 debugging capabilities 2.1.2 - 80286 debugging capabilities 2.1.3 - 80386 debugging capabilities 2.1.4 - Intel Pentium debugging capabilities 2.2 - How a debugger works 2.2.1 - Real Mode debuggers 2.2.2 - Protected Mode (VM86) debuggers 2.3 - Debugger info 2.3.1 - Miscellaneous info: Soft-ICE 2.4 - Hardware debuggers 2.4.1 - PC cards 2.4.2 - Action Replay PC card, a useless toy SECTION 3: INVENTING NEW TRICKS ----------------------- 3.1 - Things that can be assumed of normal code execution 3.2 - Requirements for a new trick 3.3 - Hints and tips 3.3.1 - Exceptions and interrupts 3.3.2 - Prefetch Instruction Queue (PIQ) 3.3.3 - Special EFLAGS register flags 3.3.4 - Special software interrupts 3.3.5 - "Double-stepping" 3.3.6 - Intel 8259, Programmable Interrupt Controller (PIC) 3.3.7 - Common Interrupt Request lines used by a PC machine 3.3.8 - Back door commands in Soft-ICE! SECTION 4: ANTI-DEBUGGING TRICK IDEAS AND EXAMPLE CODE ---------------------------------------------- 4.1 - General ideas 4.1.1 - Causing CPU to execute two instructions at a time 4.1.2 - Hiding "real" code from the user 4.1.3 - Assuming that a certain condition exists 4.1.4 - Loopy loops... 4.2 - Standard tricks 4.2.1 - Modifying interrupt vectors 4.2.2 - Masking out hardware interrupts 4.2.3 - Reprogramming Intel 8259 Programmable Int. Controller 4.2.4 - Disabling keyboard 4.2.5 - Forcing a debugger stop execution 4.2.6 - Using regular INT nn -style INT 3 calls 4.2.7 - Checking FLAGS 4.2.8 - Modifying original interrupt handler routine 4.2.9 - Foiling 'Step Over'/'Proceed' debugger commands, 1 of 2 4.2.10 - Faking a procedure call 4.2.11 - Comparing INT 01 and 3 interrupt table entries 4.2.12 - Using stack to fool a debugger 4.2.13 - Generating a General Protection Fault or a Stack Fault 4.2.14 - Exploiting rapidly changing memory areas 4.2.15 - Storing data in the interrupt table area 4.3 - Special tricks 4.3.1 - Jumping to a location within an instruction 4.3.2 - Exploiting Turbo Debugger's weak point 4.3.3 - Fooling TD386 Virtual-86 Driver 4.3.4 - Using INT 01's to make Soft-ICE gag 4.3.5 - Using self-tracing to fool Soft-ICE 4.3.6 - Screwing up Soft-ICE with back door commands 4.3.7 - Unloading Soft-ICE! 4.3.8 - Cause Soft-ICE to abort program 4.4 - Self-modifying code 4.4.1 - Simple self-modification 4.4.2 - Foiling 'Step Over'/'Proceed' debugger commands, 2 of 2 4.4.3 - Playing with Prefetch Instruction Queue (PIQ) 4.4.4 - Code encryption 4.4.5 - Hooking a decryption routine to an interrupt 4.4.6 - The Running Line 4.5 - Checksum generators 4.5.1 - Sum of bytes 4.5.2 - Number of bits 4.5.3 - Multiplication and division 4.5.4 - Calculating CRC-16 and CRC-32 4.6 - Simple code encryptors 4.6.1 - XOR en-/decryption 4.6.2 - NOT en-/decryption 4.6.3 - Bitwise rotation 4.6.4 - NEG en-/decryption 4.6.5 - Basic arithmetic operations as en-/decryption algorithms 4.6.6 - En-/decryption using translation tables 4.6.7 - Scrambling original byte order 4.7 - Polymorphic encryptors SECTION 5: APPENDICE ------------ APPENDIX A: Explanations/Glossary APPENDIX B: Suggested reading for info APPENDIX C: Useful E-mail addresses APPENDIX D: Useful Internet sites ]=============================================================================[ SECTION 1: GENERAL STUFF ======================== NOTE: This section is good reading for both beginners and the average. 1.1 - Introduction to anti-debugging ------------------------------------ You probably know what debuggers are, so... What? Never heard of them?!? OK, if you're running DOS, you should already have a debugger called DEBUG.EXE which provides you with the simplest debugging tools (yet very ineffective and quite difficult to use). The operation called "debugging" originally had the meaning to "remove software programming errors" but nowadays it could also mean just to "examine code". It involves playing with breakpoints, tracing program code, and otherwise examining code execution in order to find and remove bugs from the source. (for explanations of the terms see Appendix A, the 'Explanations/Glossary' section at the bottom of this document) In order to make it harder for others to examine how the code works, programmers have developed methods called anti-debugging tricks for their own programs. Having finished a year's project only to find out the next day that your masterpiece's copyprotection scheme has been cracked is quite frustrating, isn't it? Programs protected with anti-debugging code are often called "armored" programs. Most often this technology is used in intros, small utilities like cracks, and even virii, but there's no limit on the type or size of programs which could be protected, as long as DOS is concerned. 1.2 - Protecting your programs with anti-debugging code ------------------------------------------------------- Anti-debugging code is quite an effective way of protecting programs from prying eyes. In addition to preventing unauthorized debugging of a program, anti-debugging tricks can also be used to make program disassembler utilities and generic executable file decompressors useless, for example. Beginners, who want to learn your coding techniques, often are unable to find a way to bypass the tricky anti-debugging code. Do NOT, however, rely on this kind of protection because _any_ protection implemented with only software is NOT secure! Such a protection scheme just raises "the edge" and it is _only_ a matter of time when a skilled coder/cracker finds a way to defeat it. There are some dedicated utilities to encrypt an executable and add anti-debugging code to it for you, such as Protect! EXE/COM and HackStop, but as good as they are, generic deprotectors for them are out already. It's just better to do it by hand and give the cracker something to think about... Usually debugger traps are located at the beginning of the armored program but one should also be aware that the traps could be even more powerful when put all over the program at random locations or mixed with a specific piece of code you wish to protect. Why? Well, isn't it slower and harder to trace the code, while at the same time you have to worry about the debugger hanging at any minute, don't you think? Multiple different tricks should be combined to make their removal more difficult. Also, remember to fit the tricks suit your own purposes. That is, you should make the anti-debugging code an integral part of your program, so that it won't work without the code. Neither should the code be allowed to be bypassed by simply jumping to an address after the end of the code, to the start of the actual program. Therefore, you should make the code jump around a bit and put the program itself surrounded by anti-debugging code, for example, so that determining the actual program start address would not be an easy task. Anyway, doing this may be futile if code hasn't otherwise been protected. What I mean is that you should consider encrypting your code and booby-trap the decryption routine. 1.3 - Armored programs and virii -------------------------------- Because the same methods are widely used in virii to make disassembly and analysis more difficult, some virus scanners with heuristic capabilities may report that the executable file has been "armored" against analysis. Good examples of such scanners are F-Prot, which will claim that an "armoured" program has been found in the case, and TBAV (Thunderbyte Anti-Virus), which uses several "heuristic flags" to indicate anti-debugging code among other tricks virii use. No need to panic, however, most of the time the files found armored are just protected programs, _not_ new virii! But still be cautious if unusually many warning messages start appearing at those programs you frequently run... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SECTION 2: DEBUGGING ==================== NOTE: This section aims at explaining extensively how debugging is made possible. I recommend this section for everyone interested in knowing how it all works, but also for anyone trying to invent new anti-debugging tricks, since the basis of _anti-debugging_ is in _debugging_ (self-evident ;)! 2.1 - Intel processor debugging support --------------------------------------- This section covers all the features of an Intel 80x86 family processor useful to debuggers. Note though, that 80186/80188 and 80486 processors aren't included since no significant enhancements were made. 2.1.1 - 8086 debugging capabilities In the early 8086/8088 processors there was only one hardware supported debugging feature: Trap Enable Flag (TF) aka. Single-Step Flag, bit 8 of FLAGS register. When this bit is set, after executing the following instruction the processor generates exception #1 (or to be precise, if this bit is found set at the end of an instruction exception #1 will be generated), which in practice will execute the code pointed to by INT 01 vector (single-step interrupt). Note though, that setting or clearing the TF bit will have _no_ effect on the instruction that changed it, the first instruction to be affected will be the instruction _following_ the instruction that changed the TF bit status. The first generation 8086 processors also had (and still have) a single-byte breakpoint interrupt, INT 3 (opcode 0CCh), supporting software code debugging. The purpose of this instruction was that a debugger would point INT 03 vector to itself and then could insert this small interrupt instruction in the program at a desired breakpoint location in order to regain control of program execution at that point. However, the hardware breakpoint support of 80386 processors has made this instruction obsolete on newer processors but it still is there. 2.1.2 - 80286 debugging capabilities This generation of x86 processors doesn't have many new features for debuggers to benefit from. Protected Mode was first introduced with 286 processors but since execution of instructions with Real Mode defaults isn't directly supported, no memory protection can be achieved. However, switching between Real and Protected Mode is possible. (*) Need some info on what this allows a debugger to do! Also I/O Privilege Level (IOPL) bits in EFLAGS was new but its purpose was not exactly the same as in 386's. Someone tell me what it was originally for... 2.1.3 - 80386 debugging capabilities New Intel 80x86 family processors starting from 386's have far more advanced support for hardware debugging and task protection thus enabling debuggers to gain a very strict control over code execution: up to four hardware breakpoints, privilege levels to perform I/O, lots of memory protection capabilities, etc. ... Here are only those features useful to DOS debuggers, though. 1) Hardware breakpoints: So, now that the processor has four Debug Registers (DR0-DR3) reserved for user defined breakpoints, software based INT 3 instructions aren't needed any more messing up the code being debugged. Hardware capabilities include breakpoint on instruction execution, data R/W or just write and breakpoint length can be up to four bytes, all of these can be set in the Debug Control Register (DR7). Breakpoints are checked for at instruction boundaries, generating exception #1 either before executing the instruction at breakpoint (as a fault) in case of a code execution breakpoint, or right after the instruction that accessed memory (as a trap) at a data breakpoint. The reason that caused the occurrence of an exception #1 will be saved in the Debug Status Register (DR6). It is also possible to lock out _any_ access to Debug Registers (even in Real Mode and at Privilege Level of 0) and make the CPU generate exception #1 as fault upon a read or write access by setting the Global Debug Register access detect (GD) bit in DR7. When exception #1 handler is invoked, the GD bit is cleared to allow the routine full access to Debug Registers. Hardware breakpoints have several advantages over putting INT 3's in code, and one of them is that they can be placed even in ROM memory. It must be noted, though, that all hardware breakpoints generate exception #1, _not_ exception #3 as one could suppose! 2) I/O Privilege Level (IOPL): In EFLAGS register (bits 12-13) you can also define the maximum Privilege Level value (the level of least privilege) at which a task will be permitted to perform I/O instructions such as IN, OUT, etc. in Protected Mode without generating exception #13 or consulting the I/O Permission Bitmap (the bitmap can be used to disallow I/O to each of the I/O ports separately from a non-privileged task (ie. whose Current Privilege Level (CPL) value is higher than 0)). This is the reason why trying to disable the keyboard, for example, won't work in 386 debuggers. Any decent debugger would always intercept direct access to certain I/O ports. 3) Memory protection and Virtual 8086 Mode: Since 386 debuggers run in Protected Mode instead of Real Mode, they can use memory protection to isolate the memory used by the debugger from the memory the debugged program has access to. To simulate a Real Mode environment these Protected Mode debuggers use Virtual 8086 Mode (VM86). Virtual 8086 Mode is not, however, entirely compatible with Real Mode since a VM86 task's Privilege Level is 3 (the least privileged) and Real Mode's implicitly 0 (the most privileged). Therefore Virtual 8086 Mode is subject to all of the protection schemes the CPU uses: no execution of so called "privileged instructions" that require a PL of 0, some instructions that are IOPL sensitive, etc. Because of the many protections that are available in Virtual 8086 Mode, exception #13 (General Protection Fault) is generated when privilege level permissions have been exceeded. This happens often with actions that normally in Real Mode wouldn't cause any problems. 2.1.4 - Intel Pentium debugging capabilities To extend 80x86 family processors' debugging capabilities even further, Intel has equipped Pentiums with a super-debugging mode called the Probe Mode. This processor operating mode is a very powerful companion for any debugger. It allows a user to suspend code execution and bus activity (a complete system freeze) at any given moment, enter the Probe Mode, modify _anything_ in the system, and finally return to code execution with the modifications made as if nothing had happened. This includes, but is not limited to, changing the contents of all registers, memory and I/O ports. The mode can only be triggered by _hardware_, therefore it is impossible to bypass it with _any_ software. However, because of this using Probe Mode would require special motherboard design to support the mode or to allow such controlling hardware to be added. [* need more precise information to fill in *] 2.2 - How a debugger works -------------------------- Even though this is a document on _anti-debugging_, we _must_ know the enemy we are trying to fight. :) There are two types of debuggers available nowadays: older 8086 debuggers (Microsoft Debug, Borland Turbo Debugger, etc.), that cannot be considered very effective any more, and way more powerful 80386 hardware-assisted debuggers (Nu-Mega Soft-ICE, Borland Turbo Debugger 386, etc.) that are recommendable for any task more demanding than just experimenting with debuggers. 2.2.1 - Real Mode debuggers The older generation of debuggers that operate in Real Mode are based on the single-stepping capability of the 8086 (and better) processor, and the single-byte software breakpoint interrupt, INT 3 (opcode 0CCh). These debuggers, when single-stepping, set the Trap Enable Flag (TF, bit 8 of FLAGS) of a flag register image in the debugger's stack. Having done this the debugger will transfer control to the program being debugged with an IRET instruction (Return from Interrupt) thus loading the flags image with the TF bit set into the register. Now, only one instruction of the user program will be executed before exception #1 occurs. This exception calls the interrupt routine pointed to by INT 01 vector, which the debugger has modified to point to itself. The INT 3 software breakpoint is used in a bit similar fashion: INT 3 vector points to the debugger and planting these instructions at every desired breakpoint location will jump back to the debugger when the user program execution has reached any of these locations. However, a weakness of using INT 3's as breakpoints is that the single-byte instruction has to be physically there thus messing up the program code. This prevents programs, that for example use a self-check, from running, and also makes it impossible to debug ROM code. There are two types of single-stepping available in debuggers. The first one is 'Trace into Calls/Interrupts' which will execute code instruction by instruction just as the CPU does. Trap Enable Flag is used to (T)race into CALLs, but since interrupts disable the TF bit after PUSHing (E)FLAGS into stack, an INT 3 breakpoint is placed at the location where the interrupt vector points to. The second method is to 'Step over Calls' ('Proceed') which won't trace into any calls or interrupts but rather "steps over" each CALL and INT instruction invisibly executing the sub-routine. Other such instructions that will be stepped over are LOOPs and the ones with a REP prefix, to name some. When (P)roceeding, an INT 3 breakpoint is placed immediately after one of these instructions and the TF bit is cleared to run code until the INT 3. It would also be possible to implement this simply by ignoring the INT 01 calls caused by the Single-Step Trap but I don't think any debugger does it. Some problems may arise, though, such as executing the following instruction as well in case of an INT. Real Mode debuggers always use the same stack as the user program and thus single-stepping always modifies the program's stack, too, but it may be possible to use special tricks to retain stack in some cases. When single-stepping, the debugger could save a minimum of three words for data destroyed by an INT 01 call plus some reserves for possible PUSHes. Also, any instruction modifying the first three unused words in stack area should also be taken into account just to make sure things won't get too easy. Anyhow, trying to retain stack with a Real Mode debugger takes a lot more effort than using traditional methods. The best would be a code execution simulator which doesn't actually execute instructions but rather emulates them. But... well, who would care to write such a complex program anyway... Real Mode debuggers may, in addition to INT 01 and INT 3 vectors, also grab INT 00 (Divide Error), INT 02 (Non-Maskable Interrupt) and some other interrupts (available only in DOS), such as INT 09 (Keyboard) to allow breaking out of code back to the debugger, and INTs 20 (Program Terminate) and 21 (DOS Function Request Interrupt) to detect termination of user program. Most such debuggers (also 386 debuggers) expand their limited capabilities by checking the instruction it will next execute in advance. A debugger only needs to add an additional, user-invisible routine before executing the next instruction (when single-stepping). Especially Real Mode debuggers use this method to offer memory access breakpoints among other things, but it also makes it possible to, for example, deny I/O and possible conflicts by modifying the code just a bit. For instance, when single-stepping, Turbo Debugger detects and redirects INT 3's (opcode 0CCh) to itself by directly jumping to the breakpoint handler routine without actually executing an INT 3, and PUSHF instructions, when single-stepping, to mask off the TF bit from the FLAGS image pushed into stack. However, it will slow down code execution quite a bit, so not many features are emulated this way. 2.2.2 - Protected Mode (VM86) debuggers The second generation of debuggers, known as 386 hardware debuggers, run in Protected Mode. A 386 hardware debugger utilizes Virtual 8086 Mode to execute the user program in a simulated Real Mode environment while still gaining all the benefits from memory and I/O protection. These debuggers use the same method for single-stepping as 8086 debuggers. Alternatively a 386 debugger could also accomplish single-stepping by defining a hardware (execution) breakpoint to the address just after the next instruction to be executed, I just don't know if any of them does it this way. Breakpoints, on the other hand, are implemented by entering the 32-bit linear addresses of desired breakpoints into the four Debug Registers (DR0-DR3) of a 386 or better processor, selecting breakpoint access types and lengths, and enabling them, each of them separately, in the Debug Control Register (DR7). When the processor determines that CS:EIP matches one of the specified code execution breakpoints or that a memory access to a data breakpoint range takes place, it will generate exception #1. As no user program code modifications are needed, using hardware breakpoints makes debugger behaviour more reliable and anti-debugging harder. Hardware debuggers can also deny all access to I/O ports they consider dangerous to manipulate. This is done by setting the appropriate bits corresponding to the I/O ports in I/O Permission Bitmap. After this, all access to the ports defined from a VM86 task will result in generating exception #13. Note that in Virtual 8086 Mode the bitmap will be consulted in case of an I/O instruction regardless of the IOPL bits, as in Protected Mode! Since a 386 debugger operates in Protected Mode and the user program in Virtual 8086 Mode, exception #1 in Virtual 8086 Mode causes a task switch to a higher privilege task. In such a case, the new stack segment SS and pointer ESP of the Protected Mode debugger are first loaded from the VM86 task's Task State Segment (TSS), next GS, FS, DS and ES segment registers as 32-bit quantities and the old stack SS:ESP pointer are pushed onto the _new_ stack, not the user program's, along with the regular info any interrupt call would store on stack. Therefore single-stepping will not modify the user program's stack unlike with Real Mode debuggers. Though 386 debuggers are much more powerful and reliable than any 8086 debugger, they also have disadvantages. Since this kind of debuggers need Protected Mode to run their own code so that they could use the processor's additional protection mechanisms and Virtual 8086 Mode, another Protected Mode program cannot coexist. Examples of such programs are software EMS memory managers (emulators) like EMM386, QEMM386, etc. (they need to be executed in Protected Mode in order to use the 386 paging system to switch EMS pages in page frame). After loading one such driver DOS will be running in Virtual 8086 Mode! Since in Protected Mode Interrupt Descriptor Table (IDT) is used instead of an interrupt table to vector interrupts to their service routines in VM86, all exceptions and interrupts are first vectored through the debugger itself if it has taken over the interrupt. Though all interrupts must pass through the debugger to reroute them to their respective VM86 routines, 386 debuggers often point many exceptions to their own routines to get total control over a user program interrupt request or caused exception first. INT 01 is always vectored to a debugger routine since both hardware breakpoints and single-stepping cause it to occur. INT 3 may also be pointed to the debugger as a breakpoint, but it should be possible to set its breakpoint function off in the debugger. It is only needed to support more breakpoints than four. Since exception #13 is a common exception in VM86 for many reasons, especially because all interrupt calls and IOPL-sensitive instructions cause this exception if I/O Privilege Level value indicated by the IOPL bits in EFLAGS register is less than 3 (the Privilege Level of a VM86 task), it will also launch an error/V86 monitor handler routine in the debugger. Some other exceptions may also be mapped to a 386 debugger, especially those reporting a "medium" or severe error. In cases of a severe error, the debugger may ask the user whether to continue user program execution by giving control to VM86 interrupt routines (!!!) or to stop execution and return to the debugger. 2.3 - Debugger info ------------------- [* no info available, will be covered soon *] 2.3.1 - Miscellaneous info: Soft-ICE As funny as it is, Soft-ICE uses primarily INT 3 breakpoints as execution breakpoints (BPX) unless the breakpoint is in ROM! Hardware breakpoints are only used by memory breakpoints set on execution (BPM X). When INT 3 is called, Soft-ICE checks the request in the following order: back door commands (SI and DI set to the "magic values") have the highest priority, then enabled Soft-ICE INT 3 breakpoints (BPX) are checked for a match (this is where 'ACTION' takes place), after these it's handled as a normal INT 3 call in program code (this is where 'I3HERE' affects further execution) and if 'I3HERE' is OFF, the VM86 task INT 3 handler is launched. If a Soft-ICE back door command is used, even a hardware breakpoint set immediately after the INT 3 instruction that triggered the back door will not stop execution (any back door command starts normal execution after returning). Hardware breakpoints will be regarded after executing one instruction after the back door. Soft-ICE hardware breakpoints act oddly: it seems as if they aren't entered in Debug Registers at the same time. If multiple memory breakpoints are set, they may not all function. Need more info on this peculiarity! 2.4 - Hardware debuggers ------------------------ 2.4.1 - PC cards There are also hardware debugger _cards_ available but because they're only used by professionals (costly goodies...), not "normal" coders or people programming just for fun, I will _not_ cover the topic unless someone sends me extensive documentation on those cards (the basics how such cards generally work and more detailed, card-specific hardware data. Especially information on Intel In-Circuit Emulator/Debugger (ICE/ICD) boards/modules would be useful!) Such an average programmer would just rely on good _software-based_ debuggers (not meaning 8086 debuggers here!) like Soft-ICE. The cards shouldn't be any threat anyway since they are quite rare... 2.4.2 - Action Replay PC card, a useless toy (based on Werner Zsolt's article 'The Action Replay card for the PC') Remember the freezer module for Commodore machines? Some years ago it finally made it to PC users, but unfortunately it is no good for serious cracking or debugging. It cannot work under Windows 3.x Enhanced Mode, Windows 95 or any other 32-bit environment for that matter, and lacks support for unofficial display modes with different screen refresh rates. Those among other things, which render it just a useless toy! The card's slowdown function doesn't work at all as one would expect. It slows down the machine in a similar manner as all those small utilities available, which means operation is sluggish. Because of slowing down depends on a working timer (ie. IRQ 0 not masked and external interrupts enabled), in the handler routine of which will be put some extra loops or similar CPU power consuming stuff, the function will not work properly most of the time. Not to mention the limited and unreliable interrupt tracking option which kind of works the same way, missing interrupt calls when lots of them occur in a short length of time, that is. _If_ the card had Protected Mode drivers for DOS and other systems, it might have been somewhat more useful. And now to the part that interests us: The card is uses interrupts 0-3 to control code execution (just like normal debuggers). To function, the card also needs software drivers, that contain the required interrupt handler routines. (*) This makes it vulnerable to anti-debugging code designed against 8086 debuggers since in Real Mode or Virtual 8086 Mode, under which the drivers _only_ work, it is no problem at all changing the vectors point elsewhere. After this the card, although still sitting there with TSRs loaded, would be as useless as without its drivers... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SECTION 3: INVENTING NEW TRICKS =============================== NOTE: This section is recommended reading even for experienced coders. However, if you only want to use existing tricks listed in Section 4 there's no need to check out this one. 3.1 - Things that can be assumed of normal code execution --------------------------------------------------------- -It is assumed that INT 01 (Single-Step Trap/Hardware Breakpoint) and INT 03 (Software Breakpoint) interrupts never occur under normal conditions. They are _only_ used by debuggers (also by few other programs, but they only use these interrupts internally thus never causing any conflicts). -The Trap Enable Flag (TF, bit 8 of FLAGS) should never be set in normal code execution. -One must assume that program code is executed instruction after instruction without any additional interrupts between program instructions. (not counting external hardware interrupts which can be disabled with a CLI) -The same I/O ports can be accessed by the anti-debugging code as by the program it is attached to, too. 3.2 - Requirements for a new trick ---------------------------------- In order for a trick to be any effective or useful, the following requirements must be met: 3.2.1 - Anti-debugging code must not be easy to defeat. The code must not be allowed to be bypassed only by overwriting the code with some NOPs (No Operation instructions) or by directly jumping to an address after the code. Code that is this easy to defeat isn't worth the few bytes it takes... :) 3.2.2 - Anti-debugging code must run without causing any (or very rarely) problems when executing normally. Since the impact of anti-debugging code must be on debuggers, that (usually) execute your code a single instruction at a time and pop up between instructions (interrupt execution), the added protection code must be invisible to user when executing a protected program under normal conditions. 3.2.3 - Anti-debugging code must be compatible with 8086's, present and future processors! In order to get your anti-debugging code running on any machine, even the old 8086's, the code must not contain any specific 386 instructions, for example, or undocumented opcodes. Unless, of course, the code first checks whether the CPU could run it or not. Using instructions that earlier or future processors can't run will cause unexpected misbehaviour and exception #6's (Invalid OP-Code) to be generated. Also, relying on that an undocumented opcode would retain its purpose and functionality, or that an undefined opcode doesn't exist only increases incompatibility risk. However, for testing exception #6 Intel has reserved an opcode, 0Fh 0Bh (UD) (*), which will never be defined for other use. On AMD processors a similar reserved opcode is 0Fh 0FFh (UD). Both of these are/were undefined on both manufacturers' processors as of Pentium II release in 1997, but because of the now obsolete 'POP CS' instruction (opcode 0Fh) they won't "work" on 8086 or 80186 CPUs. 3.2.4 - The trick _must_ work properly in both Real Mode and Virtual 8086 Mode under normal conditions in DOS! The trick shouldn't cause any unexpected processor exceptions. It's no good using a trick that only reduces compatibility with systems running in Protected Mode (OS/2, Windows NT, etc.), since they use Virtual 8086 Mode to provide a Real Mode environment to an application. 3.2.5 - Actions performed by anti-debugging code should be well "hidden". The purpose of each instruction in your anti-debugging code must not be obvious, since the user _must_ first trace your code up to some point and try to understand code execution before placing any breakpoints (only guessing a location often isn't enough). Loading values directly into registers or putting all of the instructions used to perform something in a single bunch only makes it easier for a cracker to determine what's going on. 3.3 - Hints and tips -------------------- There are some hardware and software qualities which could get handy to know of. You'll find such information gathered up here. 3.3.1 - Exceptions and interrupts Exceptions are basically interrupts generated internally by the CPU, often because of an occurrence of an operation error. The first 33 interrupts (00h-20h) are reserved by Intel for exceptions. Each of the exceptions is of one of the three types available: faults, which occur _before_ the instruction which causes the exception and therefore the return address pushed on stack will point to the faulting instruction, traps, which occur _after_ the instruction causing the exception and thus return address will also point to the next instruction after the faulting one, and aborts, which are only used to report severe errors and do not allow the precise location of the instruction causing the abort to be determined. An exception of this is the Non-Maskable Interrupt (NMI) which is a sort of its own because a signal to the processor's NMI pin causes this exception. Software interrupts (INT, INTO and BOUND instructions) are also handled as exceptions, but are used to allow user-generated interrupts. Here is a list of exceptions sorted by interrupt number, the exception's type (fault/trap/abort), name with the processor model(s) it can occur on, and the instruction(s) which can cause the exception: Interrupt Type Exception name Instruction(s) --------- ---- -------------- -------------- 0 Fault Divide Error DIV, IDIV 1 *) F/T Debug Exception 2 NMI NMI Interrupt INT 02 or 3 Trap One Byte Interrupt INT 3 4 Trap Interrupt on Overflow INTO 5 Fault Array Bounds Check (186+) BOUND 6 Fault Invalid OP-Code (186+) 7 Fault Device Not Available (286+) ESC, WAIT 8 Abort Double Fault (286+) 9 ??? Coproc. Seg. Overrun (286, 386) (*) 10 Fault Invalid TSS (286+) JMP, CALL, IRET, INT 11 Fault Segment Not Present (286+) 12 Fault Stack Fault (286+) 13 Fault General Protection Fault (286+) 14 Fault Page Fault (286+) 15 - - 16 Fault Floating Point Error (286+) , WAIT 17 Fault Alignment Check Interrupt (486+) 18-32 - - 0-255 Trap Two Byte Interrupt INT nn *) Some debug exceptions occur as faults (execution breakpoint, for example), others as traps (single-step as an example). Note that this is just a simple list of exceptions as defined by Intel. Since most of these exceptions have multiple reasons, only the basic cause of them is included. For a complete list, refer to Intel documentation (an "x86 Programmer's Reference Manual" is fine). 3.3.2 - Prefetch Instruction Queue (PIQ) All Intel 80x86 family processors have a tiny memory area within the processor to hold some instructions fetched from system RAM in advance. This Prefetch Instruction Queue is as small as 4 bytes on the first generation 8086/8088 processors and can be as large as 32 bytes on 486's, or even larger on newer models. A user cannot directly access this memory but its effects can be seen. The idea is to speed up execution by reducing the need to get the next instruction from "slow" memory every time one instruction has been executed. The way a prefetcher works is very complicated and may work pretty differently on different processor models because of the different size of the PIQ and different memory organization (for example, 486 processors have a 32-byte PIQ organized as 2x16 bytes). The prefetcher unit is very inadequately documented, usually only the size of the PIQ is mentioned, and I couldn't get much info even from the engineers at Intel tech support either. I have tested a PIQ of a _486_ with self-modifying code a bit, and it looks like that any (or at least most) instruction modifying memory, including PUSHes, causes the prefetcher to refill the PIQ just before the actual memory modification takes place. The queue seems to be aligned on 16 byte boundaries (00h, 10h, 20h, etc.). The first 16-byte paragraph stored in PIQ will start at the last 16 byte boundary if the instruction modifying memory doesn't start at the border, and the memory amount loaded into the queue will be 32 bytes from that address. Here is an example to clarify this: CS:0160 mov byte ptr [017F],CC ; Replace a byte at CS:017F, the last ; byte of memory stored in the queue, ; with INT 3 opcode. (memory stored in ; PIQ starts here also) CS:017F ... ; (memory stored in PIQ ends here) Based on the same example above: If the INT 3 was written to CS:0180, it would not be contained within the queue, or if the MOV instruction was located at CS:015F the PIQ area would start 16 bytes earlier and thus wouldn't the INT 3 be contained within the queue either. Moving the MOV up to CS:016B wouldn't affect "the location" of the queue. There are some exceptions to this rule, though. Sometimes, on long instructions containing immediate address or data values, the instruction may extend backwards to an area not stored within the PIQ (shifting the MOV of the above example four bytes left to start at CS:015C will not affect the PIQ a thing). Why this happens can perhaps be explained with the CPU reading the immediate values separately _while_ executing the instruction, I don't know... Even though the prefetcher is nearly invisible to a user, it can be instructed to flush and refill the queue. Most instructions which cause instructions not be executed in sequence (ie. various jump and return instructions), implicitly cause the prefetcher flush and refill the PIQ. The instructions which will always flush the queue are the unconditional JMP, LOOP, CALL, RET, INT and variations including exceptions and external interrupts, and IRET. The ones that flush the queue _only_ when all jump conditions are met ("jump conditions" here refer to set or clear flags), are all the conditional jump instructions (JZ, JAE, JNS, etc.), except for J(E)CXZ, and conditional LOOPs (LOOPZ/E and LOOPNZ/NE) instructions. It's important to note that J(E)CXZ instruction does _not_ flush the PIQ (and otherwise acts weirdly as regards PIQ...). Similarly, though a LOOP instruction just falls through if CX=0001, it doesn't affect flushing in any way. For example, if Zero Flag is set but CX=0001, a jump will not be taken by a LOOPZ but it still flushes the prefetched instruction queue because the flags set match the conditions set for a jump. Also note that no REP prefix itself will cause a flush. For more detailed information on PIQ, check out Robert Collins' X86 website, the address is in Appendix D, 'Useful Internet sites'. 3.3.3 - Special EFLAGS register flags Trap Enable Flag (TF, bit 8): This flag (aka. Single-Step Flag) is used for single-stepping, ie. executing only one instruction at a time. If the TF bit is found set at the beginning of an instruction, exception #1 (Single-Step) will be generated by the CPU after the instruction has been executed, regardless of the state of the TF bit after execution (the same principle applies to a cleared TF bit also). This means that INT 01 handler routine is executed between two instructions. This flag has been available since the 8086's, and any program is allowed to modify this bit. The bit will not be cleared automatically by the CPU. Resume Flag (RF, bit 16): This flag can be used in conjunction with hardware execution breakpoints (they will also generate exception #1) to suppress them. Since 386 hardware execution breakpoints are treated as faults by the CPU (ie. they occur _before_ the actual instruction at breakpoint is executed), the next instruction's CS:EIP, in this case the instruction's at breakpoint, is PUSHed on stack. Returning to code execution normally would therefore retrigger the same breakpoint, but setting the RF bit of the EFLAGS image (will be loaded with an IRET) on stack in INT 01 handler routine can prevent this from happening. (Note that CPU sets this flag after a debug breakpoint has occurred as a fault, thus the flag will be pushed on stack in the EFLAGS image automatically.) This flag was first introduced in 80386 processors, and any program is allowed to modify this bit. After _successful_ execution of one instruction the bit is cleared. 3.3.4 - Special software interrupts INT 1: This is an undocumented single-byte instruction (opcode 0F1h), mainly used by In-Circuit Emulators (ICE) as breakpoint instructions. This instruction is available on 386 processors and above, and it has become official with the introduction of Pentium Pros (*). This instruction mostly functions identically to an INT 01 (opcode 0CDh 01h) with the most important exception that INT 1 is never sensitive to the IOPL bits in EFLAGS while INT nn instructions are. INT 02: This instruction calls INT 02 (NMI) routine, but unlike any other INT nn instruction it ignores any further NMI requests while in the service routine until an IRET is executed or the processor is reset. (*) INT 3: This single-byte instruction (opcode 0CCh), used to plant breakpoints into code being debugged, functions identically compared to INT 03 (opcode 0CDh 03h) with the exception that this INT 3 is never IOPL sensitive. INTO: This instruction calls INT 04 (Interrupt on Overflow) routine if Overflow Flag (OF) is set, but unlike INT 04 it is never IOPL sensitive. BOUND: This instruction calls INT 05 (Array Bounds Check) routine if a Value Out of Range is detected, but unlike INT 05 it is never IOPL sensitive. Being IOPL insensitive will only affect execution of INT instructions in Virtual 8086 or Protected Mode if IOPL is less than the Current Privilege Level of the interrupt caller. As an example, the execution procedure of an INT 3 (single-byte opcode) and an INT 03 under VM86 (CPL=3) while IOPL=0 will differ by the interrupt handler number which will be called. An INT 3 will always call the routine pointed to by the appropriate IDT entry but INT 03 would raise a General Protection Fault, exception #13 (if IOPL was 3, the appropriate routine would be called). 3.3.5 - "Double-stepping" A feature worth noticing is that loading a value into stack segment register SS causes CPU to disable _all_ external interrupts (including NMI) and prevent debug exception #1's from occurring, triggered by a Single-Step Trap as well as a hardware breakpoint, until the next instruction (following the instruction loading SS register). The instructions affected are 'MOV SS,xxxx' and 'POP SS'. This is so that stack pointer SP could also be loaded safely, otherwise any interrupt could cause the computer to hang due to an inconsistent SS:SP pointer. However, all other exceptions occurring as faults will be generated anyway, examples would be Divide Error (INT 00) and Invalid OP-Code (INT 06). Note though, that two consecutive instructions loading SS register will only double-step once, chaining multiple such instructions won't work. Also, 386 processors and above have an LSS instruction to load SS, but since it is possible to load SP register with the same instruction, no interrupts need to be disabled and thus no "double-step" will occur. You may also notice a similar effect while tracing code in Virtual 8086 Mode but not in Real Mode. The instructions causing a double-step (not necessarily!) are either privileged instructions requiring CPL of 0 or instructions accessing 386 special registers because they cause exception #13 when executed in VM86. This is _not_ a feature of an 80x86 processor, but rather due to the exception handler implementation of the VM86 control program (EMM386, Windows in 386 Enhanced Mode, etc.). If the previous instruction just caused an exception, so why not execute the following instruction also? Therefore these _could_ be chained (but who would want to?). 3.3.6 - Intel 8259, Programmable Interrupt Controller (PIC) An Intel 8259 Programmable Interrupt Controller (PIC) monitors all IRQ lines (see also section 3.3.7, 'Common Interrupt Request (IRQ) lines used by a PC machine') and sends a signal to the CPU's INTR pin whenever an IRQ occurs. Programmability includes the interrupt handler number to be called upon receipt of a certain IRQ. This is the most important (and useful to anti-debugging) programmable feature of a PIC and therefore only it will be discussed here. [* need detailed info on this *] 3.3.7 - Common Interrupt Request (IRQ) lines used by a PC machine Interrupt Requests (IRQs) are used by external hardware, such as a hard disk, controller, to interrupt other processing when it needs attention. In a PC/XT class machine there are eight possible IRQ lines (IRQs 0-7). An additional 8259 Programmable Interrupt Controller (PIC) has been added to ATs and cascaded to IRQ 2 as a slave (IRQs 8-15) of the master PIC to support up to 15 IRQs. Many of those are required by vital system peripherals and it's nearly a standard which reserves which IRQ. Here is a list of IRQs sorted by priority, the interrupt handler number in DOS (note that this varies from operating system to another since the PIC _is_ programmable) that will be called upon interrupt request signal and the owner of each IRQ line: IRQ # Interrupt Owner ------- --------- ----- IRQ0 08h System timer IRQ1 09h Keyboard IRQ2 0Ah EGA/VGA vertical retrace (PC/XT) or slave 8259 (AT) IRQ8 70h Real-Time Clock (RTC) (AT) IRQ9 71h (AT) IRQ10 72h (AT) IRQ11 73h (AT) IRQ12 74h PS/2 mouse (AT) IRQ13 75h Floating Point Unit (FPU) error (AT) IRQ14 76h Hard Disk Controller (HDC) (AT) IRQ15 77h (AT) IRQ3 0Bh COM2 or COM4 IRQ4 0Ch COM1 or COM3 IRQ5 0Dh Hard Disk Controller (HDC) (PC/XT) or LPT2 (AT) IRQ6 0Eh Floppy Disk Controller (FDC) IRQ7 0Fh LPT1 3.3.8 - Back door commands in Soft-ICE! Beginning from Soft-ICE version 2.50, it is possible to control Soft-ICE operation with the software being debugged! Nu-Mega thought this feature would prove to be useful to provide easy hardware debugging capabilities to programs, but actually they only made it possible for coders to develop their anti-debugging code against Soft-ICE. These commands are _only_ described in an addendum with versions 2.xx since the manual is printed for version 2.0. Therefore an unsuspecting user could even crash his Soft-ICE when examining a program protected with anti-Soft-ICE code. The documented back door commands let _any_ program to execute _any_ Soft-ICE command in addition to manipulating breakpoints (getting info, creating and even disabling them) without restrictions. Since there is no way to disable this feature, or at least in versions of up to 2.80, one could easily take advantage of it as a nice anti-debugging trick. But there's more: Soft-ICE uses similar back door commands to allow its companion utilities, such as LDR.EXE (program loader), to have _total_ control over the Protected Mode portion of Soft-ICE. Ever come to your mind that if Soft-ICE utilities can control the debugger, then why couldn't any anti-debugging code, too? The undocumented back door commands, that are only supposed to be used by Soft-ICE utilities, include modifying ACTION, for example, and even executing Protected Mode code (this is used when unloading Soft-ICE from memory)! These commands may even exist in versions of Soft-ICE older than 2.50. (for examples of unloading Soft-ICE, see section 4.3.7) Using Soft-ICE back door commands is pretty simple: SI and DI registers are set to fixed values, AH=09 when using documented back door commands, AL will indicate the function to be performed and when sub-function-specific registers and data are ready, the back door is activated with an INT 3 instruction. Giving Soft-ICE unauthorized orders is made even easier by the fact that the user program does not necessarily have to issue an INT 3. Running to a Soft-ICE INT 3 breakpoint (BPX), when all needed values are ready set in the registers, will also trigger the back door no matter what 'ACTION' Soft-ICE is supposed to take at a breakpoint! (for an example of screwing up Soft-ICE, see section 4.3.6) It must also be noted that any back door command will continue with normal VM86 code execution after returning from the routine, it will _not_ return to Soft-ICE screen if the user was tracing over the INT 3 trigger. DOCUMENTED SOFT-ICE BACK DOOR COMMANDS -------------------------------------- Register input: Sub-functions: AH=09 AL=10 Display information in the Soft-ICE window AL=Sub-function code AL=11 Do a Soft-ICE command SI=4647 ('FG') AL=12 Get breakpoint information DI=4A4D ('JM') AL=13 Set Soft-ICE breakpoint AL=14 Remove Soft-ICE breakpoint To activate: INT 3 Sub-function AL=10: Display information in the Soft-ICE window -------------------------------------------------------------- Register input: Returned values: DS:DX Pointer to ASCIZ string (none) Notes: ASCIZ string consists of up to 100 text characters including carriage returns (character 0Dh). A null-character (00h) terminates string. Sub-function AL=11: Do a Soft-ICE command ----------------------------------------- Register input: Returned values: DS:DX Pointer to ASCIZ string (none) Notes: ASCIZ string consists of up to 100 text characters including carriage returns (character 0Dh). A null-character (00h) terminates string. Sub-function AL=12: Get breakpoint information ---------------------------------------------- Register input: Returned values: (none) BH Entry # of last breakpoint set BL Type of last breakpoint set DH Entry # of last BP that went off DL Type of last BP that went off Notes: Entry number is the same as displayed by 'BL' command. Type is one of the following: 0 - Breakpoint on memory access (BPM) 1 - Breakpoint on I/O port access (BPIO) 2 - Breakpoint on interrupt (BPINT) 3 - Breakpoint on execution (BPX) 4 - (reserved) 5 - Breakpoint on memory range (BPR) Sub-function AL=13: Set Soft-ICE breakpoint ------------------------------------------- Register input: Returned values: DS:DX Pointer to BP structure AX Error code BX Breakpoint entry number Notes: Entry number is the same as displayed by 'BL' command. Error code is one of the following: 0 - No errors (in decimal) 3 - Breakpoint table is full 6 - Limit on memory BP's reached 7 - Limit on I/O BP's reached 9 - Limit on range BP's reached 16 - Duplicate breakpoint For breakpoint structure, see file "BPSTRUCT.ASM". Sub-function AL=14: Remove Soft-ICE breakpoint ---------------------------------------------- Register input: Returned values: BX Breakpoint entry number BX ??? when set (whatever it means) Notes: Entry number is the same as displayed by 'BL' command. One could experiment with these undocumented commands by examining one of Soft-ICE companion utilities such as LDR.EXE (uses AX=0000 and while AH=09, AL values within the range between 00h-17h, and perhaps others) or S-ICE.EXE itself, or simply by modifying the AL register value to find out new commands, but of course, some other registers may have to be set properly until the commands work (some function so "invisibly" that one can't notice the change, if any). But also, if AX is set to random values, it may cause Soft-ICE to pop up, DOS go nuts or other similar stuff. SI and DI just have to be set to the "magic values" so that Soft-ICE will grab INT 3's. However, since this erratic Soft-ICE behaviour occurs with more than one AX value, I haven't included a list of them. Now, I will need your help to list the... UNDOCUMENTED SOFT-ICE BACK DOOR COMMANDS!!! ------------------------------------------- Register input: Commands: AX=Back door code AX=0000 ??? SI=4647 ('FG') AX=0915 Set ACTION after breakpoint DI=4A4D ('JM') AX=10xx Execute code in Protected Mode To activate: INT 3 Command AX=0000: ??? -------------------- Register input: Returned values: ??? (none?) SI ??? Command AX=0915: Set ACTION after breakpoint -------------------------------------------- Register input: Returned values: BL Interrupt number ??? (none?) Command AX=10xx: Execute code in Protected Mode ----------------------------------------------- Register input: Returned values: ??? (none?) ??? (none?) Notes: Code execution starts immediately after triggering back door! [* need help to complete this *] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SECTION 4: ANTI-DEBUGGING TRICK IDEAS AND EXAMPLE CODE ====================================================== NOTE: Since the tricks listed here are effective against different kinds of debuggers, I have provided three attributes on each of the tricks to indicate what it is effective against: R=Real Mode software debuggers P=Protected Mode/VM86 hardware debuggers U=User analyzing the code (confusion code) x=(attribute not set) 4.1 - General ideas ------------------- These are ideas that might help you achieve your goal: to prevent examination of your program code. They are almost equally effective when debugging is attempted with either a 8086 or a 386 debugger. 4.1.1 - Causing CPU to execute two instructions at a time When stack segment SS is loaded with a value, any external interrupts and also debug exception #1 traps at the instruction loading SS and faults of the following instruction are disabled until the next instruction. This could be used to force execution of an instruction. Just put any instruction you want after a 'MOV SS,xxxx' or a 'POP SS' and it will be executed regardless of a single-step trap or even a hardware execution breakpoint set to the particular instruction (hardware data breakpoint set to break on memory access done by a 'MOV SS,[xxxx]' or 'POP SS' will also be disabled but can't think of any use for that... :)! If one wants to stop before executing the instruction following a pop to SS, he has to set an INT 3 -style breakpoint there (software interrupt instructions naturally work). ABOUT THE EXAMPLE: ------------------ In this example, an INT 03 instruction is forced but it could be any other. CS:0100 8C D0 mov ax,ss ; Move original SS to AX CS:0102 8E D0 mov ss,ax ; Copy value from AX back to SS and... CS:0104 CD 03 int 03 ; ...force execution of INT 03 4.1.2 - Hiding "real" code from the user The _real_ instructions used by your protection code should be buried underneath lots of useless "garbage" instructions so that the person debugging couldn't easily determine the ones which perform something vital for the code. Alternatively, if your code allows it, you could divide a trick into smaller chunks, or even into single instructions, and execute them in the middle of some other trick(s), chunk at a time. However, stack or the registers the broken down trick uses must _not_ be modified until all of its code has been executed (or values should be at least restored before the next chunk)! ABOUT THE EXAMPLE: ------------------ [* will be supplied soon *] 4.1.3 - Assuming that a certain condition exists Do _not_ compare bytes or test any bits in your data or registers if it's used in conjunction with a conditional jump. That method is extremely vulnerable to code modification. Rather, assume that a condition exists (or doesn't exist) and build up your code on it. There are many possibilities to do this, a simple example is included to demonstrate one. ABOUT THE EXAMPLE: ------------------ [* will be supplied soon *] 4.1.4 - Loopy loops... It's advisable to use very long loops in a decryption routine, for example, and fill it with lots of anti-debugging code. This will slow down a cracker considerably since he will have to get the program code descrambled before any examination is possible. Avoiding multiple debugger traps on every loop will ensure that he'll have _lots_ of fun and spend _lots_ of time trying to decrypt the code he wants. 4.2 - Standard tricks --------------------- These tricks are a bunch of traditional and quite well-known debugger traps. They should work against almost any standard debugger. Please note though, that most of them are only useful against Real Mode debuggers, not those using 386 hardware debugging support. 4.2.1 - Modifying interrupt vectors The easiest way to fool a Real Mode debugger is to change the interrupt vector entries for INTs 01 (used for single-stepping) and 3 (software breakpoint) in the interrupt table. Other interrupts, that you won't be expecting to occur, could be vectored to an IRET to disable their functionality and restored afterwards. Anti-debugging code could change the CS:IP point to the next instruction to be executed (only INT 01...) to continue execution without the debugger breaking in, or to the BIOS reboot address at F000:FFF0, for instance. It would be even better if either one of these interrupts were used for proper code activation. By using INT 3 for code activation you could also fool Soft-ICE just by setting SI=4647 and DI=4A4D before calling INT 3, this makes it run a back door handler instead of the user routine (for more information see 3.3.8, 'Back door commands in Soft-ICE!', and 4.3.6, 'Screwing up Soft-ICE with back door commands')! Redirecting other interrupts to INT 01 or 3 routines might be fun to try out, too. Pointing INT 00 (Divide Error) to INT 01 routine and then dividing by zero causes a funny effect of executing the (I)DIV instruction over and over again. This is because a Divide Error saves the address of the faulting instruction on stack instead of the next one's as the Single-Step Trap does. Still, the nastiest trick would be to use INTs 01 and 3 as substitutes for often called interrupts, INT 21 (DOS Function Request interrupt), the most frequently accessed interrupt by DOS programs, as a good example. If INT 3 (the single-byte opcode) is used, it would be virtually _impossible_ to replace any of them with another because of the fact that INT 3 instruction only takes one byte while all others require two. Also you could point INT 21 elsewhere, your program for instance, which would bug some Real Mode debuggers. Other interrupt vectors, which are also used by several debuggers (INTs 00 (Divide Error) and 02 (NMI)), may get handy. If INT 00 vector is pointed to the start of "real" code, it could be called with a simple DIV AX while AX has a value of zero (for an example, see 4.3.3, 'Fooling TD386 Virtual-86 Driver'). Just remember that it's not advisable to change vectors with the DOS Set Interrupt Vector service (INT 21, AH=25), use manual vector modification instead. If a debugger has taken over INT 21 it could prevent any interrupt vector change via DOS services. Also note that only modifying Real Mode interrupt table or VM86 virtualized interrupt vectors is possible, Interrupt Descriptor Table is out of the reach of a VM86 program and attempted access will only cause a General Protection Fault (exception #13). ABOUT THE EXAMPLE: ------------------ This example modifies INT 3 entry to point to the BIOS reboot vector. It is done in two steps, first IP will be moved and then the segment. After doing this, the next INT 3 breakpoint will result in a cold reboot. If 386 instructions are used, changing CS:IP would only take one instruction and no conflicts with the debugger could occur: tracing through code which changes INT 01 vector will cause unpredictable behaviour after changing IP while code segment is still intact... CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 A3 0C 00 mov [000C],ax ; Set INT 3 pointer IP to 0000,... CS:0107 F7 D0 not ax ; ...invert AX (0000 -> FFFF) and... CS:0109 A3 0E 00 mov [000E],ax ; ...set CS to FFFF (FFFF:0000) 4.2.2 - Masking out hardware interrupts Disabling external hardware (maskable) interrupts, such as the keyboard's, is possible via the Intel 8259 Programmable Interrupt Controller. Performing I/O to the 8259 Interrupt Mask Register ports 21h (IRQs 0-7) and A1h (IRQs 8-15) allows us to disable any Interrupt Request (IRQ) line we want and therefore any device allocating an IRQ on the system. This is done by writing an 8-bit value to either of these I/O ports, and the 8-bit value is bitmapped so that bit 0 stands for IRQ 0, bit 1 for IRQ 1 and so on. Setting a bit means that the corresponding IRQ will be disabled, clearing a bit will enable the IRQ. The most important IRQs are IRQs 0 and 1 since they are the system timer and keyboard IRQs respectively. Masking off IRQ 1 will lock any keyboard input but advanced debuggers usually re-enable it before returning from code execution. A complete list of standard PC IRQs is found in section 3.3.7, 'Common Interrupt Request (IRQ) lines used by a PC machine'. Non-Maskable Interrupts (NMI) can also be disabled. It is done on PC/XT class machines by writing 00h to port A0h (writing 80h enables NMI). On ATs and up NMI masking is done via the CMOS RAM/Real-Time Clock (RTC) port 70h. To disable NMI a value with bit 7 set must be written to port 70h (clearing bit 7 enables them). Note that port 71h should be read immediately after enabling or disabling NMI via port 70h, or the RTC may be left in an unknown state! ABOUT THE EXAMPLES: ------------------- The following examples demonstrate masking IRQs and NMI. In example #1 the keyboard is disabled using the Programmable Interrupt Controller. In example #2, NMI is disabled via the CMOS RAM/RTC port 70h on AT class machines. Since the I/O port is write-only, we'll simply write 80h. Example #1 (masking off keyboard IRQ): CS:0100 E4 21 in al,21 ; Read current value CS:0102 0C 02 or al,02 ; Set bit 1 (keyboard IRQ) CS:0104 E6 21 out 21,al ; Write new value to disable keyboard Example #2 (disabling NMI): CS:0100 B0 80 mov al,80 ; Move a value with bit 7 set into AL CS:0102 E6 70 out 70,al ; Write AL to port 70h and mask NMI CS:0104 E4 71 in al,71 ; To ensure proper RTC operation 4.2.3 - Reprogramming Intel 8259 Programmable Interrupt Controller (PIC) As the name of the chip tells us, it's user-programmable. One of the things set during the initialization procedure is the interrupt handler number that will be called upon receipt of a certain interrupt request (IRQ). By reprogramming the 8259, any current IRQ interrupt handler can be made obsolete. This allows us to fool at least any Real Mode debugger, by redirecting IRQ 1 (Keyboard) to some other interrupt handler than the default INT 09 thus locking keyboard from any program that assumes INT 09 to be called on any keystroke, for example. [* need detailed info on this *] 4.2.4 - Disabling keyboard Sometimes it may prove useful to disable the keyboard until it is needed again for disallowing any further tracing. There are several ways of doing it and here they are, "all-in-one". The first method, and most often used, is masking keyboard interrupt, IRQ 1. It can be done via the Interrupt Mask Register, I/O port 21h, but unfortunately advanced Real Mode debuggers usually re-enable it before returning from code execution and it never works on Soft-ICE. For more information and an example, see 4.2.2, 'Masking out hardware interrupts'. Note that issuing a CLI instruction (clearing IF bit of FLAGS) will disable all external interrupts thus effectively locking the keyboard also. However, that only works for running code, disabling the debugger hotkey. A CLI locks out keyboard even from Soft-ICE unless 'BREAK ON' command is entered in the debugger. [* info on Intel 8255 PPI will be added here some time *] The next method is to give commands to the Intel 8042 Keyboard Controller to disable the keyboard interface, which works on ATs and up. Real Mode debuggers usually don't expect _this_ and should be used instead to disable keyboard! Disabling keyboard via the 8042 chip is done by writing ADh (Disable Keyboard Interface) to the 8042 Command Register, port 64h, which then drives the keyboard clock line low. Keyboard interface of the 8042 can be re-enabled by writing AEh (Enable Keyboard Interface) to port 64h. However, once again, Soft-ICE is able to override this, too. Another way of doing the same trick is to first write 60h (Write 8042 Command Byte) to port 64h and next write a byte with bit 4 set to port 60h, but since the byte written is a bitmapped parameter for 8042 operation, the command byte should first be read, set the bit and finally rewrite the byte back (see the example below). The most effective method, which completely freezes keyboard in most applications including Soft-ICE whether tracing or running code, is to program the keyboard directly (all bytes written to port 60h will be passed on to the keyboard if the Keyboard Controller isn't expecting any data input for a 8042 command written to port 64h). All you need for this is an AT keyboard, PC/XT keyboards cannot be programmed. The trick is simply to disallow scanning for a keypress, which can be done by making the keyboard wait for further data from the system. While waiting for parameters for a command, the keyboard won't accept any keystrokes. The "official" way of doing this is to write F5h (Set Default w/Disable) to port 60h which resets keyboard to default values and waits for another command. To enable the keyboard, F4h (Enable Keyboard) must be written to port 60h. Also writing one of the keyboard commands EDh, F0h, F3h, FBh, FCh or FDh will make the keyboard stop scanning and wait for data input. Writing 00h to port 60h in any of those cases will restart scanning, but since that byte is supposed to be a parameter for the command issued before, keyboard may behave oddly after writing it (keyboard LEDs flashing, remapped keys, etc.). Not only direct keyboard commands but also some 8042 commands will lock the keyboard without any obvious reason. However, they're not listed here, you may test them out yourself by writing a byte to port 64h. The funny thing is, that if a 386 debugger, or any Protected Mode program (in Real Mode will always work), isn't aware of these commands it is quite impossible to prevent this kind of lock-outs. And who would expect an innocent keyboard command like 'Set Typematic Rate/Delay' hang the debugger anyway? One might think that a breakpoint on I/O to ports 60h or 64h could help, but no, think back! Pressing a key while keyboard is enabled invokes INT 09 (Keyboard) handler, which in turn will determine the key pressed by performing some I/O to ports 60h and 64h... or a program expecting keyboard input will anyway. Thus access to those ports from a VM86 task cannot be permanently denied either. Some advanced operating systems, such as Windows NT, intercept keyboard I/O and this won't work, though. For more information on those keyboard commands I suggest you get the HelpPC utility (see Appendix B, 'Suggested reading for info' at the bottom of this document). ABOUT THE EXAMPLES: ------------------- All of these examples are devoted to disabling keyboard. Example #1 is for disabling keyboard I/O through the Intel 8255 Programmable Peripheral Interface and thus _only_ works on PC/XT machines. In example #2, Intel 8042 Keyboard Controller command 'Disable Keyboard Interface' is used to lock the keyboard. Example #3 shows another way of doing the same trick as in example #2. This directly modifies bit 4 in the 8042 Command Byte by first reading in its original value, then setting the bit and writing the byte back. However, I can't guarantee that this will work but according to HelpPC's database it should be the proper way. Therefore I strongly recommend using the direct command (example #2) instead. Example #4 demonstrates forcing keyboard to stop scanning for keystrokes by setting defaults and disabling scanning. Tracing through will not succeed with _any_ debugger! Keyboard will stop responding at the first OUT instruction and will be disabled until the second OUT. Example #1 (disabling 8255 keyboard interface): ----- CS:0100 E4 61 in al,61 ; CS:0102 0C 80 or al,80 ; CS:0104 E6 61 out 61,al ; ----- Example #2 (using a 8042 control command): CS:0100 B0 AD mov al,AD ; ADh is 'Disable Keyboard Interface' CS:0102 E6 64 out 64,al ; Write it to 8042 Command Register Example #3 (writing 8042 Command Byte): CS:0100 B0 20 mov al,20 ; 20h is 'Read 8042 Command Byte' CS:0102 E6 64 out 64,al ; Write it to 8042 Command Register CS:0104 E4 60 in al,60 ; Read value from port 60h to AL and... CS:0106 0C 10 or al,10 ; ...set bit 4 CS:0108 88 C4 mov ah,al ; Move byte in AL to AH CS:010A B0 60 mov al,60 ; 60h is 'Write 8042 Command Byte' CS:010C E6 64 out 64,al ; Write it to 8042 Command Register CS:010E 88 E0 mov al,ah ; Copy byte in AH back to AL and... CS:0110 E6 60 out 60,al ; ...write AL to port 60h Example #4 (stopping key scanning): CS:0100 B0 F5 mov al,F5 ; F5h is 'Set Default w/Disable' CS:0102 E6 60 out 60,al ; Set defaults and stop key scanning CS:0104 B0 F4 mov al,F4 ; F4h is 'Enable Keyboard' CS:0106 E6 60 out 60,al ; Restart key scanning 4.2.5 - Forcing a debugger stop execution This method is a simple and easy one, you just have to put an INT 01 (used for single-stepping) or INT 3 (software breakpoint) instruction in the middle of your code. Every Real Mode debugger has these interrupts hooked to the debugger itself and some 386 debuggers also recognize INT 3's in addition to hardware breakpoints. During normal execution, no interruptions or problems will occur but if a debugger is running the code, program will be stopped at each of those instructions. Naturally the highest effectiveness is gained when used inside a loop as the debugger would stop at every round. No examples given. You can decide yourself where to drop one of these instructions. 4.2.6 - Using regular INT nn -style INT 3 calls Even though the single-byte INT 3 breakpoint instruction is a lot more common than the two-byte INT 03 (opcode 0CDh 03h) regular interrupt call, the two-byte version is also of some use. Real Mode debuggers seldom take possible INT 03's into account since most software breakpoint routines seem to fail on two-byte INT 03's. Many Real Mode debuggers only subtract _one_ byte from the IP pushed into stack by the interrupt call when returning to debugger. Therefore, if an INT 03 is used instead of an INT 3 for whatever purpose wanted, the debugger would continue executing false instructions after the INT 03 if IP isn't fixed. This happens on both Debug and Turbo Debugger, even the advanced Soft-ICE has this bug when 'I3HERE' is set ON. Watcom Debugger also has INT 03 handler problems but much worse: if an INT 03 instruction is executed while single-stepping, it will start running code _continuously_. Furthermore, setting a (software) breakpoint immediately after the INT 03 doesn't work! No examples this time. You could drop an INT 03 in a random location or do just about anything your heart desires with them, there are no limits! 4.2.7 - Checking FLAGS Because single-stepping with a debugger requires the Trap Enable Flag (TF), bit 8 of FLAGS, to be set, we could simply check its current state. If it is set, some debugger is tracing the code since normally the TF bit is always clear. Please note, though, that a debugger can fake the PUSHF (and most really do) while single-stepping and therefore this may not work, but there are ways... (see also section 4.1.1, 'Causing CPU to execute two instructions at a time') This works for Soft-ICE too, unless a 'BREAK ON' was issued. It allows Soft-ICE to intercept PUSHFs with the help of CPU. ABOUT THE EXAMPLE: ------------------ This is a plain example to show you how it works. CS:0100 9C pushf ; Push FLAGS onto stack CS:0101 58 pop ax ; Pop FLAGS image to AX CS:0102 25 00 01 and ax,0100 ; Mask all other bits but bit 8 and... CS:0105 74 02 jz 0109 ; ...if result is AX=0000, proceed CS:0107 CD 20 int 20 ; Otherwise terminate and return to DOS CS:0109 ... ... ; (code continues here) 4.2.8 - Modifying original interrupt handler routine (idea from Varicella-][, DOS virus) This is close kin to 'Modifying interrupt vectors' but no actual interrupt table modification is involved. What we will do is to replace a part of the original interrupt handler routine with that of our own. Inserting a simple IRET, for example, to the address the vector points to would effectively kill the whole interrupt routine. If an IRET is put on INTs 01 and 3, tracing and setting breakpoints with a Real Mode debugger would call these interrupts only to execute the IRET instruction and return to code. Very nice and effective way of getting rid of a debugger. ABOUT THE EXAMPLE: ------------------ This example plainly replaces the first byte of INT 01 handler routine with an IRET. As simple as that! CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 8B 1E 04 mov bx,[0004] ; Load INT 01 pointer IP into BX... 00 CS:0108 8E 1E 06 mov ds,[0006] ; ...and CS to DS 00 CS:010C C6 07 CF mov byte ptr [bx],CF ; Replace first byte of INT 01 ; handler with 0CFh (IRET) 4.2.9 - Foiling 'Step Over'/'Proceed' debugger commands, part 1 of 2 While (P)roceeding, all debuggers must insert a breakpoint after certain instructions in order to return to debugger after executing "one" of them. These instructions are INTs, CALLs, LOOPs, instructions with a REP prefix, etc. to name some but generally they are instructions that would require executing multiple others if simply traced through. This offers anti-debugging code ways of making a debugger's job a bit harder. One of them is to adjust the return address saved by CALLs and INTs a bit. When the saved offset (IP) is incremented by one, the RET or IRET will not jump back to the instruction immediately following the subroutine call, but rather skips over the location where a debugger would put the breakpoint. Thereafter code can execute undisturbed. It is recommended that a NOP would be placed after the CALL, or whatever, set to fool the debugger, just to hold either a single-byte INT 3 or a hardware execution breakpoint. Also, you must make sure not to jump back to the address where the debugger has set the breakpoint unless code overwrites the location later and that the location is no longer the first byte of any opcode (prefixes included) so that not even a hardware breakpoint would go off there. Since no ranges can be set for a hardware execution breakpoint defined in debug registers, it doesn't work unless the address is exactly at the start of an opcode. For more tricks that can fool a 'Step Over'/'Proceed' command, see 4.2.10, 'Faking a procedure call' and the second part of this trick, section 4.4.2. ABOUT THE EXAMPLE: ------------------ This example basically just adjusts the return address on stack for a CALL by one byte. It makes the CALL avoid the breakpoint set by the debugger on return from the subroutine, if the user (P)roceeds instead of (T)racing. CS:0100 E8 03 00 call 0106 ; Call subroutine CS:0103 90 nop ; Reserve a byte for a breakpoint CS:0104 CD 20 int 20 ; Terminate and return to DOS CS:0106 89 E5 mov bp,sp ; Copy value from SP to BP CS:0108 FF 46 00 inc word ptr [bp+00] ; Increment return IP value by one CS:010B C3 ret ; Return from subroutine (to CS:0104) 4.2.10 - Faking a procedure call There is a number of tricks that can fool the 'Step Over'/'Proceed' debugger command. Because of a debugger expects execution to proceed at the following instruction after a procedure, such as a loop or subroutine call, it just places a breakpoint (either an INT 3 instruction or a hardware execution breakpoint) after the instruction that calls a procedure, and then runs code until the breakpoint. Therefore it is very easy to fool _any_ debugger to actually run the rest of the code. One method is to use a LOOP, CALL or an INT as a substitute to a regular jump instruction never returning to the instruction following the jump. The only difference between a JMP and a LOOP is that a LOOP decrements (E)CX by one and if (E)CX is zero after the operation, no jump will take place, but it could be used as a combined DEC (E)CX and JZ instruction. A CALL just stores the return address (IP if near, CS:IP if far) on stack. An INT instruction will save FLAGS and CS:IP pointer, in addition it will clear IF and TF bits in FLAGS thus disabling external interrupts and single-stepping. Remember though not to jump back to the location where a debugger would put its breakpoint unless you want to get caught by the debugger. For more tricks that can fool a 'Step Over'/'Proceed' command, see both parts of 'Foiling 'Step Over'/'Proceed' debugger commands', sections 4.2.9 and 4.4.2. ABOUT THE EXAMPLE: ------------------ This is a simple example of snatching control from the debugger if the person debugging (P)roceeds instead of full (T)racing. The LOOP here is not used as it is intended to, but just to jump away. CS:0100 E2 01 loop 0103 ; Fake "loop" CS:0102 90 nop ; Reserve a byte for a breakpoint CS:0103 B0 F5 mov al,F5 ; F5h is 'Set Default w/Disable' CS:0105 E6 60 out 60,al ; Set defaults and stop key scanning CS:0107 CD 20 int 20 ; Terminate and return to DOS 4.2.11 - Comparing INT 01 and 3 interrupt table entries (idea from Lock-Master, DOS executable encryption utility) Although the handlers for INTs 01 and 3 are similar in all debuggers (both return back to the debugger), they usually are separate routines because of the slightly different purpose they're for. Of course, an intelligent routine suitable for both interrupts could be written, but the issue here is that it is much easier to write two separate routines than one with a couple of additional checks, therefore INT 01 and 3 handler CS:IPs must also differ. However, since DOS simply points both of these interrupts to the same IRET, it is possible to check if the IP of these interrupts match. If they're inequal, there is most certainly a Real Mode debugger or a similar program running. Alternatively the first instruction of the handler routine could be checked against an IRET (opcode 0CFh), but I believe comparing IPs is enough. As a notice, this could be very useful in conjunction with encrypted code. When the handlers' IPs are subtracted from each other the result should be zero if no Real Mode debugger is running (see the example below). You could try adding the result to the decryption key thus causing incorrect decryption if a debugger present. ABOUT THE EXAMPLE: ------------------ This example loads the IPs of INTs 01 and 3 into AX and BX. After this BX is subtracted from AX (used as a substitute for CMP here). The result being non-zero indicates that they have separate handler routines and in such a case program is terminated. CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 A1 04 00 mov ax,[0004] ; Load INT 01 pointer IP into AX and... CS:0107 8B 1E 0C 00 mov bx,[000C] ; ...INT 3 pointer IP into BX CS:010B 29 D8 sub ax,bx ; Subtract BX from AX and... CS:010D 74 02 jz 0111 ; ...if result is AX=0000, proceed CS:010F CD 20 int 20 ; Otherwise terminate and return to DOS CS:0111 ... ... ; (code continues here) 4.2.12 - Using stack to fool a debugger Now _this_ is a powerful multi-purpose trick. It's based on the fact that single-stepping and software breakpoints (generally all interrupts and exceptions) use the same stack as the user program to store data, and it cannot be overridden in Real Mode. Therefore, using stack to decrypt encrypted data or moving code to the location where it will be executed, for example, will not work properly if a Real Mode debugger traces through the code. You could also set the stack in the middle of your code and move stack pointer SP point to a location a few instructions ahead every little while. If SS:SP points to code but the location isn't in the immediate vicinity of CS:IP, the user could actually go for this one without noticing that something vital is missing... Possibilities are once again limitless. The critical addresses in stack are the next three words (6 bytes), where the Single-Step Trap will store flags and return address. This is the 16-bit mode, default in Real Mode/VM86, and SP-6 will be the last byte affected. Some debuggers, such as Turbo Debugger, may also store some junk of their own on the program's stack in addition to the data interrupt call writes. Unless a debugger uses special tricks to retain the stack, this kind of anti-debugging tricks work fine. You just have to remember to disable all external interrupts with a CLI since they also use stack when calling interrupt handlers and thus may interfere as well. Also, stack should not be used while SS:SP is in code unless it is supposed to be self-modifying. Unfortunately this only works against 8086 debuggers, once again. Because of the slightly different way interrupts are handled in Protected Mode, no user program stack modifications take place thus rendering all these tricks useless. ABOUT THE EXAMPLE: ------------------ This is one possibility of using stack. The example first saves original SS:SP, sets up stack in the middle of code and then restores original stack pointer. Normally this code would jump over the 'INT 20' but tracing will overwrite the JMP instruction, so program would return to DOS. The NOPs are there only to stuff some bytes in case the data written by the interrupt call would form a long instruction. Remember a CLI, too! CS:0100 8C D0 mov ax,ss ; Save original SS in AX and... CS:0102 89 E3 mov bx,sp ; ...SP in BX (stack cannot be used!) CS:0104 0E push cs ; Save code segment CS on stack and... CS:0105 17 pop ss ; ...load it as new stack segment SS CS:0106 BC 0B 01 mov sp,010B ; Next PUSH to stack will overwrite... CS:0109 EB 04 jmp 010F ; ...this instruction. CS:010B 90 nop ; Stuff a byte CS:010C 90 nop ; Stuff a byte CS:010D CD 20 int 20 ; Terminate and return to DOS CS:010F 8E D0 mov ss,ax ; Restore original SS and... CS:0111 89 DC mov sp,bx ; ...then SP 4.2.13 - Generating a General Protection Fault or a Stack Fault Exception #13 (General Protection Fault) is actually a Segment Overrun Exception in Real Mode and VM86. The segment limit in both of these modes is 0FFFFh, and if either a non-byte (eg. word) memory reference beyond the limit is made or execution is attempted beyond the limit (for example, first byte of an instruction is at 0FFFFh and the last byte at 0000h), exception #13 (0Dh) will be generated. Therefore, we could point INT 0D to a location where we'd like to continue code execution after generating a GPF. Of course, the vector should also be restored or else... This works well in Real Mode, but any VM86 control program (a 386 debugger would be one) will catch this exception and usually aborts execution with a severe error. Exception #12 (Stack Fault) also works the same way: POPping beyond segment (upper) limit will generate INT 0C. This could be used similarly to a GPF but it also has the exact same problems. Remember that only 286 processors and above will generate these exceptions, a 8086 CPU just happily executes the instructions causing them... This is pretty much the same as 4.3.3, 'Fooling TD386 Virtual-86 Driver', but unlike that division by zero trick, these variants do not work properly with _any_ VM86 environment. Read the warning below! WARNING: Although no trouble will be encountered using this trick in a true Real Mode environment, not only 386 debuggers but also most software requiring Protected Mode (and running DOS in Virtual 8086 Mode) will intercept exception #13 before the VM86 routine is run and thus it's not advisable to use this trick (but it's here anyway for those who wish to reduce compatibility of their programs ;). Examples of such software are QEMM386, EMM386 and Windows in 386 Enhanced Mode (MS-DOS Prompt). ABOUT THE EXAMPLES: ------------------ These examples mainly show how to generate exception #12 or #13 in Real Mode or Virtual 8086 Mode. None of the examples restore interrupt vectors but it should be done to prevent possible machine lock-up. Example #1 causes a General Protection Fault with a memory reference. Example #2 causes a GPF by executing code beyond 0FFFFh limit. However, interrupt vector modification isn't shown. Example #3 causes a Stack Fault. Example #1 (memory reference beyond limit): CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 C7 06 34 mov word ptr [0034],0111 ; Set INT 0D pointer IP to 0111 00 11 01 ; and... CS:010A 8C 0E 36 mov [0036],cs ; ...CS to current CS (CS:0111) 00 CS:010E A1 FF FF mov ax,[FFFF] ; GPF! (tries to read 0FFFFh-0000h) CS:0111 ... ... ; (code continues here) Example #2 (executing beyond limit): CS:FFFF 31 C0 xor ax,ax ; GPF! (instruction at 0FFFFh-0000h) Example #3 (causing a Stack Fault): CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 C7 06 30 mov word ptr [0030],0112 ; Set INT 0C pointer IP to 0112 00 12 01 ; and... CS:010A 8C 0E 32 mov [0032],cs ; ...CS to current CS (CS:0112) 00 CS:010E BC FF FF mov sp,FFFF ; Set SP to FFFF (next POP will wrap!) CS:0111 58 pop ax ; Stack Fault! ; (tries POPping from 0FFFFh-0000h) CS:0112 ... ... ; (code continues here) 4.2.14 - Exploiting rapidly changing memory areas Taking advantage of such memory areas as display RAM (located at A0000-BFFFF), especially the EGA/VGA text mode memory at B8000-BFFFF, or the timer counter dword at 0040:006C updated by INT 08 (system timer handler) 18.2 times/sec, may be of some help. Moving program code first into such a memory area and then decrypting or executing it there, for example, probably causes at least some trouble with any Real Mode debugger when code is traced through. Perhaps copying a custom interrupt handler routine into video memory would be something to try out? Using display memory as a data or code storage area doesn't work with any debugger not restoring the original program screen while executing code because instructions accessing video memory would only "see" the debugger's screen, not the one they're supposed to. Even Soft-ICE doesn't swap video memory unless told to, swapping can be enabled with 'FLASH ON'. Remember though that if a video RAM region is not in use by the current display mode, those memory locations cannot be modified! If internal timer memory location is modified (after either disabling the timer (IRQ 0), or any INTR signal with a CLI), the time spent in debugger will be enough for the timer to update memory unless the debugger also disables the timer. Soft-ICE (in debugger screen) always masks all interrupts but the keyboard's without a user being able to re-enable the timer interrupt. Knowing this it's possible to build a trick that relies on the timer counter being updated. ABOUT THE EXAMPLE: ------------------ These examples demonstrate taking advantage of display memory and the timer counter dword. Example #1 uses video RAM to try to detect a debugger. It writes a word to the beginning of text mode display memory and then checks for any changes in the word. If the read word doesn't equal to the original one, some debugger is present. [* another example will be supplied soon *] Example #1 (testing with display card memory): CS:0100 B8 00 B8 mov ax,B800 ; Load B800 into AX and... CS:0103 8E D8 mov ds,ax ; ...then as DS segment (text mode) CS:0105 A3 00 00 mov [0000],ax ; Move a word to B800:0000 CS:0108 3B 06 00 00 cmp ax,[0000] ; Compare read value to original and... CS:010C 74 02 je 0110 ; ...if they match, proceed with code CS:010E CD 20 int 20 ; Otherwise terminate and return to DOS CS:0110 ... ... ; (code continues here) [* another example will be supplied soon *] 4.2.15 - Storing data in the interrupt table area This is pretty similar to 4.2.14, 'Exploiting rapidly changing memory areas'. The contents of the interrupt table could be copied to a safe place and then the 1KB memory area could be used as data storage space for decryption, for instance. Assuming that no interrupts are called while making use of the table area, everything runs fine. Messing with the table will make the computer hang sooner or later if any interrupts are called, thus the table _has to_ be restored after use. No examples will be shown this time. 4.3 - Special tricks -------------------- These attacks are aimed against specific debuggers, and will not usually work for any other than the one it is for because the tricks are based on implementation bugs or otherwise weak points found in debuggers. Therefore certain debugger programmers may be interested in reading this section... 4.3.1 - Jumping to a location within an instruction We could confuse the person debugging a bit by jumping to a location in the middle of an opcode while another instruction is hidden there. Note that this only works on full screen debuggers such as Turbo Debugger, which decode instructions on screen in advance (although Watcom Debugger is a full screen debugger, it always re-decodes instructions if a jump to an undecoded address is encountered). Jumping to an address within an already decoded instruction (that is on screen) usually causes the debugger _not_ re-decode the instruction CS:IP points to and thus the user won't know what's going on until debugger screen is reset. On "step"-type debuggers, which only decode the next instruction to be executed, this will not do anything. It's important to note that this method will _not_ affect program execution in any way! The only purpose of this is to try to confuse the user, not the debugger. BUT (!), if you find a good pair of instructions that fit within each other, using such a pair instead might make their removal more difficult than if used separately. Usually the size of the opcode is not the problem, some 32-bit instructions can be as long as 10 bytes or even longer on some instructions which have both a memory pointer and an immediate as 32-bit operands, but the trouble will arise from how to put them together. Just remember that if the instruction that contains the hidden instructions gets executed, any memory reference as the destination operand could mess up some vital code. A bit more sophisticated method of using nested instructions would be register or memory value modification. Let's assume we have to XOR a value and we also want to hide another instruction within the XOR instruction. In such a case the other opcode could be disguised as an immediate value of a XOR and then only a jump there would be needed. This makes the hidden instruction more difficult to be changed because changing it would also affect the value that will be XORed. And if this is a part of a decryption loop... Well, you know what would happen. ;) ABOUT THE EXAMPLES: ------------------- All of these are ways of concealing an instruction within another opcode. Example #1 is just to demonstrate the principle how it works. This example will first clear the IF bit in FLAGS and then halt CPU until an NMI or a reset signal occurs. A signal to the CPU's INTR pin would also bring the processor out of halt, but since they are ignored because of the CLI, only those mentioned before will do it. Example #2 is a bit more complicated since it first executes the visible layer of instructions (MOV and JMP), then jumps back into the middle of the MOV opcode and starts running code from the second instruction layer hidden inside the first one. If used as a trick, all of the instructions should do something essential for further execution of the code. In this example, the instructions are a _bit_ far fetched, just some trivial debugger "traps"... Example #3 is a small application of this technique and shows a nice way of hiding a jump in a MOV instruction (could be any other as well). Here the register value is retained but you could make your program depend on the value loaded into AX. Example #1 (the principle): CS:0100 EB 01 jmp 0103 ; Jump to the hidden instructions CS:0102 A3 FA F4 mov [F4FA],ax ; A fake instruction, see below ------- *CS:0103 FA cli ; Disable INTR signals *CS:0104 F4 hlt ; Halt CPU Example #2 (executing instructions in two layers): CS:0100 C7 06 CD mov word ptr [01CD],C2CC ; A fake instruction, see below 01 CC C2 CS:0106 EB FA jmp 0102 ; Jump to the hidden instructions ------- *CS:0102 CD 01 int 01 ; Single-step interrupt *CS:0104 CC int 3 ; Software breakpoint interrupt *CS:0105 C2 EB FA ret FAEB ; Add FAEBh to SP and return from call Example #3 (hiding a jump smoothly): CS:0100 50 push ax ; Save AX CS:0101 B8 EB 03 mov ax,03EB ; A fake instruction, see below CS:0104 58 pop ax ; Restore AX CS:0105 EB FB jmp 0102 ; Jump to the hidden instruction CS:0107 ... ... ; (code continues here) ------- *CS:0102 EB 03 jmp 0107 ; Proceed with code 4.3.2 - Exploiting Turbo Debugger's weak point As odd as it sounds, Turbo Debugger doesn't retain Interrupt Mask Register status for IRQs 0 (timer) and 1 (keyboard). When code is traced, Turbo Debugger masks out those IRQs at each step but running will leave them as they are. However, since TD enables them always when returning from code, they will inevitably be enabled... The 'Step Over' function combines both of these, so when a subroutine call is encountered, 'Step Over' will act as a "normal" 'Run' command, otherwise as 'Trace' masking IRQs 0 and 1. This could be useful if code checks whether or not Interrupt Mask Register (primary, port 21h) contents have been changed since last update. Of course, since _running_ code usually leaves timer and keyboard IRQs enabled only once, when started, and then they could be masked out freely during execution, it's better to test if the _tracing_ condition is true (ie. IRQs 0 and 1 have been masked out). But anyway, both of them can be detected if wanted. ABOUT THE EXAMPLE: ------------------ In this example, the value of the primary Interrupt Mask Register is read, bit 0 is cleared and bit 1 set to enable/disable their respective IRQs, and then written back. The final step is to read port 21h value again (by now both timer and keyboard would be masked out if someone was tracing with Turbo Debugger) and test the status of bits 0 and 1. As an extra feature this example disables the keyboard. CS:0100 E4 21 in al,21 ; Read value from port 21h to AL CS:0102 24 FE and al,FE ; Clear bit 0 and... CS:0104 0C 02 or al,02 ; ...set bit 1 CS:0106 E6 21 out 21,al ; Write the value back to port 21h CS:0108 86 C4 xchg al,ah ; Exchange AL and AH register contents CS:010A E4 21 in al,21 ; Read the value of port 21h again CS:010C 38 C4 cmp ah,al ; Compare AL value to orig. AH and... CS:010E 74 02 je 0112 ; ...if they match, proceed with code CS:0110 CD 20 int 20 ; Otherwise terminate and return to DOS CS:0112 ... ... ; (code continues here) 4.3.3 - Fooling TD386 Virtual-86 Driver This method is based on the fact that Turbo Debugger's V8086 module (TD386) does not use the INT 00 routine pointed to by the VM86 task's interrupt table whenever a division by zero takes place, but rather its own handler routines. TD386's own routine just aborts execution (returns back to debugger) and reports about a faulty division ignoring the INT 00 handler in VM86! CS:IP will therefore have to be manually set to the proper value. Normally in Real Mode or Virtual 8086 Mode (usually the Protected Mode program that set up VM86 gives control to VM86 routines when an interrupt handler is called either by an exception, if it isn't a fatal one, or an INT nn instruction) the INT 00 routine will be called after dividing by zero. So, what we could do is to point INT 00 vector to the next instruction, for example, to recover from a division fault. This is a good way of exploiting TD386's weakness without degrading compatibility with other VM86 environments. Remember to restore the original INT 00 vector though, or the next INT 00 call will hang the computer. This makes a nice trick on Real Mode debuggers, too, but please note that stack pointer SP is modified when INT 00 is called. The next three POPs will be the CS:IP of the faulty (I)DIV instruction and FLAGS image, then the stack will be as it was before the division. It is to be noted that some other interrupts will also act like the divide error. These are INTs 02 (NMI) and 3 (software breakpoint) and TD386 will under no circumstances run the actual VM86 routine. INT 01 VM86 routine will be called if the source of the call was an INT instruction, but exception #1 caused by a set Trap Enable Flag will return you to TD386... There are also some other exceptions, such as #6 (Invalid OP-Code), #12 (Stack Fault) and #13 (General Protection Fault), which will make TD386 pop up if an INT instruction was _not_ their original cause. ABOUT THE EXAMPLE: ------------------ This example changes INT 00 vector point to the instruction following the faulty division and divides by zero. Nothing more! What this doesn't show is restoring the original vector. CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 C7 06 00 mov word ptr [0000],0110 ; Set INT 00 pointer IP to 0110 00 10 01 ; and... CS:010A 8C 0E 02 mov [0002],cs ; ...CS to current CS (CS:0110) 00 CS:010E F7 F0 div ax ; Divide by zero CS:0110 ... ... ; (code continues here) 4.3.4 - Using INT 01's to make Soft-ICE gag When issued either as INT 01 or the undocumented 0F1h opcode, Soft-ICE will beep a couple of times. Note that if INT 01 was used, Soft-ICE will beep only if 'BREAK' is OFF (if a 'BREAK ON' command is entered, it won't happen). However, the undocumented 0F1h opcode will make the beep occur even when 'BREAK' is ON. This is because the single-byte opcode is IOPL insensitive _and_ the "bug" that causes Soft-ICE to beep is only in the INT 01 handler, the V86 monitor INT 01 routine is OK (can't imagine why). It would be easy to put an INT 01 in a loop, possibly a self-checking loop repeating hundreds of times. Doing this would keep the computer busy beeping for a long time because each INT 01 will cause about 0.5 sec. delay... Computer speed has no effect on the time each beep takes, and since beeping the beeper will halt any other processing the user would have hard time trying to get back to his dear Soft-ICE if he accidentally launches the loop. :) Not to mention that Real Mode debuggers stop at each of the INT 01's... Note that exception #1 caused by a single-step trap doesn't have this effect when Soft-ICE is loaded. ABOUT THE EXAMPLE: ------------------ This is just a (too) simple example of using INT 01 to beep. This loop must be embedded in some other routine which, for example, checks the integrity of the code to make removal harder. Otherwise it's of no use... CS:0100 B9 FF FF mov cx,FFFF ; Whoa! Going to loop 65535 times! CS:0103 CD 01 int 01 ; Beep-beep (could also be opcode 0F1h) CS:0105 E2 FC loop 0103 ; Looping to CS:0103 until CX=0001 4.3.5 - Using self-tracing to fool Soft-ICE How about using self-tracing code as an anti-debugging feature against Soft-ICE? When tracing in the debugger screen, Soft-ICE clears the Trap Enable Flag after each instruction, including the exception #1 handler IRET. Even when manually set in Soft-ICE, the Trap Enable Flag will not generate an INT 01 trap and neither will the TF bit be retained if Soft-ICE is exited to run code at an active (enabled) execution breakpoint (otherwise TF bit will be set)! If INT 01 routine is designed to modify the code being executed, the code wouldn't run properly under Soft-ICE if debugger screen is entered while self-tracing. To trace such code will require multiple commands for one single instruction and thus would be _very_ annoying to trace lengthy code (the commands can be defined as macros but using a decryption routine supporting variable-length instructions makes it harder to use even them since there is no way to make Soft-ICE set a breakpoint after the instruction at CS:IP)... Sounds like fun? ;) (for an example, see 4.4.6, 'The Running Line') There's also another quirk to Soft-ICE: without any special tricks self-tracing code will disable _any_ memory access hardware breakpoint that would otherwise be triggered by the code being self-traced. Note that it's possible to set a breakpoint to be triggered by the INT 01 routine if it's used for self-modification, though. This quirk is an implementation bug in Soft-ICE and may not work in other 386 debuggers. Since the CPU checks for exception #1 traps caused by either a single-step via Trap Enable Flag or data hardware breakpoints at the same time, the condition for both of these is true. The proper procedure would be to invoke the hardware breakpoint handler before the VM86 Single-Step Trap, but how unfortunate for anyone debugging that Soft-ICE chooses to serve the Single-Step Trap first thus suppressing the data breakpoint... Therefore simply pointing INT 01 to an IRET and setting the TF bit in FLAGS would be sufficient to disable any memory access breakpoint in the code being traced! 4.3.6 - Screwing up Soft-ICE with back door commands Since Soft-ICE versions 2.50 and up offer a program back door commands to execute Soft-ICE commands and manipulate breakpoints among other things, they could be used against Soft-ICE itself. Disabling all breakpoints or even unloading Soft-ICE wouldn't be a problem to anti-debugging code. (for a more detailed description see section 3.3.8, 'Back door commands in Soft-ICE!') If you decide to execute commands via a back door command, note that trying to lock up keyboard in Soft-ICE won't work because the routine, which returns control to Soft-ICE when either the key combination is pressed or a breakpoint has occurred, enables keyboard from the Programmable Interrupt Controller. To lock keyboard (and everything else) in Virtual 8086 Mode, enter 'OB 21 FF' (only this works), and typing 'OW 21 FFFF' would lock up Soft-ICE screen. But as already mentioned, keyboard cannot be locked out from Soft-ICE. Even if the VM86 task can't use the keyboard, the correct key combination will always pop up the debugger screen. Any other Soft-ICE commands should work, though, especially those switching operation modes such as 'ACTION' (specifies action after breakpoint has been reached, valid parameters are any interrupt number in Virtual 8086 Mode or 'HERE' to return to Soft-ICE) or rebooting computer with 'HBOOT'. Calling a back door won't bring up the debugger screen, but when entering commands, remember that they also appear on the debugger screen when the user returns... Another way of crashing Soft-ICE with the 'Do a Soft-ICE command' is to put a single null character (00h) where the command to be executed should reside. This totally hangs Soft-ICE, probably because the command string gets terminated before even one Carriage Return (0Dh) is detected (which would issue the command preceding it). As a side notice, there is an "undocumented" command in Soft-ICE which may cause the debugger to act erratically and lock up. This powerful command is 'CMx' (replace 'x' with a hex value in the range 0-F). The original purpose of this command was to change the megabyte under which the memory dump window works, and it probably is a remnant from Soft-ICE development phase. However, the command doesn't check for the amount of RAM installed. Therefore, exceeding the physical memory limit will make Soft-ICE try to read from a non-existent address thus causing a General Protection Fault. The fatal thing here is that Soft-ICE _itself_ causes the GPF, therefore recovery is impossible (exception handler routines not prepared for this). The "command" can _only_ crash Soft-ICE when the memory dump window ('WD' enables/disables) is enabled and the target machine has less than 16MB of system RAM installed, but when it works, Soft-ICE will be _totally_ screwed up and a cold boot is required to get out of the debugger screen! To encompass most systems, with a maximum of 16MB RAM, issuing command 'CMF' is recommended. Executing this command with the help of back door commands Soft-ICE could be crashed (see the example below). ABOUT THE EXAMPLE: ------------------ Since this is just a simplified example, it must be noted that setting those AX, SI and DI register values just before INT 3 looks very suspicious. They should be entered, if at all possible, way before executing an INT 3 (and well hidden) to make it look like that the INT 3 instruction is just a regular Real Mode debugger trap. Also, if an instruction is put immediately after the INT 3 which triggers the back door, no execution breakpoint set to that address will interrupt code execution. CS:0100 B8 11 09 mov ax,0911 ; Back door function number into AX CS:0103 BE 47 46 mov si,4647 ; "Magic value" #1 into SI CS:0106 BF 4D 4A mov di,4A4D ; "Magic value" #2 into DI CS:0109 0E push cs ; Just to make sure we have... CS:010A 1F pop ds ; ...the correct DS CS:010B B8 11 01 mov dx,0111 ; Command string starts at DS:DX CS:010E CC int 3 ; Go for it and screw up Soft-ICE! CS:010F CD 20 int 20 ; Terminate and return to DOS CS:0111 43 4D 46 0D db 43 4D 46 0D ; 'CMF' plus a Carriage Return and... CS:0115 00 db 00 ; ...a null-character string terminator 4.3.7 - Unloading Soft-ICE! It's not much of a challenge unloading Soft-ICE, tracing the removal procedure was enough to find it out. Basically, returning to Real Mode and restoring original Interrupt Descriptor Table (IDT) value is enough to disable Soft-ICE, but to do this we must use an undocumented back door command. Note that even though installing Soft-ICE as a device driver leaves a small stub in conventional memory, it won't stop us disabling its functionality... ABOUT THE EXAMPLES: ------------------- Here are examples of removing the Protected Mode portion hiding in extended memory and disabling (*) the device driver stub in conventional memory. (tested with Soft-ICE version 2.80) In example #1, processor is simply returned into Real Mode. But since all segment registers are cleared in the process, we should first do an intersegment jump, and at least stack segment (SS) _must_ be restored. If we don't, the system will become instable at the next interrupt (which simultaneously PUSHes CS:IP plus flags into somewhere 0000:xxxx messing up system data areas!). Therefore interrupts should also be disabled with a CLI until the correct stack segment has been loaded. But I guess you've already done that at the very beginning of your code... In example #2, a character device 'SOFTICE1' is opened for read and write operations and a byte (02h) is written to the device. This device name is reserved, in addition to 'NU-MEGA', by Soft-ICE but only when loaded as a device driver. Frankly speaking, I don't know the heck this piece of code is supposed to do. It was just a part of Soft-ICE unloading sequence but causes weird trouble when tracing code with a Real Mode debugger to the interrupt call writing to this device (why?). If someone knows better about its purpose, please E-mail! Example #1 (returning to Real Mode): CS:0100 2E 8C 0E mov cs:[011D],cs ; Set correct CS for intersegment jump 1D 01 CS:0105 B4 10 mov ah,10 ; Back door function number into AH CS:0107 BE 47 46 mov si,4647 ; "Magic value" #1 into SI CS:010A BF 4D 4A mov di,4A4D ; "Magic value" #2 into DI CS:010D CC int 3 ; Start running code in Protected Mode CS:010E 0F 20 C0 mov eax,cr0 ; Read Control Register 0 into EAX CS:0111 66 25 FE and eax,7FFFFFFE ; Mask out PG and PE bits and... FF FF 7F CS:0117 0F 22 C0 mov cr0,eax ; ...load into CR0. Back to Real Mode! CS:011A EA 1F 01 jmp hhll:011F ; Load CS and flush decode queue ll hh ; (CS: ll=low byte, hh=high byte) CS:011F 2E 0F 01 lidt cs:[012B] ; Load IDT from CS:012B 1E 2B 01 CS:0125 8C C8 mov ax,cs ; Get CS into AX and... CS:0127 8E D0 mov ss,ax ; ...load it as the stack segment CS:0129 CD 20 int 20 ; Terminate and return to DOS CS:012B FF 03 dw 03FF ; IDT limit (length) CS:012D 00 00 00 dd 00000000 ; IDT base address 00 Example #2 (disabling device driver stub): CS:0100 B8 02 3D mov ax,3D02 ; DOS service Open Handle for R/W CS:0103 0E push cs ; Just to make sure we have... CS:0104 1F pop ds ; ...the correct DS CS:0105 BA 1F 01 mov dx,011F ; Device name starts at DS:DX CS:0108 CD 21 int 21 ; Open file and allocate handle CS:010A 72 11 jb 011D ; Jump if file not found CS:010C 89 C3 mov bx,ax ; Save returned file handle into BX CS:010E B8 03 44 mov ax,4403 ; DOS IOCtl Character, write to device CS:0111 BA 28 01 mov dx,0128 ; Send buffer starts at DS:DX CS:0114 B9 01 00 mov cx,0001 ; Send one byte CS:0117 CD 21 int 21 ; Write to device CS:0119 B4 3E mov ah,3E ; DOS service Close Handle CS:011B CD 21 int 21 ; Close file and deallocate handle CS:011D CD 20 int 20 ; Terminate and return to DOS CS:011F 53 4F 46 54 db 53 4F 46 54 ; 'SOFTICE1'... CS:0123 49 43 45 31 db 49 43 45 31 ; ...and... CS:0127 00 db 00 ; ...a null-character string terminator CS:0128 02 db 02 ; Character 02h 4.3.8 - Cause Soft-ICE to abort program (idea by Inbar Raz) When Soft-ICE is loaded as a device driver, a stub will remain in conventional memory reserving device names 'NU-MEGA' and 'SOFTICE1' for its own use. They both can be opened for read and write operations, but if writing to them is attempted using DOS function 40h (Write Handle), used for writing to _files_, DOS will pop up with a critical error. Note though that unlike 'SOFTICE1' which will always be reserved, Soft-ICE only uses the device name 'NU-MEGA' if it was _not_ loaded as an EMS manager with /EMM switch! So, to make a program incompatible with Soft-ICE, it would only need to create a file called 'NU-MEGA' or 'SOFTICE1' for its own purposes. If Soft-ICE was loaded as a device driver the program refuses to run, otherwise it will run fine. Additionally INT 24 routine (DOS Critical-Error-Handler) could be replaced with a custom handler which would be executed after the critical error has occurred. It could be a simple IRET to make the program pop back to DOS, a routine to recover from the error combined with unloading Soft-ICE (see 4.3.7, 'Unloading Soft-ICE!'), or anything you can think of! No examples here, you _should_ be able to do your own implementations yourself. 4.4 - Self-modifying code ------------------------- Self-modification is a useful method in some cases. Since code segment CS address varies every time a program is run, self-modification could be used to set the correct segment address of a jump (EXE-files have relocation items for this matter), but self-modifying code could also be used as an anti-debugging trick. There are different levels of self-modification but they all are based on replacing a whole instruction with another. 4.4.1 - Simple self-modification Changing an instruction before executing it is self-modification in its simplest form. It prevents correct disassembly if just a dumb code disassembler is used, but doesn't do much else. ABOUT THE EXAMPLE: ------------------ The example here shows a basic code modification... Beforehand it looks like this piece of code is going to lock up the computer, but the subroutine overwrites the CLI/HLT pair with an 'INT 20'. NOP isn't needed, but see also 4.4.2, 'Foiling 'Step Over'/'Proceed' debugger commands, part 2 of 2'. CS:0100 0E push cs ; Save code segment CS on stack and... CS:0101 1F pop ds ; ...load it as new data segment DS CS:0102 E8 03 00 call 0108 ; Call the subroutine modifying code CS:0105 90 nop ; (not necessarily needed) CS:0106 FA cli ; Disabling interrupts and... CS:0107 F4 hlt ; ...halting the CPU? Nope! CS:0108 C7 06 06 mov word ptr [0106],20CD ; Replace CLI/HLT with INT 20 01 CD 20 CS:010E C3 ret ; Return from subroutine ------- *CS:0106 CD 20 int 20 ; Terminate and return to DOS 4.4.2 - Foiling 'Step Over'/'Proceed' debugger commands, part 2 of 2 This is a quite nice thing to do with self-modifying code. While (P)roceeding, all Real Mode debuggers must insert an INT 3 software breakpoint instruction after certain instructions in order to return to debugger after executing "one" of them. These instructions are INTs, CALLs, LOOPs, instructions with a REP prefix, etc. to name some. This information could easily be used to our advantage by modifying the opcode or just the byte following one of those instructions mentioned above before it gets executed. After pressing the 'Step Over'/'Proceed' key, the debugger sets an INT 3 after the instruction if needed and lets the program run freely until the INT 3 is executed (a debugger could also single-step through the code to prevent this trick, though not very likely). The debugger entirely relies on the INT 3 being there, therefore overwriting the same byte within the call, loop, interrupt handler routine, etc. would allow unrestricted code execution without the debugger popping up again afterwards. This is extremely effective in very long loops because the user would have to trace through the whole loop, and _that_ sure isn't a wonderful thing to do. Although this method works best on Real Mode debuggers, some 386 debuggers may also be affected. After all four 386 hardware memory breakpoints have already been defined, the debugger will be forced to use INT 3 -style software breakpoints enabling this trick. A good example is Soft-ICE, which is dumb enough _not_ to release the hardware breakpoints for the debugger's internal use _even_ if they're disabled. No breakpoints are necessary while tracing anyway... For more tricks that can fool a 'Step Over'/'Proceed' command, see the first part of this trick, section 4.2.9, and 4.2.10, 'Faking a procedure call'. ABOUT THE EXAMPLES: ------------------- This is a loop which would run 65535 times for the only purpose of overwriting the byte following the LOOP instruction. A debugger would put the INT 3 (opcode 0CCh) at CS:010C if the LOOP is stepped over, but since the loop itself restores the byte a debugger would overwrite, stepping over the LOOP will actually run all of the code coming after. If you're going to put this kind of a loop in your code, remember to make it a _long_ one. CS:0100 0E push cs ; Save code segment CS on stack and... CS:0101 1F pop ds ; ...load it as new data segment DS CS:0102 B9 FF FF mov cx,FFFF ; Going to loop 65535 times CS:0105 C6 06 0C mov byte ptr [010C],CD ; Restore byte 0CDh of INT 20 01 CD CS:010A E2 F9 loop 0105 ; Looping to CS:0105 until CX=0001 CS:010C CD 20 int 20 ; Terminate and return to DOS 4.4.3 - Playing with Prefetch Instruction Queue (PIQ) Invisible to the user, every 80x86 processor has a tiny memory area, as small as 4 bytes (*) on 8086/8088 processors and as large as 32 bytes on 486's, to quicken code execution by fetching instructions only seldom from "slow" memory. Prefetch Instruction Queue memory isn't updated whenever the actual system memory is changed within the range that PIQ currently holds, which makes it a useful trick against _any_ program tracing code (single-stepping), including 386 debuggers! The trick is that if program is executed normally and no interruptions take place (external interrupts or a Single-Step Trap, for example), changing the following opcode into another or its operands won't affect execution in any way because the code already in PIQ will be executed instead. If code is being traced, on the other hand, the changed code will be executed instead the correct one. This trick can also be used against "intelligent" program decompressors such as TRON, CUP, etc. They all single-step through code while trying to find the unpacking/decryption routine and since no human is controlling this, they could easily be confused with a PIQ trick. One is to replace the following instruction with an 'INT 20' (or any other interrupt call), it would terminate the process because generic program decompressors usually will not allow any interrupts to be executed, dumber decompressors will execute the INT 20 thus quitting anyway. Another would be to replace an instruction with a 'JMP $' (where '$' is the address of the same jump instruction), this will make the decompressor _never_ stop. One final use of this trick is to lead the decompressor to a false track with a jump, for example. The jump could be to an infinite loop, other routine designed for debuggers or any location outside of program code, the BIOS reboot vector at FFFF:0000, for example... To make a PIQ trick work in about 70% of all machines at about 85% of the time (not to be trusted... the only certain thing is that everything is uncertain when trying a PIQ trick ;), there should first be an instruction flushing the PIQ, such as a jump far enough or an interrupt call. Second, not only the opcode modifying code should be as short as possible but also the target opcode. Finally you should make sure not to modify code too far away from the instruction which does it so that a prefetch wouldn't occur before reaching the modified instruction. Remember also that the modified instruction can only be _ahead_ of the instruction modifying, that the modified code should only be executed once (otherwise it must be restored) and that before the modified instruction there must not be any "abnormal" changes to CS:IP (a LOOP, RET, etc.) which could cause a PIQ flush. A CLI is in order before modification, too, so that no external interrupts mess up the trick. For more information on PIQ, see 3.3.2, 'Prefetch Instruction Queue (PIQ)'. ABOUT THE EXAMPLE: ------------------ This example replaces an instruction with an 'INT 20'. Tracing would therefore lead to program termination. Note that no precautions have been taken to prevent premature prefetch. CS:0100 0E push cs ; Save code segment CS on stack and... CS:0101 1F pop ds ; ...load it as new data segment DS CS:0102 C7 06 08 mov word ptr [0108],20CD ; Replace code with INT 20 01 CD 20 CS:0108 ... ... ; (code continues here) ------- *CS:0108 CD 20 int 20 ; Terminate program if tracing code 4.4.4 - Code encryption To protect your code from prying eyes, code encryption is one way. Usually the whole program is encrypted with only a small decryption routine in the beginning and once the routine has finished decryption the rest of the code can be run. Code encryption not only inhibits code examination before decryption, but can also trick debuggers if the encrypted code immediately follows the decryption routine (ie. execution proceeds with the decrypted code after the decryption loop has ended). Putting an INT 3 -style breakpoint (which is used by any 8086 debugger) after the decryption loop, where the breakpoint would usually be put so that you wouldn't have to trace the whole operation through, you'd end up overwriting the the first byte of the area where the decryption routine operates... and unfortunately losing the INT 3 opcode in favour of whatever the decryption algorithm produces of it. To make tracing even harder you should start decryption from the end of encrypted code instead of the beginning. This way a breakpoint couldn't be set after decrypting just the first byte, and the annoying decryption operation would have to be traced in whole. Triggering hardware execution breakpoints can also be avoided if the decryption routine transforms the LOOP, JMP, or whatever instruction was used for looping, as its final task so that the breakpoint would fall in the middle of the new opcode. This is, however, only possible by using a fixed decryption key which would produce the new opcode from the old one, a separate modification code would simply be too easy to bypass. On the other hand, a hardware memory access breakpoint set to be triggered by the last decryptions wouldn't be affected by these tricks. You'll find examples of code en-/decryptors in section 4.6, 'Simple code encryptors'. 4.4.5 - Hooking a decryption routine to an interrupt This is useful in code containing more than one encrypted region. Basically you must hook a code decryption routine as an interrupt handler and then use the interrupt for calling the decryption subroutine. It may save you a few bytes if used instead of CALLing the routine, especially if called with a single-byte interrupt opcode (INT 3, INTO for INT 04 but Overflow Flag (OF) must be set, or INT 1 (opcode 0F1h) available in 386+ processors). The decryption routine just needs decrypted code size to be supplied if key is hard-coded, otherwise it is also required. No examples here, but you may want to check out 4.4.6, 'The Running Line' and section 4.6, 'Simple code encryptors'. 4.4.6 - The Running Line (idea presented by Serge Pachkovsky) Ever thought of decrypting code on-the-fly? Code self-decryption one instruction in advance at a time, or virtually any self-modification (though from here on only decryption will be discussed), can be achieved by single-stepping through code in a similar fashion as Real Mode debuggers do it. This is a very advanced anti-debugging method and quite resistive to various hacking attempts, too. It never exposes long fragments of code to analysis and makes debugging with any Real Mode debugger nearly impossible, it even hinders tracing and suppresses memory access breakpoints on Soft-ICE (see 4.3.5, 'Using self-tracing to fool Soft-ICE')! In addition to those, any execution hardware breakpoint set in the code being decrypted can be disabled at runtime. It is done by setting the Resume Flag (RF, bit 16) of the EFLAGS image on stack and then loading it with a 32-bit IRET (an IRET with the Operand Size Prefix (opcode 66h)), and only in one case this will _not_ work, that's when the 386 debugger has set IOPL to less than 3 in VM86 task and checks for a set RF bit in the image on stack before running the IRET. However, simulating a 32-bit INT call is required for this addition: first increment SP by 6, then do 32-bit PUSHes for the EFLAGS image, CS and finally EIP. It's important to note that although setting 'BREAK ON' in Soft-ICE will cause a General Protection Fault at the 32-bit IRET, it is _only_ because Soft-ICE is not expecting any 32-bit IRETs (IRET is an IOPL-sensitive instruction). If a VM86 control program's V86 monitor is implemented correctly, no problems will arise at IOPL<3. In theory, only setting the Trap Enable Flag in FLAGS and replacing INT 01 handler with a decryption routine would be enough. However, in practice it isn't as easy as it sounds. The main problem will be how to determine the number of bytes to be decrypted at a time. There are some solutions but each of them has their disadvantages as well. One is to actually mark the length of the next opcode before the opcode itself. However, it means that code size will increase by one byte for each instruction used. Another method would be to assume that every opcode is, say, 4 bytes in length, extend shorter opcodes with NOPs, for example, and always decrypt 4 bytes at a time. This is the worst way because it not only wastes lots of memory but also requires a complicated routine to check for NOPs and so on... The last solution, and which I consider the best, is to always decrypt a certain number of bytes and after executing one instruction, re-encrypting them. The previous methods do not require re-encryption, but this _does_ to properly decrypt the next opcode (mainly needed to support longer than 8-bit decryption keys). The only disadvantage with this method is that the user has to determine the maximum length of code that will be executed until the next INT 01 call occurs (a NOP instruction in case of INTs (see the notes below) and double-steps with stack segment SS register loads must be taken into account! For more information see sections 3.3.5, '"Double-stepping"', and 4.1.1, 'Causing CPU to execute two instructions at a time'). Even though too much code will be decrypted in advance nearly always, the extra "code" will _not_ reveal _anything_ to anyone debugging if the decryption key is modified by a fixed value after processing each opcode. Please note though that exception #1 will not be generated until executing one instruction after setting TF bit in FLAGS. This includes any INT instructions and exceptions generated within the encrypted, self-traced code portion! Therefore you should _always_ keep external interrupts disabled (CLI instruction) while tracing (VERY IMPORTANT!), and add a NOP after any INT instruction (INT nn+NOP must be treated as a single opcode, meaning that both of them must be decrypted at the same time, with the same key). Having done with the self-tracing portion, remember to restore the INT 01 vector! ABOUT THE EXAMPLE: ------------------ This is an over-simplified example of how self-tracing works. It merely assumes that every opcode is only two bytes long (the worst solution used). Decryption key is not modified, neither does the example re-encrypt code after execution or check for "stuff bytes". INT 01 vector isn't restored either. CS:0100 31 C0 xor ax,ax ; Zero AX CS:0102 8E D8 mov ds,ax ; Load 0000 as DS segment CS:0104 C7 06 04 mov word ptr [0004],011C ; Set INT 01 pointer IP to 011C 00 1C 01 ; and... CS:010A 8C 0E 06 mov [0006],cs ; ...CS to current CS (CS:011C) 00 CS:010E 9C pushf ; Push FLAGS onto stack CS:010F 5B pop bx ; Pop FLAGS image to BX CS:0110 80 CF 01 or bh,01 ; Set bit 8 (TF) of FLAGS image CS:0113 53 push bx ; Push FLAGS image onto stack CS:0114 9D popf ; Pop FLAGS from stack setting TF bit CS:0115 90 nop ; A plain NOP, not yet encrypted CS:0116 C3 C2 db C3 C2 ; Encrypted instructions,... CS:0118 D2 33 db D2 33 ; ...wonder what they... CS:011A F9 32 db F9 32 ; ...really are? See below! CS:011C 55 push bp ; Save BP CS:011D 89 E5 mov bp,sp ; Copy value from SP to BP CS:011F 8B 6E 02 mov bp,[bp+02] ; Load IP of next instruction into BP CS:0122 2E 81 76 xor word ptr cs:[bp+00],1234 ; Decrypt next instruction 00 34 12 CS:0128 5D pop bp ; Restore BP CS:0129 CF iret ; Return from decryptor routine ------- *CS:0116 F7 D0 not ax ; Invert AX (0000 -> FFFF) and... *CS:0118 E6 21 out 21,al ; ...mask IRQs 0-7 (including keyboard) *CS:011A CD 20 int 20 ; Terminate and return to DOS 4.5 - Checksum generators ------------------------- Using self-checking code is a good idea when trying to prevent code modifications made by a user debugging. Even though a code integrity checking routine itself is vulnerable to attacks, if implemented properly a checksum generator could be of immeasurable help against debuggers. One could, for example, use self-checking in conjunction with a decryption routine and try to use the output (checksum) of the self-checking routine as the input (decryption key) of the decryption routine. If someone changes bytes (either a user patching code with NOPs or a debugger inserting INT 3's), the rest of the code will be improperly decoded making further execution impossible. There is an unlimited number of possibilities how a checksum generator is could be done. Here you'll find some of the most common (and the simplest yet pretty effective) error detection algorithms, CRC-16 and CRC-32 for example. 4.5.1 - Sum of bytes The term "checksum" originally probably referred to early summing formulas, but later has widened to encompass other error detection algorithms as well. The summing method is one of the simplest algorithms nowadays. To detect a change in data all the bytes/words/etc. could be summed together when we're absolutely sure of the consistency of the data, and then just re-calculate the checksum later to verify the data. Simple, huh? Well, almost too simple! If the sum is calculated in an 8-bit register (sum MOD 256), there is a 1/256 chance that an error goes undetected (this is "a blind spot"). If a 16-bit register was used there would only be a 1/65536 chance, but this wouldn't help if the the data was short, only a few bytes. There is also the possibility that when one byte change occurs, another error would compensate for first one thus resulting in undetectable errors in data while the checksum remains the same. ABOUT THE EXAMPLE: ------------------ The example here sums all the bytes in the code, of which the checksum is to be calculated from, in AL after clearing the register first. Note that this example doesn't include the routine to compare a calculated checksum to the original. CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BB ll hh mov bx, ; BX is current (and start) data offset CS:0105 B9 ll hh mov cx, ; CX defines length of checksummed code CS:0108 30 C0 xor al,al ; Zero AL CS:010A 02 07 add al,[bx] ; Add byte at DS:BX to AL CS:010C 43 inc bx ; Increment byte offset value by one CS:010D 49 dec cx ; Decrement count value by one CS:010E 75 FA jnz 010A ; Jump to CS:010A if CX!=0000 CS:0110 ... ... ; (code continues here) 4.5.2 - Number of bits Another method would be to count the number of set (1) or clear (0) bits in data. However good this sounds, it has the same weaknesses as summing bytes. Actually, this is even more susceptible to undetected errors, even a great number of them, because the errors always occur on bit level. Isn't it very much more likely that one bit gets cleared while another gets set, than that an 8-bit value changes while another transforms to compensate for the inconsistency in checksum? ABOUT THE EXAMPLE: ------------------ This is a piece of code that counts all the bits set (1) in a given length of memory. It uses BL as a bit pointer, BH to store and rotate a value with only one bit set, DL to hold a byte for testing and the bit count is stored in AX. Note though that this example _only_ checks how many bits there are set, the logic required to make use of this as an integrity checker is missing. CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BE ll hh mov si, ; SI is current (and start) data offset CS:0105 B9 ll hh mov cx, ; CX defines length of checksummed code CS:0108 BB 08 80 mov bx,8008 ; BL points to current bit in test, BH ; contains a value with bit 7 set CS:010B 8A 14 mov dl,[si] ; Load byte at DS:SI into DL CS:010D 84 D7 test dl,bh ; Test BLth bit and... CS:010F 74 01 jz 0112 ; ...if clear (0), skip over... CS:0111 40 inc ax ; ...incremention of bit count by one CS:0112 D0 CF ror bh,1 ; Rotate bits in BH left by one bit CS:0114 FE CB dec bl ; Decrement bit pointer by one and... CS:0116 74 02 jz 011A ; ...if result is BL=00, skip over... CS:0118 EB F3 jmp 010D ; ...jump to test next bit CS:011A 46 inc si ; Increment byte offset value by one CS:011B E2 EB loop 0108 ; Looping to CS:0108 until CX=0001 CS:011D ... ... ; (code continues here) 4.5.3 - Multiplication and division While summing and bit counting methods are not very good, multiplication and especially division ones are. One could multiply all the bytes, for example, in the data with each other and have the checksum from the result. Since the result increases very quickly, at least a 16-bit register should be used, to reduce the possibility of "a blind spot", too. An even more reliable method than multiplication is division. The data a checksum is calculated from can be treated as an enormously long stream of bits, as one huge number, and then divided by any certain number which will also be used to verify that the checksum of the data and re-calculated checksum match. The quotient will be discarded as being totally useless since it would become nearly as large as the bitstream (data) itself, but the remainder of the division is useful as a checksum because it will not be any wider than the divisor in bits. Since the bitstream can be as long as one megabit, the normal DIV instruction cannot only be used. Rather, we must use the process of dividing taught at school, the numbers just are represented in 1's and 0's. Remember these figures: 01111 = quotient = 0Fh (will be discarded) +--------- divisor = 1011 | 10100111 = dividend = A7h (the bitstream) Bh = -0000 (fixed value) ----- 10100 -01011 ------ 10011 -01011 ------ 10001 -01011 ------ 1101 -1011 ----- 0010 = remainder = 2h (the checksum!) This division method is also the basis of the many CRC algorithms. ABOUT THE EXAMPLE: ------------------ This example uses a 16-bit value in BX as the divisor (just don't use BX=0000...). Two bytes will be read from memory at a time, placed in AX, high and low bytes are swapped because of the reverse byte ordering 80x86 processors use, and then divided by BX. The DIV instruction is handy for the dividing scheme because it returns the remainder of the division in DX, which actually is, if you look at the process dividing above, one of the sub-remainders! And since DX will be the higher two bytes of the dividend, we just have to get some more of the bitstream we want to divide, in AX. CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BE ll hh mov si, ; SI is current (and start) data offset CS:0105 B9 ll hh mov cx, ; CX defines length of checksummed code ; (in words) CS:0108 BB ll hh mov bx, ; BX is 16-bit divisor CS:010B 31 D2 xor dx,dx ; Zero DX CS:010D 8B 04 mov ax,[si] ; Load word at DS:SI into AX CS:010F 86 C4 xchg al,ah ; Swap AL and AH for correct byte order CS:0111 F7 F3 div bx ; Divide DX:AX by BX CS:0113 46 inc si ; Increment byte offset value... CS:0114 46 inc si ; ...by two (a word) CS:0115 E2 F6 loop 010D ; Looping to CS:010D until CX=0001 CS:0117 ... ... ; (code continues here) In case you need to convert this for either an 8- or a 32-bit divisor, here is some info (dividend is always wide double the divisor): Divisor Dividend Quotient Remainder ------- -------- -------- --------- 8-bit AX AL AH 16-bit DX:AX AX DX 32-bit EDX:EAX EAX EDX 4.5.4 - Calculating CRC-16 and CRC-32 While all Cyclic Redundancy Code (CRC) algorithms are based on the same method of dividing a data bitstream as presented before in 4.5.3, 'Multiplication and division', the main difference between the algorithms is the value by which the bitstream will be divided. A thing called polynomial arithmetics is closely related to CRC algorithms. It is binary arithmetics with no carries, ie. corresponding bit positions have no effect on other positions, which basically means that addition and subtraction act like a XOR operation. I'll describe the basic steps of any CRC algorithm here, algorithm-specific parameters to CRC-16/32 will be revealed later. Let's suppose we have a W bits wide divisor, or a polynomial as it should be called. It's important to note that the width will be the position of the highest set (1) bit. Therefore the width of 11011 is 4, not 5! Having chosen a good polynomial (some are better than others depending on the placement of set (1) bits) to start with, W zero bits must be added at the end of the data bitstream, which in case of a polynomial 11011 (W=4) would be 0000. For example, bitstream 10100111 would become 1010-01110000 (dashes are to separate groups of eight bits to improve readability). Many CRC algorithms reflect (swap bits around the center, or simply reverse bit order, bit 0 becoming first processed instead of bit 7) each _byte_ before processing them. Next we just divide the bitstream with the polynomial using the same principle as before only remembering that a number is equal to or greater than another if its highest set (1) bit position equals to or is higher than the other's, and that the subtraction phase is actually XOR'ing: 11011110 = quotient = DEh (will be discarded) +------------- divisor = 11011 | 101001110000 = dividend = A70h (the bitstream + 0000) 1Bh = -11011 (fixed value) ------ 11111 -11011 ------ 01001 -00000 ------ 10011 -11011 ------ 10000 -11011 ------ 10110 -11011 ------ 11010 -11011 ------ 00010 -00000 ------ 00010 = remainder = 02h (the checksum!) After reaching the final value of _the register_ (this refers to the location where, whether a CPU register or a memory area, all operations on the bitstream take place) the bits of the whole value will usually be reversed. The last step would be to XOR the final remainder with a value. Most algorithms don't do this, though. This will conclude the "theory" portion here. For more theory and detailed information on CRC algorithms, read Ross Williams' 'A Painless Guide to CRC Error Detection Algorithms'. CRC-16 and CRC-32 algorithms have been named after the width of the polynomial. CRC-16 uses the polynomial 1-10000000-00000101 (8005h), the initial register value is 0000h and final register value will be XOR'ed with 0000h, ie. not needed. CRC-32 uses 1-00000100-11000001-00011101-10110111 (04C11DB7h) as the polynomial, register is initialized to FFFFFFFFh and its final value will be XOR'ed with FFFFFFFFh also. Both of these algorithms require each byte to be reflected before processing and also reflecting the final register value. If you want to check that a routine works OK, checksum ASCII test string "123456789" (31h 32h 33h... in hex). CRC-16 for it is BB3Dh and CRC-32 is CBF43926h. You may have heard of an algorithm called "CRC-16/CCITT" but don't confuse it for "CRC-16" which is the one discussed here. There are two ways of checking that data is consistent, according to its CRC. You can either re-calculate the CRC for the data and compare it to the original one, or you could add the CRC checksum at the end of the data it was calculated from (low byte first, high byte last) and check if calculating a CRC from the whole lot (no zeroes appended!) gives a result of zero. The last method is a bit nicer and cleaner... ABOUT THE EXAMPLES: ------------------- These examples are not as efficient as table-driven implementations of CRC algorithms but they do show the CRC calculation principle. They are very clean code fragments and thus quite fast, and if you're lacking the space for the pre-calculated tables I recommend using one of these. They're also quite flexible: it isn't too hard to transform them to support other polynomial or register widths. Caveat though! These examples will unconditionally overwrite W bits with zeroes when appending after the data to be checksummed. This could also lead to system hang if segment limit is reached. Example #1 is a basic CRC-16 implementation, and all needed data is stored and manipulated in CPU registers. Everything has already been reflected, including the polynomial (CRC-16 polynomial's reflection is A001h, CRC-32's EDB88320h), so that no reflection routine would be needed consuming more memory. CRC-16 polynomial's reflection is stored in AX, each new byte is loaded to BL, BP works as a bit pointer and the dividend, and finally the CRC, will be stored in DX. Example #2, a CRC-32 checksummer, stores "the register" in memory. Operation is almost identical to example #1 but CRC-32 polynomial is stored in EAX and the dividend, finally the CRC reversed, will be stored in memory. The last stage of calculating CRC-32 is to reverse the register and NOT it (XOR with FFFFFFFFh). Example #1 (calculating CRC-16 in CPU registers, reflected): CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BE ll hh mov si, ; SI is current (and start) data offset CS:0105 B9 09 00 mov cx, ; CX defines length of checksummed code CS:0108 31 D2 xor dx,dx ; Set initial register value (zero DX) CS:010A 89 CB mov bx,cx ; Copy value from CX to BX CS:010C 89 10 mov [bx+si],dx ; Append 16 zero bits to DS:SI+BX CS:010E 41 inc cx ; Increment length for the zeroes... CS:010F 41 inc cx ; ...by two (a word) CS:0110 B8 01 A0 mov ax,A001 ; AX is reversed CRC-16 polynomial ; (divisor) CS:0113 BD 08 00 mov bp,0008 ; BP is current bit pointer CS:0116 8A 1C mov bl,[si] ; Load byte at DS:SI into BL CS:0118 D0 DB rcr bl,1 ; Rotate bits in BL right by one bit, ; in/out through Carry Flag ; RCL: Bytes normal (bit 7 first) ; RCR: Bytes reflected (bit 0 first) CS:011A D1 DA rcr dx,1 ; Rotate bits in DX right by one bit, ; in/out through Carry Flag and... CS:011C 73 02 jnb 0120 ; ...if result is CF=1, skip over... CS:011E 31 C2 xor dx,ax ; ..."subtraction" of AX from DX CS:0120 4D dec bp ; Decrement bit pointer by one and... CS:0121 74 02 jz 0125 ; ...if result is BP=0000, skip over... CS:0123 EB F3 jmp 0118 ; ...jump to test next bit CS:0125 46 inc si ; Increment byte offset value by one CS:0126 E2 EB loop 0113 ; Looping to CS:0113 until CX=0001 CS:0128 ... ... ; (code continues here) Example #2 (calculating CRC-32 with register stored in memory): [* not enough info on CRC-32 specification -> problems with implementation *] 4.6 - Simple code encryptors ---------------------------- Equipping your program with lots of debugger traps may be useless if program code itself isn't protected in any way. You should consider encryption as a part of your program. Of course, the decryption routine should also be heavily booby-trapped to benefit from code encryption. The world is full of different code encryptor implementations, you just pick one! The methods of code encryption described here are just some of the simplest ones, XOR-encryption being the most common. But of course, if anybody sends me one of his own, I'll be happy to put it here. Note though, that the examples here will not only show how to use the different types of encryption but also some of the many ways of handling memory, so watch them closely. Depending on what you need an encryptor for, you'll have lots of choices since some of these are very fast, others very small in size. 4.6.1 - XOR en-/decryption XOR encryption is the most popular form of code scrambling, at least amongst the virus writer community because it offers variable encryption with only few instructions, yet it is not quite effective. It is very easy to use, though, because the same XOR instruction can be used for both encryption and decryption, only one code fragment is needed for both operations. Since XORing inverts the bits of the destination value that were set in the value XORing was done with, the strongest encryption will be gained using encryption keys with most bits set. ABOUT THE EXAMPLE: ------------------ This just XORs the code, byte by byte from the beginning. The type of memory handling is direct manipulation. It allows very short decryption routines if used with a LOOP instruction as shown here. Hardly more than 10 bytes and very fast, too! CS:0100 BB 0D 01 mov bx,010D ; BX is current (and start) data offset CS:0103 B9 ll hh mov cx, ; CX defines length of encrypted code CS:0106 2E 80 37 xor byte ptr cs:[bx],12 ; XOR byte at CS:BX with 12h 12 ; (decrypt) CS:010A 43 inc bx ; Increment byte offset value by one CS:010B E2 F9 loop 0106 ; Looping to CS:0106 until CX=0001 CS:010D ... ... ; (encrypted code starts here) 4.6.2 - NOT en-/decryption NOT is also a considerable encryption scheme. Just like XOR, the same instruction can be used for both encryption and decryption. Actually, NOT is the same as XORing with a key all the bits of which are set thus inverting all of the bits in the destination value. Therefore, nothing is gained from using both of these Boolean operations to encrypt code. ABOUT THE EXAMPLE: ------------------ This performs a NOT operation on the encrypted code, word by word. Here a word is read from memory into AX register with a MOV instruction before inverting its bits, and then written back with MOV. Instead of LOOP, a combination of 'DEC CX' and JNZ instructions is used (a bit quicker than a single LOOP!). CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BB 13 01 mov bx,0113 ; BX is current (and start) data offset CS:0105 B9 ll hh mov cx, ; CX defines length of encrypted code ; (in words) CS:0108 8B 07 mov ax,[bx] ; Load word at DS:BX into AX,... CS:010A F7 D0 not ax ; ...invert AX (decrypt) and... CS:010C 89 07 mov [bx],ax ; ...store AX back to DS:BX CS:010E 43 inc bx ; Increment byte offset value... CS:010F 43 inc bx ; ...by two (a word) CS:0110 49 dec cx ; Decrement count value by one CS:0111 75 F5 jnz 0108 ; Jump to CS:0108 if CX!=0000 CS:0113 ... ... ; (encrypted code starts here) 4.6.3 - Bitwise rotation There are some instructions available which rotate the bits in a value to the specified direction, either through the Carry Flag or not. This offers a pretty good type of encryption. It is recommendable for use in conjunction with any other encryption method, especially the Boolean ones to increase their otherwise low security. ROL and ROR instructions rotate bits left or right and the last bit rotated out is pushed in from the opposite side, and also saved in Carry Flag. RCL and RCR instructions are similar, but they rotate through Carry Flag (ie. Carry Flag is an additional bit in the rotation). All of these can encrypt data, but to use the same instruction for decryption the number of bits rotated must be half the total number of bits in the value (eg. 4 bits for a byte). However, rotating 8 bits in a word, for example, will only result in exchanging low and high bytes thus not really encrypting them at all. ABOUT THE EXAMPLE: ------------------ This example rotates a byte by 4 bits. Note that copying CS to DS segment takes one byte more than using a simple CS: segment override prefix, and is slower also. CS:0100 0E push cs ; Just to make sure we have... CS:0101 1F pop ds ; ...the correct DS CS:0102 BB 0E 01 mov bx,010E ; BX is current (and start) data offset CS:0105 B9 ll hh mov cx, ; CX defines length of encrypted code CS:0108 C0 07 04 rol byte ptr [bx],04 ; Rotate byte at DS:BX left by 4 ; bits (decrypt) CS:010B 43 inc bx ; Increment byte offset value by one CS:010C E2 FA loop 0108 ; Looping to CS:0108 until CX=0001 CS:010E ... ... ; (encrypted code starts here) 4.6.4 - NEG en-/decryption NEG (Negate) instruction is intended for negating signed values but it can be used for encryption as well. Negating a value is done by subtracting the value from zero. Therefore the value itself works as the encryption key. Two successive negations on the same value produce the original, so the same piece of code can be used for both encryption and decryption, here also. For an example, see 4.6.2, 'NOT en-/decryption'. Just replace the 'NOT AX' with a 'NEG AX'. 4.6.5 - Basic arithmetic operations as en-/decryption algorithms Addition and subtraction (ADD and SUB) are one way of scrambling data or code into unrecognizable form. Simply by adding or subtracting a value from a byte, word, etc. code can be effectively encrypted and easily decrypted, too. Since ADD and SUB are reverse operations, naturally they can also be treated as encryption and decryption. Carrying or borrowing won't cause any trouble because values wrap at their low and high limits. No examples given. The others here should be sufficient. 4.6.6 - En-/decryption using translation tables This is quite an inconvenient encryption scheme. There is an instruction called XLAT for hardware support of translating a character into another. It may be used for encrypting data, too, but the required 256-byte array for translation table makes the routine quite large. XLAT requires BX to point to the DS segment offset of the start of the translation table and uses the value in AL register as an index to the table, finally AL is loaded with the byte fetched from the table. The same piece of code can be used for both encryption and decryption, provided that the bytes are as pairs, pointing to each other's location in the table. For example, if in location 09h of the table there is 02h, then location 02h should contain 09h. Otherwise two separate translation tables are needed consuming another 256 bytes. ABOUT THE EXAMPLE: ------------------ This example decrypts code, byte by byte, using a translation table. It also uses string instructions LODSB and STOSB to load and store a byte from memory temporarily in AL (a MOV from and to memory would be faster but takes a few bytes more space) for translation and then writes it back. Note that the translation table is _not_ shown! CS:0100 0E push cs ; Save code segment CS on stack and... CS:0101 1F pop ds ; ...load it as new data segment DS CS:0102 BE 16 01 mov si,0116 ; DS:SI will be the source CS:0105 0E push cs ; Save code segment CS on stack and... CS:0106 07 pop es ; ...load it as new extra segment ES CS:0107 89 F7 mov di,si ; ES:DI will be the destination CS:0109 BB ll hh mov bx, ; Translation table starts at DS:BX CS:010C B9 ll hh mov cx, ; CX defines length of encrypted code CS:010F FC cld ; Direction is from start towards end CS:0110 AC lodsb ; Read byte at DS:SI into AL register CS:0111 D7 xlat ; Translate byte in AL (decrypt) CS:0112 AA stosb ; Write byte in AL register to ES:DI CS:0113 49 dec cx ; Decrement count value by one CS:0114 75 FA jnz 0110 ; Jump to CS:0110 if CX!=0000 CS:0116 ... ... ; (encrypted code starts here) 4.6.7 - Scrambling original byte order This is not actually an encryption method at all. It just produces unintelligible code and simply by putting bytes in the correct sequence the real code can be seen. ABOUT THE EXAMPLE: ------------------ Here a word is read from memory to AX, the high and low bytes are swapped and then the word is written back. CS:0100 0E push cs ; Save code segment CS on stack and... CS:0101 1F pop ds ; ...load it as new data segment DS CS:0102 BE 13 01 mov si,0113 ; DS:SI will be the source CS:0105 0E push cs ; Save code segment CS on stack and... CS:0106 07 pop es ; ...load it as new extra segment ES CS:0107 89 F7 mov di,si ; ES:DI will be the destination CS:0109 B9 ll hh mov cx, ; CX defines length of encrypted code ; (in words) CS:010C FC cld ; Direction is from start towards end CS:010D AD lodsw ; Read word at DS:SI into AX register CS:010E 86 C4 xchg al,ah ; Exchange AL and AH contents (decrypt) CS:0110 AB stosw ; Write word in AX register to ES:DI CS:0111 E2 FA loop 010D ; Looping to CS:010D until CX=0001 CS:0113 ... ... ; (encrypted code starts here) 4.7 - Polymorphic code encryptors --------------------------------- [* will be covered some time if any need to *] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SECTION 5: APPENDICE ==================== APPENDIX A: Explanations/Glossary --------------------------------- -Breakpoint: Breakpoints are used by debuggers to stop code execution at a certain location and return control to the debugger. -Debugger: Debuggers are utilities which provide tools to debug programs. 8086 software debuggers (aka. 8086/software/Real Mode debuggers) rely on using a 8086 processor's debugging capabilities (ie. only single-stepping and using INT 3 instructions as breakpoints), hence the name. 386 hardware debuggers (aka. 386/hardware/Protected Mode/VM86 debuggers) take advantage of the hardware of a 386 or better processor for debugging purposes. (for more information on these two types see section 2.2, 'How a debugger works') -Debugging: Debugging is, as the word itself tells us, removing bugs from code or simply examining code. It involves examining code execution and thus trying to find a known error in the source. -Exception: Exceptions are interrupts internally generated by the CPU, often due to an error condition. When the CPU detects an invalid opcode fault, for example, it will generate exception #6 (186+ processors only). A note though: unlike hardware interrupts, exceptions are synchronous to code execution and thus can always be reproduced under same conditions. -Interrupt: Interrupts are used to literally interrupt code execution. Software interrupts, ie. caused by an exception or an INT instruction (which is handled as an exception, too), have an interrupt table (in Real Mode) or an Interrupt Descriptor Table (IDT) entry (in Protected Mode) which contains the address of the interrupt service routine. Hardware (external) interrupts are caused by an external signal to the processor's INTR (Interrupt Request) or NMI (Non-Maskable Interrupt Request) pins. When a signal comes to the INTR pin, the signaller will supply the interrupt service routine number to be called. An NMI signal is always hooked to INT 02. -Interrupt Descriptor Table (IDT): See 'Interrupt table' below! -Interrupt table: Interrupt table contains address pointers (vectors) to the interrupt handler routines called by INT instructions, exceptions and external hardware interrupts. In Real Mode the interrupt table is always located in the very beginning of memory, the area's size is 03FFh bytes (1KB) and each INT entry takes four bytes, two for the IP word and another two for the CS word (an example entry: 78563412 points to 1234:5678). In Protected Mode, Interrupt Descriptor Table (IDT) is used instead. An IDT can be located anywhere in memory, size can vary (but usually 07FFh bytes (2KB) to contain all 256 possible interrupt vectors) and both of these can be modified with an LIDT (Load IDT) instruction. Each entry in IDT consumes 8 bytes. (need some info on how an IDT entry is formed!) In a VM86 task, don't mistake the virtualized interrupt table for the Protected Mode IDT. Since Virtual 8086 Mode is just a sub-system of Protected Mode, all interrupts attempt to call the handler routine pointed to by the appropriate IDT entry. This will succeed if IOPL bits in FLAGS are set to 3 (which is the CPL of a VM86 task), but otherwise (IOPL<3) they all fail and cause exception #13 instead. VM86 tasks have because of their simulated Real Mode nature, however, a "virtual" interrupt table in the beginning of the VM86 task memory which is consulted either by a Protected Mode interrupt handler when IOPL=3, or a V86 monitor if IOPL<3, to redirect the interrupt to its VM86 routine. (see also 'V86 monitor') -I/O Permission Bitmap: I/O Permission Bitmap allows, an operating system for example, masking off access to certain I/O ports from non-privileged tasks (this only applies to Protected Mode and Virtual 8086 Mode). The bitmap consists of up to 64Kbits, each of which represents a single byte-wide I/O port (two bits for a word-wide and four bits for a dword-wide port). Setting any of the bits corresponding to the I/O port will disable I/O to and from it from a non-privileged task. In Protected Mode the bitmap will be used if the task's Current Privilege Level value is greater than the one specified in the IOPL bits of FLAGS register. In VM86 mode, the IOPL value is not checked but rather all tasks are subject to the bitmap. Exception #13 will be generated if access to a port is denied from the task trying to perform I/O. -Opcode (Operational Code): Opcodes control a whole processor's operation. They are instructions encoded as a flow of bits so that the processor could understand them. Don't confuse symbolic instructions with opcodes, though. For example, the opcode for a 'NOP' instruction would be 90h. -Privilege Level (PL): Intel processors starting from 80286's use internal privilege levels (effective only in Protected Mode) to protect execution of certain instructions and memory access from tasks that aren't privileged enough. Privilege Level varies from 0, the most privileged, to 3 being the least privileged. Default (and immodifiable) value for Real Mode tasks is 0 and for VM86 tasks 3. A Protected Mode task's PL is one of the four possible values and can be changed to the preference of an operating system. -Single-Stepping: Single-stepping is a method used by debuggers to execute only one instruction of code at a time. This way a user may watch the code execute, step by step. -Task State Segment (TSS): TSS was introduced with Protected Mode in 286 processors and is used for multi-tasking to save system state for a each Protected Mode task separately. In a 286 style TSS, contents of all registers including LDT, initial SS:SP for returning to Privilege Levels 0-2 and a back link to previous task is saved, in 16 bits. 386 style TSS's also contain CR3 register and the two extra segment registers, FS and GS, a T (Debug Trap) bit (for supporting breakpoints on task switches), offset of the I/O Permission Bitmap and the bitmap itself. In addition to these, free-formed extra info about the task can also be entered in a 386 TSS. All registers are saved in 32 bits stuffing 00's in the high word of segment registers. -Tracing: Tracing simply means single-stepping through code and examining code execution at the same time. -V86 monitor: When called from a VM86 task with an IOPL of less than 3, all software interrupts (except for the single-byte INT 3, INTO and BOUND instructions, which are not sensitive to the IOPL bits) generate exception #13. Therefore interrupt #13 handler must contain (in Protected Mode) a suitable monitor routine to determine whether the exception was caused by a VM86 task calling interrupt services or by a General Protection Fault. APPENDIX B: Suggested reading for info [+] = Printed on paper -------------------------------------- [-] = In "electronical" form +i486 Microprocessor (Intel, order #240440): An Intel 80486 processor databook Available from your local Intel representative for free! (for an Adobe Acrobat-format copy see Appendix D, 'Useful Internet sites') -HelpPC (David Jurgens): A program with a large database of 80x86 instructions and other useful stuff concerning PCs. -Interrupt List (Ralf Brown): A most complete database of DOS interrupts and other valuable PC hardware info. -A Painless Guide to CRC Error Detection Algorithms (Ross Williams): A thorough guide to error detection algorithms APPENDIX C: Useful E-mail addresses ----------------------------------- inbar@glx.chief.co.il : Inbar Raz (the author of 'Anti Debugging Tricks') (Internet - FidoNet gate: Inbar.Raz@p42.f100.n403.z2.fidonet.org) (FidoNet: Inbar Raz, 2:403/100.42) support@intel.com : Support (questions/comments/etc.) for Intel products rcollins@x86.org : Robert Collins (the sys. admin. of the X86 website) NOTE: This is _only_ for personal mail! For questions, use one of the following addresses instead. intelcpu@x86.org : Questions about Intel microprocessors othercpu@x86.org : Questions about non-Intel microprocessors support@x86.org : Technical support on anything else ralf@cs.cmu.edu : Ralf Brown (the author of the 'Interrupt List') (Alternate address: ralf@pobox.com) ross@guest.adelaide.edu.au : Ross Williams (the author of 'A Painless Guide to CRC Error Detection Algorithms') APPENDIX D: Useful Internet sites --------------------------------- http://developer.intel.com/design/product.htm * Intel Developer Home and On-Line Literature website: Offers free data sheets and manuals of many Intel products, especially 80x86 processors, in Adobe Acrobat format. Also available as FTP (see below). The full contents of this site are also available as a _FREE CD_, 'Developers' Insight CD-ROM'! ftp://download.intel.com/design/ * Intel On-Line Literature FTP site: Contains all the documents as available through WWW (see above), but due to scrambled filenames and no descriptions, it's recommended to use the website instead. http://www.x86.org/ * Intel Secrets (X86) website: Lots of information on undocumented Intel 80x86 CPU features as well as bugs. Also available through FTP (see below). ftp://ftp.x86.org/ * Intel Secrets (X86) FTP site: Has all the articles, demonstration source code and other stuff as the website (see above). http://www.cs.cmu.edu/afs/cs/user/ralf/pub/WWW/files.html * Ralf Brown's Home Page, primary website: Here you can get the most recent 'Interrupt List' and other software he's written. If this doesn't work, try the one below. http://www.pobox.com/~ralf/files.html * Ralf Brown's Home Page, alternate website: Plainly redirects all requests to the current address. ]=============================================================================[ Special thanks to: ------------------ People who have actively helped with this project: -Inbar Raz: The author of the interesting article, 'Anti Debugging Tricks'. It was a very good introduction into the world of anti-debugging! -Warren Ellis: The engineer at Intel overloaded with questions concerning 80x86 debugging features... :] People whose work has passively helped with this project: -Michael Forrest: Hm? Why did _your_ name end up here? Don't know... Anyway, read the reply to your 'Anti-Anti Debugging Tricks' article in file TO_MFORR.TXT, if you ever see this. -Werner "Dirk Gently" Zsolt: The author of 'The Action Replay card for the PC', an article about the card. (see section 2.4.2) -Ross Williams: The author of 'A Painless Guide to CRC Error Detection Algorithms', an article about error detection and algorithms. It enabled me to fill in the checksum generator section. Contacting me: -------------- E-mail: mhk@sci.fi (to get some more info on contacting me, please finger my account!) Please, do E-mail me if you have any questions or comments, or even something to contribute to this document. Also, don't hesitate to contact me if you'd like to make any suggestions or want something to be discussed here. I _do_ need your help to develop this "whitepaper" further. Information in demand: ---------------------- General info: -New (or old ;) ideas for tricks, preferably with code examples. Anything will do varying from user confusion code to code encryption and CRC checks. -Corrections and additions to info/code examples -...whatever you think might be of use! Info especially needed about: -Other debuggers than the very limited number already listed. Needed are the name and version of the debugger, if it uses hardware capabilities for debugging (primarily/optionally), whether it uses Real Mode or Virtual 8086 Mode to execute code, interrupt vectors grabbed for its own use (required with 8086 debuggers) and possibly other information. -Undocumented Soft-ICE back door commands (see section 3.3.8) -Hardware debugger cards -In-Circuit Emulators/Debuggers (ICE/ICD) (what they are, how they work...) SiGNED:MHK