
      -================================================================-

        "Poor Man's Guide to Anti-Debugging on the Intel 80x86 Family"
                                (release 1.00)
                       (Mostly ;) Copyright 1997 (C) MHK

                          Written and compiled by MHK
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^

      -================================================================-

NOTE: This document was inspired by and is mostly based on the article 'Anti
      Debugging Tricks' written by _Inbar Raz_ (with assistance from _Eden
      Shochat_ and _Yossi Gottlieb_). I wish to thank them, especially Mr. Raz,
      who generously permitted me to add the tricks from his own article as a
      part of this document!


About this document
-------------------
 In my article I intend to concentrate on introducing you deeper into the basis
of anti-debugging and presenting you with some _ideas_ rather than concrete
code examples. I will also try to explain things in a detailed way and make
everything easy to understand. However, some assembler knowledge is still
required.
 The information in this document has been gathered from various other
documents and books, as donations from people all over the world, by
experimenting, examining code execution of several "armored" programs, and
plainly as ideas (in case of tricks).
 Anyway, let's just hope this document will give you new, fresh ideas even if
you are a pro in the subject. Be creative and use your imagination!

DISCLAIMER
==========
 I cannot guarantee that _any_ of the information contained within this
document is accurate, functional or otherwise suitable for any purpose other
than just reading for pure happy-happy-joy-joy. :) If you decide to use the
information for your own purposes, you will do it AT YOUR OWN RISK. If you run
into any trouble, undesired program behaviour occurs or in the worst case, any
data is lost due to experimenting with the information offered by this
document, you're on your own. So don't bother complaining about your screwed-up
hard disk, if you should happen to produce one.
 So much for legal issues. Now, feel free to read on...

IMPORTANT NOTES:
----------------
 1) All of the numbers used in conjunction with assembler instructions or as
register values are in hexadecimal, others are decimal numbers unless otherwise
stated. To clarify some spots, an extra 'h' has been added to indicate that the
value is in hex.
 2) The code examples are written in kind of a "pseudo-assembler". It is some
sort of a mixture between the code Debug accepts and the one, which assemblers
require. If entered into DOS Debug, code examples should almost certainly be
accepted (if no other than 8086 instructions are used). Nevertheless, example
code has also been included as bytes corresponding to the instructions.
 3) Although each section should be self-explanatory enough to understand it
without a need to browse others through, I advise you to read the _whole_
document from the beginning to the end referring to the explanations section
(appendix A) when needed.
 4) There may be some inaccuracies in the text due to my misunderstandings
(sorry!), incorrect information, or processor development (ie. new CPUs have
things implemented in a slightly different way). I have marked some spots with
(*)'s to indicate info that I'm not 100% sure of and that they need to be
confirmed. I need help with correcting errors and if you see something here
that definitely is wrong, please do contact me! Also, you may notice this
document is a bit sketchy. I may have taken a bit too large a bite when trying
to include all the "might-be-useful" info here...
 5) ...and then a message to people who just want to criticize others' work
without ever accomplishing anything themselves: This document has required a
lot of research with debuggers (mainly finding out bugs in their routines ;)
and the way how a 80x86 processor's debugging support works. All of the tricks
have been thoroughly tested and are fully working on an "IBM PC compatible"
(only those requiring a PC/XT haven't been verified, and some tricks using
system-specific features, such as some I/O ports, may not work on PS/2-series
machines). So, in case you feel like complaining about this document not
containing enough methods which can trick a 386 debugger, as _some_ people do,
_you_ invent a trick not included here and send it to me with the complaint!

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

INDEX OF CONTENTS
=================

SECTION 1: GENERAL STUFF
        ----------------
        1.1 - Introduction to anti-debugging
        1.2 - Protecting your programs with anti-debugging code
        1.3 - Armored programs and virii
SECTION 2: DEBUGGING
        ------------
        2.1 - Intel processor debugging support
               2.1.1 - 8086 debugging capabilities
               2.1.2 - 80286 debugging capabilities
               2.1.3 - 80386 debugging capabilities
               2.1.4 - Intel Pentium debugging capabilities
        2.2 - How a debugger works
               2.2.1 - Real Mode debuggers
               2.2.2 - Protected Mode (VM86) debuggers
        2.3 - Debugger info
               2.3.1 - Miscellaneous info: Soft-ICE
        2.4 - Hardware debuggers
               2.4.1 - PC cards
               2.4.2 - Action Replay PC card, a useless toy
SECTION 3: INVENTING NEW TRICKS
        -----------------------
        3.1 - Things that can be assumed of normal code execution
        3.2 - Requirements for a new trick
        3.3 - Hints and tips
               3.3.1 - Exceptions and interrupts
               3.3.2 - Prefetch Instruction Queue (PIQ)
               3.3.3 - Special EFLAGS register flags
               3.3.4 - Special software interrupts
               3.3.5 - "Double-stepping"
               3.3.6 - Intel 8259, Programmable Interrupt Controller (PIC)
               3.3.7 - Common Interrupt Request lines used by a PC machine
               3.3.8 - Back door commands in Soft-ICE!
SECTION 4: ANTI-DEBUGGING TRICK IDEAS AND EXAMPLE CODE
        ----------------------------------------------
        4.1 - General ideas
               4.1.1 - Causing CPU to execute two instructions at a time
               4.1.2 - Hiding "real" code from the user
               4.1.3 - Assuming that a certain condition exists
               4.1.4 - Loopy loops...
        4.2 - Standard tricks
               4.2.1 - Modifying interrupt vectors
               4.2.2 - Masking out hardware interrupts
               4.2.3 - Reprogramming Intel 8259 Programmable Int. Controller
               4.2.4 - Disabling keyboard
               4.2.5 - Forcing a debugger stop execution
               4.2.6 - Using regular INT nn -style INT 3 calls
               4.2.7 - Checking FLAGS
               4.2.8 - Modifying original interrupt handler routine
               4.2.9 - Foiling 'Step Over'/'Proceed' debugger commands, 1 of 2
               4.2.10 - Faking a procedure call
               4.2.11 - Comparing INT 01 and 3 interrupt table entries
               4.2.12 - Using stack to fool a debugger
               4.2.13 - Generating a General Protection Fault or a Stack Fault
               4.2.14 - Exploiting rapidly changing memory areas
               4.2.15 - Storing data in the interrupt table area
        4.3 - Special tricks
               4.3.1 - Jumping to a location within an instruction
               4.3.2 - Exploiting Turbo Debugger's weak point
               4.3.3 - Fooling TD386 Virtual-86 Driver
               4.3.4 - Using INT 01's to make Soft-ICE gag
               4.3.5 - Using self-tracing to fool Soft-ICE
               4.3.6 - Screwing up Soft-ICE with back door commands
               4.3.7 - Unloading Soft-ICE!
               4.3.8 - Cause Soft-ICE to abort program
        4.4 - Self-modifying code
               4.4.1 - Simple self-modification
               4.4.2 - Foiling 'Step Over'/'Proceed' debugger commands, 2 of 2
               4.4.3 - Playing with Prefetch Instruction Queue (PIQ)
               4.4.4 - Code encryption
               4.4.5 - Hooking a decryption routine to an interrupt
               4.4.6 - The Running Line
        4.5 - Checksum generators
               4.5.1 - Sum of bytes
               4.5.2 - Number of bits
               4.5.3 - Multiplication and division
               4.5.4 - Calculating CRC-16 and CRC-32
        4.6 - Simple code encryptors
               4.6.1 - XOR en-/decryption
               4.6.2 - NOT en-/decryption
               4.6.3 - Bitwise rotation
               4.6.4 - NEG en-/decryption
               4.6.5 - Basic arithmetic operations as en-/decryption algorithms
               4.6.6 - En-/decryption using translation tables
               4.6.7 - Scrambling original byte order
        4.7 - Polymorphic encryptors
SECTION 5: APPENDICE
        ------------
        APPENDIX A: Explanations/Glossary
        APPENDIX B: Suggested reading for info
        APPENDIX C: Useful E-mail addresses
        APPENDIX D: Useful Internet sites

]=============================================================================[

SECTION 1: GENERAL STUFF
========================
NOTE: This section is good reading for both beginners and the average.

1.1 - Introduction to anti-debugging
------------------------------------
 You probably know what debuggers are, so... What? Never heard of them?!? OK,
if you're running DOS, you should already have a debugger called DEBUG.EXE
which provides you with the simplest debugging tools (yet very ineffective and
quite difficult to use). The operation called "debugging" originally had the
meaning to "remove software programming errors" but nowadays it could also mean
just to "examine code". It involves playing with breakpoints, tracing program
code, and otherwise examining code execution in order to find and remove bugs
from the source. (for explanations of the terms see Appendix A, the
'Explanations/Glossary' section at the bottom of this document)
 In order to make it harder for others to examine how the code works,
programmers have developed methods called anti-debugging tricks for their own
programs. Having finished a year's project only to find out the next day that
your masterpiece's copyprotection scheme has been cracked is quite frustrating,
isn't it? Programs protected with anti-debugging code are often called
"armored" programs. Most often this technology is used in intros, small
utilities like cracks, and even virii, but there's no limit on the type or size
of programs which could be protected, as long as DOS is concerned.

1.2 - Protecting your programs with anti-debugging code
-------------------------------------------------------
 Anti-debugging code is quite an effective way of protecting programs from
prying eyes. In addition to preventing unauthorized debugging of a program,
anti-debugging tricks can also be used to make program disassembler utilities
and generic executable file decompressors useless, for example. Beginners, who
want to learn your coding techniques, often are unable to find a way to bypass
the tricky anti-debugging code. Do NOT, however, rely on this kind of
protection because _any_ protection implemented with only software is NOT
secure! Such a protection scheme just raises "the edge" and it is _only_ a
matter of time when a skilled coder/cracker finds a way to defeat it. There are
some dedicated utilities to encrypt an executable and add anti-debugging code
to it for you, such as Protect! EXE/COM and HackStop, but as good as they are,
generic deprotectors for them are out already. It's just better to do it by
hand and give the cracker something to think about...
 Usually debugger traps are located at the beginning of the armored program but
one should also be aware that the traps could be even more powerful when put
all over the program at random locations or mixed with a specific piece of code
you wish to protect. Why? Well, isn't it slower and harder to trace the code,
while at the same time you have to worry about the debugger hanging at any
minute, don't you think? Multiple different tricks should be combined to make
their removal more difficult. Also, remember to fit the tricks suit your own
purposes. That is, you should make the anti-debugging code an integral part of
your program, so that it won't work without the code. Neither should the code
be allowed to be bypassed by simply jumping to an address after the end of the
code, to the start of the actual program. Therefore, you should make the code
jump around a bit and put the program itself surrounded by anti-debugging code,
for example, so that determining the actual program start address would not be
an easy task. Anyway, doing this may be futile if code hasn't otherwise been
protected. What I mean is that you should consider encrypting your code and
booby-trap the decryption routine.

1.3 - Armored programs and virii
--------------------------------
 Because the same methods are widely used in virii to make disassembly and
analysis more difficult, some virus scanners with heuristic capabilities may
report that the executable file has been "armored" against analysis. Good
examples of such scanners are F-Prot, which will claim that an "armoured"
program has been found in the case, and TBAV (Thunderbyte Anti-Virus), which
uses several "heuristic flags" to indicate anti-debugging code among other
tricks virii use. No need to panic, however, most of the time the files found
armored are just protected programs, _not_ new virii! But still be cautious
if unusually many warning messages start appearing at those programs you
frequently run...

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SECTION 2: DEBUGGING
====================
NOTE: This section aims at explaining extensively how debugging is made
      possible. I recommend this section for everyone interested in knowing
      how it all works, but also for anyone trying to invent new anti-debugging
      tricks, since the basis of _anti-debugging_ is in _debugging_
      (self-evident ;)!

2.1 - Intel processor debugging support
---------------------------------------
 This section covers all the features of an Intel 80x86 family processor useful
to debuggers. Note though, that 80186/80188 and 80486 processors aren't
included since no significant enhancements were made.

2.1.1 - 8086 debugging capabilities

 In the early 8086/8088 processors there was only one hardware supported
debugging feature: Trap Enable Flag (TF) aka. Single-Step Flag, bit 8 of FLAGS
register. When this bit is set, after executing the following instruction the
processor generates exception #1 (or to be precise, if this bit is found set at
the end of an instruction exception #1 will be generated), which in practice
will execute the code pointed to by INT 01 vector (single-step interrupt). Note
though, that setting or clearing the TF bit will have _no_ effect on the
instruction that changed it, the first instruction to be affected will be the
instruction _following_ the instruction that changed the TF bit status.
 The first generation 8086 processors also had (and still have) a single-byte
breakpoint interrupt, INT 3 (opcode 0CCh), supporting software code debugging.
The purpose of this instruction was that a debugger would point INT 03 vector
to itself and then could insert this small interrupt instruction in the program
at a desired breakpoint location in order to regain control of program
execution at that point. However, the hardware breakpoint support of 80386
processors has made this instruction obsolete on newer processors but it still
is there.

2.1.2 - 80286 debugging capabilities

 This generation of x86 processors doesn't have many new features for debuggers
to benefit from. Protected Mode was first introduced with 286 processors but
since execution of instructions with Real Mode defaults isn't directly
supported, no memory protection can be achieved. However, switching between
Real and Protected Mode is possible. (*) Need some info on what this allows a
debugger to do!
 Also I/O Privilege Level (IOPL) bits in EFLAGS was new but its purpose was not
exactly the same as in 386's. Someone tell me what it was originally for...

2.1.3 - 80386 debugging capabilities

 New Intel 80x86 family processors starting from 386's have far more advanced
support for hardware debugging and task protection thus enabling debuggers to
gain a very strict control over code execution: up to four hardware
breakpoints, privilege levels to perform I/O, lots of memory protection
capabilities, etc. ... Here are only those features useful to DOS debuggers,
though.

 1) Hardware breakpoints:
     So, now that the processor has four Debug Registers (DR0-DR3) reserved for
    user defined breakpoints, software based INT 3 instructions aren't needed
    any more messing up the code being debugged. Hardware capabilities include
    breakpoint on instruction execution, data R/W or just write and breakpoint
    length can be up to four bytes, all of these can be set in the Debug
    Control Register (DR7). Breakpoints are checked for at instruction
    boundaries, generating exception #1 either before executing the instruction
    at breakpoint (as a fault) in case of a code execution breakpoint, or right
    after the instruction that accessed memory (as a trap) at a data
    breakpoint. The reason that caused the occurrence of an exception #1 will
    be saved in the Debug Status Register (DR6).
     It is also possible to lock out _any_ access to Debug Registers (even in
    Real Mode and at Privilege Level of 0) and make the CPU generate exception
    #1 as fault upon a read or write access by setting the Global Debug
    Register access detect (GD) bit in DR7. When exception #1 handler is
    invoked, the GD bit is cleared to allow the routine full access to Debug
    Registers.
     Hardware breakpoints have several advantages over putting INT 3's in code,
    and one of them is that they can be placed even in ROM memory. It must be
    noted, though, that all hardware breakpoints generate exception #1, _not_
    exception #3 as one could suppose!
 2) I/O Privilege Level (IOPL):
     In EFLAGS register (bits 12-13) you can also define the maximum Privilege
    Level value (the level of least privilege) at which a task will be
    permitted to perform I/O instructions such as IN, OUT, etc. in Protected
    Mode without generating exception #13 or consulting the I/O Permission
    Bitmap (the bitmap can be used to disallow I/O to each of the I/O ports
    separately from a non-privileged task (ie. whose Current Privilege Level
    (CPL) value is higher than 0)).
     This is the reason why trying to disable the keyboard, for example, won't
    work in 386 debuggers. Any decent debugger would always intercept direct
    access to certain I/O ports.
 3) Memory protection and Virtual 8086 Mode:
     Since 386 debuggers run in Protected Mode instead of Real Mode, they can
    use memory protection to isolate the memory used by the debugger from the
    memory the debugged program has access to. To simulate a Real Mode
    environment these Protected Mode debuggers use Virtual 8086 Mode (VM86).
    Virtual 8086 Mode is not, however, entirely compatible with Real Mode since
    a VM86 task's Privilege Level is 3 (the least privileged) and Real Mode's
    implicitly 0 (the most privileged). Therefore Virtual 8086 Mode is subject
    to all of the protection schemes the CPU uses: no execution of so called
    "privileged instructions" that require a PL of 0, some instructions that
    are IOPL sensitive, etc.
     Because of the many protections that are available in Virtual 8086 Mode,
    exception #13 (General Protection Fault) is generated when privilege level
    permissions have been exceeded. This happens often with actions that
    normally in Real Mode wouldn't cause any problems.

2.1.4 - Intel Pentium debugging capabilities

 To extend 80x86 family processors' debugging capabilities even further, Intel
has equipped Pentiums with a super-debugging mode called the Probe Mode. This
processor operating mode is a very powerful companion for any debugger. It
allows a user to suspend code execution and bus activity (a complete system
freeze) at any given moment, enter the Probe Mode, modify _anything_ in the
system, and finally return to code execution with the modifications made as if
nothing had happened. This includes, but is not limited to, changing the
contents of all registers, memory and I/O ports.
 The mode can only be triggered by _hardware_, therefore it is impossible to
bypass it with _any_ software. However, because of this using Probe Mode would
require special motherboard design to support the mode or to allow such
controlling hardware to be added.

 [* need more precise information to fill in *]

2.2 - How a debugger works
--------------------------
 Even though this is a document on _anti-debugging_, we _must_ know the enemy
we are trying to fight. :) There are two types of debuggers available nowadays:
older 8086 debuggers (Microsoft Debug, Borland Turbo Debugger, etc.), that
cannot be considered very effective any more, and way more powerful 80386
hardware-assisted debuggers (Nu-Mega Soft-ICE, Borland Turbo Debugger 386,
etc.) that are recommendable for any task more demanding than just
experimenting with debuggers.

2.2.1 - Real Mode debuggers

 The older generation of debuggers that operate in Real Mode are based on the
single-stepping capability of the 8086 (and better) processor, and the
single-byte software breakpoint interrupt, INT 3 (opcode 0CCh). These
debuggers, when single-stepping, set the Trap Enable Flag (TF, bit 8 of FLAGS)
of a flag register image in the debugger's stack. Having done this the debugger
will transfer control to the program being debugged with an IRET instruction
(Return from Interrupt) thus loading the flags image with the TF bit set into
the register. Now, only one instruction of the user program will be executed
before exception #1 occurs. This exception calls the interrupt routine pointed
to by INT 01 vector, which the debugger has modified to point to itself. The
INT 3 software breakpoint is used in a bit similar fashion: INT 3 vector points
to the debugger and planting these instructions at every desired breakpoint
location will jump back to the debugger when the user program execution has
reached any of these locations. However, a weakness of using INT 3's as
breakpoints is that the single-byte instruction has to be physically there thus
messing up the program code. This prevents programs, that for example use a
self-check, from running, and also makes it impossible to debug ROM code.
 There are two types of single-stepping available in debuggers. The first one
is 'Trace into Calls/Interrupts' which will execute code instruction by
instruction just as the CPU does. Trap Enable Flag is used to (T)race into
CALLs, but since interrupts disable the TF bit after PUSHing (E)FLAGS into
stack, an INT 3 breakpoint is placed at the location where the interrupt vector
points to. The second method is to 'Step over Calls' ('Proceed') which won't
trace into any calls or interrupts but rather "steps over" each CALL and INT
instruction invisibly executing the sub-routine. Other such instructions that
will be stepped over are LOOPs and the ones with a REP prefix, to name some.
When (P)roceeding, an INT 3 breakpoint is placed immediately after one of these
instructions and the TF bit is cleared to run code until the INT 3. It would
also be possible to implement this simply by ignoring the INT 01 calls caused
by the Single-Step Trap but I don't think any debugger does it. Some problems
may arise, though, such as executing the following instruction as well in case
of an INT.
 Real Mode debuggers always use the same stack as the user program and thus
single-stepping always modifies the program's stack, too, but it may be
possible to use special tricks to retain stack in some cases. When
single-stepping, the debugger could save a minimum of three words for data
destroyed by an INT 01 call plus some reserves for possible PUSHes. Also, any
instruction modifying the first three unused words in stack area should also be
taken into account just to make sure things won't get too easy. Anyhow, trying
to retain stack with a Real Mode debugger takes a lot more effort than using
traditional methods. The best would be a code execution simulator which doesn't
actually execute instructions but rather emulates them. But... well, who would
care to write such a complex program anyway...
 Real Mode debuggers may, in addition to INT 01 and INT 3 vectors, also grab
INT 00 (Divide Error), INT 02 (Non-Maskable Interrupt) and some other
interrupts (available only in DOS), such as INT 09 (Keyboard) to allow breaking
out of code back to the debugger, and INTs 20 (Program Terminate) and 21 (DOS
Function Request Interrupt) to detect termination of user program.
 Most such debuggers (also 386 debuggers) expand their limited capabilities by
checking the instruction it will next execute in advance. A debugger only needs
to add an additional, user-invisible routine before executing the next
instruction (when single-stepping). Especially Real Mode debuggers use this
method to offer memory access breakpoints among other things, but it also makes
it possible to, for example, deny I/O and possible conflicts by modifying the
code just a bit. For instance, when single-stepping, Turbo Debugger detects and
redirects INT 3's (opcode 0CCh) to itself by directly jumping to the breakpoint
handler routine without actually executing an INT 3, and PUSHF instructions,
when single-stepping, to mask off the TF bit from the FLAGS image pushed into
stack. However, it will slow down code execution quite a bit, so not many
features are emulated this way.

2.2.2 - Protected Mode (VM86) debuggers

 The second generation of debuggers, known as 386 hardware debuggers, run in
Protected Mode. A 386 hardware debugger utilizes Virtual 8086 Mode to execute
the user program in a simulated Real Mode environment while still gaining all
the benefits from memory and I/O protection. These debuggers use the same
method for single-stepping as 8086 debuggers. Alternatively a 386 debugger
could also accomplish single-stepping by defining a hardware (execution)
breakpoint to the address just after the next instruction to be executed, I
just don't know if any of them does it this way. Breakpoints, on the other
hand, are implemented by entering the 32-bit linear addresses of desired
breakpoints into the four Debug Registers (DR0-DR3) of a 386 or better
processor, selecting breakpoint access types and lengths, and enabling them,
each of them separately, in the Debug Control Register (DR7). When the
processor determines that CS:EIP matches one of the specified code execution
breakpoints or that a memory access to a data breakpoint range takes place, it
will generate exception #1. As no user program code modifications are needed,
using hardware breakpoints makes debugger behaviour more reliable and
anti-debugging harder. Hardware debuggers can also deny all access to I/O ports
they consider dangerous to manipulate. This is done by setting the appropriate
bits corresponding to the I/O ports in I/O Permission Bitmap. After this, all
access to the ports defined from a VM86 task will result in generating
exception #13. Note that in Virtual 8086 Mode the bitmap will be consulted in
case of an I/O instruction regardless of the IOPL bits, as in Protected Mode!
 Since a 386 debugger operates in Protected Mode and the user program in
Virtual 8086 Mode, exception #1 in Virtual 8086 Mode causes a task switch to a
higher privilege task. In such a case, the new stack segment SS and pointer ESP
of the Protected Mode debugger are first loaded from the VM86 task's Task State
Segment (TSS), next GS, FS, DS and ES segment registers as 32-bit quantities
and the old stack SS:ESP pointer are pushed onto the _new_ stack, not the user
program's, along with the regular info any interrupt call would store on stack.
Therefore single-stepping will not modify the user program's stack unlike with
Real Mode debuggers.
 Though 386 debuggers are much more powerful and reliable than any 8086
debugger, they also have disadvantages. Since this kind of debuggers need
Protected Mode to run their own code so that they could use the processor's
additional protection mechanisms and Virtual 8086 Mode, another Protected Mode
program cannot coexist. Examples of such programs are software EMS memory
managers (emulators) like EMM386, QEMM386, etc. (they need to be executed in
Protected Mode in order to use the 386 paging system to switch EMS pages in
page frame). After loading one such driver DOS will be running in Virtual 8086
Mode!
 Since in Protected Mode Interrupt Descriptor Table (IDT) is used instead of an
interrupt table to vector interrupts to their service routines in VM86, all
exceptions and interrupts are first vectored through the debugger itself if it
has taken over the interrupt. Though all interrupts must pass through the
debugger to reroute them to their respective VM86 routines, 386 debuggers often
point many exceptions to their own routines to get total control over a user
program interrupt request or caused exception first. INT 01 is always vectored
to a debugger routine since both hardware breakpoints and single-stepping cause
it to occur. INT 3 may also be pointed to the debugger as a breakpoint, but it
should be possible to set its breakpoint function off in the debugger. It is
only needed to support more breakpoints than four. Since exception #13 is a
common exception in VM86 for many reasons, especially because all interrupt
calls and IOPL-sensitive instructions cause this exception if I/O Privilege
Level value indicated by the IOPL bits in EFLAGS register is less than 3 (the
Privilege Level of a VM86 task), it will also launch an error/V86 monitor
handler routine in the debugger. Some other exceptions may also be mapped to a
386 debugger, especially those reporting a "medium" or severe error. In cases
of a severe error, the debugger may ask the user whether to continue user
program execution by giving control to VM86 interrupt routines (!!!) or to stop
execution and return to the debugger.

2.3 - Debugger info
-------------------
 [* no info available, will be covered soon *]

2.3.1 - Miscellaneous info: Soft-ICE

 As funny as it is, Soft-ICE uses primarily INT 3 breakpoints as execution
breakpoints (BPX) unless the breakpoint is in ROM! Hardware breakpoints are
only used by memory breakpoints set on execution (BPM X).
 When INT 3 is called, Soft-ICE checks the request in the following order:
back door commands (SI and DI set to the "magic values") have the highest
priority, then enabled Soft-ICE INT 3 breakpoints (BPX) are checked for a match
(this is where 'ACTION' takes place), after these it's handled as a normal
INT 3 call in program code (this is where 'I3HERE' affects further execution)
and if 'I3HERE' is OFF, the VM86 task INT 3 handler is launched.
 If a Soft-ICE back door command is used, even a hardware breakpoint set
immediately after the INT 3 instruction that triggered the back door will not
stop execution (any back door command starts normal execution after returning).
Hardware breakpoints will be regarded after executing one instruction after the
back door.
 Soft-ICE hardware breakpoints act oddly: it seems as if they aren't entered in
Debug Registers at the same time. If multiple memory breakpoints are set, they
may not all function. Need more info on this peculiarity!

2.4 - Hardware debuggers
------------------------

2.4.1 - PC cards

 There are also hardware debugger _cards_ available but because they're only
used by professionals (costly goodies...), not "normal" coders or people
programming just for fun, I will _not_ cover the topic unless someone sends me
extensive documentation on those cards (the basics how such cards generally
work and more detailed, card-specific hardware data. Especially information on
Intel In-Circuit Emulator/Debugger (ICE/ICD) boards/modules would be useful!)
Such an average programmer would just rely on good _software-based_ debuggers
(not meaning 8086 debuggers here!) like Soft-ICE.
 The cards shouldn't be any threat anyway since they are quite rare...

2.4.2 - Action Replay PC card, a useless toy
        (based on Werner Zsolt's article 'The Action Replay card for the PC')

 Remember the freezer module for Commodore machines? Some years ago it finally
made it to PC users, but unfortunately it is no good for serious cracking or
debugging. It cannot work under Windows 3.x Enhanced Mode, Windows 95 or any
other 32-bit environment for that matter, and lacks support for unofficial
display modes with different screen refresh rates. Those among other things,
which render it just a useless toy!
 The card's slowdown function doesn't work at all as one would expect. It slows
down the machine in a similar manner as all those small utilities available,
which means operation is sluggish. Because of slowing down depends on a working
timer (ie. IRQ 0 not masked and external interrupts enabled), in the handler
routine of which will be put some extra loops or similar CPU power consuming
stuff, the function will not work properly most of the time. Not to mention the
limited and unreliable interrupt tracking option which kind of works the same
way, missing interrupt calls when lots of them occur in a short length of time,
that is. _If_ the card had Protected Mode drivers for DOS and other systems, it
might have been somewhat more useful.
 And now to the part that interests us: The card is uses interrupts 0-3 to
control code execution (just like normal debuggers). To function, the card also
needs software drivers, that contain the required interrupt handler routines. (*)
This makes it vulnerable to anti-debugging code designed against 8086 debuggers
since in Real Mode or Virtual 8086 Mode, under which the drivers _only_ work,
it is no problem at all changing the vectors point elsewhere. After this the
card, although still sitting there with TSRs loaded, would be as useless as
without its drivers...

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SECTION 3: INVENTING NEW TRICKS
===============================
NOTE: This section is recommended reading even for experienced coders. However,
      if you only want to use existing tricks listed in Section 4 there's no
      need to check out this one.

3.1 - Things that can be assumed of normal code execution
---------------------------------------------------------
 -It is assumed that INT 01 (Single-Step Trap/Hardware Breakpoint) and INT 03
   (Software Breakpoint) interrupts never occur under normal conditions. They
   are _only_ used by debuggers (also by few other programs, but they only use
   these interrupts internally thus never causing any conflicts).
 -The Trap Enable Flag (TF, bit 8 of FLAGS) should never be set in normal code
   execution.
 -One must assume that program code is executed instruction after instruction
   without any additional interrupts between program instructions. (not
   counting external hardware interrupts which can be disabled with a CLI)
 -The same I/O ports can be accessed by the anti-debugging code as by the
   program it is attached to, too.

3.2 - Requirements for a new trick
----------------------------------
In order for a trick to be any effective or useful, the following requirements
must be met:

3.2.1 - Anti-debugging code must not be easy to defeat.

 The code must not be allowed to be bypassed only by overwriting the code with
some NOPs (No Operation instructions) or by directly jumping to an address
after the code. Code that is this easy to defeat isn't worth the few bytes it
takes... :)

3.2.2 - Anti-debugging code must run without causing any (or very rarely)
        problems when executing normally.

 Since the impact of anti-debugging code must be on debuggers, that (usually)
execute your code a single instruction at a time and pop up between
instructions (interrupt execution), the added protection code must be invisible
to user when executing a protected program under normal conditions.

3.2.3 - Anti-debugging code must be compatible with 8086's, present and future
        processors!

 In order to get your anti-debugging code running on any machine, even the old
8086's, the code must not contain any specific 386 instructions, for example,
or undocumented opcodes. Unless, of course, the code first checks whether the
CPU could run it or not.
 Using instructions that earlier or future processors can't run will cause
unexpected misbehaviour and exception #6's (Invalid OP-Code) to be generated.
Also, relying on that an undocumented opcode would retain its purpose and
functionality, or that an undefined opcode doesn't exist only increases
incompatibility risk. However, for testing exception #6 Intel has reserved an
opcode, 0Fh 0Bh (UD) (*), which will never be defined for other use. On AMD
processors a similar reserved opcode is 0Fh 0FFh (UD). Both of these are/were
undefined on both manufacturers' processors as of Pentium II release in 1997,
but because of the now obsolete 'POP CS' instruction (opcode 0Fh) they won't
"work" on 8086 or 80186 CPUs.

3.2.4 - The trick _must_ work properly in both Real Mode and Virtual 8086 Mode
        under normal conditions in DOS!

 The trick shouldn't cause any unexpected processor exceptions. It's no good
using a trick that only reduces compatibility with systems running in Protected
Mode (OS/2, Windows NT, etc.), since they use Virtual 8086 Mode to provide a
Real Mode environment to an application.

3.2.5 - Actions performed by anti-debugging code should be well "hidden".

 The purpose of each instruction in your anti-debugging code must not be
obvious, since the user _must_ first trace your code up to some point and try
to understand code execution before placing any breakpoints (only guessing a
location often isn't enough). Loading values directly into registers or putting
all of the instructions used to perform something in a single bunch only makes
it easier for a cracker to determine what's going on. 

3.3 - Hints and tips
--------------------
There are some hardware and software qualities which could get handy to know
of. You'll find such information gathered up here.

3.3.1 - Exceptions and interrupts

 Exceptions are basically interrupts generated internally by the CPU, often
because of an occurrence of an operation error. The first 33 interrupts
(00h-20h) are reserved by Intel for exceptions. Each of the exceptions is of
one of the three types available: faults, which occur _before_ the instruction
which causes the exception and therefore the return address pushed on stack
will point to the faulting instruction, traps, which occur _after_ the
instruction causing the exception and thus return address will also point to
the next instruction after the faulting one, and aborts, which are only used to
report severe errors and do not allow the precise location of the instruction
causing the abort to be determined. An exception of this is the Non-Maskable
Interrupt (NMI) which is a sort of its own because a signal to the processor's
NMI pin causes this exception. Software interrupts (INT, INTO and BOUND
instructions) are also handled as exceptions, but are used to allow
user-generated interrupts.
 Here is a list of exceptions sorted by interrupt number, the exception's type
(fault/trap/abort), name with the processor model(s) it can occur on, and the
instruction(s) which can cause the exception:

 Interrupt   Type     Exception name                     Instruction(s)
 ---------   ----     --------------                     --------------
     0       Fault    Divide Error                       DIV, IDIV
     1       *) F/T   Debug Exception                    <any instruction>
     2       NMI      NMI Interrupt                      INT 02 or <NMI signal>
     3       Trap     One Byte Interrupt                 INT 3
     4       Trap     Interrupt on Overflow              INTO
     5       Fault    Array Bounds Check (186+)          BOUND
     6       Fault    Invalid OP-Code (186+)             <any illegal
                                                           instruction>
     7       Fault    Device Not Available (286+)        ESC, WAIT
     8       Abort    Double Fault (286+)                <any instruction that
                                                           can generate an
                                                           exception>
     9       ???      Coproc. Seg. Overrun (286, 386)    <floating point> (*)
    10       Fault    Invalid TSS (286+)                 JMP, CALL, IRET, INT
    11       Fault    Segment Not Present (286+)         <segment register
                                                           instructions>
    12       Fault    Stack Fault (286+)                 <stack references>
    13       Fault    General Protection Fault (286+)    <any memory reference>
    14       Fault    Page Fault (286+)                  <any memory access or
                                                           code fetch>
    15       -        <reserved>                         -
    16       Fault    Floating Point Error (286+)        <floating point>, WAIT
    17       Fault    Alignment Check Interrupt (486+)   <unaligned memory
                                                           access>
   18-32     -        <reserved>                         -
   0-255     Trap     Two Byte Interrupt                 INT nn

 *) Some debug exceptions occur as faults (execution breakpoint, for example),
    others as traps (single-step as an example).

 Note that this is just a simple list of exceptions as defined by Intel. Since
most of these exceptions have multiple reasons, only the basic cause of them is
included. For a complete list, refer to Intel documentation (an "x86
Programmer's Reference Manual" is fine).

3.3.2 - Prefetch Instruction Queue (PIQ)

 All Intel 80x86 family processors have a tiny memory area within the processor
to hold some instructions fetched from system RAM in advance. This Prefetch
Instruction Queue is as small as 4 bytes on the first generation 8086/8088
processors and can be as large as 32 bytes on 486's, or even larger on newer
models. A user cannot directly access this memory but its effects can be seen.
The idea is to speed up execution by reducing the need to get the next
instruction from "slow" memory every time one instruction has been executed.
 The way a prefetcher works is very complicated and may work pretty differently
on different processor models because of the different size of the PIQ and
different memory organization (for example, 486 processors have a 32-byte PIQ
organized as 2x16 bytes). The prefetcher unit is very inadequately documented,
usually only the size of the PIQ is mentioned, and I couldn't get much info
even from the engineers at Intel tech support either. I have tested a PIQ of a
_486_ with self-modifying code a bit, and it looks like that any (or at least
most) instruction modifying memory, including PUSHes, causes the prefetcher to
refill the PIQ just before the actual memory modification takes place. The
queue seems to be aligned on 16 byte boundaries (00h, 10h, 20h, etc.). The
first 16-byte paragraph stored in PIQ will start at the last 16 byte boundary
if the instruction modifying memory doesn't start at the border, and the memory
amount loaded into the queue will be 32 bytes from that address. Here is an
example to clarify this:

 CS:0160   mov byte ptr [017F],CC       ; Replace a byte at CS:017F, the last
                                        ; byte of memory stored in the queue,
                                        ; with INT 3 opcode. (memory stored in
                                        ; PIQ starts here also)
 CS:017F   ...                          ; (memory stored in PIQ ends here)

 Based on the same example above: If the INT 3 was written to CS:0180, it would
not be contained within the queue, or if the MOV instruction was located at
CS:015F the PIQ area would start 16 bytes earlier and thus wouldn't the INT 3
be contained within the queue either. Moving the MOV up to CS:016B wouldn't
affect "the location" of the queue. There are some exceptions to this rule,
though. Sometimes, on long instructions containing immediate address or data
values, the instruction may extend backwards to an area not stored within the
PIQ (shifting the MOV of the above example four bytes left to start at CS:015C
will not affect the PIQ a thing). Why this happens can perhaps be explained
with the CPU reading the immediate values separately _while_ executing the
instruction, I don't know...
 Even though the prefetcher is nearly invisible to a user, it can be instructed
to flush and refill the queue. Most instructions which cause instructions not
be executed in sequence (ie. various jump and return instructions), implicitly
cause the prefetcher flush and refill the PIQ. The instructions which will
always flush the queue are the unconditional JMP, LOOP, CALL, RET, INT and
variations including exceptions and external interrupts, and IRET. The ones
that flush the queue _only_ when all jump conditions are met ("jump conditions"
here refer to set or clear flags), are all the conditional jump instructions
(JZ, JAE, JNS, etc.), except for J(E)CXZ, and conditional LOOPs (LOOPZ/E and
LOOPNZ/NE) instructions. It's important to note that J(E)CXZ instruction does
_not_ flush the PIQ (and otherwise acts weirdly as regards PIQ...). Similarly,
though a LOOP instruction just falls through if CX=0001, it doesn't affect
flushing in any way. For example, if Zero Flag is set but CX=0001, a jump will
not be taken by a LOOPZ but it still flushes the prefetched instruction queue
because the flags set match the conditions set for a jump. Also note that no
REP prefix itself will cause a flush.
 For more detailed information on PIQ, check out Robert Collins' X86 website,
the address is in Appendix D, 'Useful Internet sites'.

3.3.3 - Special EFLAGS register flags

Trap Enable Flag (TF, bit 8): This flag (aka. Single-Step Flag) is used for
   single-stepping, ie. executing only one instruction at a time. If the TF bit
   is found set at the beginning of an instruction, exception #1 (Single-Step)
   will be generated by the CPU after the instruction has been executed,
   regardless of the state of the TF bit after execution (the same principle
   applies to a cleared TF bit also). This means that INT 01 handler routine is
   executed between two instructions.
    This flag has been available since the 8086's, and any program is allowed
   to modify this bit. The bit will not be cleared automatically by the CPU.
Resume Flag (RF, bit 16): This flag can be used in conjunction with hardware
   execution breakpoints (they will also generate exception #1) to suppress
   them. Since 386 hardware execution breakpoints are treated as faults by the
   CPU (ie. they occur _before_ the actual instruction at breakpoint is
   executed), the next instruction's CS:EIP, in this case the instruction's at
   breakpoint, is PUSHed on stack. Returning to code execution normally would
   therefore retrigger the same breakpoint, but setting the RF bit of the
   EFLAGS image (will be loaded with an IRET) on stack in INT 01 handler
   routine can prevent this from happening. (Note that CPU sets this flag after
   a debug breakpoint has occurred as a fault, thus the flag will be pushed on
   stack in the EFLAGS image automatically.)
    This flag was first introduced in 80386 processors, and any program is
   allowed to modify this bit. After _successful_ execution of one instruction
   the bit is cleared.

3.3.4 - Special software interrupts

INT 1: This is an undocumented single-byte instruction (opcode 0F1h), mainly
   used by In-Circuit Emulators (ICE) as breakpoint instructions. This
   instruction is available on 386 processors and above, and it has become
   official with the introduction of Pentium Pros (*). This instruction mostly
   functions identically to an INT 01 (opcode 0CDh 01h) with the most important
   exception that INT 1 is never sensitive to the IOPL bits in EFLAGS while
   INT nn instructions are.
INT 02: This instruction calls INT 02 (NMI) routine, but unlike any other
   INT nn instruction it ignores any further NMI requests while in the service
   routine until an IRET is executed or the processor is reset. (*)
INT 3: This single-byte instruction (opcode 0CCh), used to plant breakpoints
   into code being debugged, functions identically compared to INT 03 (opcode
   0CDh 03h) with the exception that this INT 3 is never IOPL sensitive.
INTO: This instruction calls INT 04 (Interrupt on Overflow) routine if Overflow
   Flag (OF) is set, but unlike INT 04 it is never IOPL sensitive.
BOUND: This instruction calls INT 05 (Array Bounds Check) routine if a Value
   Out of Range is detected, but unlike INT 05 it is never IOPL sensitive.

 Being IOPL insensitive will only affect execution of INT instructions in
Virtual 8086 or Protected Mode if IOPL is less than the Current Privilege Level
of the interrupt caller. As an example, the execution procedure of an INT 3
(single-byte opcode) and an INT 03 under VM86 (CPL=3) while IOPL=0 will differ
by the interrupt handler number which will be called. An INT 3 will always call
the routine pointed to by the appropriate IDT entry but INT 03 would raise a
General Protection Fault, exception #13 (if IOPL was 3, the appropriate routine
would be called).

3.3.5 - "Double-stepping"

 A feature worth noticing is that loading a value into stack segment register
SS causes CPU to disable _all_ external interrupts (including NMI) and prevent
debug exception #1's from occurring, triggered by a Single-Step Trap as well as
a hardware breakpoint, until the next instruction (following the instruction
loading SS register). The instructions affected are 'MOV SS,xxxx' and 'POP SS'.
This is so that stack pointer SP could also be loaded safely, otherwise any
interrupt could cause the computer to hang due to an inconsistent SS:SP
pointer. However, all other exceptions occurring as faults will be generated
anyway, examples would be Divide Error (INT 00) and Invalid OP-Code (INT 06).
Note though, that two consecutive instructions loading SS register will only
double-step once, chaining multiple such instructions won't work. Also, 386
processors and above have an LSS instruction to load SS, but since it is
possible to load SP register with the same instruction, no interrupts need to
be disabled and thus no "double-step" will occur.
 You may also notice a similar effect while tracing code in Virtual 8086 Mode
but not in Real Mode. The instructions causing a double-step (not necessarily!)
are either privileged instructions requiring CPL of 0 or instructions accessing
386 special registers because they cause exception #13 when executed in VM86.
This is _not_ a feature of an 80x86 processor, but rather due to the exception
handler implementation of the VM86 control program (EMM386, Windows in 386
Enhanced Mode, etc.). If the previous instruction just caused an exception, so
why not execute the following instruction also? Therefore these _could_ be
chained (but who would want to?).

3.3.6 - Intel 8259, Programmable Interrupt Controller (PIC)

 An Intel 8259 Programmable Interrupt Controller (PIC) monitors all IRQ lines
(see also section 3.3.7, 'Common Interrupt Request (IRQ) lines used by a PC
machine') and sends a signal to the CPU's INTR pin whenever an IRQ occurs.
Programmability includes the interrupt handler number to be called upon receipt
of a certain IRQ. This is the most important (and useful to anti-debugging)
programmable feature of a PIC and therefore only it will be discussed here.

 [* need detailed info on this *]

3.3.7 - Common Interrupt Request (IRQ) lines used by a PC machine

 Interrupt Requests (IRQs) are used by external hardware, such as a hard disk,
controller, to interrupt other processing when it needs attention. In a PC/XT
class machine there are eight possible IRQ lines (IRQs 0-7). An additional 8259
Programmable Interrupt Controller (PIC) has been added to ATs and cascaded to
IRQ 2 as a slave (IRQs 8-15) of the master PIC to support up to 15 IRQs. Many of
those are required by vital system peripherals and it's nearly a standard which
reserves which IRQ.
 Here is a list of IRQs sorted by priority, the interrupt handler number in DOS
(note that this varies from operating system to another since the PIC _is_
programmable) that will be called upon interrupt request signal and the owner
of each IRQ line:

  IRQ #   Interrupt   Owner
 -------  ---------   -----
  IRQ0       08h      System timer
  IRQ1       09h      Keyboard
  IRQ2       0Ah      EGA/VGA vertical retrace (PC/XT) or slave 8259 (AT)
  IRQ8       70h      Real-Time Clock (RTC) (AT)
  IRQ9       71h      <redirected IRQ2> (AT)
  IRQ10      72h      <reserved> (AT)
  IRQ11      73h      <reserved> (AT)
  IRQ12      74h      PS/2 mouse (AT)
  IRQ13      75h      Floating Point Unit (FPU) error (AT)
  IRQ14      76h      Hard Disk Controller (HDC) (AT)
  IRQ15      77h      <reserved> (AT)
  IRQ3       0Bh      COM2 or COM4
  IRQ4       0Ch      COM1 or COM3
  IRQ5       0Dh      Hard Disk Controller (HDC) (PC/XT) or LPT2 (AT)
  IRQ6       0Eh      Floppy Disk Controller (FDC)
  IRQ7       0Fh      LPT1

3.3.8 - Back door commands in Soft-ICE!

 Beginning from Soft-ICE version 2.50, it is possible to control Soft-ICE
operation with the software being debugged! Nu-Mega thought this feature would
prove to be useful to provide easy hardware debugging capabilities to programs,
but actually they only made it possible for coders to develop their
anti-debugging code against Soft-ICE. These commands are _only_ described in an
addendum with versions 2.xx since the manual is printed for version 2.0.
Therefore an unsuspecting user could even crash his Soft-ICE when examining a
program protected with anti-Soft-ICE code.
 The documented back door commands let _any_ program to execute _any_ Soft-ICE
command in addition to manipulating breakpoints (getting info, creating and
even disabling them) without restrictions. Since there is no way to disable
this feature, or at least in versions of up to 2.80, one could easily take
advantage of it as a nice anti-debugging trick.
 But there's more: Soft-ICE uses similar back door commands to allow its
companion utilities, such as LDR.EXE (program loader), to have _total_ control
over the Protected Mode portion of Soft-ICE. Ever come to your mind that if
Soft-ICE utilities can control the debugger, then why couldn't any
anti-debugging code, too? The undocumented back door commands, that are only
supposed to be used by Soft-ICE utilities, include modifying ACTION, for
example, and even executing Protected Mode code (this is used when unloading
Soft-ICE from memory)! These commands may even exist in versions of Soft-ICE
older than 2.50. (for examples of unloading Soft-ICE, see section 4.3.7)
 Using Soft-ICE back door commands is pretty simple: SI and DI registers are
set to fixed values, AH=09 when using documented back door commands, AL will
indicate the function to be performed and when sub-function-specific registers
and data are ready, the back door is activated with an INT 3 instruction.
Giving Soft-ICE unauthorized orders is made even easier by the fact that the
user program does not necessarily have to issue an INT 3. Running to a Soft-ICE
INT 3 breakpoint (BPX), when all needed values are ready set in the registers,
will also trigger the back door no matter what 'ACTION' Soft-ICE is supposed to
take at a breakpoint! (for an example of screwing up Soft-ICE, see section
4.3.6) It must also be noted that any back door command will continue with
normal VM86 code execution after returning from the routine, it will _not_
return to Soft-ICE screen if the user was tracing over the INT 3 trigger.

 DOCUMENTED SOFT-ICE BACK DOOR COMMANDS
 --------------------------------------
 Register input:           Sub-functions:

  AH=09                     AL=10   Display information in the Soft-ICE window
  AL=Sub-function code      AL=11   Do a Soft-ICE command
  SI=4647 ('FG')            AL=12   Get breakpoint information
  DI=4A4D ('JM')            AL=13   Set Soft-ICE breakpoint
                            AL=14   Remove Soft-ICE breakpoint
  To activate: INT 3

  Sub-function AL=10: Display information in the Soft-ICE window
  --------------------------------------------------------------
  Register input:                       Returned values:
   DS:DX   Pointer to ASCIZ string       (none)

  Notes: ASCIZ string consists of up to 100 text characters including carriage
         returns (character 0Dh). A null-character (00h) terminates string.

  Sub-function AL=11: Do a Soft-ICE command
  -----------------------------------------
  Register input:                       Returned values:
   DS:DX   Pointer to ASCIZ string       (none)

  Notes: ASCIZ string consists of up to 100 text characters including carriage
         returns (character 0Dh). A null-character (00h) terminates string.

  Sub-function AL=12: Get breakpoint information
  ----------------------------------------------
  Register input:                       Returned values:
   (none)                                BH   Entry # of last breakpoint set
                                         BL   Type of last breakpoint set
                                         DH   Entry # of last BP that went off
                                         DL   Type of last BP that went off

  Notes: Entry number is the same as displayed by 'BL' command.
         Type is one of the following: 0 - Breakpoint on memory access (BPM)
                                       1 - Breakpoint on I/O port access (BPIO)
                                       2 - Breakpoint on interrupt (BPINT)
                                       3 - Breakpoint on execution (BPX)
                                       4 - (reserved)
                                       5 - Breakpoint on memory range (BPR)

  Sub-function AL=13: Set Soft-ICE breakpoint
  -------------------------------------------
  Register input:                       Returned values:
   DS:DX   Pointer to BP structure       AX   Error code
                                         BX   Breakpoint entry number

  Notes: Entry number is the same as displayed by 'BL' command.
         Error code is one of the following: 0 - No errors
                               (in decimal)  3 - Breakpoint table is full
                                             6 - Limit on memory BP's reached
                                             7 - Limit on I/O BP's reached
                                             9 - Limit on range BP's reached
                                            16 - Duplicate breakpoint
         For breakpoint structure, see file "BPSTRUCT.ASM".

  Sub-function AL=14: Remove Soft-ICE breakpoint
  ----------------------------------------------
  Register input:                       Returned values:
   BX   Breakpoint entry number          BX   ??? when set (whatever it means)

  Notes: Entry number is the same as displayed by 'BL' command.

 One could experiment with these undocumented commands by examining one of
Soft-ICE companion utilities such as LDR.EXE (uses AX=0000 and while AH=09, AL
values within the range between 00h-17h, and perhaps others) or S-ICE.EXE
itself, or simply by modifying the AL register value to find out new commands,
but of course, some other registers may have to be set properly until the
commands work (some function so "invisibly" that one can't notice the change,
if any). But also, if AX is set to random values, it may cause Soft-ICE to pop
up, DOS go nuts or other similar stuff. SI and DI just have to be set to the
"magic values" so that Soft-ICE will grab INT 3's. However, since this erratic
Soft-ICE behaviour occurs with more than one AX value, I haven't included a
list of them. Now, I will need your help to list the...

 UNDOCUMENTED SOFT-ICE BACK DOOR COMMANDS!!!
 -------------------------------------------
 Register input:           Commands:

  AX=Back door code         AX=0000   ???
  SI=4647 ('FG')            AX=0915   Set ACTION after breakpoint
  DI=4A4D ('JM')            AX=10xx   Execute code in Protected Mode

  To activate: INT 3

  Command AX=0000: ???
  --------------------
  Register input:                       Returned values:
   ??? (none?)                           SI   ???

  Command AX=0915: Set ACTION after breakpoint
  --------------------------------------------
  Register input:                       Returned values:
   BL   Interrupt number                 ??? (none?)

  Command AX=10xx: Execute code in Protected Mode
  -----------------------------------------------
  Register input:                       Returned values:
   ??? (none?)                           ??? (none?)

  Notes: Code execution starts immediately after triggering back door!

 [* need help to complete this *]

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SECTION 4: ANTI-DEBUGGING TRICK IDEAS AND EXAMPLE CODE
======================================================
NOTE: Since the tricks listed here are effective against different kinds of
      debuggers, I have provided three attributes on each of the tricks to
      indicate what it is effective against:
        R=Real Mode software debuggers
        P=Protected Mode/VM86 hardware debuggers
        U=User analyzing the code (confusion code)
        x=(attribute not set)

4.1 - General ideas
-------------------
 These are ideas that might help you achieve your goal: to prevent examination
of your program code. They are almost equally effective when debugging is
attempted with either a 8086 or a 386 debugger.

4.1.1 - Causing CPU to execute two instructions at a time

 When stack segment SS is loaded with a value, any external interrupts and also
debug exception #1 traps at the instruction loading SS and faults of the
following instruction are disabled until the next instruction. This could be
used to force execution of an instruction. Just put any instruction you want
after a 'MOV SS,xxxx' or a 'POP SS' and it will be executed regardless of a
single-step trap or even a hardware execution breakpoint set to the particular
instruction (hardware data breakpoint set to break on memory access done by a
'MOV SS,[xxxx]' or 'POP SS' will also be disabled but can't think of any use
for that... :)! If one wants to stop before executing the instruction following
a pop to SS, he has to set an INT 3 -style breakpoint there (software interrupt
instructions naturally work).

 ABOUT THE EXAMPLE:
 ------------------
 In this example, an INT 03 instruction is forced but it could be any other.

  CS:0100 8C D0      mov ax,ss          ; Move original SS to AX
  CS:0102 8E D0      mov ss,ax          ; Copy value from AX back to SS and...
  CS:0104 CD 03      int 03             ; ...force execution of INT 03

4.1.2 - Hiding "real" code from the user

 The _real_ instructions used by your protection code should be buried
underneath lots of useless "garbage" instructions so that the person debugging
couldn't easily determine the ones which perform something vital for the code.
 Alternatively, if your code allows it, you could divide a trick into smaller
chunks, or even into single instructions, and execute them in the middle of
some other trick(s), chunk at a time. However, stack or the registers the
broken down trick uses must _not_ be modified until all of its code has been
executed (or values should be at least restored before the next chunk)!

 ABOUT THE EXAMPLE:
 ------------------
 [* will be supplied soon *]

4.1.3 - Assuming that a certain condition exists

 Do _not_ compare bytes or test any bits in your data or registers if it's used
in conjunction with a conditional jump. That method is extremely vulnerable to
code modification. Rather, assume that a condition exists (or doesn't exist)
and build up your code on it. There are many possibilities to do this, a simple
example is included to demonstrate one.

 ABOUT THE EXAMPLE:
 ------------------
 [* will be supplied soon *]

4.1.4 - Loopy loops...

 It's advisable to use very long loops in a decryption routine, for example,
and fill it with lots of anti-debugging code. This will slow down a cracker
considerably since he will have to get the program code descrambled before any
examination is possible. Avoiding multiple debugger traps on every loop will
ensure that he'll have _lots_ of fun and spend _lots_ of time trying to decrypt
the code he wants.

4.2 - Standard tricks
---------------------
 These tricks are a bunch of traditional and quite well-known debugger traps.
They should work against almost any standard debugger. Please note though, that
most of them are only useful against Real Mode debuggers, not those using 386
hardware debugging support.

4.2.1 - Modifying interrupt vectors

 The easiest way to fool a Real Mode debugger is to change the interrupt vector
entries for INTs 01 (used for single-stepping) and 3 (software breakpoint) in
the interrupt table. Other interrupts, that you won't be expecting to occur,
could be vectored to an IRET to disable their functionality and restored
afterwards. Anti-debugging code could change the CS:IP point to the next
instruction to be executed (only INT 01...) to continue execution without the
debugger breaking in, or to the BIOS reboot address at F000:FFF0, for instance.
It would be even better if either one of these interrupts were used for proper
code activation. By using INT 3 for code activation you could also fool
Soft-ICE just by setting SI=4647 and DI=4A4D before calling INT 3, this makes
it run a back door handler instead of the user routine (for more information
see 3.3.8, 'Back door commands in Soft-ICE!', and 4.3.6, 'Screwing up Soft-ICE
with back door commands')! Redirecting other interrupts to INT 01 or 3 routines
might be fun to try out, too. Pointing INT 00 (Divide Error) to INT 01 routine
and then dividing by zero causes a funny effect of executing the (I)DIV
instruction over and over again. This is because a Divide Error saves the
address of the faulting instruction on stack instead of the next one's as the
Single-Step Trap does. Still, the nastiest trick would be to use INTs 01 and 3
as substitutes for often called interrupts, INT 21 (DOS Function Request
interrupt), the most frequently accessed interrupt by DOS programs, as a good
example. If INT 3 (the single-byte opcode) is used, it would be virtually
_impossible_ to replace any of them with another because of the fact that INT 3
instruction only takes one byte while all others require two. Also you could
point INT 21 elsewhere, your program for instance, which would bug some Real
Mode debuggers.
 Other interrupt vectors, which are also used by several debuggers (INTs 00
(Divide Error) and 02 (NMI)), may get handy. If INT 00 vector is pointed to
the start of "real" code, it could be called with a simple DIV AX while AX has
a value of zero (for an example, see 4.3.3, 'Fooling TD386 Virtual-86 Driver').
 Just remember that it's not advisable to change vectors with the DOS Set
Interrupt Vector service (INT 21, AH=25), use manual vector modification
instead. If a debugger has taken over INT 21 it could prevent any interrupt
vector change via DOS services.
 Also note that only modifying Real Mode interrupt table or VM86 virtualized
interrupt vectors is possible, Interrupt Descriptor Table is out of the reach
of a VM86 program and attempted access will only cause a General Protection
Fault (exception #13).

 ABOUT THE EXAMPLE:
 ------------------
 This example modifies INT 3 entry to point to the BIOS reboot vector. It is
 done in two steps, first IP will be moved and then the segment. After doing
 this, the next INT 3 breakpoint will result in a cold reboot. If 386
 instructions are used, changing CS:IP would only take one instruction and no
 conflicts with the debugger could occur: tracing through code which changes
 INT 01 vector will cause unpredictable behaviour after changing IP while code
 segment is still intact...

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 A3 0C 00   mov [000C],ax      ; Set INT 3 pointer IP to 0000,...
  CS:0107 F7 D0      not ax             ; ...invert AX (0000 -> FFFF) and...
  CS:0109 A3 0E 00   mov [000E],ax      ; ...set CS to FFFF (FFFF:0000)

4.2.2 - Masking out hardware interrupts

 Disabling external hardware (maskable) interrupts, such as the keyboard's, is
possible via the Intel 8259 Programmable Interrupt Controller. Performing I/O
to the 8259 Interrupt Mask Register ports 21h (IRQs 0-7) and A1h (IRQs 8-15)
allows us to disable any Interrupt Request (IRQ) line we want and therefore any
device allocating an IRQ on the system. This is done by writing an 8-bit value
to either of these I/O ports, and the 8-bit value is bitmapped so that bit 0
stands for IRQ 0, bit 1 for IRQ 1 and so on. Setting a bit means that the
corresponding IRQ will be disabled, clearing a bit will enable the IRQ.
 The most important IRQs are IRQs 0 and 1 since they are the system timer and
keyboard IRQs respectively. Masking off IRQ 1 will lock any keyboard input but
advanced debuggers usually re-enable it before returning from code execution. A
complete list of standard PC IRQs is found in section 3.3.7, 'Common Interrupt
Request (IRQ) lines used by a PC machine'.
 Non-Maskable Interrupts (NMI) can also be disabled. It is done on PC/XT class
machines by writing 00h to port A0h (writing 80h enables NMI). On ATs and up
NMI masking is done via the CMOS RAM/Real-Time Clock (RTC) port 70h. To disable
NMI a value with bit 7 set must be written to port 70h (clearing bit 7 enables
them). Note that port 71h should be read immediately after enabling or
disabling NMI via port 70h, or the RTC may be left in an unknown state!

 ABOUT THE EXAMPLES:
 -------------------
 The following examples demonstrate masking IRQs and NMI.
  In example #1 the keyboard is disabled using the Programmable Interrupt
 Controller.
  In example #2, NMI is disabled via the CMOS RAM/RTC port 70h on AT class
 machines. Since the I/O port is write-only, we'll simply write 80h.

 Example #1 (masking off keyboard IRQ):

  CS:0100 E4 21      in al,21           ; Read current value
  CS:0102 0C 02      or al,02           ; Set bit 1 (keyboard IRQ)
  CS:0104 E6 21      out 21,al          ; Write new value to disable keyboard

 Example #2 (disabling NMI):

  CS:0100 B0 80      mov al,80          ; Move a value with bit 7 set into AL
  CS:0102 E6 70      out 70,al          ; Write AL to port 70h and mask NMI
  CS:0104 E4 71      in al,71           ; To ensure proper RTC operation

4.2.3 - Reprogramming Intel 8259 Programmable Interrupt Controller (PIC)

 As the name of the chip tells us, it's user-programmable. One of the things
set during the initialization procedure is the interrupt handler number that
will be called upon receipt of a certain interrupt request (IRQ). By
reprogramming the 8259, any current IRQ interrupt handler can be made obsolete.
This allows us to fool at least any Real Mode debugger, by redirecting IRQ 1
(Keyboard) to some other interrupt handler than the default INT 09 thus locking
keyboard from any program that assumes INT 09 to be called on any keystroke,
for example.

 [* need detailed info on this *]

4.2.4 - Disabling keyboard

 Sometimes it may prove useful to disable the keyboard until it is needed again
for disallowing any further tracing. There are several ways of doing it and
here they are, "all-in-one".
 The first method, and most often used, is masking keyboard interrupt, IRQ 1.
It can be done via the Interrupt Mask Register, I/O port 21h, but unfortunately
advanced Real Mode debuggers usually re-enable it before returning from code
execution and it never works on Soft-ICE. For more information and an example,
see 4.2.2, 'Masking out hardware interrupts'. Note that issuing a CLI
instruction (clearing IF bit of FLAGS) will disable all external interrupts
thus effectively locking the keyboard also. However, that only works for
running code, disabling the debugger hotkey. A CLI locks out keyboard even from
Soft-ICE unless 'BREAK ON' command is entered in the debugger.
 [* info on Intel 8255 PPI will be added here some time *]
 The next method is to give commands to the Intel 8042 Keyboard Controller to
disable the keyboard interface, which works on ATs and up. Real Mode debuggers
usually don't expect _this_ and should be used instead to disable keyboard!
Disabling keyboard via the 8042 chip is done by writing ADh (Disable Keyboard
Interface) to the 8042 Command Register, port 64h, which then drives the
keyboard clock line low. Keyboard interface of the 8042 can be re-enabled by
writing AEh (Enable Keyboard Interface) to port 64h. However, once again,
Soft-ICE is able to override this, too. Another way of doing the same trick is
to first write 60h (Write 8042 Command Byte) to port 64h and next write a byte
with bit 4 set to port 60h, but since the byte written is a bitmapped parameter
for 8042 operation, the command byte should first be read, set the bit and
finally rewrite the byte back (see the example below).
 The most effective method, which completely freezes keyboard in most
applications including Soft-ICE whether tracing or running code, is to program
the keyboard directly (all bytes written to port 60h will be passed on to the
keyboard if the Keyboard Controller isn't expecting any data input for a 8042
command written to port 64h). All you need for this is an AT keyboard, PC/XT
keyboards cannot be programmed. The trick is simply to disallow scanning for a
keypress, which can be done by making the keyboard wait for further data from
the system. While waiting for parameters for a command, the keyboard won't
accept any keystrokes. The "official" way of doing this is to write F5h (Set
Default w/Disable) to port 60h which resets keyboard to default values and
waits for another command. To enable the keyboard, F4h (Enable Keyboard) must
be written to port 60h. Also writing one of the keyboard commands EDh, F0h,
F3h, FBh, FCh or FDh will make the keyboard stop scanning and wait for data
input. Writing 00h to port 60h in any of those cases will restart scanning, but
since that byte is supposed to be a parameter for the command issued before,
keyboard may behave oddly after writing it (keyboard LEDs flashing, remapped
keys, etc.). Not only direct keyboard commands but also some 8042 commands will
lock the keyboard without any obvious reason. However, they're not listed here,
you may test them out yourself by writing a byte to port 64h. The funny thing
is, that if a 386 debugger, or any Protected Mode program (in Real Mode will
always work), isn't aware of these commands it is quite impossible to prevent
this kind of lock-outs. And who would expect an innocent keyboard command like
'Set Typematic Rate/Delay' hang the debugger anyway? One might think that a
breakpoint on I/O to ports 60h or 64h could help, but no, think back! Pressing
a key while keyboard is enabled invokes INT 09 (Keyboard) handler, which in
turn will determine the key pressed by performing some I/O to ports 60h and
64h... or a program expecting keyboard input will anyway. Thus access to those
ports from a VM86 task cannot be permanently denied either. Some advanced
operating systems, such as Windows NT, intercept keyboard I/O and this won't
work, though. For more information on those keyboard commands I suggest you get
the HelpPC utility (see Appendix B, 'Suggested reading for info' at the bottom
of this document).

 ABOUT THE EXAMPLES:
 -------------------
 All of these examples are devoted to disabling keyboard.
  Example #1 is for disabling keyboard I/O through the Intel 8255 Programmable
 Peripheral Interface and thus _only_ works on PC/XT machines.
  In example #2, Intel 8042 Keyboard Controller command 'Disable Keyboard
 Interface' is used to lock the keyboard.
  Example #3 shows another way of doing the same trick as in example #2. This
 directly modifies bit 4 in the 8042 Command Byte by first reading in its
 original value, then setting the bit and writing the byte back. However, I
 can't guarantee that this will work but according to HelpPC's database it
 should be the proper way. Therefore I strongly recommend using the direct
 command (example #2) instead.
  Example #4 demonstrates forcing keyboard to stop scanning for keystrokes by
 setting defaults and disabling scanning. Tracing through will not succeed with
 _any_ debugger! Keyboard will stop responding at the first OUT instruction and
 will be disabled until the second OUT.

 Example #1 (disabling 8255 keyboard interface):

-----
  CS:0100 E4 61      in al,61           ; 
  CS:0102 0C 80      or al,80           ; 
  CS:0104 E6 61      out 61,al          ; 
-----

 Example #2 (using a 8042 control command):

  CS:0100 B0 AD      mov al,AD          ; ADh is 'Disable Keyboard Interface'
  CS:0102 E6 64      out 64,al          ; Write it to 8042 Command Register

 Example #3 (writing 8042 Command Byte):

  CS:0100 B0 20      mov al,20          ; 20h is 'Read 8042 Command Byte'
  CS:0102 E6 64      out 64,al          ; Write it to 8042 Command Register
  CS:0104 E4 60      in al,60           ; Read value from port 60h to AL and...
  CS:0106 0C 10      or al,10           ; ...set bit 4
  CS:0108 88 C4      mov ah,al          ; Move byte in AL to AH
  CS:010A B0 60      mov al,60          ; 60h is 'Write 8042 Command Byte'
  CS:010C E6 64      out 64,al          ; Write it to 8042 Command Register
  CS:010E 88 E0      mov al,ah          ; Copy byte in AH back to AL and...
  CS:0110 E6 60      out 60,al          ; ...write AL to port 60h

 Example #4 (stopping key scanning):

  CS:0100 B0 F5      mov al,F5          ; F5h is 'Set Default w/Disable'
  CS:0102 E6 60      out 60,al          ; Set defaults and stop key scanning
  CS:0104 B0 F4      mov al,F4          ; F4h is 'Enable Keyboard'
  CS:0106 E6 60      out 60,al          ; Restart key scanning

4.2.5 - Forcing a debugger stop execution

 This method is a simple and easy one, you just have to put an INT 01 (used for
single-stepping) or INT 3 (software breakpoint) instruction in the middle of
your code. Every Real Mode debugger has these interrupts hooked to the debugger
itself and some 386 debuggers also recognize INT 3's in addition to hardware
breakpoints. During normal execution, no interruptions or problems will occur
but if a debugger is running the code, program will be stopped at each of those
instructions. Naturally the highest effectiveness is gained when used inside a
loop as the debugger would stop at every round.
 No examples given. You can decide yourself where to drop one of these
instructions.

4.2.6 - Using regular INT nn -style INT 3 calls

 Even though the single-byte INT 3 breakpoint instruction is a lot more common
than the two-byte INT 03 (opcode 0CDh 03h) regular interrupt call, the two-byte
version is also of some use. Real Mode debuggers seldom take possible INT 03's
into account since most software breakpoint routines seem to fail on two-byte
INT 03's. Many Real Mode debuggers only subtract _one_ byte from the IP pushed
into stack by the interrupt call when returning to debugger. Therefore, if an
INT 03 is used instead of an INT 3 for whatever purpose wanted, the debugger
would continue executing false instructions after the INT 03 if IP isn't fixed.
This happens on both Debug and Turbo Debugger, even the advanced Soft-ICE has
this bug when 'I3HERE' is set ON. Watcom Debugger also has INT 03 handler
problems but much worse: if an INT 03 instruction is executed while
single-stepping, it will start running code _continuously_. Furthermore,
setting a (software) breakpoint immediately after the INT 03 doesn't work!
 No examples this time. You could drop an INT 03 in a random location or do
just about anything your heart desires with them, there are no limits!

4.2.7 - Checking FLAGS

 Because single-stepping with a debugger requires the Trap Enable Flag (TF),
bit 8 of FLAGS, to be set, we could simply check its current state. If it is
set, some debugger is tracing the code since normally the TF bit is always
clear. Please note, though, that a debugger can fake the PUSHF (and most really
do) while single-stepping and therefore this may not work, but there are
ways... (see also section 4.1.1, 'Causing CPU to execute two instructions at a
time') This works for Soft-ICE too, unless a 'BREAK ON' was issued. It allows
Soft-ICE to intercept PUSHFs with the help of CPU.

 ABOUT THE EXAMPLE:
 ------------------
 This is a plain example to show you how it works.

  CS:0100 9C         pushf              ; Push FLAGS onto stack
  CS:0101 58         pop ax             ; Pop FLAGS image to AX
  CS:0102 25 00 01   and ax,0100        ; Mask all other bits but bit 8 and...
  CS:0105 74 02      jz 0109            ; ...if result is AX=0000, proceed
  CS:0107 CD 20      int 20             ; Otherwise terminate and return to DOS
  CS:0109 ...        ...                ; (code continues here)

4.2.8 - Modifying original interrupt handler routine
         (idea from Varicella-][, DOS virus)

 This is close kin to 'Modifying interrupt vectors' but no actual interrupt
table modification is involved. What we will do is to replace a part of the
original interrupt handler routine with that of our own. Inserting a simple
IRET, for example, to the address the vector points to would effectively kill
the whole interrupt routine. If an IRET is put on INTs 01 and 3, tracing and
setting breakpoints with a Real Mode debugger would call these interrupts only
to execute the IRET instruction and return to code. Very nice and effective way
of getting rid of a debugger.

 ABOUT THE EXAMPLE:
 ------------------
 This example plainly replaces the first byte of INT 01 handler routine with an
 IRET. As simple as that!

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 8B 1E 04   mov bx,[0004]      ; Load INT 01 pointer IP into BX...
          00
  CS:0108 8E 1E 06   mov ds,[0006]      ; ...and CS to DS
          00
  CS:010C C6 07 CF   mov byte ptr [bx],CF   ; Replace first byte of INT 01
                                            ; handler with 0CFh (IRET)

4.2.9 - Foiling 'Step Over'/'Proceed' debugger commands, part 1 of 2

 While (P)roceeding, all debuggers must insert a breakpoint after certain
instructions in order to return to debugger after executing "one" of them.
These instructions are INTs, CALLs, LOOPs, instructions with a REP prefix, etc.
to name some but generally they are instructions that would require executing
multiple others if simply traced through. This offers anti-debugging code ways
of making a debugger's job a bit harder. One of them is to adjust the return
address saved by CALLs and INTs a bit. When the saved offset (IP) is
incremented by one, the RET or IRET will not jump back to the instruction
immediately following the subroutine call, but rather skips over the location
where a debugger would put the breakpoint. Thereafter code can execute
undisturbed. It is recommended that a NOP would be placed after the CALL, or
whatever, set to fool the debugger, just to hold either a single-byte INT 3 or
a hardware execution breakpoint. Also, you must make sure not to jump back to
the address where the debugger has set the breakpoint unless code overwrites
the location later and that the location is no longer the first byte of any
opcode (prefixes included) so that not even a hardware breakpoint would go off
there. Since no ranges can be set for a hardware execution breakpoint defined
in debug registers, it doesn't work unless the address is exactly at the start
of an opcode.
 For more tricks that can fool a 'Step Over'/'Proceed' command, see 4.2.10,
'Faking a procedure call' and the second part of this trick, section 4.4.2.

 ABOUT THE EXAMPLE:
 ------------------
 This example basically just adjusts the return address on stack for a CALL by
 one byte. It makes the CALL avoid the breakpoint set by the debugger on return
 from the subroutine, if the user (P)roceeds instead of (T)racing.

  CS:0100 E8 03 00   call 0106          ; Call subroutine
  CS:0103 90         nop                ; Reserve a byte for a breakpoint
  CS:0104 CD 20      int 20             ; Terminate and return to DOS
  CS:0106 89 E5      mov bp,sp          ; Copy value from SP to BP
  CS:0108 FF 46 00   inc word ptr [bp+00]   ; Increment return IP value by one
  CS:010B C3         ret                ; Return from subroutine (to CS:0104)

4.2.10 - Faking a procedure call

 There is a number of tricks that can fool the 'Step Over'/'Proceed' debugger
command. Because of a debugger expects execution to proceed at the following
instruction after a procedure, such as a loop or subroutine call, it just
places a breakpoint (either an INT 3 instruction or a hardware execution
breakpoint) after the instruction that calls a procedure, and then runs code
until the breakpoint. Therefore it is very easy to fool _any_ debugger to
actually run the rest of the code.
 One method is to use a LOOP, CALL or an INT as a substitute to a regular jump
instruction never returning to the instruction following the jump. The only
difference between a JMP and a LOOP is that a LOOP decrements (E)CX by one and
if (E)CX is zero after the operation, no jump will take place, but it could be
used as a combined DEC (E)CX and JZ instruction. A CALL just stores the return
address (IP if near, CS:IP if far) on stack. An INT instruction will save FLAGS
and CS:IP pointer, in addition it will clear IF and TF bits in FLAGS thus
disabling external interrupts and single-stepping. Remember though not to jump
back to the location where a debugger would put its breakpoint unless you want
to get caught by the debugger.
 For more tricks that can fool a 'Step Over'/'Proceed' command, see both parts
of 'Foiling 'Step Over'/'Proceed' debugger commands', sections 4.2.9 and 4.4.2.

 ABOUT THE EXAMPLE:
 ------------------
 This is a simple example of snatching control from the debugger if the person
 debugging (P)roceeds instead of full (T)racing. The LOOP here is not used as
 it is intended to, but just to jump away.

  CS:0100 E2 01      loop 0103          ; Fake "loop"
  CS:0102 90         nop                ; Reserve a byte for a breakpoint
  CS:0103 B0 F5      mov al,F5          ; F5h is 'Set Default w/Disable'
  CS:0105 E6 60      out 60,al          ; Set defaults and stop key scanning
  CS:0107 CD 20      int 20             ; Terminate and return to DOS

4.2.11 - Comparing INT 01 and 3 interrupt table entries
         (idea from Lock-Master, DOS executable encryption utility)

 Although the handlers for INTs 01 and 3 are similar in all debuggers (both
return back to the debugger), they usually are separate routines because of the
slightly different purpose they're for. Of course, an intelligent routine
suitable for both interrupts could be written, but the issue here is that it is
much easier to write two separate routines than one with a couple of additional
checks, therefore INT 01 and 3 handler CS:IPs must also differ. However, since
DOS simply points both of these interrupts to the same IRET, it is possible to
check if the IP of these interrupts match. If they're inequal, there is most
certainly a Real Mode debugger or a similar program running. Alternatively the
first instruction of the handler routine could be checked against an IRET
(opcode 0CFh), but I believe comparing IPs is enough.
 As a notice, this could be very useful in conjunction with encrypted code.
When the handlers' IPs are subtracted from each other the result should be zero
if no Real Mode debugger is running (see the example below). You could try
adding the result to the decryption key thus causing incorrect decryption if a
debugger present.

 ABOUT THE EXAMPLE:
 ------------------
 This example loads the IPs of INTs 01 and 3 into AX and BX. After this BX is
 subtracted from AX (used as a substitute for CMP here). The result being
 non-zero indicates that they have separate handler routines and in such a case
 program is terminated.

  CS:0100 31 C0         xor ax,ax       ; Zero AX
  CS:0102 8E D8         mov ds,ax       ; Load 0000 as DS segment
  CS:0104 A1 04 00      mov ax,[0004]   ; Load INT 01 pointer IP into AX and...
  CS:0107 8B 1E 0C 00   mov bx,[000C]   ; ...INT 3 pointer IP into BX
  CS:010B 29 D8         sub ax,bx       ; Subtract BX from AX and...
  CS:010D 74 02         jz 0111         ; ...if result is AX=0000, proceed
  CS:010F CD 20         int 20          ; Otherwise terminate and return to DOS
  CS:0111 ...           ...             ; (code continues here)

4.2.12 - Using stack to fool a debugger

 Now _this_ is a powerful multi-purpose trick. It's based on the fact that
single-stepping and software breakpoints (generally all interrupts and
exceptions) use the same stack as the user program to store data, and it cannot
be overridden in Real Mode. Therefore, using stack to decrypt encrypted data or
moving code to the location where it will be executed, for example, will not
work properly if a Real Mode debugger traces through the code. You could also
set the stack in the middle of your code and move stack pointer SP point to a
location a few instructions ahead every little while. If SS:SP points to code
but the location isn't in the immediate vicinity of CS:IP, the user could
actually go for this one without noticing that something vital is missing...
Possibilities are once again limitless. The critical addresses in stack are the
next three words (6 bytes), where the Single-Step Trap will store flags and
return address. This is the 16-bit mode, default in Real Mode/VM86, and SP-6
will be the last byte affected. Some debuggers, such as Turbo Debugger, may
also store some junk of their own on the program's stack in addition to the
data interrupt call writes. Unless a debugger uses special tricks to retain the
stack, this kind of anti-debugging tricks work fine. You just have to remember
to disable all external interrupts with a CLI since they also use stack when
calling interrupt handlers and thus may interfere as well. Also, stack should
not be used while SS:SP is in code unless it is supposed to be self-modifying.
 Unfortunately this only works against 8086 debuggers, once again. Because of
the slightly different way interrupts are handled in Protected Mode, no user
program stack modifications take place thus rendering all these tricks useless.

 ABOUT THE EXAMPLE:
 ------------------
 This is one possibility of using stack. The example first saves original
 SS:SP, sets up stack in the middle of code and then restores original stack
 pointer. Normally this code would jump over the 'INT 20' but tracing will
 overwrite the JMP instruction, so program would return to DOS. The NOPs are
 there only to stuff some bytes in case the data written by the interrupt call
 would form a long instruction. Remember a CLI, too!

  CS:0100 8C D0      mov ax,ss          ; Save original SS in AX and...
  CS:0102 89 E3      mov bx,sp          ; ...SP in BX (stack cannot be used!)
  CS:0104 0E         push cs            ; Save code segment CS on stack and...
  CS:0105 17         pop ss             ; ...load it as new stack segment SS
  CS:0106 BC 0B 01   mov sp,010B        ; Next PUSH to stack will overwrite...
  CS:0109 EB 04      jmp 010F           ; ...this instruction.
  CS:010B 90         nop                ; Stuff a byte
  CS:010C 90         nop                ; Stuff a byte
  CS:010D CD 20      int 20             ; Terminate and return to DOS
  CS:010F 8E D0      mov ss,ax          ; Restore original SS and...
  CS:0111 89 DC      mov sp,bx          ; ...then SP

4.2.13 - Generating a General Protection Fault or a Stack Fault

 Exception #13 (General Protection Fault) is actually a Segment Overrun
Exception in Real Mode and VM86. The segment limit in both of these modes is
0FFFFh, and if either a non-byte (eg. word) memory reference beyond the limit
is made or execution is attempted beyond the limit (for example, first byte of
an instruction is at 0FFFFh and the last byte at 0000h), exception #13 (0Dh)
will be generated. Therefore, we could point INT 0D to a location where we'd
like to continue code execution after generating a GPF. Of course, the vector
should also be restored or else... This works well in Real Mode, but any VM86
control program (a 386 debugger would be one) will catch this exception and
usually aborts execution with a severe error.
 Exception #12 (Stack Fault) also works the same way: POPping beyond segment
(upper) limit will generate INT 0C. This could be used similarly to a GPF but
it also has the exact same problems.
 Remember that only 286 processors and above will generate these exceptions, a
8086 CPU just happily executes the instructions causing them...
 This is pretty much the same as 4.3.3, 'Fooling TD386 Virtual-86 Driver', but
unlike that division by zero trick, these variants do not work properly with
_any_ VM86 environment. Read the warning below!

WARNING: Although no trouble will be encountered using this trick in a true
         Real Mode environment, not only 386 debuggers but also most software
         requiring Protected Mode (and running DOS in Virtual 8086 Mode) will
         intercept exception #13 before the VM86 routine is run and thus it's
         not advisable to use this trick (but it's here anyway for those who
         wish to reduce compatibility of their programs ;). Examples of such
         software are QEMM386, EMM386 and Windows in 386 Enhanced Mode (MS-DOS
         Prompt).

 ABOUT THE EXAMPLES:
 ------------------
 These examples mainly show how to generate exception #12 or #13 in Real Mode
 or Virtual 8086 Mode. None of the examples restore interrupt vectors but it
 should be done to prevent possible machine lock-up.
  Example #1 causes a General Protection Fault with a memory reference.
  Example #2 causes a GPF by executing code beyond 0FFFFh limit. However,
 interrupt vector modification isn't shown.
  Example #3 causes a Stack Fault.

 Example #1 (memory reference beyond limit):

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 C7 06 34   mov word ptr [0034],0111   ; Set INT 0D pointer IP to 0111
          00 11 01                              ; and...
  CS:010A 8C 0E 36   mov [0036],cs      ; ...CS to current CS (CS:0111)
          00
  CS:010E A1 FF FF   mov ax,[FFFF]      ; GPF! (tries to read 0FFFFh-0000h)
  CS:0111 ...        ...                ; (code continues here)

 Example #2 (executing beyond limit):

  CS:FFFF 31 C0      xor ax,ax          ; GPF! (instruction at 0FFFFh-0000h)

 Example #3 (causing a Stack Fault):

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 C7 06 30   mov word ptr [0030],0112   ; Set INT 0C pointer IP to 0112
          00 12 01                              ; and...
  CS:010A 8C 0E 32   mov [0032],cs      ; ...CS to current CS (CS:0112)
          00
  CS:010E BC FF FF   mov sp,FFFF        ; Set SP to FFFF (next POP will wrap!)
  CS:0111 58         pop ax             ; Stack Fault!
                                        ; (tries POPping from 0FFFFh-0000h)
  CS:0112 ...        ...                ; (code continues here)

4.2.14 - Exploiting rapidly changing memory areas

 Taking advantage of such memory areas as display RAM (located at A0000-BFFFF),
especially the EGA/VGA text mode memory at B8000-BFFFF, or the timer counter
dword at 0040:006C updated by INT 08 (system timer handler) 18.2 times/sec, may
be of some help. Moving program code first into such a memory area and then
decrypting or executing it there, for example, probably causes at least some
trouble with any Real Mode debugger when code is traced through. Perhaps
copying a custom interrupt handler routine into video memory would be something
to try out?
 Using display memory as a data or code storage area doesn't work with any
debugger not restoring the original program screen while executing code because
instructions accessing video memory would only "see" the debugger's screen, not
the one they're supposed to. Even Soft-ICE doesn't swap video memory unless
told to, swapping can be enabled with 'FLASH ON'. Remember though that if a
video RAM region is not in use by the current display mode, those memory
locations cannot be modified! If internal timer memory location is modified
(after either disabling the timer (IRQ 0), or any INTR signal with a CLI), the
time spent in debugger will be enough for the timer to update memory unless the
debugger also disables the timer. Soft-ICE (in debugger screen) always masks
all interrupts but the keyboard's without a user being able to re-enable the
timer interrupt. Knowing this it's possible to build a trick that relies on the
timer counter being updated.

 ABOUT THE EXAMPLE:
 ------------------
 These examples demonstrate taking advantage of display memory and the timer
 counter dword.
  Example #1 uses video RAM to try to detect a debugger. It writes a word to
 the beginning of text mode display memory and then checks for any changes in
 the word. If the read word doesn't equal to the original one, some debugger is
 present.
  [* another example will be supplied soon *]

 Example #1 (testing with display card memory):

  CS:0100 B8 00 B8      mov ax,B800     ; Load B800 into AX and...
  CS:0103 8E D8         mov ds,ax       ; ...then as DS segment (text mode)
  CS:0105 A3 00 00      mov [0000],ax   ; Move a word to B800:0000
  CS:0108 3B 06 00 00   cmp ax,[0000]   ; Compare read value to original and...
  CS:010C 74 02         je 0110         ; ...if they match, proceed with code
  CS:010E CD 20         int 20          ; Otherwise terminate and return to DOS
  CS:0110 ...           ...             ; (code continues here)

 [* another example will be supplied soon *]

4.2.15 - Storing data in the interrupt table area

 This is pretty similar to 4.2.14, 'Exploiting rapidly changing memory areas'.
The contents of the interrupt table could be copied to a safe place and then
the 1KB memory area could be used as data storage space for decryption, for
instance. Assuming that no interrupts are called while making use of the table
area, everything runs fine. Messing with the table will make the computer hang
sooner or later if any interrupts are called, thus the table _has to_ be
restored after use.
 No examples will be shown this time.

4.3 - Special tricks
--------------------
 These attacks are aimed against specific debuggers, and will not usually work
for any other than the one it is for because the tricks are based on
implementation bugs or otherwise weak points found in debuggers. Therefore
certain debugger programmers may be interested in reading this section...

4.3.1 - Jumping to a location within an instruction

 We could confuse the person debugging a bit by jumping to a location in the
middle of an opcode while another instruction is hidden there. Note that this
only works on full screen debuggers such as Turbo Debugger, which decode
instructions on screen in advance (although Watcom Debugger is a full screen
debugger, it always re-decodes instructions if a jump to an undecoded address
is encountered). Jumping to an address within an already decoded instruction
(that is on screen) usually causes the debugger _not_ re-decode the instruction
CS:IP points to and thus the user won't know what's going on until debugger
screen is reset. On "step"-type debuggers, which only decode the next
instruction to be executed, this will not do anything. It's important to note
that this method will _not_ affect program execution in any way! The only
purpose of this is to try to confuse the user, not the debugger.
 BUT (!), if you find a good pair of instructions that fit within each other,
using such a pair instead might make their removal more difficult than if used
separately. Usually the size of the opcode is not the problem, some 32-bit
instructions can be as long as 10 bytes or even longer on some instructions
which have both a memory pointer and an immediate as 32-bit operands, but the
trouble will arise from how to put them together. Just remember that if the
instruction that contains the hidden instructions gets executed, any memory
reference as the destination operand could mess up some vital code.
 A bit more sophisticated method of using nested instructions would be register
or memory value modification. Let's assume we have to XOR a value and we also
want to hide another instruction within the XOR instruction. In such a case the
other opcode could be disguised as an immediate value of a XOR and then only a
jump there would be needed. This makes the hidden instruction more difficult to
be changed because changing it would also affect the value that will be XORed.
And if this is a part of a decryption loop... Well, you know what would happen.
;)

 ABOUT THE EXAMPLES:
 -------------------
 All of these are ways of concealing an instruction within another opcode.
  Example #1 is just to demonstrate the principle how it works. This example
 will first clear the IF bit in FLAGS and then halt CPU until an NMI or a reset
 signal occurs. A signal to the CPU's INTR pin would also bring the processor
 out of halt, but since they are ignored because of the CLI, only those
 mentioned before will do it.
  Example #2 is a bit more complicated since it first executes the visible
 layer of instructions (MOV and JMP), then jumps back into the middle of the
 MOV opcode and starts running code from the second instruction layer hidden
 inside the first one. If used as a trick, all of the instructions should do
 something essential for further execution of the code. In this example, the
 instructions are a _bit_ far fetched, just some trivial debugger "traps"...
  Example #3 is a small application of this technique and shows a nice way of
 hiding a jump in a MOV instruction (could be any other as well). Here the
 register value is retained but you could make your program depend on the value
 loaded into AX.

 Example #1 (the principle):

  CS:0100 EB 01      jmp 0103           ; Jump to the hidden instructions
  CS:0102 A3 FA F4   mov [F4FA],ax      ; A fake instruction, see below
  -------
 *CS:0103 FA         cli                ; Disable INTR signals
 *CS:0104 F4         hlt                ; Halt CPU

 Example #2 (executing instructions in two layers):

  CS:0100 C7 06 CD   mov word ptr [01CD],C2CC   ; A fake instruction, see below
          01 CC C2
  CS:0106 EB FA      jmp 0102           ; Jump to the hidden instructions
  -------
 *CS:0102 CD 01      int 01             ; Single-step interrupt
 *CS:0104 CC         int 3              ; Software breakpoint interrupt
 *CS:0105 C2 EB FA   ret FAEB           ; Add FAEBh to SP and return from call

 Example #3 (hiding a jump smoothly):

  CS:0100 50         push ax            ; Save AX
  CS:0101 B8 EB 03   mov ax,03EB        ; A fake instruction, see below
  CS:0104 58         pop ax             ; Restore AX
  CS:0105 EB FB      jmp 0102           ; Jump to the hidden instruction
  CS:0107 ...        ...                ; (code continues here)
  -------
 *CS:0102 EB 03      jmp 0107           ; Proceed with code

4.3.2 - Exploiting Turbo Debugger's weak point

 As odd as it sounds, Turbo Debugger doesn't retain Interrupt Mask Register
status for IRQs 0 (timer) and 1 (keyboard). When code is traced, Turbo Debugger
masks out those IRQs at each step but running will leave them as they are.
However, since TD enables them always when returning from code, they will
inevitably be enabled... The 'Step Over' function combines both of these, so
when a subroutine call is encountered, 'Step Over' will act as a "normal" 'Run'
command, otherwise as 'Trace' masking IRQs 0 and 1.
 This could be useful if code checks whether or not Interrupt Mask Register
(primary, port 21h) contents have been changed since last update. Of course,
since _running_ code usually leaves timer and keyboard IRQs enabled only once,
when started, and then they could be masked out freely during execution, it's
better to test if the _tracing_ condition is true (ie. IRQs 0 and 1 have been
masked out). But anyway, both of them can be detected if wanted.

 ABOUT THE EXAMPLE:
 ------------------
 In this example, the value of the primary Interrupt Mask Register is read,
 bit 0 is cleared and bit 1 set to enable/disable their respective IRQs, and
 then written back. The final step is to read port 21h value again (by now both
 timer and keyboard would be masked out if someone was tracing with Turbo
 Debugger) and test the status of bits 0 and 1. As an extra feature this
 example disables the keyboard.

  CS:0100 E4 21      in al,21           ; Read value from port 21h to AL
  CS:0102 24 FE      and al,FE          ; Clear bit 0 and...
  CS:0104 0C 02      or al,02           ; ...set bit 1
  CS:0106 E6 21      out 21,al          ; Write the value back to port 21h
  CS:0108 86 C4      xchg al,ah         ; Exchange AL and AH register contents
  CS:010A E4 21      in al,21           ; Read the value of port 21h again
  CS:010C 38 C4      cmp ah,al          ; Compare AL value to orig. AH and...
  CS:010E 74 02      je 0112            ; ...if they match, proceed with code
  CS:0110 CD 20      int 20             ; Otherwise terminate and return to DOS
  CS:0112 ...        ...                ; (code continues here)

4.3.3 - Fooling TD386 Virtual-86 Driver

 This method is based on the fact that Turbo Debugger's V8086 module (TD386)
does not use the INT 00 routine pointed to by the VM86 task's interrupt table
whenever a division by zero takes place, but rather its own handler routines.
TD386's own routine just aborts execution (returns back to debugger) and
reports about a faulty division ignoring the INT 00 handler in VM86! CS:IP will
therefore have to be manually set to the proper value. Normally in Real Mode or
Virtual 8086 Mode (usually the Protected Mode program that set up VM86 gives
control to VM86 routines when an interrupt handler is called either by an
exception, if it isn't a fatal one, or an INT nn instruction) the INT 00
routine will be called after dividing by zero. So, what we could do is to point
INT 00 vector to the next instruction, for example, to recover from a division
fault. This is a good way of exploiting TD386's weakness without degrading
compatibility with other VM86 environments. Remember to restore the original
INT 00 vector though, or the next INT 00 call will hang the computer.
 This makes a nice trick on Real Mode debuggers, too, but please note that
stack pointer SP is modified when INT 00 is called. The next three POPs will be
the CS:IP of the faulty (I)DIV instruction and FLAGS image, then the stack will
be as it was before the division.
 It is to be noted that some other interrupts will also act like the divide
error. These are INTs 02 (NMI) and 3 (software breakpoint) and TD386 will under
no circumstances run the actual VM86 routine. INT 01 VM86 routine will be
called if the source of the call was an INT instruction, but exception #1
caused by a set Trap Enable Flag will return you to TD386... There are also
some other exceptions, such as #6 (Invalid OP-Code), #12 (Stack Fault) and #13
(General Protection Fault), which will make TD386 pop up if an INT instruction
was _not_ their original cause.

 ABOUT THE EXAMPLE:
 ------------------
 This example changes INT 00 vector point to the instruction following the
 faulty division and divides by zero. Nothing more! What this doesn't show is
 restoring the original vector.

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 C7 06 00   mov word ptr [0000],0110   ; Set INT 00 pointer IP to 0110
          00 10 01                              ; and...
  CS:010A 8C 0E 02   mov [0002],cs      ; ...CS to current CS (CS:0110)
          00
  CS:010E F7 F0      div ax             ; Divide by zero
  CS:0110 ...        ...                ; (code continues here)

4.3.4 - Using INT 01's to make Soft-ICE gag

 When issued either as INT 01 or the undocumented 0F1h opcode, Soft-ICE will
beep a couple of times. Note that if INT 01 was used, Soft-ICE will beep only
if 'BREAK' is OFF (if a 'BREAK ON' command is entered, it won't happen).
However, the undocumented 0F1h opcode will make the beep occur even when
'BREAK' is ON. This is because the single-byte opcode is IOPL insensitive _and_
the "bug" that causes Soft-ICE to beep is only in the INT 01 handler, the V86
monitor INT 01 routine is OK (can't imagine why).
 It would be easy to put an INT 01 in a loop, possibly a self-checking loop
repeating hundreds of times. Doing this would keep the computer busy beeping
for a long time because each INT 01 will cause about 0.5 sec. delay... Computer
speed has no effect on the time each beep takes, and since beeping the beeper
will halt any other processing the user would have hard time trying to get back
to his dear Soft-ICE if he accidentally launches the loop. :) Not to mention
that Real Mode debuggers stop at each of the INT 01's...
 Note that exception #1 caused by a single-step trap doesn't have this effect
when Soft-ICE is loaded.

 ABOUT THE EXAMPLE:
 ------------------
 This is just a (too) simple example of using INT 01 to beep. This loop must be
 embedded in some other routine which, for example, checks the integrity of the
 code to make removal harder. Otherwise it's of no use...

  CS:0100 B9 FF FF   mov cx,FFFF        ; Whoa! Going to loop 65535 times!
  CS:0103 CD 01      int 01             ; Beep-beep (could also be opcode 0F1h)
  CS:0105 E2 FC      loop 0103          ; Looping to CS:0103 until CX=0001

4.3.5 - Using self-tracing to fool Soft-ICE

 How about using self-tracing code as an anti-debugging feature against
Soft-ICE? When tracing in the debugger screen, Soft-ICE clears the Trap Enable
Flag after each instruction, including the exception #1 handler IRET. Even when
manually set in Soft-ICE, the Trap Enable Flag will not generate an INT 01 trap
and neither will the TF bit be retained if Soft-ICE is exited to run code at an
active (enabled) execution breakpoint (otherwise TF bit will be set)! If INT 01
routine is designed to modify the code being executed, the code wouldn't run
properly under Soft-ICE if debugger screen is entered while self-tracing. To
trace such code will require multiple commands for one single instruction and
thus would be _very_ annoying to trace lengthy code (the commands can be
defined as macros but using a decryption routine supporting variable-length
instructions makes it harder to use even them since there is no way to make
Soft-ICE set a breakpoint after the instruction at CS:IP)... Sounds like fun?
;) (for an example, see 4.4.6, 'The Running Line')
 There's also another quirk to Soft-ICE: without any special tricks
self-tracing code will disable _any_ memory access hardware breakpoint that
would otherwise be triggered by the code being self-traced. Note that it's
possible to set a breakpoint to be triggered by the INT 01 routine if it's used
for self-modification, though. This quirk is an implementation bug in Soft-ICE
and may not work in other 386 debuggers. Since the CPU checks for exception #1
traps caused by either a single-step via Trap Enable Flag or data hardware
breakpoints at the same time, the condition for both of these is true. The
proper procedure would be to invoke the hardware breakpoint handler before the
VM86 Single-Step Trap, but how unfortunate for anyone debugging that Soft-ICE
chooses to serve the Single-Step Trap first thus suppressing the data
breakpoint... Therefore simply pointing INT 01 to an IRET and setting the TF
bit in FLAGS would be sufficient to disable any memory access breakpoint in the
code being traced!

4.3.6 - Screwing up Soft-ICE with back door commands

 Since Soft-ICE versions 2.50 and up offer a program back door commands to
execute Soft-ICE commands and manipulate breakpoints among other things, they
could be used against Soft-ICE itself. Disabling all breakpoints or even
unloading Soft-ICE wouldn't be a problem to anti-debugging code. (for a more
detailed description see section 3.3.8, 'Back door commands in Soft-ICE!')
 If you decide to execute commands via a back door command, note that trying to
lock up keyboard in Soft-ICE won't work because the routine, which returns
control to Soft-ICE when either the key combination is pressed or a breakpoint
has occurred, enables keyboard from the Programmable Interrupt Controller. To
lock keyboard (and everything else) in Virtual 8086 Mode, enter 'OB 21 FF'
(only this works), and typing 'OW 21 FFFF' would lock up Soft-ICE screen. But
as already mentioned, keyboard cannot be locked out from Soft-ICE. Even if the
VM86 task can't use the keyboard, the correct key combination will always pop
up the debugger screen. Any other Soft-ICE commands should work, though,
especially those switching operation modes such as 'ACTION' (specifies action
after breakpoint has been reached, valid parameters are any interrupt number in
Virtual 8086 Mode or 'HERE' to return to Soft-ICE) or rebooting computer with
'HBOOT'. Calling a back door won't bring up the debugger screen, but when
entering commands, remember that they also appear on the debugger screen when
the user returns... Another way of crashing Soft-ICE with the 'Do a Soft-ICE
command' is to put a single null character (00h) where the command to be
executed should reside. This totally hangs Soft-ICE, probably because the
command string gets terminated before even one Carriage Return (0Dh) is
detected (which would issue the command preceding it).
 As a side notice, there is an "undocumented" command in Soft-ICE which may
cause the debugger to act erratically and lock up. This powerful command is
'CMx' (replace 'x' with a hex value in the range 0-F). The original purpose of
this command was to change the megabyte under which the memory dump window
works, and it probably is a remnant from Soft-ICE development phase. However,
the command doesn't check for the amount of RAM installed. Therefore, exceeding
the physical memory limit will make Soft-ICE try to read from a non-existent
address thus causing a General Protection Fault. The fatal thing here is that
Soft-ICE _itself_ causes the GPF, therefore recovery is impossible (exception
handler routines not prepared for this).
 The "command" can _only_ crash Soft-ICE when the memory dump window ('WD'
enables/disables) is enabled and the target machine has less than 16MB of
system RAM installed, but when it works, Soft-ICE will be _totally_ screwed up
and a cold boot is required to get out of the debugger screen! To encompass
most systems, with a maximum of 16MB RAM, issuing command 'CMF' is recommended.
Executing this command with the help of back door commands Soft-ICE could be
crashed (see the example below).

 ABOUT THE EXAMPLE:
 ------------------
 Since this is just a simplified example, it must be noted that setting those
 AX, SI and DI register values just before INT 3 looks very suspicious. They
 should be entered, if at all possible, way before executing an INT 3 (and well
 hidden) to make it look like that the INT 3 instruction is just a regular Real
 Mode debugger trap. Also, if an instruction is put immediately after the INT 3
 which triggers the back door, no execution breakpoint set to that address will
 interrupt code execution.

  CS:0100 B8 11 09      mov ax,0911     ; Back door function number into AX
  CS:0103 BE 47 46      mov si,4647     ; "Magic value" #1 into SI
  CS:0106 BF 4D 4A      mov di,4A4D     ; "Magic value" #2 into DI
  CS:0109 0E            push cs         ; Just to make sure we have...
  CS:010A 1F            pop ds          ; ...the correct DS
  CS:010B B8 11 01      mov dx,0111     ; Command string starts at DS:DX
  CS:010E CC            int 3           ; Go for it and screw up Soft-ICE!
  CS:010F CD 20         int 20          ; Terminate and return to DOS
  CS:0111 43 4D 46 0D   db 43 4D 46 0D  ; 'CMF' plus a Carriage Return and...
  CS:0115 00            db 00           ; ...a null-character string terminator

4.3.7 - Unloading Soft-ICE!

 It's not much of a challenge unloading Soft-ICE, tracing the removal procedure
was enough to find it out. Basically, returning to Real Mode and restoring
original Interrupt Descriptor Table (IDT) value is enough to disable Soft-ICE,
but to do this we must use an undocumented back door command. Note that
even though installing Soft-ICE as a device driver leaves a small stub in
conventional memory, it won't stop us disabling its functionality...

 ABOUT THE EXAMPLES:
 -------------------
 Here are examples of removing the Protected Mode portion hiding in extended
 memory and disabling (*) the device driver stub in conventional memory.
 (tested with Soft-ICE version 2.80)
  In example #1, processor is simply returned into Real Mode. But since all
 segment registers are cleared in the process, we should first do an
 intersegment jump, and at least stack segment (SS) _must_ be restored. If we
 don't, the system will become instable at the next interrupt (which
 simultaneously PUSHes CS:IP plus flags into somewhere 0000:xxxx messing up
 system data areas!). Therefore interrupts should also be disabled with a CLI
 until the correct stack segment has been loaded. But I guess you've already
 done that at the very beginning of your code...
  In example #2, a character device 'SOFTICE1' is opened for read and write
 operations and a byte (02h) is written to the device. This device name is
 reserved, in addition to 'NU-MEGA', by Soft-ICE but only when loaded as a
 device driver. Frankly speaking, I don't know the heck this piece of code is
 supposed to do. It was just a part of Soft-ICE unloading sequence but causes
 weird trouble when tracing code with a Real Mode debugger to the interrupt
 call writing to this device (why?). If someone knows better about its purpose,
 please E-mail!

 Example #1 (returning to Real Mode):

  CS:0100 2E 8C 0E   mov cs:[011D],cs   ; Set correct CS for intersegment jump
          1D 01
  CS:0105 B4 10      mov ah,10          ; Back door function number into AH
  CS:0107 BE 47 46   mov si,4647        ; "Magic value" #1 into SI
  CS:010A BF 4D 4A   mov di,4A4D        ; "Magic value" #2 into DI
  CS:010D CC         int 3              ; Start running code in Protected Mode
  CS:010E 0F 20 C0   mov eax,cr0        ; Read Control Register 0 into EAX
  CS:0111 66 25 FE   and eax,7FFFFFFE   ; Mask out PG and PE bits and...
          FF FF 7F
  CS:0117 0F 22 C0   mov cr0,eax        ; ...load into CR0. Back to Real Mode!
  CS:011A EA 1F 01   jmp hhll:011F      ; Load CS and flush decode queue
          ll hh                         ;  (CS: ll=low byte, hh=high byte)
  CS:011F 2E 0F 01   lidt cs:[012B]     ; Load IDT from CS:012B
          1E 2B 01
  CS:0125 8C C8      mov ax,cs          ; Get CS into AX and...
  CS:0127 8E D0      mov ss,ax          ; ...load it as the stack segment
  CS:0129 CD 20      int 20             ; Terminate and return to DOS
  CS:012B FF 03      dw 03FF            ; IDT limit (length)
  CS:012D 00 00 00   dd 00000000        ; IDT base address
          00

 Example #2 (disabling device driver stub):

  CS:0100 B8 02 3D      mov ax,3D02     ; DOS service Open Handle for R/W
  CS:0103 0E            push cs         ; Just to make sure we have...
  CS:0104 1F            pop ds          ; ...the correct DS
  CS:0105 BA 1F 01      mov dx,011F     ; Device name starts at DS:DX
  CS:0108 CD 21         int 21          ; Open file and allocate handle
  CS:010A 72 11         jb 011D         ; Jump if file not found
  CS:010C 89 C3         mov bx,ax       ; Save returned file handle into BX
  CS:010E B8 03 44      mov ax,4403     ; DOS IOCtl Character, write to device
  CS:0111 BA 28 01      mov dx,0128     ; Send buffer starts at DS:DX
  CS:0114 B9 01 00      mov cx,0001     ; Send one byte
  CS:0117 CD 21         int 21          ; Write to device
  CS:0119 B4 3E         mov ah,3E       ; DOS service Close Handle
  CS:011B CD 21         int 21          ; Close file and deallocate handle
  CS:011D CD 20         int 20          ; Terminate and return to DOS
  CS:011F 53 4F 46 54   db 53 4F 46 54  ; 'SOFTICE1'...
  CS:0123 49 43 45 31   db 49 43 45 31  ; ...and...
  CS:0127 00            db 00           ; ...a null-character string terminator
  CS:0128 02            db 02           ; Character 02h

4.3.8 - Cause Soft-ICE to abort program
         (idea by Inbar Raz)

 When Soft-ICE is loaded as a device driver, a stub will remain in conventional
memory reserving device names 'NU-MEGA' and 'SOFTICE1' for its own use. They
both can be opened for read and write operations, but if writing to them is
attempted using DOS function 40h (Write Handle), used for writing to _files_,
DOS will pop up with a critical error. Note though that unlike 'SOFTICE1' which
will always be reserved, Soft-ICE only uses the device name 'NU-MEGA' if it was
_not_ loaded as an EMS manager with /EMM switch!
 So, to make a program incompatible with Soft-ICE, it would only need to create
a file called 'NU-MEGA' or 'SOFTICE1' for its own purposes. If Soft-ICE was
loaded as a device driver the program refuses to run, otherwise it will run
fine. Additionally INT 24 routine (DOS Critical-Error-Handler) could be
replaced with a custom handler which would be executed after the critical error
has occurred. It could be a simple IRET to make the program pop back to DOS, a
routine to recover from the error combined with unloading Soft-ICE (see 4.3.7,
'Unloading Soft-ICE!'), or anything you can think of!
 No examples here, you _should_ be able to do your own implementations
yourself.

4.4 - Self-modifying code
-------------------------
 Self-modification is a useful method in some cases. Since code segment CS
address varies every time a program is run, self-modification could be used to
set the correct segment address of a jump (EXE-files have relocation items for
this matter), but self-modifying code could also be used as an anti-debugging
trick. There are different levels of self-modification but they all are based
on replacing a whole instruction with another.

4.4.1 - Simple self-modification

 Changing an instruction before executing it is self-modification in its
simplest form. It prevents correct disassembly if just a dumb code disassembler
is used, but doesn't do much else.

 ABOUT THE EXAMPLE:
 ------------------
 The example here shows a basic code modification... Beforehand it looks like
 this piece of code is going to lock up the computer, but the subroutine
 overwrites the CLI/HLT pair with an 'INT 20'. NOP isn't needed, but see also
 4.4.2, 'Foiling 'Step Over'/'Proceed' debugger commands, part 2 of 2'.

  CS:0100 0E         push cs            ; Save code segment CS on stack and...
  CS:0101 1F         pop ds             ; ...load it as new data segment DS
  CS:0102 E8 03 00   call 0108          ; Call the subroutine modifying code
  CS:0105 90         nop                ; (not necessarily needed)
  CS:0106 FA         cli                ; Disabling interrupts and...
  CS:0107 F4         hlt                ; ...halting the CPU? Nope!
  CS:0108 C7 06 06   mov word ptr [0106],20CD   ; Replace CLI/HLT with INT 20
          01 CD 20
  CS:010E C3         ret                ; Return from subroutine
  -------
 *CS:0106 CD 20      int 20             ; Terminate and return to DOS

4.4.2 - Foiling 'Step Over'/'Proceed' debugger commands, part 2 of 2

 This is a quite nice thing to do with self-modifying code. While (P)roceeding,
all Real Mode debuggers must insert an INT 3 software breakpoint instruction
after certain instructions in order to return to debugger after executing "one"
of them. These instructions are INTs, CALLs, LOOPs, instructions with a REP
prefix, etc. to name some. This information could easily be used to our
advantage by modifying the opcode or just the byte following one of those
instructions mentioned above before it gets executed. After pressing the 'Step
Over'/'Proceed' key, the debugger sets an INT 3 after the instruction if needed
and lets the program run freely until the INT 3 is executed (a debugger could
also single-step through the code to prevent this trick, though not very
likely). The debugger entirely relies on the INT 3 being there, therefore
overwriting the same byte within the call, loop, interrupt handler routine,
etc. would allow unrestricted code execution without the debugger popping up
again afterwards. This is extremely effective in very long loops because the
user would have to trace through the whole loop, and _that_ sure isn't a
wonderful thing to do.
 Although this method works best on Real Mode debuggers, some 386 debuggers may
also be affected. After all four 386 hardware memory breakpoints have already
been defined, the debugger will be forced to use INT 3 -style software
breakpoints enabling this trick. A good example is Soft-ICE, which is dumb
enough _not_ to release the hardware breakpoints for the debugger's internal
use _even_ if they're disabled. No breakpoints are necessary while tracing
anyway...
 For more tricks that can fool a 'Step Over'/'Proceed' command, see the first
part of this trick, section 4.2.9, and 4.2.10, 'Faking a procedure call'.

 ABOUT THE EXAMPLES:
 -------------------
 This is a loop which would run 65535 times for the only purpose of overwriting
 the byte following the LOOP instruction. A debugger would put the INT 3
 (opcode 0CCh) at CS:010C if the LOOP is stepped over, but since the loop
 itself restores the byte a debugger would overwrite, stepping over the LOOP
 will actually run all of the code coming after. If you're going to put this
 kind of a loop in your code, remember to make it a _long_ one.

  CS:0100 0E         push cs            ; Save code segment CS on stack and...
  CS:0101 1F         pop ds             ; ...load it as new data segment DS
  CS:0102 B9 FF FF   mov cx,FFFF        ; Going to loop 65535 times
  CS:0105 C6 06 0C   mov byte ptr [010C],CD   ; Restore byte 0CDh of INT 20
          01 CD
  CS:010A E2 F9      loop 0105          ; Looping to CS:0105 until CX=0001
  CS:010C CD 20      int 20             ; Terminate and return to DOS

4.4.3 - Playing with Prefetch Instruction Queue (PIQ)

 Invisible to the user, every 80x86 processor has a tiny memory area, as small
as 4 bytes (*) on 8086/8088 processors and as large as 32 bytes on 486's, to
quicken code execution by fetching instructions only seldom from "slow" memory.
Prefetch Instruction Queue memory isn't updated whenever the actual system
memory is changed within the range that PIQ currently holds, which makes it a
useful trick against _any_ program tracing code (single-stepping), including
386 debuggers! The trick is that if program is executed normally and no
interruptions take place (external interrupts or a Single-Step Trap, for
example), changing the following opcode into another or its operands won't
affect execution in any way because the code already in PIQ will be executed
instead. If code is being traced, on the other hand, the changed code will be
executed instead the correct one.
 This trick can also be used against "intelligent" program decompressors such
as TRON, CUP, etc. They all single-step through code while trying to find the
unpacking/decryption routine and since no human is controlling this, they could
easily be confused with a PIQ trick. One is to replace the following
instruction with an 'INT 20' (or any other interrupt call), it would terminate
the process because generic program decompressors usually will not allow any
interrupts to be executed, dumber decompressors will execute the INT 20 thus
quitting anyway. Another would be to replace an instruction with a 'JMP $'
(where '$' is the address of the same jump instruction), this will make the
decompressor _never_ stop. One final use of this trick is to lead the
decompressor to a false track with a jump, for example. The jump could be to an
infinite loop, other routine designed for debuggers or any location outside of
program code, the BIOS reboot vector at FFFF:0000, for example...
 To make a PIQ trick work in about 70% of all machines at about 85% of the
time (not to be trusted... the only certain thing is that everything is
uncertain when trying a PIQ trick ;), there should first be an instruction
flushing the PIQ, such as a jump far enough or an interrupt call. Second, not
only the opcode modifying code should be as short as possible but also the
target opcode. Finally you should make sure not to modify code too far away
from the instruction which does it so that a prefetch wouldn't occur before
reaching the modified instruction. Remember also that the modified instruction
can only be _ahead_ of the instruction modifying, that the modified code should
only be executed once (otherwise it must be restored) and that before the
modified instruction there must not be any "abnormal" changes to CS:IP (a LOOP,
RET, etc.) which could cause a PIQ flush. A CLI is in order before
modification, too, so that no external interrupts mess up the trick.
 For more information on PIQ, see 3.3.2, 'Prefetch Instruction Queue (PIQ)'.

 ABOUT THE EXAMPLE:
 ------------------
 This example replaces an instruction with an 'INT 20'. Tracing would therefore
 lead to program termination. Note that no precautions have been taken to
 prevent premature prefetch.

  CS:0100 0E         push cs            ; Save code segment CS on stack and...
  CS:0101 1F         pop ds             ; ...load it as new data segment DS
  CS:0102 C7 06 08   mov word ptr [0108],20CD   ; Replace code with INT 20
          01 CD 20
  CS:0108 ...        ...                ; (code continues here)
  -------
 *CS:0108 CD 20      int 20             ; Terminate program if tracing code

4.4.4 - Code encryption

 To protect your code from prying eyes, code encryption is one way. Usually the
whole program is encrypted with only a small decryption routine in the
beginning and once the routine has finished decryption the rest of the code can
be run. Code encryption not only inhibits code examination before decryption,
but can also trick debuggers if the encrypted code immediately follows the
decryption routine (ie. execution proceeds with the decrypted code after the
decryption loop has ended). Putting an INT 3 -style breakpoint (which is used
by any 8086 debugger) after the decryption loop, where the breakpoint would
usually be put so that you wouldn't have to trace the whole operation through,
you'd end up overwriting the the first byte of the area where the decryption
routine operates... and unfortunately losing the INT 3 opcode in favour of
whatever the decryption algorithm produces of it. To make tracing even harder
you should start decryption from the end of encrypted code instead of the
beginning. This way a breakpoint couldn't be set after decrypting just the
first byte, and the annoying decryption operation would have to be traced in
whole. Triggering hardware execution breakpoints can also be avoided if the
decryption routine transforms the LOOP, JMP, or whatever instruction was used
for looping, as its final task so that the breakpoint would fall in the middle
of the new opcode. This is, however, only possible by using a fixed decryption
key which would produce the new opcode from the old one, a separate
modification code would simply be too easy to bypass. On the other hand, a
hardware memory access breakpoint set to be triggered by the last decryptions
wouldn't be affected by these tricks.
 You'll find examples of code en-/decryptors in section 4.6, 'Simple code
encryptors'.

4.4.5 - Hooking a decryption routine to an interrupt

 This is useful in code containing more than one encrypted region. Basically
you must hook a code decryption routine as an interrupt handler and then use
the interrupt for calling the decryption subroutine. It may save you a few
bytes if used instead of CALLing the routine, especially if called with a
single-byte interrupt opcode (INT 3, INTO for INT 04 but Overflow Flag (OF)
must be set, or INT 1 (opcode 0F1h) available in 386+ processors). The
decryption routine just needs decrypted code size to be supplied if key is
hard-coded, otherwise it is also required.
 No examples here, but you may want to check out 4.4.6, 'The Running Line' and
section 4.6, 'Simple code encryptors'.

4.4.6 - The Running Line
         (idea presented by Serge Pachkovsky)

 Ever thought of decrypting code on-the-fly? Code self-decryption one
instruction in advance at a time, or virtually any self-modification (though
from here on only decryption will be discussed), can be achieved by
single-stepping through code in a similar fashion as Real Mode debuggers do it.
This is a very advanced anti-debugging method and quite resistive to various
hacking attempts, too. It never exposes long fragments of code to analysis and
makes debugging with any Real Mode debugger nearly impossible, it even hinders
tracing and suppresses memory access breakpoints on Soft-ICE (see 4.3.5, 'Using
self-tracing to fool Soft-ICE')!
 In addition to those, any execution hardware breakpoint set in the code being
decrypted can be disabled at runtime. It is done by setting the Resume Flag
(RF, bit 16) of the EFLAGS image on stack and then loading it with a 32-bit
IRET (an IRET with the Operand Size Prefix (opcode 66h)), and only in one case
this will _not_ work, that's when the 386 debugger has set IOPL to less than 3
in VM86 task and checks for a set RF bit in the image on stack before running
the IRET. However, simulating a 32-bit INT call is required for this addition:
first increment SP by 6, then do 32-bit PUSHes for the EFLAGS image, CS and
finally EIP. It's important to note that although setting 'BREAK ON' in
Soft-ICE will cause a General Protection Fault at the 32-bit IRET, it is _only_
because Soft-ICE is not expecting any 32-bit IRETs (IRET is an IOPL-sensitive
instruction). If a VM86 control program's V86 monitor is implemented correctly,
no problems will arise at IOPL<3.
 In theory, only setting the Trap Enable Flag in FLAGS and replacing INT 01
handler with a decryption routine would be enough. However, in practice it
isn't as easy as it sounds. The main problem will be how to determine the
number of bytes to be decrypted at a time. There are some solutions but each
of them has their disadvantages as well. One is to actually mark the length of
the next opcode before the opcode itself. However, it means that code size will
increase by one byte for each instruction used. Another method would be to
assume that every opcode is, say, 4 bytes in length, extend shorter opcodes
with NOPs, for example, and always decrypt 4 bytes at a time. This is the worst
way because it not only wastes lots of memory but also requires a complicated
routine to check for NOPs and so on... The last solution, and which I consider
the best, is to always decrypt a certain number of bytes and after executing
one instruction, re-encrypting them. The previous methods do not require
re-encryption, but this _does_ to properly decrypt the next opcode (mainly
needed to support longer than 8-bit decryption keys). The only disadvantage
with this method is that the user has to determine the maximum length of code
that will be executed until the next INT 01 call occurs (a NOP instruction in
case of INTs (see the notes below) and double-steps with stack segment SS
register loads must be taken into account! For more information see sections
3.3.5, '"Double-stepping"', and 4.1.1, 'Causing CPU to execute two
instructions at a time'). Even though too much code will be decrypted in
advance nearly always, the extra "code" will _not_ reveal _anything_ to anyone
debugging if the decryption key is modified by a fixed value after processing
each opcode.
 Please note though that exception #1 will not be generated until executing one
instruction after setting TF bit in FLAGS. This includes any INT instructions
and exceptions generated within the encrypted, self-traced code portion!
Therefore you should _always_ keep external interrupts disabled (CLI
instruction) while tracing (VERY IMPORTANT!), and add a NOP after any INT
instruction (INT nn+NOP must be treated as a single opcode, meaning that both
of them must be decrypted at the same time, with the same key). Having done
with the self-tracing portion, remember to restore the INT 01 vector!

 ABOUT THE EXAMPLE:
 ------------------
 This is an over-simplified example of how self-tracing works. It merely
 assumes that every opcode is only two bytes long (the worst solution used).
 Decryption key is not modified, neither does the example re-encrypt code after
 execution or check for "stuff bytes". INT 01 vector isn't restored either.

  CS:0100 31 C0      xor ax,ax          ; Zero AX
  CS:0102 8E D8      mov ds,ax          ; Load 0000 as DS segment
  CS:0104 C7 06 04   mov word ptr [0004],011C   ; Set INT 01 pointer IP to 011C
          00 1C 01                              ; and...
  CS:010A 8C 0E 06   mov [0006],cs      ; ...CS to current CS (CS:011C)
          00
  CS:010E 9C         pushf              ; Push FLAGS onto stack
  CS:010F 5B         pop bx             ; Pop FLAGS image to BX
  CS:0110 80 CF 01   or bh,01           ; Set bit 8 (TF) of FLAGS image
  CS:0113 53         push bx            ; Push FLAGS image onto stack
  CS:0114 9D         popf               ; Pop FLAGS from stack setting TF bit
  CS:0115 90         nop                ; A plain NOP, not yet encrypted
  CS:0116 C3 C2      db C3 C2           ; Encrypted instructions,...
  CS:0118 D2 33      db D2 33           ; ...wonder what they...
  CS:011A F9 32      db F9 32           ; ...really are? See below!
  CS:011C 55         push bp            ; Save BP
  CS:011D 89 E5      mov bp,sp          ; Copy value from SP to BP
  CS:011F 8B 6E 02   mov bp,[bp+02]     ; Load IP of next instruction into BP
  CS:0122 2E 81 76   xor word ptr cs:[bp+00],1234   ; Decrypt next instruction
          00 34 12
  CS:0128 5D         pop bp             ; Restore BP
  CS:0129 CF         iret               ; Return from decryptor routine
  -------
 *CS:0116 F7 D0      not ax             ; Invert AX (0000 -> FFFF) and...
 *CS:0118 E6 21      out 21,al          ; ...mask IRQs 0-7 (including keyboard)
 *CS:011A CD 20      int 20             ; Terminate and return to DOS

4.5 - Checksum generators
-------------------------
 Using self-checking code is a good idea when trying to prevent code
modifications made by a user debugging. Even though a code integrity checking
routine itself is vulnerable to attacks, if implemented properly a checksum
generator could be of immeasurable help against debuggers. One could, for
example, use self-checking in conjunction with a decryption routine and try to
use the output (checksum) of the self-checking routine as the input (decryption
key) of the decryption routine. If someone changes bytes (either a user
patching code with NOPs or a debugger inserting INT 3's), the rest of the code
will be improperly decoded making further execution impossible.
 There is an unlimited number of possibilities how a checksum generator is
could be done. Here you'll find some of the most common (and the simplest yet
pretty effective) error detection algorithms, CRC-16 and CRC-32 for example.

4.5.1 - Sum of bytes

 The term "checksum" originally probably referred to early summing formulas,
but later has widened to encompass other error detection algorithms as well.
The summing method is one of the simplest algorithms nowadays. To detect a
change in data all the bytes/words/etc. could be summed together when we're
absolutely sure of the consistency of the data, and then just re-calculate the
checksum later to verify the data. Simple, huh?
 Well, almost too simple! If the sum is calculated in an 8-bit register (sum
MOD 256), there is a 1/256 chance that an error goes undetected (this is "a
blind spot"). If a 16-bit register was used there would only be a 1/65536
chance, but this wouldn't help if the the data was short, only a few bytes.
There is also the possibility that when one byte change occurs, another error
would compensate for first one thus resulting in undetectable errors in data
while the checksum remains the same.

 ABOUT THE EXAMPLE:
 ------------------
 The example here sums all the bytes in the code, of which the checksum is to
 be calculated from, in AL after clearing the register first. Note that this
 example doesn't include the routine to compare a calculated checksum to the
 original.

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BB ll hh   mov bx,<START>     ; BX is current (and start) data offset
  CS:0105 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of checksummed code
  CS:0108 30 C0      xor al,al          ; Zero AL
  CS:010A 02 07      add al,[bx]        ; Add byte at DS:BX to AL
  CS:010C 43         inc bx             ; Increment byte offset value by one
  CS:010D 49         dec cx             ; Decrement count value by one
  CS:010E 75 FA      jnz 010A           ; Jump to CS:010A if CX!=0000
  CS:0110 ...        ...                ; (code continues here)

4.5.2 - Number of bits

 Another method would be to count the number of set (1) or clear (0) bits in
data. However good this sounds, it has the same weaknesses as summing bytes.
Actually, this is even more susceptible to undetected errors, even a great
number of them, because the errors always occur on bit level. Isn't it very
much more likely that one bit gets cleared while another gets set, than that an
8-bit value changes while another transforms to compensate for the
inconsistency in checksum?

 ABOUT THE EXAMPLE:
 ------------------
 This is a piece of code that counts all the bits set (1) in a given length of
 memory. It uses BL as a bit pointer, BH to store and rotate a value with only
 one bit set, DL to hold a byte for testing and the bit count is stored in AX.
 Note though that this example _only_ checks how many bits there are set, the
 logic required to make use of this as an integrity checker is missing.

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BE ll hh   mov si,<START>     ; SI is current (and start) data offset
  CS:0105 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of checksummed code
  CS:0108 BB 08 80   mov bx,8008        ; BL points to current bit in test, BH
                                        ; contains a value with bit 7 set
  CS:010B 8A 14      mov dl,[si]        ; Load byte at DS:SI into DL
  CS:010D 84 D7      test dl,bh         ; Test BLth bit and...
  CS:010F 74 01      jz 0112            ; ...if clear (0), skip over...
  CS:0111 40         inc ax             ; ...incremention of bit count by one
  CS:0112 D0 CF      ror bh,1           ; Rotate bits in BH left by one bit
  CS:0114 FE CB      dec bl             ; Decrement bit pointer by one and...
  CS:0116 74 02      jz 011A            ; ...if result is BL=00, skip over...
  CS:0118 EB F3      jmp 010D           ; ...jump to test next bit
  CS:011A 46         inc si             ; Increment byte offset value by one
  CS:011B E2 EB      loop 0108          ; Looping to CS:0108 until CX=0001
  CS:011D ...        ...                ; (code continues here)

4.5.3 - Multiplication and division

 While summing and bit counting methods are not very good, multiplication and
especially division ones are. One could multiply all the bytes, for example, in
the data with each other and have the checksum from the result. Since the
result increases very quickly, at least a 16-bit register should be used, to
reduce the possibility of "a blind spot", too.
 An even more reliable method than multiplication is division. The data a
checksum is calculated from can be treated as an enormously long stream of
bits, as one huge number, and then divided by any certain number which will
also be used to verify that the checksum of the data and re-calculated checksum
match. The quotient will be discarded as being totally useless since it would
become nearly as large as the bitstream (data) itself, but the remainder of the
division is useful as a checksum because it will not be any wider than the
divisor in bits. Since the bitstream can be as long as one megabit, the normal
DIV instruction cannot only be used. Rather, we must use the process of
dividing taught at school, the numbers just are represented in 1's and 0's.
Remember these figures:

                          01111  = quotient = 0Fh (will be discarded)
                     +---------
     divisor =  1011 | 10100111  = dividend = A7h (the bitstream)
          Bh =        -0000
 (fixed value)        -----
                       10100
                      -01011
                      ------
                        10011
                       -01011
                       ------
                         10001
                        -01011
                        ------
                           1101
                          -1011
                          -----
                           0010  = remainder = 2h (the checksum!)

This division method is also the basis of the many CRC algorithms.

 ABOUT THE EXAMPLE:
 ------------------
 This example uses a 16-bit value in BX as the divisor (just don't use
 BX=0000...). Two bytes will be read from memory at a time, placed in AX, high
 and low bytes are swapped because of the reverse byte ordering 80x86
 processors use, and then divided by BX. The DIV instruction is handy for the
 dividing scheme because it returns the remainder of the division in DX, which
 actually is, if you look at the process dividing above, one of the
 sub-remainders! And since DX will be the higher two bytes of the dividend, we
 just have to get some more of the bitstream we want to divide, in AX.

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BE ll hh   mov si,<START>     ; SI is current (and start) data offset
  CS:0105 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of checksummed code
                                        ; (in words)
  CS:0108 BB ll hh   mov bx,<DIVISOR>   ; BX is 16-bit divisor
  CS:010B 31 D2      xor dx,dx          ; Zero DX
  CS:010D 8B 04      mov ax,[si]        ; Load word at DS:SI into AX
  CS:010F 86 C4      xchg al,ah         ; Swap AL and AH for correct byte order
  CS:0111 F7 F3      div bx             ; Divide DX:AX by BX
  CS:0113 46         inc si             ; Increment byte offset value...
  CS:0114 46         inc si             ; ...by two (a word)
  CS:0115 E2 F6      loop 010D          ; Looping to CS:010D until CX=0001
  CS:0117 ...        ...                ; (code continues here)

  In case you need to convert this for either an 8- or a 32-bit divisor, here
 is some info (dividend is always wide double the divisor):

  Divisor   Dividend   Quotient   Remainder
  -------   --------   --------   ---------
   8-bit       AX         AL         AH
  16-bit     DX:AX        AX         DX
  32-bit    EDX:EAX      EAX         EDX

4.5.4 - Calculating CRC-16 and CRC-32

 While all Cyclic Redundancy Code (CRC) algorithms are based on the same method
of dividing a data bitstream as presented before in 4.5.3, 'Multiplication and
division', the main difference between the algorithms is the value by which the
bitstream will be divided. A thing called polynomial arithmetics is closely
related to CRC algorithms. It is binary arithmetics with no carries, ie.
corresponding bit positions have no effect on other positions, which basically
means that addition and subtraction act like a XOR operation.
 I'll describe the basic steps of any CRC algorithm here, algorithm-specific
parameters to CRC-16/32 will be revealed later. Let's suppose we have a W bits
wide divisor, or a polynomial as it should be called. It's important to note
that the width will be the position of the highest set (1) bit. Therefore the
width of 11011 is 4, not 5! Having chosen a good polynomial (some are better
than others depending on the placement of set (1) bits) to start with, W zero
bits must be added at the end of the data bitstream, which in case of a
polynomial 11011 (W=4) would be 0000. For example, bitstream 10100111 would
become 1010-01110000 (dashes are to separate groups of eight bits to improve
readability). Many CRC algorithms reflect (swap bits around the center, or
simply reverse bit order, bit 0 becoming first processed instead of bit 7) each
_byte_ before processing them. Next we just divide the bitstream with the
polynomial using the same principle as before only remembering that a number is
equal to or greater than another if its highest set (1) bit position equals to
or is higher than the other's, and that the subtraction phase is actually
XOR'ing:

                           11011110  = quotient = DEh (will be discarded)
                     +-------------
    divisor =  11011 | 101001110000  = dividend = A70h (the bitstream + 0000)
        1Bh =         -11011
(fixed value)         ------
                        11111
                       -11011
                       ------
                         01001
                        -00000
                        ------
                          10011
                         -11011
                         ------
                           10000
                          -11011
                          ------
                            10110
                           -11011
                           ------
                             11010
                            -11011
                            ------
                              00010
                             -00000
                             ------
                              00010  = remainder = 02h (the checksum!)

After reaching the final value of _the register_ (this refers to the location
where, whether a CPU register or a memory area, all operations on the bitstream
take place) the bits of the whole value will usually be reversed. The last step
would be to XOR the final remainder with a value. Most algorithms don't do
this, though. This will conclude the "theory" portion here. For more theory and
detailed information on CRC algorithms, read Ross Williams' 'A Painless Guide
to CRC Error Detection Algorithms'.
 CRC-16 and CRC-32 algorithms have been named after the width of the
polynomial. CRC-16 uses the polynomial 1-10000000-00000101 (8005h), the
initial register value is 0000h and final register value will be XOR'ed with
0000h, ie. not needed. CRC-32 uses 1-00000100-11000001-00011101-10110111
(04C11DB7h) as the polynomial, register is initialized to FFFFFFFFh and its
final value will be XOR'ed with FFFFFFFFh also. Both of these algorithms
require each byte to be reflected before processing and also reflecting the
final register value. If you want to check that a routine works OK, checksum
ASCII test string "123456789" (31h 32h 33h... in hex). CRC-16 for it is BB3Dh
and CRC-32 is CBF43926h. You may have heard of an algorithm called
"CRC-16/CCITT" but don't confuse it for "CRC-16" which is the one discussed
here.
 There are two ways of checking that data is consistent, according to its CRC.
You can either re-calculate the CRC for the data and compare it to the original
one, or you could add the CRC checksum at the end of the data it was calculated
from (low byte first, high byte last) and check if calculating a CRC from the
whole lot (no zeroes appended!) gives a result of zero. The last method is a
bit nicer and cleaner...

 ABOUT THE EXAMPLES:
 -------------------
 These examples are not as efficient as table-driven implementations of CRC
 algorithms but they do show the CRC calculation principle. They are very clean
 code fragments and thus quite fast, and if you're lacking the space for the
 pre-calculated tables I recommend using one of these. They're also quite
 flexible: it isn't too hard to transform them to support other polynomial or
 register widths. Caveat though! These examples will unconditionally overwrite
 W bits with zeroes when appending after the data to be checksummed. This could
 also lead to system hang if segment limit is reached.
  Example #1 is a basic CRC-16 implementation, and all needed data is stored
 and manipulated in CPU registers. Everything has already been reflected,
 including the polynomial (CRC-16 polynomial's reflection is A001h, CRC-32's
 EDB88320h), so that no reflection routine would be needed consuming more
 memory. CRC-16 polynomial's reflection is stored in AX, each new byte is
 loaded to BL, BP works as a bit pointer and the dividend, and finally the CRC,
 will be stored in DX.
  Example #2, a CRC-32 checksummer, stores "the register" in memory. Operation
 is almost identical to example #1 but CRC-32 polynomial is stored in EAX and
 the dividend, finally the CRC reversed, will be stored in memory. The last
 stage of calculating CRC-32 is to reverse the register and NOT it (XOR with
 FFFFFFFFh).

 Example #1 (calculating CRC-16 in CPU registers, reflected):

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BE ll hh   mov si,<START>     ; SI is current (and start) data offset
  CS:0105 B9 09 00   mov cx,<LENGTH>    ; CX defines length of checksummed code
  CS:0108 31 D2      xor dx,dx          ; Set initial register value (zero DX)
  CS:010A 89 CB      mov bx,cx          ; Copy value from CX to BX
  CS:010C 89 10      mov [bx+si],dx     ; Append 16 zero bits to DS:SI+BX
  CS:010E 41         inc cx             ; Increment length for the zeroes...
  CS:010F 41         inc cx             ; ...by two (a word)
  CS:0110 B8 01 A0   mov ax,A001        ; AX is reversed CRC-16 polynomial
                                        ; (divisor)
  CS:0113 BD 08 00   mov bp,0008        ; BP is current bit pointer
  CS:0116 8A 1C      mov bl,[si]        ; Load byte at DS:SI into BL
  CS:0118 D0 DB      rcr bl,1           ; Rotate bits in BL right by one bit,
                                        ; in/out through Carry Flag
                                        ; RCL: Bytes normal (bit 7 first)
                                        ; RCR: Bytes reflected (bit 0 first)
  CS:011A D1 DA      rcr dx,1           ; Rotate bits in DX right by one bit,
                                        ; in/out through Carry Flag and...
  CS:011C 73 02      jnb 0120           ; ...if result is CF=1, skip over...
  CS:011E 31 C2      xor dx,ax          ; ..."subtraction" of AX from DX
  CS:0120 4D         dec bp             ; Decrement bit pointer by one and...
  CS:0121 74 02      jz 0125            ; ...if result is BP=0000, skip over...
  CS:0123 EB F3      jmp 0118           ; ...jump to test next bit
  CS:0125 46         inc si             ; Increment byte offset value by one
  CS:0126 E2 EB      loop 0113          ; Looping to CS:0113 until CX=0001
  CS:0128 ...        ...                ; (code continues here)

 Example #2 (calculating CRC-32 with register stored in memory):

  [* not enough info on CRC-32 specification -> problems with implementation *]

4.6 - Simple code encryptors
----------------------------
 Equipping your program with lots of debugger traps may be useless if program
code itself isn't protected in any way. You should consider encryption as a
part of your program. Of course, the decryption routine should also be heavily
booby-trapped to benefit from code encryption.
 The world is full of different code encryptor implementations, you just pick
one! The methods of code encryption described here are just some of the
simplest ones, XOR-encryption being the most common. But of course, if anybody
sends me one of his own, I'll be happy to put it here.
 Note though, that the examples here will not only show how to use the
different types of encryption but also some of the many ways of handling
memory, so watch them closely. Depending on what you need an encryptor for,
you'll have lots of choices since some of these are very fast, others very
small in size.

4.6.1 - XOR en-/decryption

 XOR encryption is the most popular form of code scrambling, at least amongst
the virus writer community because it offers variable encryption with only few
instructions, yet it is not quite effective. It is very easy to use, though,
because the same XOR instruction can be used for both encryption and
decryption, only one code fragment is needed for both operations. Since XORing
inverts the bits of the destination value that were set in the value XORing was
done with, the strongest encryption will be gained using encryption keys with
most bits set.

 ABOUT THE EXAMPLE:
 ------------------
 This just XORs the code, byte by byte from the beginning. The type of memory
 handling is direct manipulation. It allows very short decryption routines if
 used with a LOOP instruction as shown here. Hardly more than 10 bytes and very
 fast, too!

  CS:0100 BB 0D 01   mov bx,010D        ; BX is current (and start) data offset
  CS:0103 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of encrypted code
  CS:0106 2E 80 37   xor byte ptr cs:[bx],12   ; XOR byte at CS:BX with 12h
          12                                   ; (decrypt)
  CS:010A 43         inc bx             ; Increment byte offset value by one
  CS:010B E2 F9      loop 0106          ; Looping to CS:0106 until CX=0001
  CS:010D ...        ...                ; (encrypted code starts here)

4.6.2 - NOT en-/decryption

 NOT is also a considerable encryption scheme. Just like XOR, the same
instruction can be used for both encryption and decryption. Actually, NOT is
the same as XORing with a key all the bits of which are set thus inverting all
of the bits in the destination value. Therefore, nothing is gained from using
both of these Boolean operations to encrypt code.

 ABOUT THE EXAMPLE:
 ------------------
 This performs a NOT operation on the encrypted code, word by word. Here a word
 is read from memory into AX register with a MOV instruction before inverting
 its bits, and then written back with MOV. Instead of LOOP, a combination of
 'DEC CX' and JNZ instructions is used (a bit quicker than a single LOOP!).

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BB 13 01   mov bx,0113        ; BX is current (and start) data offset
  CS:0105 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of encrypted code
                                        ; (in words)
  CS:0108 8B 07      mov ax,[bx]        ; Load word at DS:BX into AX,...
  CS:010A F7 D0      not ax             ; ...invert AX (decrypt) and...
  CS:010C 89 07      mov [bx],ax        ; ...store AX back to DS:BX
  CS:010E 43         inc bx             ; Increment byte offset value...
  CS:010F 43         inc bx             ; ...by two (a word)
  CS:0110 49         dec cx             ; Decrement count value by one
  CS:0111 75 F5      jnz 0108           ; Jump to CS:0108 if CX!=0000
  CS:0113 ...        ...                ; (encrypted code starts here)

4.6.3 - Bitwise rotation

 There are some instructions available which rotate the bits in a value to the
specified direction, either through the Carry Flag or not. This offers a pretty
good type of encryption. It is recommendable for use in conjunction with any
other encryption method, especially the Boolean ones to increase their
otherwise low security.
 ROL and ROR instructions rotate bits left or right and the last bit rotated
out is pushed in from the opposite side, and also saved in Carry Flag. RCL and
RCR instructions are similar, but they rotate through Carry Flag (ie. Carry
Flag is an additional bit in the rotation). All of these can encrypt data, but
to use the same instruction for decryption the number of bits rotated must be
half the total number of bits in the value (eg. 4 bits for a byte). However,
rotating 8 bits in a word, for example, will only result in exchanging low and
high bytes thus not really encrypting them at all.

 ABOUT THE EXAMPLE:
 ------------------
 This example rotates a byte by 4 bits. Note that copying CS to DS segment
 takes one byte more than using a simple CS: segment override prefix, and is
 slower also.

  CS:0100 0E         push cs            ; Just to make sure we have...
  CS:0101 1F         pop ds             ; ...the correct DS
  CS:0102 BB 0E 01   mov bx,010E        ; BX is current (and start) data offset
  CS:0105 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of encrypted code
  CS:0108 C0 07 04   rol byte ptr [bx],04   ; Rotate byte at DS:BX left by 4
                                            ; bits (decrypt)
  CS:010B 43         inc bx             ; Increment byte offset value by one
  CS:010C E2 FA      loop 0108          ; Looping to CS:0108 until CX=0001
  CS:010E ...        ...                ; (encrypted code starts here)

4.6.4 - NEG en-/decryption

 NEG (Negate) instruction is intended for negating signed values but it can be
used for encryption as well. Negating a value is done by subtracting the value
from zero. Therefore the value itself works as the encryption key. Two
successive negations on the same value produce the original, so the same piece
of code can be used for both encryption and decryption, here also.
 For an example, see 4.6.2, 'NOT en-/decryption'. Just replace the 'NOT AX'
with a 'NEG AX'.

4.6.5 - Basic arithmetic operations as en-/decryption algorithms

 Addition and subtraction (ADD and SUB) are one way of scrambling data or code
into unrecognizable form. Simply by adding or subtracting a value from a byte,
word, etc. code can be effectively encrypted and easily decrypted, too.
Since ADD and SUB are reverse operations, naturally they can also be treated as
encryption and decryption. Carrying or borrowing won't cause any trouble
because values wrap at their low and high limits.
 No examples given. The others here should be sufficient.

4.6.6 - En-/decryption using translation tables

 This is quite an inconvenient encryption scheme. There is an instruction
called XLAT for hardware support of translating a character into another. It
may be used for encrypting data, too, but the required 256-byte array for
translation table makes the routine quite large. XLAT requires BX to point to
the DS segment offset of the start of the translation table and uses the value
in AL register as an index to the table, finally AL is loaded with the byte
fetched from the table. The same piece of code can be used for both encryption
and decryption, provided that the bytes are as pairs, pointing to each other's
location in the table. For example, if in location 09h of the table there is
02h, then location 02h should contain 09h. Otherwise two separate translation
tables are needed consuming another 256 bytes.

 ABOUT THE EXAMPLE:
 ------------------
 This example decrypts code, byte by byte, using a translation table. It also
 uses string instructions LODSB and STOSB to load and store a byte from memory
 temporarily in AL (a MOV from and to memory would be faster but takes a few
 bytes more space) for translation and then writes it back. Note that the
 translation table is _not_ shown!

  CS:0100 0E         push cs            ; Save code segment CS on stack and...
  CS:0101 1F         pop ds             ; ...load it as new data segment DS
  CS:0102 BE 16 01   mov si,0116        ; DS:SI will be the source
  CS:0105 0E         push cs            ; Save code segment CS on stack and...
  CS:0106 07         pop es             ; ...load it as new extra segment ES
  CS:0107 89 F7      mov di,si          ; ES:DI will be the destination
  CS:0109 BB ll hh   mov bx,<TBL_START> ; Translation table starts at DS:BX
  CS:010C B9 ll hh   mov cx,<LENGTH>    ; CX defines length of encrypted code
  CS:010F FC         cld                ; Direction is from start towards end
  CS:0110 AC         lodsb              ; Read byte at DS:SI into AL register
  CS:0111 D7         xlat               ; Translate byte in AL (decrypt)
  CS:0112 AA         stosb              ; Write byte in AL register to ES:DI
  CS:0113 49         dec cx             ; Decrement count value by one
  CS:0114 75 FA      jnz 0110           ; Jump to CS:0110 if CX!=0000
  CS:0116 ...        ...                ; (encrypted code starts here)

4.6.7 - Scrambling original byte order

 This is not actually an encryption method at all. It just produces
unintelligible code and simply by putting bytes in the correct sequence the
real code can be seen.

 ABOUT THE EXAMPLE:
 ------------------
 Here a word is read from memory to AX, the high and low bytes are swapped and
 then the word is written back.

  CS:0100 0E         push cs            ; Save code segment CS on stack and...
  CS:0101 1F         pop ds             ; ...load it as new data segment DS
  CS:0102 BE 13 01   mov si,0113        ; DS:SI will be the source
  CS:0105 0E         push cs            ; Save code segment CS on stack and...
  CS:0106 07         pop es             ; ...load it as new extra segment ES
  CS:0107 89 F7      mov di,si          ; ES:DI will be the destination
  CS:0109 B9 ll hh   mov cx,<LENGTH>    ; CX defines length of encrypted code
                                        ; (in words)
  CS:010C FC         cld                ; Direction is from start towards end
  CS:010D AD         lodsw              ; Read word at DS:SI into AX register
  CS:010E 86 C4      xchg al,ah         ; Exchange AL and AH contents (decrypt)
  CS:0110 AB         stosw              ; Write word in AX register to ES:DI
  CS:0111 E2 FA      loop 010D          ; Looping to CS:010D until CX=0001
  CS:0113 ...        ...                ; (encrypted code starts here)

4.7 - Polymorphic code encryptors
---------------------------------
 [* will be covered some time if any need to *]

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SECTION 5: APPENDICE
====================

APPENDIX A: Explanations/Glossary
---------------------------------
 -Breakpoint: Breakpoints are used by debuggers to stop code execution at a
   certain location and return control to the debugger.
 -Debugger: Debuggers are utilities which provide tools to debug programs.
    8086 software debuggers (aka. 8086/software/Real Mode debuggers) rely on
   using a 8086 processor's debugging capabilities (ie. only single-stepping
   and using INT 3 instructions as breakpoints), hence the name.
    386 hardware debuggers (aka. 386/hardware/Protected Mode/VM86 debuggers)
   take advantage of the hardware of a 386 or better processor for debugging
   purposes.
   (for more information on these two types see section 2.2, 'How a debugger
   works')
 -Debugging: Debugging is, as the word itself tells us, removing bugs from
   code or simply examining code. It involves examining code execution and thus
   trying to find a known error in the source.
 -Exception: Exceptions are interrupts internally generated by the CPU, often
   due to an error condition. When the CPU detects an invalid opcode fault, for
   example, it will generate exception #6 (186+ processors only). A note
   though: unlike hardware interrupts, exceptions are synchronous to code
   execution and thus can always be reproduced under same conditions.
 -Interrupt: Interrupts are used to literally interrupt code execution.
    Software interrupts, ie. caused by an exception or an INT instruction
   (which is handled as an exception, too), have an interrupt table (in Real
   Mode) or an Interrupt Descriptor Table (IDT) entry (in Protected Mode) which
   contains the address of the interrupt service routine.
    Hardware (external) interrupts are caused by an external signal to the
   processor's INTR (Interrupt Request) or NMI (Non-Maskable Interrupt Request)
   pins. When a signal comes to the INTR pin, the signaller will supply the
   interrupt service routine number to be called. An NMI signal is always
   hooked to INT 02.
 -Interrupt Descriptor Table (IDT): See 'Interrupt table' below!
 -Interrupt table: Interrupt table contains address pointers (vectors) to the
   interrupt handler routines called by INT instructions, exceptions and
   external hardware interrupts.
    In Real Mode the interrupt table is always located in the very beginning of
   memory, the area's size is 03FFh bytes (1KB) and each INT entry takes four
   bytes, two for the IP word and another two for the CS word (an example
   entry: 78563412 points to 1234:5678).
    In Protected Mode, Interrupt Descriptor Table (IDT) is used instead. An IDT
   can be located anywhere in memory, size can vary (but usually 07FFh bytes
   (2KB) to contain all 256 possible interrupt vectors) and both of these can
   be modified with an LIDT (Load IDT) instruction. Each entry in IDT consumes
   8 bytes. (need some info on how an IDT entry is formed!)
    In a VM86 task, don't mistake the virtualized interrupt table for the
   Protected Mode IDT. Since Virtual 8086 Mode is just a sub-system of
   Protected Mode, all interrupts attempt to call the handler routine pointed
   to by the appropriate IDT entry. This will succeed if IOPL bits in FLAGS are
   set to 3 (which is the CPL of a VM86 task), but otherwise (IOPL<3) they all
   fail and cause exception #13 instead. VM86 tasks have because of their
   simulated Real Mode nature, however, a "virtual" interrupt table in the
   beginning of the VM86 task memory which is consulted either by a Protected
   Mode interrupt handler when IOPL=3, or a V86 monitor if IOPL<3, to redirect
   the interrupt to its VM86 routine. (see also 'V86 monitor')
 -I/O Permission Bitmap: I/O Permission Bitmap allows, an operating system for
   example, masking off access to certain I/O ports from non-privileged tasks
   (this only applies to Protected Mode and Virtual 8086 Mode). The bitmap
   consists of up to 64Kbits, each of which represents a single byte-wide I/O
   port (two bits for a word-wide and four bits for a dword-wide port). Setting
   any of the bits corresponding to the I/O port will disable I/O to and from
   it from a non-privileged task.
    In Protected Mode the bitmap will be used if the task's Current Privilege
   Level value is greater than the one specified in the IOPL bits of FLAGS
   register. In VM86 mode, the IOPL value is not checked but rather all tasks
   are subject to the bitmap. Exception #13 will be generated if access to a
   port is denied from the task trying to perform I/O.
 -Opcode (Operational Code): Opcodes control a whole processor's operation.
   They are instructions encoded as a flow of bits so that the processor could
   understand them. Don't confuse symbolic instructions with opcodes, though.
   For example, the opcode for a 'NOP' instruction would be 90h.
 -Privilege Level (PL): Intel processors starting from 80286's use internal
   privilege levels (effective only in Protected Mode) to protect execution of
   certain instructions and memory access from tasks that aren't privileged
   enough. Privilege Level varies from 0, the most privileged, to 3 being the
   least privileged. Default (and immodifiable) value for Real Mode tasks is 0
   and for VM86 tasks 3. A Protected Mode task's PL is one of the four possible
   values and can be changed to the preference of an operating system.
 -Single-Stepping: Single-stepping is a method used by debuggers to execute
   only one instruction of code at a time. This way a user may watch the code
   execute, step by step.
 -Task State Segment (TSS): TSS was introduced with Protected Mode in 286
   processors and is used for multi-tasking to save system state for a each
   Protected Mode task separately.
    In a 286 style TSS, contents of all registers including LDT, initial SS:SP
   for returning to Privilege Levels 0-2 and a back link to previous task is
   saved, in 16 bits.
    386 style TSS's also contain CR3 register and the two extra segment
   registers, FS and GS, a T (Debug Trap) bit (for supporting breakpoints on
   task switches), offset of the I/O Permission Bitmap and the bitmap itself.
   In addition to these, free-formed extra info about the task can also be
   entered in a 386 TSS. All registers are saved in 32 bits stuffing 00's in
   the high word of segment registers.
 -Tracing: Tracing simply means single-stepping through code and examining code
   execution at the same time.
 -V86 monitor: When called from a VM86 task with an IOPL of less than 3, all
   software interrupts (except for the single-byte INT 3, INTO and BOUND
   instructions, which are not sensitive to the IOPL bits) generate exception
   #13. Therefore interrupt #13 handler must contain (in Protected Mode) a
   suitable monitor routine to determine whether the exception was caused by a
   VM86 task calling interrupt services or by a General Protection Fault.

APPENDIX B: Suggested reading for info   [+] = Printed on paper
--------------------------------------   [-] = In "electronical" form
 +i486 Microprocessor (Intel, order #240440): An Intel 80486 processor databook
    Available from your local Intel representative for free! (for an Adobe
   Acrobat-format copy see Appendix D, 'Useful Internet sites')
 -HelpPC (David Jurgens): A program with a large database of 80x86 instructions
   and other useful stuff concerning PCs.
 -Interrupt List (Ralf Brown): A most complete database of DOS interrupts and
   other valuable PC hardware info.
 -A Painless Guide to CRC Error Detection Algorithms (Ross Williams): A
   thorough guide to error detection algorithms

APPENDIX C: Useful E-mail addresses
-----------------------------------
 inbar@glx.chief.co.il : Inbar Raz (the author of 'Anti Debugging Tricks')
   (Internet - FidoNet gate: Inbar.Raz@p42.f100.n403.z2.fidonet.org)
   (FidoNet: Inbar Raz, 2:403/100.42)
 support@intel.com : Support (questions/comments/etc.) for Intel products
 rcollins@x86.org : Robert Collins (the sys. admin. of the X86 website)
                    NOTE: This is _only_ for personal mail! For questions, use
                          one of the following addresses instead.
   intelcpu@x86.org : Questions about Intel microprocessors
   othercpu@x86.org : Questions about non-Intel microprocessors
   support@x86.org : Technical support on anything else
 ralf@cs.cmu.edu : Ralf Brown (the author of the 'Interrupt List')
   (Alternate address: ralf@pobox.com)
 ross@guest.adelaide.edu.au : Ross Williams (the author of 'A Painless Guide to
                                            CRC Error Detection Algorithms')

APPENDIX D: Useful Internet sites
---------------------------------
 http://developer.intel.com/design/product.htm
   * Intel Developer Home and On-Line Literature website:
      Offers free data sheets and manuals of many Intel products, especially
      80x86 processors, in Adobe Acrobat format. Also available as FTP (see
      below).
       The full contents of this site are also available as a _FREE CD_,
      'Developers' Insight CD-ROM'!

 ftp://download.intel.com/design/
   * Intel On-Line Literature FTP site:
      Contains all the documents as available through WWW (see above), but due
      to scrambled filenames and no descriptions, it's recommended to use the
      website instead.

 http://www.x86.org/
   * Intel Secrets (X86) website:
      Lots of information on undocumented Intel 80x86 CPU features as well as
      bugs. Also available through FTP (see below).

 ftp://ftp.x86.org/
   * Intel Secrets (X86) FTP site:
      Has all the articles, demonstration source code and other stuff as the
      website (see above).

 http://www.cs.cmu.edu/afs/cs/user/ralf/pub/WWW/files.html
   * Ralf Brown's Home Page, primary website:
      Here you can get the most recent 'Interrupt List' and other software he's
      written. If this doesn't work, try the one below.

 http://www.pobox.com/~ralf/files.html
   * Ralf Brown's Home Page, alternate website:
      Plainly redirects all requests to the current address.

]=============================================================================[

Special thanks to:
------------------
 People who have actively helped with this project:
  -Inbar Raz: The author of the interesting article, 'Anti Debugging Tricks'.
              It was a very good introduction into the world of anti-debugging!
  -Warren Ellis: The engineer at Intel overloaded with questions concerning
                 80x86 debugging features... :]

 People whose work has passively helped with this project:
  -Michael Forrest: Hm? Why did _your_ name end up here? Don't know... Anyway,
                    read the reply to your 'Anti-Anti Debugging Tricks' article
                    in file TO_MFORR.TXT, if you ever see this.
  -Werner "Dirk Gently" Zsolt: The author of 'The Action Replay card for the
                               PC', an article about the card. (see section
                               2.4.2)
  -Ross Williams: The author of 'A Painless Guide to CRC Error Detection
                  Algorithms', an article about error detection and algorithms.
                  It enabled me to fill in the checksum generator section.

Contacting me:
--------------
 E-mail: mhk@sci.fi

  (to get some more info on contacting me, please finger my account!)

 Please, do E-mail me if you have any questions or comments, or even something
 to contribute to this document. Also, don't hesitate to contact me if you'd
 like to make any suggestions or want something to be discussed here. I _do_
 need your help to develop this "whitepaper" further.

Information in demand:
----------------------
 General info:
  -New (or old ;) ideas for tricks, preferably with code examples. Anything
    will do varying from user confusion code to code encryption and CRC checks.
  -Corrections and additions to info/code examples
  -...whatever you think might be of use!

 Info especially needed about:
  -Other debuggers than the very limited number already listed.
    Needed are the name and version of the debugger, if it uses hardware
    capabilities for debugging (primarily/optionally), whether it uses Real
    Mode or Virtual 8086 Mode to execute code, interrupt vectors grabbed for
    its own use (required with 8086 debuggers) and possibly other information.
  -Undocumented Soft-ICE back door commands (see section 3.3.8)
  -Hardware debugger cards
  -In-Circuit Emulators/Debuggers (ICE/ICD) (what they are, how they work...)

                                                 SiGNED:MHK
