From : Ed Beroset 1:3641/1.250 To : Bob Kohl Subj : reading object code? ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ In a msg on , to All, Bob Kohl writes: BK> file for this program. In the old days, I learned how to read an BK> object file. That was when I was learning to program ASM on a BK> mainframe. I'm a bit out of practice, and I'm not sure the same BK> method holds true for the PC. It does, but the object module format (OMF) for 80XXX linkers is wickedly complex. First, a bit of background. The OMF was first promulgated by Intel for their own compilers and linkers. It was created way back in the dark old days when memory was at a premium, so all sorts of gimmicks were used to reduce the amount of memory required to store OMF records. Later, other companies, including Microsoft, IBM, Borland, and PharLap added to the original OMF and, unfortunately, didn't always use the same conventions. Microsoft's documentation, while incomplete, is still the most comprehensive I've seen and is an essential reference for anybody needing to understand OMF as it is currently used. It is available for download from Microsoft's BBS, and probably via their ftp site as SS0288.ZIP. It's also commonly available on programmers' BBSs around the world. Until you download that file, here's a rough sketch so you'll have some idea how to read these things. The first thing to realize is that OMF files consist of a series of records. Every record consists of a one byte Record Type, followed by a word-sized Record Length. What follows that is data bytes and lastly the checksum. The Record Length includes the data bytes and checksum but not the Record Type or Record Length bytes. The checksum byte is calculated to make the sum of all bytes in the record (modulo 256) equal zero, although some language products simply put a zero in this byte instead of calculating a checksum. Given just that much information, your object file breaks into the following records: 0 1 2 3 4 5 6 7 8 9 A B C D E F 80 0E 00 0C 6B 65 79 69 6E 67 30 33 2E 61 73 6D 0D 96 25 00 00 06 44 47 52 4F 55 50 04 44 41 54 41 04 43 4F 44 45 05 53 54 41 43 4B 05 5F 44 41 54 41 05 5F 54 45 58 54 8F 98 07 00 48 21 00 07 04 01 EC 98 07 00 48 2D 00 06 03 01 E2 98 07 00 74 00 01 05 05 01 E1 9A 06 00 02 FF 02 FF 03 5B 88 04 00 00 A2 00 D2 A0 06 00 02 00 00 19 00 3F A0 16 00 02 1B 00 4F 75 74 70 75 74 20 66 69 6C 65 6E 61 6D 65 3A 20 24 BD A0 25 00 01 00 00 B8 00 00 8E D8 B4 0A BA 00 00 CD 21 BA 00 00 B4 09 CD 21 BB 01 00 BA 00 00 B4 09 CD 21 B4 4C CD 21 42 9C 19 00 C8 01 15 01 01 C4 08 14 01 02 C4 0D 10 01 02 1B 00 C4 17 10 01 02 02 00 99 8A 07 00 C1 00 01 01 00 00 AC Record type 80h is THEADR (Translator HEADer Record) which contains the name of the source module. In this case, it reads "keying03.asm" followed by the checksum byte 0dh. Record type 96h is an LNAMES record, which specifies a list of names to be used subsequently in the file. Each name consists of a length byte, followed by the actual name. Null names (e.g. with zero length) are valid. In this case, you've got the following names: '' 'DGROUP' 'DATA' 'CODE' 'STACK' '_DATA' '_TEXT' Record type 98h is a SEGDEF record. The format for this one is quite complex, but basically it contains fields which describe the name (by reference to the previous LNAMES record), the size, the class, combine type and other attributes. In your case there are three SEGDEF records which describe the following segments: _TEXT WORD PUBLIC Class 'CODE' Length: 0021 _DATA WORD PUBLIC Class 'DATA' Length: 002d STACK PARA STACK Class 'STACK' Length: 0100 Record type 9Ah is a GRPDEF record, which is about as complicated as the SEGDEF records. This one specifies that the _DATA and STACK segments are both in the group named DGROUP. Record type 88h is a COMENT record. This particular one is an "A2h" class comment, which is a marker indicating that a two-pass linker can stop the first pass at this point. All of the tables required internally by the linker (e.g. the above lists of segment names, combine classes, etc.) have been read in by this point, so the linker can proceed to pass two. This record type isn't technically required but it can theoretically speed up the link process. I can tell from this that you're probably using MASM 5.10 -- later versions of MASM don't seem to generate records of this type. Record type A0h is an LEDATA record. LEDATA (Logical Enumerated DATA records) and LIDATA (Logical Iterated DATA records) are where the "meat" of the program actually lies. For that reason, I'll describe it in a bit more detail. A0 06 00 02 00 00 19 00 3F A0 is the record type, and 06 00 means there are six bytes in the record. The 02 means that this is segment #2 (the _DATA segment as described above by the SEGDEF records). The following word (00 00) specifies the address of this data relative to the beginning of the segment. Successive data records occupy higher addresses. In this case, it's the first data in this segment, so it starts at 0. What follows is the actual data. In this case, we only have two bytes, 19 00 which correspond to the following two assembly language source lines: BK> mlbuff db 25 ;Maximum # of byes to read BK> albuff db 0 ;Number of bytes read by DOS The checksum is 3Fh. The next LEDATA record follows the same format and indicates segment #2 (_DATA), an offset of 01bh (leaving room for 25 bytes of strbuff + 2 bytes of data in the last record), and 12h data bytes which correspond to the following source line: BK> input lit db "Output filename: $" ; setup for input The next LEDATA record contains the actual code for your program. If you generated a listing file when you assembled the file, you should be able to match up the data bytes in this record with the code bytes generated by the assembler. In this case: BK> mov ax,@data BK> mov ds,ax ... corresponds to B8 00 00 8E D8... Finally we get to record type 9Ch, which is a FIXUPP record. This is one of the most complicated records commonly used in OMF files, so I won't go into any detail here, but in a nutshell, it specifies the locations of things that need to be "fixed up" by the linker. As an example, the mov ax,@data instruction is encoded as though @data = 0. This isn't necessarily true, since the _DATA segment might actually be anywhere. In this case, the first FIXUPP subrecord specifies that at location 0001h in the previous LEDATA or LIDATA record, a logical segment base address specified by a Group Index #1 (which refers to the _DATA segment within the DGROUP group) may need a fix-up. (whew!) The last record is type 8Ah, a MODEND (MODule END) record. It is what its name implies, specifying the end of a module. This particular MODEND record also indicates that this is the main module and that the starting location (relative to the start of the _TEXT segment) is 0. Now aren't you sorry you asked? :-) There used to be (and probably still is) a utility that Microsoft provided to dump out object files. While you're on Microsoft's BBS, you might look for such a utility there. There are a number of freeware utilities that do the same thing, and the utility I use (IMHO the most complete and helpful) is one called TDUMP that is shipped with Borland's language products (I used TDUMP's output to help create this message). -> Ed <- --- Squish v1.01 * Origin: = Psychotronic BBS // 919-286-4542 // Durham, NC = (1:3641/1.250)