This file contains a write-up of the more technical aspects of uuencoding and uudecoding. First read the file UUSER.TXT, then read this for more details. Documentation for UUENCODE/DECODE 96 (v56) UU-encoding is a way to code a file which may contain any characters into a standard character set that can be reliably sent over diverse networks. THE CHARACTER ENCODING: The basic scheme is to break groups of 3 eight bit characters (24 bits) into 4 six bit characters and then add 32 (a space) to each six bit character which maps it into a readily transmittable character. Another way of phrasing this is to say that the encoded 6 bit characters are mapped into the set: `!"#$%&'()*+,-./012356789:;<=>?@ABC...XYZ[\]^_ for transmission over communications lines. As some transmission mechanisms compress or remove spaces, spaces are changed into back-quote characters (a 96). (A better scheme might have been to use a bias of 33 so a space is not created, but this is not done.) A newer, less popular, encoding method, called XX-encoding uses the set: +-01..89ABC...XYZabc...xyz In my opinion, XX-encoding is superior to UU-encoding because it uses more "normal" characters that are less likely to get corrupted. In fact several of the special characters in the UU set do not get through an EBCDIC to ASCII translation correctly. Conversely, an advantage of the UU set is that it does not use lower case characters. Now-a-days both upper and lower case are sent with no problems; maybe in the communications dark ages, there was a problem with lower case. This "UU" encode/decode pair can handle either XX or UU encoding. The encode program defaults to creating a UU encoded file; but can be run with a "-x" option to create an XX encoding. The decode program defaults to autodetect. However the program can get confused by comment lines preceding the actual encoded data. The decode mode can be forced to UU or XX with the "-u" or "-x" parameter. Another option is for the character mapping table to be inserted at the front of the file. The format for this is discussed later. A table is automatically detected and used by this decode program. (A table will override the "-x" or "-u" parameters.) The encode program can be run with a "-t" option which tells it to put the table into the encoded file. A third encode mapping is the one used by Brad Templeton's ABE program. This is not handled by these programs as the check and control information surrounding the actual encoded data is in a different form. From a theoretical view, this encoding is breaking down 24 bits modulo 64. Note that 64**3 is = 2**24. The result is 24 bits in for 32 bits out, a 33% size increase. Note that 85**5 > 2**32. Also note that there are 94 transmittable ASCII characters (from 0x21 through 0x7e). Thus modulo 85 encoding (the atob encoder) transforms 32 bits to 5 ASCII chars or 40 bits for a 25% size increase. The trade off in the modulo 85 encoding is that many communications systems do not reliably transmit 85 ASCII characters. The tilde, carat, brackets, and sometimes upper or lower case, may get corrupted. There are three other popular encoding techniques. One is BinHex used on Apple Computers. The current version is BinHex 4.0. BinHex uses another mapping into 64 characters. The first encoded line in a BinHex file is an encoded structure that contains the file name, size, checksum, date and time The remaining lines are encoded data. Another technique that I have seen is BinMail used on Unisys A-series. The most recent mechanism is incorporated into MIME and is called Base64 encoding. I support this as a $20 shareware option. It uses another character mapping of 64 characters with no checksums and no special "end" indicator. COMPOSING A LINE OF ENCODED CHARACTERS: A small number of eight bit characters are encoded into a single line and a count is put at the start of the line. (Most lines in an encoded file have 60 encoded characters (45 original bytes). When you look at a UU-encoded file note that most lines start with the letter "M". "M" is decimal 77 in ASCII which, minus the 32 bias, is 45.) BinHex does not use a count character, every encoded line contains 64 characters. Except the last is limited by the file size. Base64, like BinHex, does not use a count character. An encoded line can be any length (modulo 4). The last line also is module 4 length but the "=" character indicates no character so the actual file length can be recreated. This encode program optionally puts a check character at the end of each line. The check is the sum of all the encoded characters, before adding the mapping, modulo 64. Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it uses the sum of the original, not the encoded characters. This decode program accepts either form of line check character. In previous versions of my package (4.13 and lower) the line check characters was generated by default by the encode program and was suppressed with the "-L" option. One reason to suppress them is if the decode will be done by one of the old Horton decoders. Most modern decoders either accept this form of line level check or will simply stop looking after the line length is exhausted. My feelings are mixed about the current need for line checksums because errors of this type essentially never occur. Given modern, error-free communications systems and the CRC checks on the entire file (see below) I have made the default for uuencoding to have NO line level check characters effective version 4.21. The "-L" option on uuencode turns on generation of line checksums. If you have a problem communications system and want to isolate the trouble, turn them on. Uudecode automatically checks for the presence of line checksums; so the default for uudecode is to leave line level checks on. If there are some problems the "-L" option for uudecode turns them off; sometimes there is junk at the end of the line which causes spurious line checksum errors. I have encountered various other ways that encoders end lines. One encoder put an "M" at both the start and end of the line. Another used a line count character. This decode program checks all of these. I would not be surprised if some encoder out there ends lines with sequential astrological symbols. If you encounter some other weird form of encoded file, let me know. (The -L option turns line level checking off.) PACKAGING THE LINES INTO FILES: The lines of encoded data can be preceded by comments and by network addressing information. The encoded data is directly preceded by a line containing: begin This line is created by the encode program. The decode program scans the file looking for "begin" in column 1. If following line is the encoded data, the decoding process begins. Some encoders put file time and date information on the begin line: begin