Microsoft MS-DOS CD-ROM Extensions MS-DOSifying your CD-ROM 23 March, 1990 Most of the following caveats apply to the present version of the Microsoft CD-ROM Extensions. Future versions of the extensions are expected to support many of the features listed below that are at present best avoided. The behavior of the extensions with fields and records that are presently ignored may change at any time. - Correctness Make sure that your disc is in valid High Sierra format. Nothing is guaranteed if your disc is not in valid format. Surprisingly enough, we have received several discs that have one or more illegally formatted data areas ranging from directories being sorted incorrectly, incorrect path table sizes, incorrect directory file sizes, directories missing from the path table, invalid directory names, etc. In almost every case, the Extensions will behave incorrectly and at worst, the system will crash. In addition to running validation software to verify the High Sierra image, one should also verify that the Extensions work with your cdrom disc and application software before distributing it. Unfortunately, it may not matter if your disc is correct and the Extensions are wrong if they don't work together. Please report any and all problems you think are in the Extensions to Microsoft so that they can be fixed. - Pathtable and Directory sizes This bears repeating because many people have gotten it wrong. Directory file sizes are always a multiple of the logical sector size - 2 kilobytes. Path table sizes are always the exact number of bytes that are contained in the path table which is typically not a multiple of 2k. You must not have blank directory sectors and the directory length must reflect the correct length of the directory file. The directory sectors always begin on a logical sector boundary. - 8.3 File names MS-DOS cannot handle longer than 8.3 filenames. If the CD-ROM filename is longer than 8.3, then the filename will be truncated. If this happens, two files that are not unique within 8.3 characters will map to the same filename. For example: filename1.txt will appear as filename.txt filename2.txt will also appear as filename.txt Kanji filenames are also limited to 8.3 or 4.1 kanji characters. Only shift-kanji filenames are recognized at present. To get kanji, you must specify a supplementary volume descriptor indicating you have kanji filenames. Contact Microsoft to find out how this is done. - Record Formats The extensions do not support any record formats so if the RECORD bit is set in the file flags byte in the directory entry for a file, it will be ignored. - Interleaving In the present version, the Extensions do not support interleaving so if the Interleave size and Interleave factor are non-zero, the file will ignore these fields and return erroneous data. - Multi-Extent Files Multi-extent files are not supported in the present version. Each extent of a multi-extent file will appear as a separate file with the same name. - Multi-volume Multi-volume disc sets are not supported in the present version. Directories that are located on another volume could potentially cause the Extensions to crash if searched and erroneous data will be returned for files that are located on another volume. - Coded Character Sets Only one coded character set or supplementary volume descriptor is recognized in the latest version. This is for shift-Kanji. - Version numbers Version numbers are not supported by the Extensions. The Extensions will strip the version string off the end of the filename so that two identical filenames with different versions will appear to have the same name. There is no way to specifically ask for any but the first instance of that filename. Two files with the same name and different version numbers have the same accessing problem as two files with longer than 8.3 filenames that have been truncated to the same filename. - Protection Protection bits are not used on MS-DOS. If the protection bit is set in the file flags byte in the directory entry for a file , it is ignored and normal access is allowed. - No XAR support At present, the Extensions ignore the contents of any XAR record. - Motorola format tables The additional copies of the path table and any values in "Motorola" format (most significant bytes using the lowest address values) are ignored at present. MSCDEX only pays attention to "Intel" formatted values. They should be included though for portability sake. - Multiple copies of the path table The Extensions presently only read and use the first copy of the path table. Later versions may check to see that copies of the path table agree. - Additional Volume Descriptor Records Boot records and Unspecified volume descriptors are ignored. The first standard volume descriptor found is the one that is used. Additional copies are ignored at present. - File flags The existence bit is treated the same as the hidden bit on MS-DOS. Some other operating systems may not handle the existence bit so you may not want to use it if you are targeting these systems. The directory bit for High Sierra is treated the same as the directory bit in MS-DOS. Files with the protection bit set are not found when searched for or opened. None of the remaining bits, (Associated/Record/Multi-extent/Reserved), are handled at present. Using files with these bits set will have undefined behavior. - Unique Volume Identifiers It is highly recommended that the volume identifier be unique. The Extensions use the volume identifier to do volume tracking and to double-check to see if the disc has changed. The more chance that users will have two discs with the same volume identifier, the more chance that this will confuse the Extensions and lead it to believe that the disc has not changed when in fact it has. It is also highly recommended that application programs use the volume label to tell if the cdrom disc has changed. The volume label for a CDROM on MS-DOS is obtained from the volume identifier field in the primary volume descriptor. The call to get the volume label is very inexpensive to make once the CDROM has been initialized and will cause no disc I/O to be done unless the media has changed. This is the best way for an application to tell if the disc it wants to work with is in the drive. The application software should not communicate with the driver or drive to determine if the media has changed or the Extensions may not learn that the disc has changed and will not reinitialize what it knows about the new disc. - Many Small Directories or A Few Large Directories As a rule, it is better to have many small directories that contain fewer files than one very large directory. The answer depends on your application's behavior because if you try very hard, you can thrash almost as badly with many small subdirectories as you can with one large subdirectory. Reading further will help explain. What makes the difference? For each file open, suppose you have 1000 subdirectories with 40 files, on average you'll read about one sector per file open and scan 1/2 of it. On the other hand, you could have 1 directory with 4000 files. On average, each file open in this large directory (about 100 sectors) will involve scanning about 50 sectors to open that one file. As long as it is very inexpensive to get to each directory through the pathtable, clearly it is much better to have many small directories. Further improvements can be made by grouping files that are related and will be opened together in each of these subdirectories so that as you open each successive file, the directory sector is very likely in the disc cache and this will help minimize hitting the CD-ROM disc. Putting each file in a separate subdirectory is extreme and will cost you because you will never gain the benefits of locating the next file in a directory sector that has already been cached and you will needlessly enlarge the pathtable. There is a limit though to how many subdirectories you may want because if there are too many you may end up thrashing on the pathtable sectors. Each pathtable sector holds pointers to approximately 100 to 200 directory files depending on the directory name lengths. If you have a pathtable that is 10 sectors long, you will want at least 10 sectors of memory buffers to hold the pathtable or you may risk re-reading sections of the pathtable on every file open which will be very costly. The most important point you can learn is that you can vastly improve your file open speed by making sure you have enough memory buffers. If you are repeatedly trying to scan a 10 sector directory file (approximately 400 entries) and you only have 4 sectors in the sector cache, the cache is going to work against you because you will end up churning it to death. If you allocate 14 sectors for example (/M:14), then the whole directory file will find its way into the cache and you will stop hitting the disc. The difference in speed may be several orders of magnitude. A safe bet is to recommend reserving as many sectors are in the pathtable plus the number of sectors for the largest directory plus 2. The last two are reserved for data sectors and internal dynamic data. This formula is complicated with multiple drives because the buffers are not tied to specific drives and are shared and because not all drives are active at the same time. Another rule, do not rely on the file system to do your searching for you. If you are performance conscious, finding a chunk of data by looking for it with a file name through the file system is expensive. 99% of the time, locating data through the file system is fine because the cost is a single one-time operation, but if this is repeated often enough, it may pay to do some of the work yourself. What can be better is lump everything into one big file and cache your own hierarchy, indexing, binary trees, or whatever searching scheme you choose to use to get you to the data you need rather than asking for the file system to tell you where it is. MSCDEX - Microsoft MS-DOS CD-ROM Extensions Version 2.10 MS-DOSifying your CD-ROM - Copyright (C) Microsoft Corp. 1989. All rights reserved - page {page|4} MS-DOSifying your CD-ROM - Copyright (C) Microsoft Corp. 1989. All rights reserved - page {page|4}