CXT (TM) C EXPLORATION TOOLS CFT (TM) C FUNCTION TREE GENERATOR CFT386 (TM) C FUNCTION TREE GENERATOR CST (TM) C STRUCTURE TREE GENERATOR CST386 (TM) C STRUCTURE TREE GENERATOR CFTN (TM) C FUNCTION TREE NAVIGATOR CSTN (TM) C STRUCTURE TREE NAVIGATOR Version 2.12, July 1993 Copyright (C) Juergen Mueller (J.M.) 1988-1993, Federal Republic of Germany (GER). All rights reserved world-wide. - 1 - DISCLAIMER OF WARRANTY THIS SOFTWARE AND ACCOMPANYING WRITTEN MATERIALS (INCLUDING INSTRUCTIONS FOR USE) IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY OR FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. IN NO EVENT WILL THE AUTHOR AND COPYRIGHT HOLDER BE LIABLE FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR OTHER DIRECT, INDIRECT, GENERAL, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OR INABILITY TO USE THIS PROGRAM (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, BUSINESS INTERRUPTION, LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) AND ON ANY THEORY OF LIABILITY, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. ACKNOWLEDGEMENT BY USING THIS SOFTWARE YOU ACKNOWLEDGE THAT YOU HAVE READ THIS LIMITED WARRANTY AND ACCOMPANYING REMARKS, UNDERSTAND IT, AND AGREE TO BE BOUND BY ITS TERMS AND CONDITIONS. YOU ALSO AGREE THAT THIS IS THE COMPLETE AND EXCLUSIVE STATEMENT OF AGREEMENT BETWEEN THE PARTIES AND SUPERSEDE ALL PROPOSALS OR PRIOR AGREEMENTS, ORAL OR WRITTEN, AND ANY OTHER COMMUNICATIONS BETWEEN THE PARTIES RELATING TO THE SUBJECT MATTER OF THE LIMITED WARRANTY. You are expressly prohibited from selling this software or parts of it in any form, circulate it in any incomplete or modified form, distribute it with another product (except on CD-ROM) or removing this notice. No one may modify or patch any of the executable files in any way, including, but not limited to, decompiling, disassembling or otherwise reverse engineering this software in whole or part. This software and documentation is Copyright (C) by Juergen Mueller Aldingerstrasse 22 D-70806 Kornwestheim Federal Republic of Germany (GER) The documentation may be distributed verbatim, but changing is not allowed. The informations and specifications in this document are subject to change without notice. THIS VERSION OF THE DOCUMENTATION, SOFTWARE AND COPYRIGHT SUPERSEDES ALL PREVIOUS VERSIONS. - 2 - LICENCE This version of CFT and CST is NOT public domain or free software, but is being distributed as SHAREWARE. Non-registered users of this software are granted a limited license for a 30-day evaluation period starting from the day of the first use to make an evaluation copy for trial use for the express purpose of determining whether this software is suitable for their needs. At the end of this trial period you should either register your copy or discontinue using CFT and CST. The use of unregistered copies of this software, outside of the initial 30-day trial, by any person, business, corporation, government agency or any other entity is strictly prohibited. This means that if you use this software, then you should pay for your copy. This software is NOT free, but you have the opportunity to try it before you buy it. Either pay for it, or quit using it. A registration entitles you to use your copy of this software on any and all computers available to you. If other people have access to this software or may use it, then additional copies or a site licence should be purchased. All users are granted a limited licence to copy CFT and CST only for the trial use of others and subject to the above limitations. This licence does NOT include distribution, selling or copying of this software package in connection with any other product or service or for distribution in any incomplete or modified form. Operators of electronic bulletin board systems and software servers (like INTERNET FTP-Servers) are encouraged to post CFT and CST for downloading by their users, as long as the above conditions are met. This package is expected to be distributed by shareware and freeware channels, but the fees paid for "distribution" costs are strictly exchanged between the distributor and the recipient, and the author makes no express or implied warranties about the quality or integrity of such indirectly acquired copies. Distributors and users may obtain the package directly from the author by following the ordering procedures in the REGISTER.DOC file. REGISTRATION REMINDER Unregistered copies of CXT programs like CFT or CST are 100% fully functional. I make them this way so that you can have a real look at them, and then decide whether they fit your needs or not. This work depends on your honesty. If you use it, I expect you to pay for it. When you pay for the shareware you like, you are voting with your pocketbook, and will encourage me and others to develop more of these kinds of products. THANK YOU FOR SUPPORTING THE SHAREWARE CONCEPT - 3 - TABLE OF CONTENTS 1 INTRODUCTION 2 PROGRAM DESCRIPTION 3 C-LANGUAGE IMPLEMENTATION AND C-PREPROCESSOR 4 C++ SOURCE CODE 5 ASSEMBLER SOURCE CODE 6 DATABASE GENERATION 7 PROGRAM LIMITATIONS 8 IMPROVING EXECUTION SPEED 9 COMMAND LINE SYNTAX DESCRIPTION 10 OUTPUT DESCRIPTION AND INTERPRETATION 11 INTEGRATION INTO PROGRAMMING ENVIRONMENTS 12 TOOLS FOR DATABASE PROCESSING 13 TROUBLE SHOOTING 14 REFERENCES 15 TRADEMARKS APPENDIX 1: PRECOMPILER DEFINES APPENDIX 2: RESERVED KEYWORDS APPENDIX 3: EFFICIENCY APPENDIX 4: SYSTEM REQUIREMENTS APPENDIX 5: INSTALLATION - 4 - 1 INTRODUCTION CFT and CST and their 32 bit protected mode versions CFT386 and CST386 are powerful program development, maintenance and documentation tools. They provide the programmer the ability to analyse the C source code of applications, no matter how big or complex they are. CFT and CST are also very useful to explore unknown source code and to get complete overview about its internal structure. The re-engineering of old and/or undocumented source code becomes an easy task with CFT and CST. These tools help the programmer to analyse, identify, locate and access all parts of a large software system. They are designed to support software reuse, maintenance and reliability. By preprocessing, scanning and analysing the entire program source code as a single unit, these programs build an internal representation of the function call hierarchy (CFT) and of the data structure relations (CST). The resulting output shows from a global perspective the interdependencies and hierarchical structure between the functions or data types of the whole, multi file, software project. Several features and options allow the user to customize the generated hierarchy tree chart output and to get a large set of useful informations about the source code. The hierarchy structure is always up-to-date because it relies on the original source code as the primary source of information. Written software documentation often differs from that what really has been coded, so the source code itself is the ultimate documentation. An important feature is the database generation. It allows the recalling of informations without reprocessing the source code. The database can again be read in by CFT and CST to produce different outputs or to add new files to the database. Special recall programs called CFTN and CSTN allow fast searching for items in the database. These programs can be used within any environment, for example on the DOS command line or from inside editors like BRIEF, QEDIT or MicroEMACS (DOS and WINDOWS), to provide a full software project management system with access to all functions and data types with just a keystroke. These features make a comfortable "hypertext source code browser and locator" system out of your editor. A project consisting of several files appears to the developer as if it were a 'whole-part' of software. A list of all functions/data types and source files can be written as formatted ASCII text files and can be used as input for other programs like word processors or spreadsheet calculators. A useful option of CST is the possibility to generate a source file with which size and byte offset calculations for structures/unions and their members can be performed. This option is useful especially to support any kind of error searching or hardware debugging, for example with an ICE, or if data structures have to be exchanged between different hardware platforms. - 5 - CFT can also be used to analyse "C"-like languages as they are used by several commercial programs. The macro programming languages of the BRIEF, EPSILON and ME editors are such languages and can be handled by CFT. CFT and CST have been used and tested since 1989 in several projects with applications ranging from single source files over medium sized projects (like CFT and CST themselves) up to very large software projects with hundreds of source and include files (mixed C and assembler code), more than 6 MB of source code, more than 200000 lines, 2000 functions and 500 data types. A lot of public available sources (e.g. GNU-C compiler, GNU-C library, GNU-EMACS, MicroEMACS, NCSA TCP/IP communication software package, SUIT - The Simple User Interface Toolkit, NIHCL - The National Institute of Health C++ class library, F2C Fortran-to-C translator, several projects from Dr. Dobbs Journal like DFLAT, BOB, XSCHEME) were processed (with sometimes surprising results!) during the development and have been used to test and improve the features, reliability, correctness, robustness and execution speed of CFT, CST and their related utilities. - 6 - 2 PROGRAM DESCRIPTION CFT builds a hierarchy tree chart of every function with the called functions in it's own function block. These functions are again used as a starting point for subsequent function blocks. Starting the tree chart with the "main"-function it will display the complete function flow chart and the function hierarchy dependency of the whole application with all user defined functions and the called library functions. Prototyped but never defined or called functions are also detected. Recursive calls of functions are recognized and displayed, even over several call levels. Repeated calls of previously displayed functions in the output tree chart are detected and a message will be given with a reference to their first appearance. This prevents the output of complete subtrees displayed earlier. Overloaded C++ functions and operators are recognized and displayed with the number of overloadings. CST acts similar to CFT but it works on data types like basic types, structures, unions, enumerations and C++ classes. CST builds a hierarchy tree chart of every structure and union data type with their internal elements and their related data types. If these data types are again structures, unions or classes, the substructures will again be displayed. CST recognizes data types defined by 'typedef' and derived from other data types. The type names corresponding to the same basic type are displayed in the output file as 'alias' names for their common basic data type name. Every feature of CFT like the detection of recursive declared structures and unions, references to previously displayed data types and others are available and act similar. Every function (CFT) and data type (CST) can be displayed with the name of the source file and the line number where it is defined. The output can be customized to display the tree chart as a call-tree ("CALLER-CALLEE"-relation: "WHO CALLS WHOM") or as a caller-tree ("CALLEE-CALLER"-relation: "WHO IS CALLED BY WHOM"). This feature allows the user to determine which functions are called from a specific function or which functions are callers of a specific function. The function and data type extraction from the source code is done by scanning and parsing the source. There is absolutely no need for the programmer to mark functions or data types of interest, for example with special keywords, starting the definitions at the beginning of a line or to use comments containing special marks, as it is necessary for other source code analysers and browsers. CFT and CST do not need these work-arounds, any source code can be processed without previous work. These tools are also compiler independent because they can be customized to support any kind of compiler. - 7 - Several useful informations and software metrics about the processed source code and the included files can be generated like - file size and comment size in bytes for every file, - number of source code lines for every file, - number of included files for every source file, - total effective number of scanned bytes and lines for every source file and its included files, - for every defined function the number of lines, the code and comment size in bytes, the number of bytes per line, the number of functions called, the number of flow control statements (if, else, for, while, case, default, goto, return, exit), the maximum brace nesting level and if the function is used only inside the file, - for every defined structure/union the total number of elements and the number of elements which are themselves structures/unions, - file function or data type reference list for every file, - total number of displayed, defined, undefined or multiple defined functions and data types, - location of all multiple defined functions and data types, - location of all overloaded C++ functions, - source file - include file dependencies for every source file, - final statistical summary for all files, - cross reference of every occurrence for every function or data type, - parent/children relationship for every function and data type, - critical function call path/structure nesting with deepest non-recursive nesting level (unlimited tree depth), - C++ class inheritance graph, and much more ... The resulting hierarchy structure chart is another representation for a directed call graph. A directed call graph consists of nodes (functions or data types) and connections (call relations) between these nodes. The number of nodes and connections which are necessary to transform the hierarchy structure chart into a directed call graph will also be calculated as an additional information about the system complexity. A large number of options to control the program execution and the output generation are available and can be defined on the command line, by command files or by defining them in an environment variable used by the program. CFT and CST can be directly invoked from inside editors or integrated development environments like the Borland C++ IDE. Detailed examples for the integration together with necessary macro or batch files are given. - 8 - CFT and CST are command line driven programs, there are no interactive menu driven versions available. For this kind of applications command line versions are the best choice, I think. There are several reasons for this decision. SAA-like versions would add too much overhead and use memory which can better be used in other ways by the programs. Another reason is that CFT and CST have also been designed for running in batch mode e.g. from within editors or MAKE files to automate analysing. - 9 - 3 C-LANGUAGE IMPLEMENTATION AND C-PREPROCESSOR The current ANSI C language standard X3.159-1989-ANSI C resp. ISO/IEC 9899:1990 (E) as described in several books about the C-language (see references) was used as a development base. The reserved keywords being recognized are not only the original ANSI C keywords but were also taken from several compiler implementations like Microsoft, Borland or GNU and their own special language extensions. The books "The C++ Programming Language" and "The Annotated C++ Reference Manual" (ARM) together with informations about the work of the ANSI C++ committee X3J16 resp. the ISO/IEC working group SC22 WG21 were used for the C++ keywords. Another major source was also the AT&T C++ release 2.1. Compiler specific extensions especially from GNU are also recognized. Proposed extensions to C++ like additional keywords (e.g. wchar_t) and the so called 'digraphs' will be supported if they are introduced into the C++ language standard. A complete list of all reserved keywords is show in appendix 2. The large set of keywords may lead to some slight problems in situations where a keyword is not used as itself but as an identifier name, for example a C++ keyword used as an identifier in C. During a normal file scan, precompiler defines are, if possible, handled as if a real precompiler would be present, but this can cause some trouble with '#if', '#ifdef' and other precompiler controls which are not evaluated. Also the block nesting level, which will be supervised by the source code scanner, may not be at level 0 at the end of the file because of such precompiler controls. To avoid such things, a built-in C-preprocessor allows the complete preprocessing of the source code and include files for several compiler types as an additional option (-P). Preprocessing or not is a little bit controversial because it can either result in a loss of information if macros are used to change the program behaviour and hide function calls, it can lead to errors during file scanning or it can change the function and data type informations obtained from the code which may not exactly correspond to the visible source code. Preprocessing can be an advantage or not, so the user has to decide whether he does it or not. The preprocessor handles the defines for Microsoft C 5.1, Microsoft C/C++ 7.0, Turbo C++ 1.0, Borland C++ 2.0, Borland C++ 3.1, GNU-C and Intel 80960 C compiler iC960 3.0 and all memory models (not necessary for GNU-C and I960) or CPU architectures for the Intel 80960 32 bit RISC processor (KA, KB, SA, SB, MC, CA). Other compiler types can be customized with the -B and the -D options. The default ANSI C predefined macros '__FILE__', '__LINE__', '__DATE__', '__TIME__' are generated for preprocessing. The macro '__STDC__' is NOT defined (some compilers test with '#ifndef __STDC__'), so that non standard ANSI C extensions in the processed code are allowed. Defining '-D__STDC__=1' forces ANSI C conforming output (if used by the scanned source code, of course!). A list of the precompiler - 10 - defines for the supported compiler types is shown in appendix 1. Features like the replacing of trigraphs and the recognition of C++ comments '//...' are also treated by the preprocessor. The precompiler recognizes several errors or possible sources for problems like - the use of undefined variables in precompiler controls, - misbalanced '#if...' control block(s) including the exact location (file, line) where the failing block started, - recursive called include files, - wrong number of macro arguments (missing ones or too many) and displays diagnostic messages with an exact description of the error or warning reason and its location in the source file. - 11 - 4 C++ SOURCE CODE Although CFT and CST were initially not developed to process C++ code it is possible to do so. In that case, however, some restrictions and limitations should be considered. The recognition of C++ classes by CST is limited because the handling of the internal class structure items is too complex to fit in the CST program. So classes are only referenced by name but their internal structure will not be scanned and displayed. The C++ class inheritance relationships are recognized and shown in a class hierarchy graph listing (option -b). Structures in C++ with function names as structure members will not be processed correctly. The use of overloaded functions with equal names but different parameters in C++ programs may lead to incorrect calling relationships. A correct handling of this feature requires a complete C++ source code analyser to keep track of the different calling parameters. If more precise informations about C++ code is needed, utilities like 'class hierarchy browsers' or 'class viewers', which are usually (or should be) part of C++ compiler environments, should be used instead. Because of the above described reasons, some care should be taken if C++ code is processed and displayed. - 12 - 5 ASSEMBLER SOURCE CODE As an additional feature, CFT can process assembler source code for the Intel 80x86 processors (MASM 5.1, TASM) and for the Intel 80960 RISC processors (or any other "AT&T UNIX-like assembler" like GNU) to get information about assembler procedures and functions being called from the assembler source files. The assembler source code scanner also detects and handles calls of include files. This feature is useful for mixed language programming. The processing of assembler macros, however, is not supported, the preprocessing option (-P) works only with C source code. Assembler source files are recognized by their file extensions '.ASM' and '.S', there is no other way to force a file being processed as an assembler file. The following naming convention is used: For '.ASM' assembler files (MASM, TASM) all identifiers are treated case-insensitive and will be transformed to lower case characters, but identifiers in '.S' (GNU, I960) assembler files are treated case-sensitive. This means, that an assembler function 'func1' defined in an '.ASM' file can be called from C-source by 'func1', 'FUNC1', 'Func1' or any other lower and upper case character combination. If 'func1' is defined in an '.S' file, the name must match exactly. The first leading underscore of a function name will be removed to get exact naming matches. Type modifiers in C source code like 'cdecl' or 'pascal' will not be considered. Remember these conventions when processing C and assembler files. Assembler code statements (inline code) inside C source code will not be processed and will be skipped, because it is too difficult to handle the several kinds of syntax being used for this like 'asm ...', 'asm "..."' or 'asm(...)' and the different keywords ('asm', '_asm', '__asm', '__asm__', ...) used by various compiler implementations. - 13 - 6 DATABASE GENERATION One of the most important features provided by CFT and CST is the database generation which can be enabled with the -G option. It is performed after writing the output file to save all informations about the processed files in a set of dBASE compatible database files (extension '.DBF') for later use. These database files contain all necessary informations like function or data type names, the location where they are defined, their caller/callee relationship, all scanned files with statistic informations, include files and so on. It was tried to store the informations in the most compact and effective database structure to save disk space. Note that if the contents of the database files is manipulated by external tools like dBASE or something else, the internal consistency will be corrupted and wrong or unexpected results will happen! The database can be used to recall informations, for example to find out, if and in which file and on which line a specific function or data type is defined. A previously generated database can be read into CFT and CST (option -g) to add new files to it and/or to produce another output file with new configuration options, for example with the reverse call tree or only with a special selected item of interest to be displayed. Such an incremental database generation is also useful if large projects can be divided into a set of commonly used files and project specific files. A good example for this is the GNU C compiler, which consists of a set of language independent files and three language dependent file sets for C, C++ and Objective-C. To analyse this software with CFT or CST, the language independent part can be stored into a database which is later reused for the language dependent parts to build the complete set of informations. The ability to retrieve informations about the sources from the database is quite useful in many cases. Recalling informations from a database is much faster than processing all the sources again to find a specific item of interest. The documentation and maintenance of large software projects is much more effective and easier to do if the developer has a tool to navigate through the source code and that helps him in his comprehension of the program and its internal structure. It is also useful for reverse engineering of source code to get an overview of the internal program structure. Together with user programmable editors it is possible to offer the user a source code browser with a hypertext like feeling by integrating database recalling functions into the editors. Two utility programs, called CFTN and CSTN to, retrieve informations from databases, are available with supporting macros for their integration into the BRIEF, QEDIT or MicroEMACS editor, which are described in another section later in this manual. - 14 - 7 PROGRAM LIMITATIONS First of all, CFT and CST cannot replace a compiler or a syntax checker like 'LINT' to detect errors in the source code. This means that it should be possible to compile the source code without fatal, catastrophic errors before it is possible to analyse it with CFT and CST, otherwise the processing results may be incorrect. However, there are some situations where CFT and CST can be useful to detect bugs and inconsistencies in the source code like - multiple definitions of functions or data types, - different function return types, - implicit declared functions with no prototype, - function definitions used as prototype, - recursive, nested, hidden and frequent calls of include files, - unclosed strings or character constants, - nested comments, - misbalanced braces, - unexpected end-of-file characters inside files, - illegal characters in the source code, - wrong number of macro arguments, - missing macro arguments, - misbalanced '#if...' control blocks. These code checks are done on multiple files so that inconsistencies between different files can be found and displayed. This is a capability which conventional compilers working only on a single file at a time cannot provide and will miss therefore (maybe the linker will find some of these inconsistencies). Some statistical informations about the source code may not be correct if preprocessing is enabled (-P). This affects all options which do statistics like the -p or -s option. The size of the 'pure' source code may not be correct due to macro expansion or removing of unnecessary blanks. However, the file size is always correct because it will be taken from the source file. Most of the program limitations are given by the limited number of available memory. This means that the more conventional main memory you have, the better it is. The real mode versions of CFT and CST do not use expanded or extended memory, no virtual memory management or disk file swapping, so keep your conventional memory free of memory consuming TSR programs and other utilities if you want to process a large number of files. The use of operating systems like MS-DOS 5.0 or DR-DOS 6.0 and memory managers like QEMM or 386MAX to get more free conventional memory may help to handle big applications with a large number of files. If memory problems still occur during processing, there is an easy way to break the memory limits: use the 32 bit protected mode versions of CFT and CST, called CFT386 and CST386. These programs are running in protected mode and so they have no memory limitations and are faster than the real mode versions. - 15 - The number and the sizes of files to be processed is nearly unlimited with 2^14 files and 2^31 bytes maximum file length. Each file can have 2^16 lines. The number of functions and data types being handled is limited to 2^14. Note that these values are given for the real mode versions, the protected mode versions exceed them. These limitations should be enough even for the biggest projects that could be mentioned. The calling of nested include files is limited by the number of files which can be opened simultaneously (operating system resp. compiler dependent). The ANSI C minimum for include file nesting levels is 8, this demand will be fulfilled by CFT and CST. The integrated C-preprocessor limits the size of expanded macros to 6 Kbytes. The number of macros simultaneously defined is unlimited (ANSI: 1024) and only affected by the number of available memory. The number of macro parameters is limited to 31 (ANSI: 31) and there are up to 31 significant characters (ANSI: 31) recognized. The conditional compilation nesting levels of '#if...' control blocks is limited to 32 (ANSI: 8). The line length is unlimited (ANSI: logical line length is 509 characters). The number of characters in a string (including '\0') is 2048 (ANSI: 509). The number of members in one structure/union is unlimited (ANSI: 127), the number of structure/union nesting levels is unlimited (ANSI: 15). The calculation depth of the critical function call path or structure nesting level is unlimited. The calculation is an extremely recursive function and was tested up to more than 110 nesting levels (protected mode version). CFT cannot recognize and reference a function if it is used with its pure name, without parentheses. This happens if a function name is assigned to a function pointer variable or used as a function pointer argument in a function call. CFT will be confused in some rare cases by extensive type-casting operations like 'void __based(void) * __cdecl ... ()' and will display unexpected messages. A function prototype declaration inside a function block ('function given scope') will not be recognized by CFT. In assembler source code, some definitions of local variables seem to look like a function or a label definition and are treated by CFT like that although this may be wrong in some cases. It is also not always possible to detect a call of a local label correctly. CFT sometimes displays warning messages about 'return type mismatch' though this may be correct in that special case because the different types are earlier defined by a 'typedef' declaration. The reason is simply that CFT doesn't recognize these 'typedef's (but CST does!), it looks only for function names. An often requested feature for CST is the integration of the calculation of structure/union sizes with byte offset informations for every structure/union member. This feature is not implemented in CST although it would be possible to do this - 16 - because all necessary informations are present. The reason is that there would be too much overhead for CST to treat the various compiler implementations with their different basic type sizes (sizeof(int), sizeof(long double)) for different processor types (16 bit, 32 bit, 64 bit, ...) and data type alignment requirements (by default and also controlled with #pragma's like 'align' or 'pack'). It would be possible to do this for just one selected compiler implementation or processor type but not for a great number of them. Especially compilers for advanced architectures like RISC processors have very complicated type alignments rules depending on the data types, alignment pragmas, compiler switches, type sizes, available register number and register sizes and resulting structure/union/class sizes to generate highly optimized code. This includes usually the insertion of 'fill' bytes inside a structure/union and sometimes 'padding bytes' at the end of a structure/union to force aligned sizes on specific byte boundaries (For examples see the reference manual of the Intel 80960 C-Compiler iC960, release 3.0). Because of these reasons, an integrated 'byte offset calculation' is not implemented in CST. Instead, you can generate a source file for selected data types with option -O, that performs these calculations, if you compile the generated file with your C compiler. For further informations see the description for option -O. SUMMARY The above described limitations can lead in some situations to misinterpretations or loss of informations of the scanned source code. The only way to avoid these lacks would be the inclusion of parts of a 'real compiler' to handle the complete C (and C++) syntax in any possible situation. But this was not the intention when the developing of these programs as 'little' and easy to use programming supporting tools began. Although I hope that CFT and CST will in most cases be powerful development and documentation tools! - 17 - 8 IMPROVING EXECUTION SPEED CFT and CST are disk storage based programs because the source and include files, the intermediate precompiler file and the output file must be read from and written to hard disk. This means that the execution speed of CFT and CST depends at first on the speed of the physical storage medium and not (only) on the speed of the CPU. There are several ways to improve the program performance: - install a RAM-disk and a) start CFT and CST from there so that the intermediate file and the resulting output file will be stored there (but don't forget to copy the output file to the hard disk before power-off), or b) use the -v option to redirect only the precompiler output file (scanner input file) to the RAM-disk from anywhere the program is started (the RAM-disk must be large enough to hold the largest possible temporary file, otherwise a disk-write error will occur), - use a hard disk cache program like SmartDrive, HyperDisk or PC-Cache, - use a faster hard disk, - and finally, of course, use a faster and more powerful CPU. The most effective combination is option -v with a RAM-disk as destination path and hard disk caching together with a fast hard disk drive. If the disk cache is large enough to hold most of the frequently called include files, the execution speed is about 2.5 to 3 times faster than without. This is a significant speed-up especially for projects with a large number of files and many included files in each source file. During program execution with preprocessing (option -P), most of the time will be consumed to preprocess the given input files and the related include files and to generate the preprocessor output file. The scanning for functions (CFT) or data types (CST) takes only a small amount of time. The function/data type relations are computed while the output is generated and written to disk, there is no precomputing necessary. The function for critical call path/nesting level detection depends only on the number of functions or structures and not on the call/declaration nesting complexity. The execution time grows linear with the number of items (functions/structures) to process and is very fast! Be aware of the fact that the processing of a large number of files can take quite a long time (from several minutes up to hours on lower performance machines!), especially if option -P for preprocessing is enabled. The generation of the output file and writing to disk can also take some time if the number of items to display is large and the nesting structure is complex or if there is no cross reference option enabled (see -x and -r for further information). If the - 18 - number of items is very large, one of the most time consuming options is the function/data type file reference (option -z). The writing and reading of the database files (options -G and -g) takes also some time due to the large number of different informations. Don't panic if there seems to be no disk access for a longer time, the reason is just that there may be time consuming computations and that the output will be buffered internally to reduce the number of disk accesses and therefore speed up the output! For more detailed informations about the program efficiency see appendix 3. - 19 - 9 COMMAND LINE SYNTAX DESCRIPTION CFT and CST are command-line driven programs. This section gives a complete overview about all command line options and their syntax. It gives also remarks for their use and shows several examples with detailed descriptions. There are no differences between the real mode and the protected mode versions of these programs. This section of the documentation should be read very careful by all users to get a complete overview about all the features which are provided by CFT and CST. SYNTAX: CFT [options [$cmdfile]] <[+]file> <@filelist> CST [options [$cmdfile]] <[+]file> <@filelist> OPTIONS: (valid for) -R (CFT, CST) By default, CFT and CST generate the hierarchy tree chart of the called function/data type ("CALLER:CALLEE relation", "WHO CALLES WHOM"). The -R option produces an inverted listing showing the callers/users of each function/data type. It generates the output as the function/data type hierarchy member list tree chart in reverse order as a list of calling items of the referenced basic item ("CALLEE:CALLER relation", "WHO IS CALLED BY WHOM"). This option is useful to get the relations between functions/data types and their callers/users. -x (CFT, CST) Cross reference in case of multiple use. Every function and data type will be given a unique reference number which will furthermore be used as an identifying reference number for the function or data type if it is again displayed. -r (CFT, CST) This is almost the same as option -x, but an additional file reference with the file name and the line number of the declaration will be given (includes -x). The -r or the -x option is strictly recommended and should be used as a default option, because without it, every function/data type will be completely redisplayed, including the underlying subtree of functions or data types, whenever it occurs in the output tree chart and so the resulting output file will grow immense, up to several megabytes, if there is enough disk space to write the output file. -m[name] (CFT) Start the function tree chart dump with function 'main' (-m) or 'name' (-mname), name is case sensitive. If -m+ is specified, the output starts with the topmost function, this is the function which is in the highest level of the hierarchy tree chart. If this option is not set, the default is lexicographical order of the displayed functions. - 20 - Usually, the complete function tree chart should start with the 'main' function so that every subfunction is a (sub-)member of 'main'. This option is useful for windows programs to start the output with the initial 'WinMain' function (-mWinMain) instead of 'main'. It can also be used to start the output with the initial assembler start-up code being executed before the 'main'-function is called. -mtype (CST) Start the data type tree chart with data type 'type' (-mtype). If -m+ is specified, the output starts with the topmost data type, this is the data type which is in the highest level of the hierarchy tree chart. The default output is in lexicographical order of the displayed data types. Useful if a selected structure/union should be displayed at the beginning of the output file. -a (CFT, CST) List every function/data type, also previously referenced functions/data types. This generates a complete list of every function/data type in lexicographical order with references to their first location. -f (CFT, CST) Generate an output list in short form, only with the function/data type names, no further description of the internal function/data type elements. -iname (CFT) Ignore function member 'name' in output tree chart. It will not be displayed and will be skipped instead if found as a function member. This option can be useful if, for example, functions are used only for test purposes and are of no further interest for the user and should be ignored in the output tree chart. -u (CFT) List undefined functions. These functions are probably library functions, defined in other files which have not been scanned or are unresolved externals found by the linker. -V (CFT) List prototyped functions which are neither called nor defined (option -a and -u). This option is useful to find unused function prototypes which could be removed from the source code. -l (CFT) List a function only once in case of repeated consecutive calls. If a function is called more than one time inside a function without any other call in between, there will be only one reference of that function call in the output tree chart. This option results in shorter output files. - 21 - -n[a] (CFT, CST) Display the most critical function call path respectively display the data structure/union with the maximum nesting level. The modificator 'a' is used to display every function/structure with its users/callers. This option helps to determine the complexity of the function call/data structure hierarchy and finds recursions over several call/nesting levels. Note that for functions the maximum call path being displayed is the result of the static source code analysis. During program execution the call path can be even deeper if functions are called indirectly with function pointers. -c[s] (CFT, CST) Display the number of calls to each function/data type, 's' sorts by the number of calls (default order: lexicographical). Useful to find out which functions/data types are never called/used (maybe unnecessary and deletable) and which ones are the most frequently called/used (together with profiler results a subject for further optimization efforts). -Z[s] (CFT, CST) Display every caller and member for each function/data type, 's' sorts by the number of calls (default order: lexicographical), this is an extension of the -c option. This option shows the relations in the following form: List of parent functions/data types: 1. caller (reference #) <# of calls from> ... n. caller ... function/data type (reference #) <# of calls from parents, # of calls to children> List of child functions/data types: 1. called member (reference #) <# of calls to> ... m. called member ... This compact form lists all callers and members with the number of their calls, recursions are detected and displayed. -z (CFT, CST) Generate a function/data type call cross reference table. For every function/data type the location of its definition (file, line) and a complete list of its calls/references, sorted by files and line numbers is given in the following form: 1. function/data type (reference #) [file #], line # [file #]: line #, ... ... 2. ... ... The functions/data types are displayed in lexicographical order. At the end of the section is the cross reference file list. - 22 - -b (CST) Display the C++ class inheritance relationships. This option generates two listings. The first one displays the complete C++ class hierarchy graph(s). The second one shows for each class first the superclasses from which the class inherits and the access restrictions (public, protected, virtual, ...) and second the subclasses which inherit from the given class, also with access restrictions. This option is useful to find out things like the class dependencies or multiple inheritance. -C[s] (CFT, CST) List the function/data type contents for every processed file, 's' sorts by line numbers (default order: lexicographical). There are additional informations possible with the option -s. CFT informs if none of the functions defined in a file is called from functions defined in other files (internal versus external linkage). Functions for which no external caller outside the file is found will be marked [INTERNAL], such functions are candidates for defining them as 'static'. Attention: Calling a function by a function pointer won't be noticed! This information is useful to find out whether the contents of a file is unnecessary for the project so that the file must not be linked. This option gives useful informations about source code metrics for every defined function. -y (CFT, CST) Display cross link list of files which contain referencing and referenced functions/data types of functions/data types of a specific file. This option shows the relations in the following form: 1. referencing file ... n. referencing file file 1. referenced file ... m. referenced file This option is useful if you want to find out the file relationsships. This information can be used to isolate specific files from a project, e.g. library files. It is also useful if you want to separate a function and want to know which other files are needed because they contain called functions. -S[..] (CFT, CST) Specify name (-Sname) or file with names (-S@namelist) of functions/data types to search for and to dump if present, names are case sensitive. These items are listed first in the output tree chart file. By using -S on the command line, it is necessary to surround a data type name that consists of two words with double quotation marks like "struct _iobuf" to connect the two words. This is not necessary inside a list file, but there every search name must be on a separate line. - 23 - -D[..] (CFT, CST) Specifies macro name(s) (-Dname or -Dname1=name2) or file with macro names (-D@namelist) of functions/data types which should be predefined and linked together, also used as preprocessor define if the integrated preprocessor is called (-P). The defined names are case sensitive and trigraph translation is performed on them. The definition of a string as replacement for a macro name is different on the command line and inside a macro definition file or command file (marked with '$'). On the command line, the double quotation marks must be 'escaped' and the string must be quoted like '-DXYZ="\"123\""' (similar to C strings) to work correctly, the reason is the DOS wildcard expansion of the command line. Inside a macro definition or command file, the double quotation marks need not be 'escaped', so the definition can be written like '-DXYZ="123"'. This option cannot be used in environment defines if the equal sign '=' is used because this produces a syntax error for DOS when trying to store a 'SET=...' command with a second equal sign in one line. If a define item consists of two words see the notes at option -S for a description. Keep these differences and exceptions in mind to avoid unexpected results using the -D option. -U[..] (CFT, CST) Specifies a predefined macro name (-Dname) or file with predefined macro names (-U@namelist) to be undefined for preprocessing. Note that the default predefined macro names '__FILE__', '__LINE__', '__DATE__', '__TIME__' cannot be undefined. All other predefined names for the various compiler types can be undefined. Like for -D, the names are considered case-sensitive, but trigraph translation is not performed because the internal representation cannot contain trigraphs. -O[..] (CST) Specifies name(s) (-Oname) or file with names (-O@namelist) of data types for which the calculation of structure/union sizes with byte offset informations for every data type member should be performed. Additionally specifying -O+ sets a flag for the recursive collection of sub-structures during expansion which are displayed without specifying them by -O. This means that if a structure/union consists of members which are also structures or unions, and so on, it is not necessary to specify all these data type names with -O to enable them for byte offset calculation. Instead, you have to specify only the top most data type with -O and additionally -O+ to force CST to select all related sub-types for displaying. If -O+ is set but no names are specified, all structures and unions will be used for byteoffset calculations. As the result of this option, CST generates a C source file, called 'CST_OFFS.C'. This file needs some additional editing to declare necessary include files, data types, defines or pragmas before it can be compiled with the C compiler for which the file was generated (be sure to use the same includes!). The resulting executable prints for every structure/union member the byte offset relative to the beginning of the structure/union (decimal and hexadecimal) and the size of each member, the resulting - 24 - structure/union size and also informations whether a structure/union member has been aligned (= compiler dependent insertion of fill bytes before that member) or if the structure/union was padded with fill bytes at the end of it to align the size to a specific length. To get these informations and to perform the necessary calculations therefore, the source file 'CST_OFFS.C' makes extensive use of the C macro programming capabilities, which may lead in some rare cases to an error in the compilation of the resulting file due to the internal limitations of some C compilers. The -O option is very useful if you need detailed informations about structures/unions in case of error searching and debugging, especially for hardware debugging with an ICE. It is also useful for finding out the differences in the internal layout of a structure/union in the case of porting C source code between different compilers and/or operating systems or if data structures are exchanged between different hardware platforms, for example with data communication. You can verify if the expected structure/union layout and size is really produced by the target compiler. -I[path] (CFT, CST) This option enables the scanning of include files declared with '#include "..."' or '#include <...>'. The required path for the include files is taken from the INCLUDE environment variable (default) or can be user defined by 'path'. Paths defined with -I will be searched before any other paths taken from environment variables specified by -E or -P, so care should be taken with that option. Using the -I or -E option without -P allows the scanning of the source file and the included files without preprocessing. In that case, preprocessor controls like '#if ...' are not evaluated and can lead to unexpected results. -Ename (CFT, CST) Almost the same as -I, but the path for the include files will be taken from the environment variable 'name'. Typing -EINCLUDE would produce the same results as -I alone. -P[name] (CFT, CST) Run the integrated C preprocessor before the file scan. In this case the default include path is taken from the INCLUDE environment variable, from the user defined 'name' environment and additional paths from -I and -E option are used. If special paths should be searched before the default paths, they must be specified by the -I path or the -E environment option and they must be placed on the command line before the -P option to be processed first. The -D, -U preprocessor defines and -T type and memory model and -B size infos are also used, if defined. The path for the preprocessor output file can be specified by the -v option, the default is the current working directory. The comments in the source and included files will remain until -q is defined to remove them. The comments are used for statistics with - 25 - option -p. If option -C++ is set, the macro '__cplusplus' will be predefined before preprocessing to enable C++ macros and C++ comment recognition. If you are using a compiler which is not supported by CFT and CST or the build-in preprocessing doesn't satisfy your needs because the results seem to be different from your preprocessor, you can do the following: preprocess the files you want to analyse with your own compiler preprocessor and use these preprocessed files as input for CFT and CST. -Ttype,m (CFT, CST) Use this option to set the compiler type for source code preprocessing to one of the following types: MSC51 Microsoft C 5.1 MSC70 Microsoft C/C++ 7.0 TC10 Borland Turbo C++ 1.0 BC20 Borland C++ 2.0 BC31 Borland C++ 3.1 GNU GNU-C I960 Intel 80960 iC960 3.0 The supported memory models are T(iny) (valid only for MSC70, TC10, BC20, BC31), S(mall), M(edium), C(ompact), L(arge), H(uge), 'L' is assumed as default if no model is specified. GNU-C and Intel iC960 do not need a memory model because they compile really 32 bit code, but the Intel iC960 compiler requires the definition of the 80960 RISC processor architecture which is one of KA, KB, SA, SB, MC, CA (default is KB). This option causes several compiler dependent preprocessor macros (if they were known to me, however) to be defined before preprocessing starts. This option can only be used with the -P option, otherwise it has no effect. If your compiler is not supported, you can perform the following steps: Find out which preprocessor defines are necessary (manual, help file) and declare them with option -D, then declare, depending on the selected memory model or processor architecture, the type sizes with option -B. -Bsizes (CFT, CST) Redefine the basic type sizes and pointer type sizes (all values must be declared in bytes) for conditional preprocessor controls with the 'sizeof()' keyword like '#if sizeof(int) == 4'. This option is only valid with the -P option. The required format for this option is -Bv,c,s,i,l,f,d,ld*data,code | (delimiter between data and pointer sizes is '*') with the following types and their respective default data size values in bytes (the pointer type sizes are model dependent): - 26 - v : void (sizeof(void) is usually 0, but for GNU-C it is 1) c : char (1 byte) s : short (by definition 2 bytes, hardware independent) i : integer (hardware dependent, 2 or 4 bytes) l : long (4 bytes) f : float (4 bytes, IEEE format) d : double (8 bytes, IEEE format) ld : long double (10 bytes, IEEE format, some compilers assume long double == double (= 8 bytes), some CPU's and their compilers have special alignment requirements like the Intel 80960, where sizeof(long double) is 16 bytes due to register and memory access requirements and structure alignment) data : data pointer (type pointers, 2 or 4 bytes, memory model dependent) code : code pointer (function pointers, 2 or 4 bytes, memory model dependent) The sizes of signed and unsigned types of the same basic types are considered equal, this means that, for example, the following expression is true: sizeof(unsigned int) == sizeof(signed int) == sizeof(int) The sizes of type pointers to data and function pointers to code are also considered equal, this means that, for example, the following expressions are true: sizeof(int *) == sizeof(float *) sizeof(int (*)()) == sizeof(float (*)()) A 64 bit (8 bytes) integer type like 'long long' or 'bigint' (or something else) is not supported because there are no C compilers known to me which use such a type although some (co-)processors and their assemblers are able to handle it (see Intel 80960 assembler manual for examples). If the -B option is not set, the default values for the various memory models and compiler types (as they are known to me) are used, the assumed target hardware has an Intel 80x86 microprocessor. Note that during preprocessing type modificators like "near" or "far" are not recognized. If the -B and the -T options are not set, the sizes of data pointers and code pointers are always considered equal: sizeof(int *) == sizeof(int (*)()) (= 4, large model) For example, -B0,1,2,2,4,4,8,10*4,4 would be the correct declaration for MS-C 7.0, large/huge memory model, with the values for data types (void = 0, char = 1, short = 2, int = 2, long = 4, float = 4, double = 8 and long double = 10 bytes) and pointers to data types and function pointers (all values 4 bytes). These values are set automatically by defining -TMSC70,L (or -TMSC70,H) as compiler type and memory model description for preprocessing. - 27 - -q (CFT, CST) Remove comments from preprocessed files, default is don't remove. This option is only valid with option -P, it also affects the -p option because counting comments is not possible and calculations on them cannot be done. -M (CFT, CST) This option generates a source file/include file dependency table for every processed file. This table shows the dependent include files of a source file and can be used to build a MAKE file. It is also useful to check if the included files are taken from the correct directories. -p (CFT, CST) Calculate the program code/file size ratio for every file and make a final summary. This option gives a short overview about the 'real' file contents versus complexity. The computed value is in the range from 0.000 (only comment, no code) to 1.000 (only code, no comment). Used together with -P, the results may not be absolutely correct because of the macro expanding and removing of parts of the source code by '#if...' control blocks. If preprocessing -P is enabled, comment byte count in included files will not be performed. If option -q is set, -p will not calculate values related with comments. -s (CFT, CST) Used with -C, this option gives additional informations. For CFT for every function: the number of lines for the function body, the maximum brace levels, the number of bytes for the function body and the number of comment bytes inside the function body. The average values for every source file are computed and displayed. For CST for every data type: number of type elements, number of subelements (nested structures/unions). -dn (CFT, CST) Set the maximum function/structure/union nesting level for output generation to 'n' (the maximum value of n = 999 is used as default). This means that the request for displaying a deeper level will be rejected and the output tree chart will be truncated at the given level. -vpath (CFT, CST) Set a specific path for the intermediate precompiler output file. This option is useful to speed up execution speed when the intermediate file can be stored on a RAM-disk so that file access to the precompiled file is much faster than on a hard disk. Environment variables like 'TMP' or 'TEMP' to set the path for temporary files are not evaluated. -ofile (CFT, CST) Write the generated analysis results to file 'file'. The default file names are 'CFT.LST' for CFT/CFT386 and 'CST.LST' for CST/CST386. Possible overwriting of an existing output file with the same name other than the default one will be detected and prompted for user reconfirming. The resulting output file is an - 28 - ASCII text file with no formatting characters which can be printed with every printer, viewed and/or edited with every text editor and taken as input to word processors, for example for documentation purposes. -N (CFT, CST) Disable the writing of an output file. This option can be useful if, for example, only a database (option -G) should be generated with CFT or CST and no output file is required. In that case the sometimes very time consuming process of output file writing is skipped. Note that for CST the writing of the byte offset file "CST_OFFS.C" will not be affected by this option. -L[L][+] (CFT, CST) Redirect the screen output to a file, called 'CFT.LOG' resp. 'CST.LOG'. If '+' is set, the output is both written to screen and redirected to the log file so that the output messages can both be viewed as they appear and later analysed. Finally, -LL resp. -LL+ appends the output to an existing file, this can be useful if CFT and CST run in batch jobs. -Wlevel (CFT, CST) Set error and warning message level. Higher warning levels include lower ones. Possible levels are: 0 : all error and warning messages are suppressed except absolutely catastrophic fatal errors, 1 : display serious errors or warnings, 2 : includes level 1 plus additional errors and warnings, 3 : includes level 2 plus errors/warnings/remarks, 4 : includes level 3 plus warnings about implicit declared functions and lacks of type or storage class. The following levels affect only preprocessing: 5 : includes level 4 plus warnings and errors during preprocessing (non-fatal errors and warnings during preprocessing are otherwise not displayed, preprocessor is running in 'silent mode'), 6 : includes level 5 plus remarks/slight warnings during preprocessing. The output format for messages during file scan is file name(line): error: description file name(line): warning: description and during preprocessing (warning levels 5 and 6) preprocessor: file name(line): error: description source line preprocessor: file name(line): warning: description source line - 29 - -C++ (CFT, CST) Enable C++ source code processing. This includes the handling of C++ comments '//...', the recognition of C++ keywords and the definition of the macro name '__cplusplus' for preprocessing. If a supported compiler defines additional macro names like '__TCPLUSPLUS__' for Turbo-C they will also be defined before preprocessing. Option -C++ is strictly required to process C++ code correct. -X (CFT, CST) Assume a UNIX-style text file: no CR, only LF. The default is a DOS-style text file with CR+LF. Any other combination like CR in UNIX-files, CR without following LF or LF without preceding CR in DOS-files will cause a warning message. This option is useful to detect possible conversion errors between different operating systems or incorrect editor configuration settings. -Y (CFT, CST) Ignore CR+LF checks. This option disables all checks which are done for unexpected CR+LF combinations in DOS or UNIX files. If option -Y is set, option -X will be ignored. This option can be useful if there would be too many messages concerning that error or if this message would be of no interest for the user. -F (CFT, CST) Use only ASCII characters for the tree chart output instead of the default semigraphic characters. This option is useful if the generated output file should be printed on a printer which does not support semigraphic characters like they are defined in the IBM character set. It can also be used to prepare the output file for use in a WINDOWS application like MicroEMACS if there is no font with semigraphics available. -e[char] (CFT, CST) Generate formatted ASCII text files with function/data type list and file list. All entries are separated by the optional 'char' character, if 'char ' is not defined, the tabulator character is used. If spaces are wanted as separating characters, you have to write -e" ". Such prepared files can be used directly as input to other programs like word processors (e.g. MS-WORD for WINDOWS) or spreadsheet calculators (e.g. MS-EXCEL), for example for documentation purposes. The following files are created: CFTITEMS.TXT: Contents: function name, return type, file name, line #, total # of function bytes, # of function comment bytes, # of function lines, # of control statements, # of brace levels CSTITEMS.TXT: Contents: data type name, file name line # CFTFILES.TXT and CSTFILES.TXT: Contents: file name, # of lines, file size in bytes, # of comment bytes, # of functions/data types - 30 - -G[name] (CFT, CST) Generate a database with the complete set of informations about the processed sources. The additional parameter 'name' (path and file name) is used as an unique base name for the set of database files (up to 6 significant characters), a default name ('CXT') is used if no name is specified. The generated database files (extension '.DBF') are dBASE compatible. There are two additional files created, one with the command line options (extension '.CMD') and one with a list of the source files (extension '.SRC') being use for database generation. They can be used as command line definition files with '$' (command list) and '@' (file list). As a result of the database generation you will find files named 'CXTxy.ext' (default name 'CXT') respectively 'namexy.ext' (user defined 'name'), where 'x' will be 'F' for CFT or 'S' for CST and 'y' is replaced by an internally used character to mark the different database files and their contents. -g[name] (CFT, CST) Read a previously generated database (see option -G). The additional parameter 'name' (path and file name) is used as an unique base name for the set of database files (up to 6 significant characters), a default name ('CXT') is used if no name is specified. Every source file will be tested for changes of file creation time and file size and a warning message will be given to inform the user. -? (CFT, CST) -h[elp] -H[elp] Shows the command line syntax and gives a short, but complete help information about the accepted commands and their syntax. cmdfile (CFT, CST) Specifies a file with (additional) command line options. This might be useful if the command line would be too long because of the number of options and files declared or if you are usually using the same options which can then be stored in a command file. The initial '$'-character is required to mark a command file. [+]file (CFT, CST) The name of a source file to be processed. More than one file can be specified on the command line. The default assumption for the given files is that they contain C source code. Assembler source files are only recognized by the file extension '.ASM' (80x86 MASM/TASM) and '.S' (Intel 80960, GNU). The '+' sign indicates that, starting from the given directory, all subdirectories should be searched recursively for the given file name search pattern. This addition is useful if a large software project is divided into several modules with separate subdirectories for each module. In that case only the starting - 31 - (root-)directory with the requested file name search pattern must be specified to search the current directory and all subdirectories. If the file name or the include file specification inside a file contains a relative path ('./', '.\', '../' or '..\') it will be translated into an absolute path starting from the current working directory respectively in case of include files depending on the path of the parent file. Command line wildcards '*' and '?' are possible and will be accepted. filelist (CFT, CST) A file with a list of source file(s) to be processed, wildcards are accepted. The list file should have every file on a single line. The rules for files containing assembler code and path translation are described above. The initial '@'-character is required to mark a filelist file. The '+' sign for subdirectory processing is also possible inside the filelist file. - 32 - REMARKS ON USING OPTIONS None of the above described options is predefined so it's up to the user himself to customize his preferred processing and output style by adding control options needed therefore. This seems to be the best way to give the users the freedom of making their own decisions about the features they really need for doing their work. Some of the above described options should be regarded and used as 'default' options to generate a readable, complete and useful output file without unexpected side effects. So the minimum default command lines should be CFT -m -ra CST -ra Both command sets generate a complete listing containing all items with file name and line reference and a cross reference id for repeated use (options -ra). The option -m for CFT forces the output to start with the 'main' function (if found). The precompile option -P is not strictly necessary though for exact results it should also be set together with the -T option. The standard default command line might be CFT -m -rauspMP -T -cs -Cs -na -Zs -G CST -rapMP -T -cs -Cs -na -Zs -G If you start using CFT and CST for your own business, take these options as a basic set and try other options to get a feeling for what they are useful and how they affect the output. The large number of options may be confusing for beginners but this is the only way to give the users the flexibility of customising their own output. Therefore, take some time to learn about CFT and CST and their features, read this manual carefully and make your own experiences with this software. It is possible to declare more than one source file, command file and list file on the command line. In that case they will be processed in the order they appear. Files and options can be placed in mixed order on the command line, there is no recommended order for them because all options (also those inside command files!) will be processed before any source files were scanned. The maximum command line length for DOS is 127 characters, so this is a system dependent 'natural' limit for the options and file names being declared. If you have more items to declare, place them into command list files and file list files, which do not have such limitations. Options can also be defined by the environment variables CFT and CST (also used for CFT386 and CST386) like - 33 - SET CFT=... SET CST=... To separate single options in the environment string, spaces are required. See also the description for the -D option for remarks on environment variable definitions. The rules for the interpretation of options is 1. if defined, all options in the environment variables CFT (for CFT and CFT386) or CST (for CST and CST386) will be taken, 2. the command line options and the option files will be interpreted in the order they appear. If an option is declared different more than once then previous declarations will be overwritten by the newer one. If options are represented by a single character with no additional optional values possible like -r or -a, they can be grouped together with a single leading '-' in front like '-rasM', which is the same as '-r -a -s -M'. The last option however, can have additions, for example '-rasMmWinMain' which can be evaluated to '-r -a -s -M -mWinMain'. If an option can have an additional parameter, the parameter must be specified without a space between the option character. Leaving this space means that no additional parameter is given for this option. File names being composed of drive letter, directory name, file name and file extension, in the following referred simply as 'path name', are treated by some special procedures to force a unique style of their internal representation: - path names are always considered not case sensitive, so there is no difference in upper case, lower case and mixed case path names (the reason is that DOS does not make any difference), - path names containing './', '.\', '../' and '..\' (so called 'relative paths') are expanded and transformed into absolute paths, - the recommended directory delimiter is '/' (UNIX-style), if a '\' (DOS-style) is recognized in a path name, it will be replaced by '/', - path names are always expanded and transformed into the default style :/ to get a unique representation for every file name that must be handled during processing. These actions are done with every path name during file processing. File names given on the command line are also transformed. - 34 - If you want to perform database generation (option -G) for different projects, you are responsible to separate them and avoid overwriting of existing databases. This can be done either by giving the databases different names so that the database files can be placed all in the same directory, or every database must be written into its own directory. If you want to access the databases be sure to use the correct name and/or path, also within the BRIEF or MicroEMACS editors. COMMAND LINE EXAMPLES 1. CFT -m -rau *.c This program invocation of CFT processes all files with the extension ".c" in the current directory and generates an output file starting with the "main"-function (option -m) for the output tree. Every function will be displayed with file and line number reference and a cross reference number (option -r). All functions will be shown in lexicographical order (-a), also undefined ones (-u). 2. CFT -mWinMain -rausMP -TMSC70,L -Id: -cs -Cs -na -ve: -C++ *.c ..\*.c *.cpp This invocation is similar to the one described above with some extensions. The source files from the current (*.c, *.cpp) and from the parent (..\*.c) directory, they will be preprocessed (-P) with MS-C 7.0 defines for large memory model (-TMSC70,L), the include file path will be taken from the environment variable "INCLUDE" (default for -P) and the path "d:" (-Id:) will also be searched for. The precompiler output is stored in path "e:" (-ve:). C++ extensions and keywords will be recognized if they occur (-C++). The output will start with the "WinMain"-function (-mWinMain). There will be a sorted call statistic (-cs) and a function summary for every scanned file (-Cs) with additional informations for every function (-s). The critical function call path for all functions will be calculated and displayed (-na) and the included files of every source file will be shown (-M). 3. CST -S"struct _test" -r *.h -W2 -C++ Start CST to scan all files in the current directory with extension ".h" for data types. They will be displayed with file name and line number reference and cross reference number (-r). The output should be done for the data type 'struct _test' (-S"struct _test"). The warning level is set to "2" (-W2). 4. CFT y.c -R -Dmain=main_entry z.c -P x.c Start CFT to produce a reverse calling tree (-R) of the functions found in the files "x.c", "y.c" and "z.c" in the current directory. The files will be preprocessed (-P) before file scan, the name "main" will be replaced by "main_entry" during preprocessing (-Dmain=main_entry). - 35 - 5. CST $cst1.cmd $cst2.cmd -ve\tmp: @cstfiles +*.h -olist.v1a This invocation of CST receives its options from the command files "cst1.cmd" and "cst2.cmd" and stores the preprocessor output in path "e:\tmp" (-ve:\tmp). The files being processed are defined in the source list file "cstfiles" and on the command line by "+*.h". The "+*.h" file specification searches the current directory and all subdirectories for files with the extension ".h". The output file will be named "list.v1a" (-olistv1a). 6. CFT -ra -PGNUINC -TGNU -M c:\gnu\src\*.c c:\gnu\src\*.s -d10 CFT scans all files with extension ".c" and ".s" in the directory "c:\gnu\src". They will be preprocessed with an include file path defined in environment variable "GNUINC" (-PGNUINC) for compiler type "GNU" (-TGNU). The output contains all functions (-a) with complete reference information (-r) and a list of all included files for every source file (-M). The output tree will be truncated if the nesting level is higher than 10 (-d10). 7. CST *.c CST processes all files with extension ".c" in the current working directory. There are no options specified, so only the options set by the environment variable 'CST', if present, will be used to customize the program execution. As an example the command line options used in example 6. can be defined as environment variable CST by 'SET CST=-raMKPGNUINC -TGNU -d10'. 8. CFT -ra -PI960INC -TI960,KB *.c *.s CFT scans all files with extension ".c" and ".s" in the current directory. They will be preprocessed with an include file path defined in environment variable "I960INC" (-PI960INC) for compiler type "I960", 'KB' architecture (-TI960,KB). The output contains all functions (-a) with complete reference information (-r). 9. CFT -rRM -gproj40 -Gproj41 CFT reads the database named 'proj40' (-g) and produces as output the reverse function call tree (-R) with complete reference information (-r), the (include) file interdependencies (-M) and a new database named 'proj41'. 10. CST -g -Gnew -N CST reads the default database (-g) and produces as output another database named 'new' (-Gnew). No other output file is generated (-N). 11. CST -N -OTEST -O+ test.h CST reads the file "test.h", generates no output file (-N), but a byte offset calculation file for data type 'TEST' (-OTEST) and its enclosed type members (-O+). - 36 - 10 OUTPUT DESCRIPTION AND INTERPRETATION This section gives an overview about the files being generated by CFT and CST and the interpretation of the results. Different files are produced as output depending on the options being set by the user. Usually, if -N is not set, all informations are written to the default output file CFT.LST or CST.LST or to the file specified by the -o option. The internal structure of these files and their meanings are described below. If database generation is enabled with option -G, several files are produced. They all have a common database name to identify the files that are related with a project. The file extension '.DBF' marks the dBASE compatible database files, the file with the extension '.CMD' contains the command line options and the file with the extension '.SRC' contains all source files that were processed. For further informations refer to the corresponding section in the syntax description. CFT OUTPUT The output file is divided into several sections. Some of the sections listed are generated by default (-), others are optional (o) and only displayed if they are enabled by a command line option. Also, the default sections can be customized to produce the desired output. The sections generated for CFT are (in the order they appear): - file header - function calltree/called-by hierarchy listing (-r, -R, -x, -a, -m, -f, -dn, -V, -l) - function summary - multiple defined functions and their location (only if detected) - overloaded functions and their location (only if detected) o undefined functions (-u) o function call statistics (-c[s]) o function caller/member relations (-Z[s]) o function call cross reference table (-z) o critical function call path (-n[a]) o source file - include file dependency (-M) o function tables for source files (-C[s], -s, -q) - file information summary (-p, -q) Each function is displayed like: int test() (1) with the following meanings - int : function return type - test() : function name - (1) : function reference number - 37 - - : found as (one or more of) D = definition, M = macro, P = prototype, C = function call, A = assembler function - : file name, line number The line number is the line where the function definition block starts with its initial '{' and not the line where the function name resides. I think that this is the best solution because it is the point where we go really inside the function block. This convention is also used by source level debuggers which point on the line with the opening brace on function entry. CST OUTPUT The output file is divided into several sections. Some of the sections listed are generated by default (-), others are optional (o) and only displayed if they are enabled by a command line option. Also, the default sections can be customized to produce the desired output. The sections generated for CST are (in the order they appear): - file header - data structure calltree/called-by hierarchy listing (-r, -R, -x, -a, -m, -f, -dn) - data type summary - multiple defined data types and their location (only if detected) o data type call statistics (-c[s]) o data type caller/member relations (-Z[s]) o data type call cross reference table (-z) o maximum data type nesting (-n[a]) o source file - include file dependency (-M) o data type tables for source files (-C[s], -s, -q) - file information summary (-p, -q) Each data type is displayed like: struct _test (1) with the following meanings - struct _test : type specifier - (1) : reference number - : data type (one/none of): B = basic type (void, char, int, ...), S = struct, U = union, C = class, E = enum - 38 - - : file name, line number of type definition (only printed if necessary) - : file name, line number of basic type definition The two locations for the data type can occur if the data type is first defined and later assigned via 'typedef' or by '#define' (if -P is not set) to another data type name: test.c: ... line 60: struct xyz {...}; ... line 90: typedef struct xyz struct _test; ... Their definition is on different lines but both data type names refer to the same data structure. Like the convention used for functions, the line number is the line where the structure, union, enumeration or class type definition block starts with its initial '{' and not the line where the type name resides. For an example session and more detailed informations about the generated output of CFT and CST see the file EXAMPLE.DOC. OUTPUT INTERPRETATION Besides the hierarchical structure chart of the function and data type relationships, the resulting output contains several useful informations about the program which can be used for optimization, reuse or maintenance purposes. Identifying the most frequently called functions is a good way to find candidates for further optimization. Low-level functions with many callers but no called subfunctions are ideal for reuse. Functions with no callers may be useless if the function is also not called via function pointers and can be discarded therefore. The chance to find errors in complex functions with many lines of source code, many called functions and a lot of control statements is much bigger than in simple functions. - 39 - 11 INTEGRATION INTO PROGRAMMING ENVIRONMENTS Invoking CFT and CST directly from inside editors or integrated programming environments (IDE) and displaying the results can be a very useful feature during program development. With advanced IDE's like that of Borland C++ or Microsoft PWB this is an easy task. The Borland IDE has in its system menu a section with 'transfer items. It contains programs that can be invoked from inside the IDE like TASM or GREP. To add CFT and CST as new entries you have to go to the OPTIONS menu and open 'TRANSFERS...'. Choose a free entry in the table and select EDIT. A window will open with 3 edit lines. In first line called 'Program Title' you must write 'C~FT' resp. 'C~ST' as the name being displayed in the transfer section. The '~' prepends the hot-keys 'F' and 'S'. In the second line called 'Program Path' you must write 'CFTIDE' resp. 'CSTIDE', maybe with the complete path, if necessary. 'CFTIDE' and 'CSTIDE' are two batch files which perform the invocation of CFT resp. CST together with the necessary options. These batch files are part of the CXT package, you can change the options defined there if you need other ones. In the third line called 'Command Line' you must write the macro commands '$EDNAME $NOSWAP $CAP EDIT'. These macros transfer the file name in the current edit window ($EDNAME) to the batch file, suppress window swapping ($NOSWAP) and capture the processing results in an own edit window ($CAP EDIT). The last step is to save these entries, then the integration is completed and CFT and CST can be used as if they were built-in functions. The processing results are shown in an edit window which can be scrolled, resized or moved. By adding CFT and CST to the IDE it is much easier for the programmer to use these tools. - 40 - 12 TOOLS FOR DATABASE PROCESSING To access informations stored in a database, the following utilities are available for 'CFT' and 'CST': CFTN C Function Tree Navigator CSTN C Structure Tree Navigator They can be used to recall the file name and line number of a specific item (function or data type) from the database. If the requested item is found in the database, it will be displayed with its location where it is defined or where it is found for the first time if there was no definition found during processing. As an additional feature editors like BRIEF 3.0, QEDIT 2.1 or MicroEMACS 3.11 can be invoked directly with the informations to open the target file and to move the cursor to the line where the searched item is located. For BRIEF there are several macros available to perform searching inside the editor. A new edit window with the file at the location of the requested item will be opened if the search was successful. Also both MicroEMACS editor versions for DOS and WINDOWS are supported. Some of these actions are also possible for QEDIT, with slight limitations due to the macro programming capabilities. Other user programmable editors which should be able to work with CFTN and CSTN are EPSILON, ME, KEDIT, Multi-Edit, GNU-EMACS portations like DEMACS or OEMACS, the Microsoft editor M or integrated development environments like Borland or Microsoft PWB (this list may not be complete). You can try to integrate CFTN and CSTN into these systems by using the BRIEF, QEDIT or MicroEMACS macro files as examples for your own integration development. PRECOMPILED SOURCE FILES Sometimes, if the precompile option -P was used to process the source files related with the database, the results of searches seem to be wrong. This can happen if an identifier in the source code is in fact defined as a macro and has been exchanged during preprocessing so that the resulting source processed by the analyser is different from the original source and the cursor will point to an obviously wrong location or the search will fail. An identifier which is in fact a macro name is unknown and not accessible after precompiling. It is also possible that a function being used in the original source could not be found in the database. The reason is that the function is in fact a 'function like' macro and was replaced during preprocessing. If different named macros are defined equal, a search for an item may point to another location than the requested. If the -P option is not set, the same item can have several 'alias'- names due to macro defining. If the source code contains explicit #line numbers, searching for a specific line may also fail. Keep these exceptions in mind for a correct interpretation the results when using the database. - 41 - IMPORTANT NOTICE Recalling informations from the database may not be valid if files being processed by CFT or CST were edited and changed after the database generation has been performed. Errors can result like pointing to wrong files and/or lines if source lines have been deleted or inserted, failed searches if names have changed or failed accesses to files which may have been renamed, moved or deleted. To avoid these errors, a consistency check for the file creation date/time and file size will be performed by CFTN and CSTN. If inconsistencies are recognized, the user will be informed that the database is not up-to-date and should be updated by processing the source files again. SYNTAX: CFTN [options] pattern CFTN [options] pattern OPTIONS -Eeditor Specifies the editor command line for option -e, overwrites the default and the environment values. See the section about environment variables for further informations about the required format. -F Print all file names which are related with the database. This option is useful to get a complete overview about all files of the project. -a Print all function/data type names. Useful to generate a list of items, for example as input to other programs. -B Same as -a, but prints additionally the internal database record number. Used by BRIEF macros. -bform Run search in batch-mode, this means that, if the requested item was found, the location will be displayed on a single line as "file name line number" (default style), otherwise there will be no output that the search failed. The output style can be changed by specifying 'form' to overwrite the default style. Like for option -E you can specify the exact locations where the file name and line number should be inserted by defining a format string with %s and %d (See also the section about environment variables). For example, the format to generate a command line for invoking BRIEF, QEDIT or MicroEMACS would look like cstn -b"b -m\"goto_line %d\" %s" ... (BRIEF) cstn -b"q %s -n%d" ... (QEDIT) cstn -b"me -G%d %s" (MicroEMACS) - 42 - This option gives you a great flexibility in generating an output for your own purposes, for example to write a batch file or for further use in other programs. -e If the requested item is found, an editor will be invoked to display the file containing the requested item. There are three different ways to specify the editor command line (evaluated in that order): 1) use option -E, 2) define the environment variables CFTNEDIT, CSTNEDIT or CXTNEDIT, 3) if nothing is specified, BRIEF as the default editor (if present) will be invoked with the file name and line number of the item to move the cursor to its location. Ensure that the PATH environment variable is set correctly, including the path for the BRIEF directory. -fname Use 'name' as base name (path and file name) for database files. It is also possible to use environment variables (CFTNBASE, CSTNBASE, CXTNBASE) for the definition of the database names. If -f and environment variables are not set, a default name will be used (see also option -G from CFT and CST syntax description). This allows the use of different databases, for example, generated for different projects. See also the section about environment variables for further information. -r# This option prints the location for a selected item with matching pattern and record number #. This option requires -b. Used by BRIEF macros. -Ritem Print a cross reference list of every occurrence of 'item' with complete file name and line number. -Dfile Print a list with the contents of 'file'. pattern The item to search for in the database. This can either be a function name (CFTN) or a data type name (CSTN). There are three different ways of searching depending how 'pattern' is given: pattern exact search, pattern* the beginning of the item must match with pattern *pattern a substring must match with pattern If the item to search for consists of more than one word (contains spaces), the search pattern must be 'quoted' like "struct _iobuf" to ensure that these words are interpreted as single pattern. - 43 - RETURN VALUES The following values are returned to DOS or the calling program to report the result of the database search: - 0 searched item not found, - 1 searched item found, - 2 searched item found, but the source file may have been changed (creation date and/or file size are not equal) since the creation of the database (database is not up-to-date). The returned value can be used to decide what action should be done for different results, for example, if the database is not up-to-date. ENVIRONMENT VARIABLES CFTNEDIT, CSTNEDIT, CXTNEDIT: The editor to invoke can be defined either by option -e or by defining the environment variables CFTNEDIT (for CFTN), CSTNEDIT (for CSTN) or the commonly used variable CXTNEDIT (for both CFTN and CSTN) with the format string of the editor of your choice. The format string can be used to specify the place where the file name and the line number should be inserted to give additional informations to the editor. Use %s for the file name and %d for the line number. For example, the invocation of the default editor BRIEF could be defined like SET CFTNEDIT=b -m"goto_line %d" %s SET CSTNEDIT=b -m"goto_line %d" %s SET CXTNEDIT=b -m"goto_line %d" %s where 'b' is the BRIEF editor, '-m' specifies the macro being invoked when BRIEF starts, the macro name 'goto_line' with '%d' as the place to insert the line number and '%s' as the place for the file name. Note that this example cannot be used on the command line with -E option because of the quotes. It is possible to change the order of %d and %s if another editor is used. Here are additional configuration examples for other popular editors (examples are given for CFTN, similar for CSTN): EDIT (MS-DOS 5.0): SET CFTNEDIT=edit %s or -E"edit %s" or SET CFTNEDIT=edit or -Eedit VDE 1.62: SET CFTNEDIT=vde %s or -E"vde %s" or SET CFTNEDIT=vde or -Evde QEDIT 2.1: SET CFTNEDIT=q %s -n%d or -E"q %s -n%d" MicroEMACS 3.11: SET CFTNEDIT=me -G%d %s or -E"me -G%d %s" - 44 - The described notation allows the user to customize CFTN and CSTN with his preferred editor and to perform additional actions during invocation. If your editor supports macro programming like BRIEF you are free to write your own macros to do similar things like the CXT.CM macro given for BRIEF 3.0 does. I think this is the most flexible way to give users control about this option and to help them working with their preferred programming environment and development tools. CFTNBASE, CSTNBASE, CXTNBASE: These environment variables can be used to specify the name of the database. Similar to the editor environment variables, CFTNBASE and CSTNBASE are related to CFTN and CSTN and CXTNBASE is used for both. For example, to specify the database 'proj1' located in directory 'd:\develop\projects' type SET CFTNBASE=d:\develop\projects\proj1 SET CSTNBASE=d:\develop\projects\proj1 for a separate definition or SET CXTNBASE=d:\develop\projects\proj1 for a common definition of the database name. COMMAND LINE EXAMPLES 1) CFTN * Displays all functions in lexicographical order with their return types, file names and line numbers. Gives a short overview about all functions being found. 2) CSTN -e * Edit all data types in lexicographical order, use default or by environment variable CSTNEDIT or CXTNEDIT defined editor. 3) CFTN -fproject1 -Evde -e main Search database named 'project1' for function 'main' and edit with editor 'vde'. 4) CSTN -b "union REGS" Search for data type 'union REGS' and display, if found, the file name and line number 5) CSTN -e -E"q %s -n%d" -fcft tmbuf Search database 'cft' for data type 'tmbuf' and invoke, if found, the editor 'q' (QEDIT V2.1) with the file name and line number - 45 - SEARCHING INSIDE BRIEF (Version 3.0) This feature is one of the most powerful enhancements for the BRIEF editor and offers the user full control over the complete source code of software projects no matter how big they are and how many files they include. It extends the BRIEF editor to a comfortable hypertext source code browser and locator system. The browser allows its user to find and read various important program constructs like functions and data types in several files simultaneously and moving between them. The complete project with several source and include files appears as if it were a 'whole-part'. The browser helps the programmer to learn about the existing program structures and supports him in developing new and maintaining existing code. The programmer can use the generated output files CFT.LST or CST.LST (or the one he created with the -o option) to walk along the hierarchy tree chart and to select from there the function or data type that should be displayed in detail. The following features are implemented as macros: - searching for a specific item, tagged or marked - building menus of all defined items - building menus of all references to a specific item - building menus of all processed files - building menus of all items defined in the current file - searching for a specific item cross reference number - changing the database name Every function and data type can be accessed with just a keystroke by moving the cursor on it ("tagging") and executing a macro to locate the item and zoom into the file where it is defined. The user does no longer have to remember the file names and locations where the functions and data types are defined nor does he have to change the files, directories and drives to access the files manually. It is possible to build interactive dialog menus with all functions or data types in lexicographical order and to select an item to display. This is very useful to get a quick overview about all accessible functions and data types of the whole project. It is also possible to build an interactive dialog menu with all file names in lexicographical order which are stored in the database and to select one file to open for edit. Other menus are available for file contents lists and item cross references. All informations to perform these actions are stored in the databases generated by processing the files related with the project. To invoke CFTN and CSTN inside BRIEF, the macro file CXT.CM must be loaded (with CXT.CM), which makes the implemented macros available. These macros are - 46 - MACRO NAME KEY ASSIGNMENT (defined in CXTKEYS.CM) cft Shift F1 cftmenu Shift F2 cftxrefmenu Shift F3 cftxrefmenuagain Shift F4 cftdefmenu Shift F7 cftfilemenu Shift F8 cftfind Shift F11 cftbase Shift F12 cst Ctrl F1 cstmenu Ctrl F2 cstxrefmenu Ctrl F3 cstxrefmenuagain Ctrl F4 cstdefmenu Ctrl F7 cstfilemenu Ctrl F8 cstfind Ctrl F11 cstbase Ctrl F12 cxtbase Alt Tab cxtsearchxref Ctrl Tab cxthelp This macro key assignment list is also available within BRIEF as a help screen which can be invoked by the macro 'cxthelp'. The CXT help information is not part of the BRIEF help system because this would need modifications of the original BRIEF help files. Instead of loading the file CXT.CM and typing the macro names manually, you can load the macro file CXTKEYS.CM which performs automatic loading of the CXT.CM file if any of the above listed macros is invoked with a hot-key. To simplify working with this package, the CXTKEYS.CM macro file also contains key assignments for the macros. These hot-keys offer a "point and shoot" hypertext like feeling. The macro source file CXTKEYS.CB contains the source code for CXTKEYS.CM so that you are able to make changes like the key assignments for your personal needs or to move the initialization function to the BRIEF start-up macro file (For further informations about BRIEF macros see the BRIEF manuals). To load these macros and to execute CFTN and CSTN, which are invoked from inside BRIEF, be sure to set the directory path correctly. It is also necessary to allow access to the macro file DIALOG.CM which contains the functions for dialog menu building and processing. A search can be started by simply moving the cursor on the item to search for or by marking a block with the item (necessary if search pattern contains more than one word like 'struct xyz') and then running one of the following macros (or press hot-keys): cft (function search) cst (data type search) It is also possible to type the name of the item to search for manually. To do this you must run one of the following macros: - 47 - cftfind (function search) cstfind (data type search) If the search was successful, a new window with the file containing the item will be opened and the cursor will be placed at the line where the item is located. If inconsistencies have been detected, the user will be informed. If the requested item or the source file containing the item is not found, a message will be given. The macros for building the function and data type dialog menu are cftmenu (function menu) cstmenu (data type menu) You can scroll through the entries and select an item which should be displayed. To access databases other than the default ones, there are two ways to change the base names: 1) Set the environment variables CFTNBASE, CSTNBASE or CXTNBASE (see description above). By loading the macro file CXT.CM these variables will be used for initialization. 2) To change the base names from inside BRIEF, there are three macros to do this. They overwrite the initial values given by the environment variables: cftbase change base name for function search cstbase change base name for data type search cxtbase change both CFT and CST base name With these features it is possible to set default values for the database files or to change between different databases without leaving BRIEF which gives the user a maximum of flexibility. You can display a menu list with all source files being scanned for the database by typing cftfilemenu (CFT file menu) cstfilemenu (CST file menu) With this feature you can get a quick overview about all files related with the database. Other menu driven options concern the displaying of all cross references to a specific item (see macro 'cst' for informations about marking) with the macros cftxrefmenu (CFT cross reference menu) cftxrefmenuagain (show previous menu again) cstxrefmenu (CST cross reference menu) cstxrefmenuagain (show previous menu again) and the displaying of a file contents list for the actual source file with the macros - 48 - cftdefmenu (CFT file menu) cstdefmenu (CST file menu) To search for the first appearance of a specific cross reference number like '(123)' in a CFT or CST output listing file, move the cursor to the reference number and type cxtsearchxref (search cross reference) The macro extracts the complete number and searches for its first occurrence by starting from the beginning of the output file. With this macro you can move quickly from any reference to its initial description. All the above described macro functions are defined in the BRIEF macro file CXT.CB. These macros make extensive use of the several options of CFTN resp. CSTN, which are described earlier in detail. - 49 - SEARCHING INSIDE QEDIT (Version 2.1) The popular shareware editor QEDIT with its macro programming capabilities allows, like the BRIEF editor, the searching of functions and data types from inside the editor. The following examples for QEDIT macros act, with slight limitations, like the BRIEF macros 'cft' and 'cst': CFT function searching, assigned to : #f9 MacroBegin MarkWord Copy Dos 'cftn -b ' Paste '>tmp' Return Return EditFile 'tmp' Return AltWordSet MarkWord Copy DefaultWordSet EditFile Paste Return EditFile 'tmp' Return EndLine CursorLeft MarkWord Copy Quit NextFile GotoLine Paste Return CST data type searching, assigned to : #f10 MacroBegin MarkWord Copy Dos 'cstn -b ' Paste '>tmp' Return Return EditFile 'tmp' Return AltWordSet MarkWord Copy DefaultWordSet EditFile Paste Return EditFile 'tmp' Return EndLine CursorLeft MarkWord Copy Quit NextFile GotoLine Paste Return These QEDIT macro definitions can be placed into the 'qconfig.dat' configuration file and added to 'q.exe' with the 'qconfig.exe' configuration utility (For additional details about QEDIT macro programming see the QEDIT documentation). The two macros perform the following actions: mark the current word, execute the CFTN or CSTN database search for the marked word via dos and redirect the output to file 'tmp', read target file name from 'tmp' and open target file, read line number from 'tmp' and go to the selected line. These macros are working almost similar to those used from BRIEF, but they have some limitations in their functionality due to the limited capabilities of the QEDIT macro programming language: - there is no error check for a correct cursor location, - the searched item must always be a single word like 'main' or 'size_t', a combined pattern like 'struct iobuf' cannot be searched, - there is no error check if the search was successful or failed or the database is not up-to-date, - if the target file is the same as that from which the search started and other additional files are also open (QEDIT ring buffer), probably a wrong file will be accessed, - the name of the database cannot be changed, the searches are performed either with the default database or those defined by the environment variables. - 50 - SEARCHING INSIDE MicroEMACS (Version 3.11, DOS & WINDOWS) The latest editor which is now supported with macros for database access is MicroEMACS 3.11. The macro file is named CXT_ME.CMD and should be place in the MircoEMACS directory. This macro file works with the DOS and the WINDOWS version of MicroEMACS 3.11. The following macros are available: - cft function search for tagged item - cst data type search for tagged item - cftmark function search for marked item - cstmark data type search for marked item - cftfind function search for user defined item - cstfind data type search for user defined item - cftfile list of all CFT files - cstfile list of all CST files - cftbase set CFT database name - cstbase set CST database name - cxtbase set both CFT and CST database name They can be invoked by loading the macro file CXT_ME.CMD with ESC CTRL+S CXT_ME.CMD and running the macro with ESC CTRL+E If the macros are used with the MicroEMACS WINDOWS version, you may have to change the DOSEXEC.PIF file, which is part of the MicroEMACS 3.11 distribution package. During the CXT macro execution, the shell command may stop after execution and waits for the key pressed to continue. To avoid this interruption, you can enable it by editing the PIF file and select "Close window after execution". The environment variables CFTNBASE, CSTNBASE and CXTNBASE are used in the same way as in the BRIEF version. Key-assignments to macro procedure names are not performed, if you prefer hot-keys, you are free to do this for yourself. In the MicroEMACS WINDOWS version, however, the user accessible macros can be integrated into the "Miscellaneous" pull-down menu (thanks to the incredible macro programming capabilities of MicroEMACS!). To view the generated output file with its semigraphic frames, change the font type and select for example the 'TERMINAL' font from the OEM font list which supports semigraphic characters. - 51 - 13 TROUBLE SHOOTING This section contains informations about problems and the several reasons which may occur during the use of CFT, CFT386, CST, CST386, CFTN and CSTN. It is strictly recommended that users should read the complete documentation to have an overview about the features before they start using CFT and CST and run into any unexpected troubles. See also the chapter about 'PROGRAMMING LIMITATIONS'. THE CFT AND CST PROGRAMS CANNOT BE EXECUTED The program path is not specified in the environment variable PATH, the programs are not yet installed in the specified directory, attempt to start the 386 protected mode versions on an non 386 computer. EXECUTION STOPPED WITH MESSAGE "OUT OF MEMORY" An attempt to allocate memory has failed. Try to remove unnecessary memory resident TSR programs or use the protected mode versions CFT386 and CST386 if you have an 386/486. If this message happens for the protected mode versions, there is not enough free disk space for the swap file. Set the temporary directory, defined by 'TMP' resp. 'TEMP' environment variables, to another drive, if possible. WRITING THE CFT AND CST OUTPUT FILE TAKES A LONG TIME A large number of informations must be handled, option -x or -r is not set and so the output tree chart is very large, slow CPU and/or harddisk. Use option -v to redirect intermediate files to a faster RAM-disk (if such is present). THE RESULTING OUTPUT IS DEEPLY NESTED AND EXCEEDS THE SCREEN SIZE Two reasons: Use the -r or -x option if not already specified or the source code/data types are indeed deeply nested. THE BRIEF MACROS CANNOT BE EXECUTED The macro file is not loaded, other macros with the same names or assigned keys already exist. THE BRIEF OR MICROEMACS MACROS CANNOT BE LOADED The path to the macro file location must be specified when loading the macros, if they are not in the default directory for the editor. THE BRIEF MACROS DO NOT FIND ANY FUNCTIONS OR DATA TYPES There is no access to CFTN and CSTN due to incorrect path specification, no database is present, the path to the database files is incorrect, the database name is incorrect. THE BYTE OFFSET CALCULATION FILE "CST_OFFS.C" CANNOT BE COMPILED Several reasons: Necessary data types or include files are not specified or the CST processing was done with include files other than those being used for compiling. If the number of data type informations is too large, some compilers cannot compile the large number of statements in a single file generated from CST ('out of heap space', 'code segment too large' or other messages - 52 - like that). In that case you may have to split the file into several smaller files or reduce the number of data types to display. LOCATING ITEMS IN THE BRIEF EDITOR POINTS TO WRONG PLACES Searching items from within the BRIEF editor points to wrong lines, the requested item is not present there or the file seems to be corrupted. This can have several reasons: The file is not up-to-date and has been changed since the database generation so that the line references are no longer valid. Another reason can be that the source file has explicit #line numbers as it is usual for files produced by source code generators like YACC/BISON or LEX/FLEX. A third reason may be that the source file was generated on an UNIX system and has therefore only LF instead of CR+LF as end-of-line delimiter so that BRIEF cannot display the file correctly, the file seems to be written in a single line. UNEXPECTED RESULTS WHILE RUNNING CFT AND CST UNDER WINDOWS 3.1 The 386 versions of CFT and CST cannot run under Windows 3.1, they are using the CPU exclusive and can therefore not co-exist with Windows, only the real mode versions can. In Windows enhanced mode (virtual 386 mode), CFT/CST cannot run simultaneously in several independent DOS-windows if they are working in the same directory or use the same temporary directory, because the temporary intermediate files may have the same names and will conflict due to multiple accesses to the same file. This may also happen if the same files are scanned. MICROEMACS FOR WINDOWS SEEMS TO HANG DURING DATABASE ACCESS The reason is simple: The shell call to DOS through DOSEXEC.PIF waits for a keystroke to continue execution and to return to WINDOWS. You may change this behaviour by editing the DOSEXEC.PIF file (see MicroEMACS section for further information). - 53 - 14 REFERENCES Brian W. Kernighan, Dennis M. Ritchie: "The C Programming Language", Prentice Hall, Englewood Cliffs, Second Edition 1988 Samuel P. Harbison, Guy L. Steele Jr.: "C: A Reference Manual", Prentice Hall, Englewood Cliffs, Third Edition 1991 Bjarne Stroustrup: "The C++ Programming Language", Addison-Wesley, Second Edition 1992 Margaret A. Ellis, Bjarne Stroustrup: "The Annotated C++ Reference Manual" (ARM), Addison-Wesley, Second Edition 1991 "Working Paper for Draft Proposed International Standard for Information Systems - Programming Language C++", AT&T, ANSI committee X3J16, ISO working group WG21, January 28, 1993 Bjarne Stroustrup, Keith Gorlen, Phil Brown, Dennis Mancl, Andrew Koenig: "UNIX System V - AT&T C++ Language System, Release 2.1 - Selected Readings", AT&T, 1989 Goldberg, A.: "Programmer as Reader", IEEE Software, September 1987 L.W. Cannon, R.A. Elliot, L.W. Kirchhoff, J.H. Miller, J.M. Milner, R.W. Mitze, E.P. Schan, N.O. Whittington, H. Spencer, D. Keppel, M. Brader: "Recommended C Style and Coding Standards", Technical Report, in the Public Domain, Revision 6.0, July 1991 (revised and updated version of the 'AT&T Indian Hill style guide', can be obtained via anonymous FTP from cs.washington.edu in '~ftp/pub/cstyle.tar.Z') A. Dolenc, A. Lemmke, D. Keppel, G.V. Reilly: "Notes on Writing Portable Programs in C", Technical Report, in the Public Domain, Revision 8, November 1990 (can be obtained via anonymous FTP from cs.washington.edu in '~ftp/pub/cport.tar.Z') M. Henricson, E. Nyquist: "Programming in C++, Rules and Recommendations", Technical Report, in the Public Domain, Ellemtel Telecommunication Systems Laboratories, Alvsjo/Sweden, Document No. M 90 0118 Uen, Rev. C (can be obtained via anonymous FTP from various sites as 'rules.ps.Z' or 'c++rules.ps.Z') Compiler reference manuals and related documentations (language references, language implementations and extensions): - Microsoft C 5.1 - Microsoft C 6.0 - Microsoft C/C++ 7.0 - Microsoft C for SCO UNIX System V Rel. 3.2 - Microsoft Macro Assembler MASM 5.1 - Borland Turbo C++ 1.0 - Borland C++ 2.0 - Borland C++ 3.1 - Borland Turbo Assembler TASM 2.0 - Intel 80960 C-Compiler (ic960, ec960) - 54 - - Intel 80960 Assembler (asm960) - Intel 80860 Metaware High C i860 APX (UNIX-hosted) - GNU-960 Tools (UNIX-hosted) - GNU-C Compiler 2.2.2 (C, C++, Objective-C) - GNU Assembler - AT&T C++ 2.1 CFRONT (C++ to C translator) for SCO UNIX System V Rel. 3.2 - IBM C-Compilers (CC, XLC) for IBM RS 6000 RISC stations, AIX 3.15 - HP C-Compilers (CC, C89) for HP Apollo 9000 RISC stations, HP-UX 9.0 - VAX C 15 TRADEMARKS All brand or product names are trademarks (TM) or registered trademarks (R) of their respective owners. The following products and names are Copyright (C) Juergen Mueller (J.M.), Federal Republic of Germany (GER), all rights reserved world-wide: CXT (TM) C EXPLORATION TOOLS CFT (TM) C FUNCTION TREE GENERATOR CFT386 (TM) C FUNCTION TREE GENERATOR CST (TM) C STRUCTURE TREE GENERATOR CST386 (TM) C STRUCTURE TREE GENERATOR CFTN (TM) C FUNCTION TREE NAVIGATOR CSTN (TM) C STRUCTURE TREE NAVIGATOR The name 'CXT' is used as the summary term for all the above described C-tools (and maybe some more in the future ...). The CXT tools itself are part of the PXT (TM) PROGRAM EXPLORATION TOOLS which provide a similar set of functionalities for the source code analysis of different programming languages. - 55 - APPENDIX 1: PRECOMPILER DEFINES The following list shows the precompiler defines for the supported compiler types. It contains the default defines and the optional memory model and architecture defines. Other default compiler defines which are usually declared by some of the compilers are not automatically defined by the -T option. These are defines for compilation like WINDOWS, __WINDOWS__, _Windows, DLL or __DLL__, for optimization like __OPTIMIZE__ or __FASTCALL__ or others like those about target (operating-) systems like NT, MIPS, UNIX, unix, __unix__, i386, __i386__, GNUDOS, BSD, VMS, USG, DGUX or hpux. Other sometimes predefined macros are __STRICT_ANSI__, __CHAR_UNSIGNED__ or __TIMESTAMP__. If necessary, they can be user defined on the command line with the -D option. The macro name __cplusplus will be defined if the command line option '-C++' is set to enable C++ processing. 1. MSC51 (Microsoft C 5.1): Default defines: MSDOS, M_I86 C++ specific defines: (none) Memory model defines: M_I86SM, M_I86MM, M_I86CM, M_I86LM, M_I86HM 2. MSC70 (Microsoft C/C++ 7.0): Default defines: MSDOS, M_I86, _MSC_VER (=700) C++ specific defines: (none) Memory model defines: M_I86TM, M_I86SM, M_I86MM, M_I86CM, M_I86LM, M_I86HM 3. TC10 (Borland Turbo C++ 1.0): Default defines: __MSDOS__, __TURBOC__ C++ specific defines: __TCPLUSPLUS Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ 4. BC20 (Borland C++ 2.0): Default defines: __MSDOS__, __BORLANDC__ (=0x0200), __TURBOC__ (=0x0297) C++ specific defines: __BCPLUSPLUS__ (=0x0200), __TCPLUSPLUS__ (=0x0200) Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ 5. BC31 (Borland C++ 3.1): Default defines: __MSDOS__, __BORLANDC__ (=0x0410), __TURBOC__ (=0x0410) C++ specific defines: __BCPLUSPLUS__ (=0x0310), __TCPLUSPLUS__ (=0x0310) Memory model defines: __TINY__, __SMALL__, __MEDIUM__, __COMPACT_, __LARGE__, __HUGE__ - 56 - 6. GNU (GNU C 2.2.2): Default defines: __GNUC__ (=2) C++ specific defines: __GNUG__ (=2) Memory model defines: (not necessary) 7. I960 (Intel iC960 3.0): Default defines: __i960 C++ specific defines: (none) Memory model defines: (not necessary) Architecture defines: __i960KA, __i960KB, __i960SA, __i960SB, __i960MC, __i960CA - 57 - APPENDIX 2: RESERVED KEYWORDS The following list shows the keywords being recognized by CFT and CST, the standard C keywords, the C++ keywords and the non-standard keywords which are compiler dependent extensions to the C or C++ language. Standard C keywords are also C++ keywords, always! The C++ keywords are recognized only if option '-C++' is set, otherwise they are treated as identifiers. This list may not be complete or correct due to upcoming new releases of the supported compilers with new extensions or extensions to the language standard. C++, for which till now no 'real' language standard exists (except the AT&T CFRONT implementation), differs among several implementations, especially for the newly introduced exception and template concepts (try, catch, throw, template). Undocumented but (obviously) present keywords especially in GNU C (e.g. __alignof, __classof, ...) or in Microsoft C/C++ 7.0 are ignored (even if they are listed here). KEYWORDS Standard compiler-specific extension C C++ MSC TC/BC GNU C 7.0 3.0 2.2.2 asm x auto x break x case x catch x x cdecl x x char x class x classof x const x continue x default x delete x do x double x dynamic x else x enum x except x exception x extern x far x x float x for x fortran x x friend x goto x huge x x if x inline x int x interrupt x x long x near x x new x - 58 - operator x overload x x pascal x x private x protected x public x register x return x short x signed x sizeof x static x struct x switch x template x this x throw x try x x typedef x typeof x union x unsigned x virtual x void x volatile x while x __alignof x __alignof__ x __asm x x __asm__ x __attribute x __attribute__ x __based x __cdecl x __classof x __classof__ x __const x x __const__ x __emit x __except x __export x __extension__ x __far x __fastcall x __finally x __fortran x __headof x __headof__ x __huge x __inline x __inline__ x __interrupt x __label__ x __loadds x __near x __saveregs x - 59 - __segment x __segname x __self x __signed x __signed__ x __stdcall x __syscall x __try x __typeof x __typeof__ x __volatile x __volatile__ x _asm x _based x _cdecl x _emit x _export x x _far x _fastcall x _fortran x _huge x _interrupt x _loadds x x _near x _pascal x _saveregs x x _seg x _segment x _segname x _self x - 60 - APPENDIX 3: EFFICIENCY To provide some values about the speed and the efficiency of the programs, tests were performed with CFT386 running on a 33 MHz 80486 with 8 MB RAM, 256 KB cache and a 15 ms hard disk (no disk cache or RAM-disk installed). The source code for the first test was the C++ part of the GNU-C compiler (version 2.2.2), which is the largest of the three compiler parts (C, C++, Objective-C). The following results have been found: - 139 files (71 source files and 68 include files) have been scanned - a total number of 2330 functions has been found from which 2248 functions were defined in the 71 source files - the directed call graph would have 2314 nodes and 10301 connections - the critical function call path has a maximum nesting level of 115 - the total size of the 139 files is 6.54 MB with 209100 lines (about 35 bytes/line), source code/filesize ratio 0.739, average function size is 1951 bytes resp. 63 lines - the effective size of the preprocessed and scanned source code (source files and their included files) is 20.71 MB with 591000 lines - the resulting output file (options -m -rauspP -TGNU -cs -Cs -n) has about 3.94 MB and 36100 lines - the resulting 6 database files have a size of 727 KB (source code/database ratio is about 9 : 1) - inside BRIEF, a database search for the location of a function is performed in less than 4 seconds - the total time for the complete processing was 36'40'' minutes with 32'20'' for analysis, 2'50'' for output file writing and 1'30'' for database writing - the average analysis speed for this source code was about 641 KB/min. respectively 18300 lines/min. The CFT386 results for a large commercial project are: - 190 files (132 source files (C and assembler) and 58 include files) have been scanned - a total number of 1223 functions has been found from which 1177 functions were defined in the 132 source and in 3 include files (some include files contain inline functions) - the directed call graph would have 1223 nodes and 2366 connections - the total size of the 190 files is 6.22 MB with 145550 lines (about 42 bytes/line), source code/filesize ratio 0.533, average function size is 1805 bytes resp. 66 lines - the effective size of the preprocessed and scanned source code (source files and their included files) is 48.42 MB with 959100 lines - the resulting output file (options -m -rauspP -cs -Cs -na) has about 907 KB and 24700 lines - 61 - - the resulting 6 database files have a size of 306 KB (source code/database ratio is about 20 : 1) - the total time for the complete processing was 35'25'' minutes with 34'15'' for analysis, 0'45'' for output file writing and 0'25'' for database writing - the average analysis speed for this source code was about 1.41 MB/min. respectively 28000 lines/min. To get some efficiency values for CST386, the include files from another commercial project were analysed for data types: - 54 include files have been scanned - a total number of 589 data types have been found from which 568 structures/unions were defined in 43 of the 54 include files - the directed call graph would have 589 nodes and 1787 connections - the total size of the 54 files is 1.389 MB with 25600 lines (about 54 bytes/line), source code/filesize ratio 0.343 - the resulting output file (options -rasp -cs -Cs -n) has about 299 KB and 8580 lines - the resulting 6 database files have a size of 326 KB (source code/database ratio is about 4.3 : 1) - the total time for the complete processing was 1'22'' minutes with 0'47'' for analysis, 0'20'' for output file writing and 0'15'' for database writing - the average analysis speed for this source code was about 1.77 MB/min. respectively 32600 lines/min (note: no preprocessing performed!). One can see that the calculated average values for the analysis speed differ due to the effective size of the 'really' present source code in relation to the size of the comments which can be seen by the code/filesize ratio. The speed values do not consider that, if the preprocessing option -P is set, the source code is first preprocessed to a temporary file and then analysed in a second step so that large parts of the source code are read twice (original and preprocessed code) and written once (preprocessor output). With these facts in mind, the analysis speed of CFT and CST seems to be quite acceptable! - 62 - APPENDIX 4: SYSTEM REQUIREMENTS Real mode versions CFT, CST, CFTN, CSTN: - IBM-AT or 100% compatible with Intel 80286 or higher, 512 KB RAM, hard disk, DOS 3.3 or higher Protected mode versions CFT386 and CST386: - IBM-AT or 100% compatible with Intel 80386+80387 or higher, 2 MB RAM, hard-disk, DOS 3.3 or higher APPENDIX 5: INSTALLATION To install this software copy all files into one directory, for example into your own 'utility'-directory or any other directory, for example 'C:\CXT', and add this directory to your system path. This is necessary so that you can invoke the programs directly from other directories. Note that especially for using the BRIEF macros this path must be set to access CFTN and CSTN. You must also copy the BRIEF macro files CXT.CM and CXTKEYS.CM to the BRIEF macro directory so that you can use the autoload function from CXTKEYS.CM and must not specify the complete path when you want to load the macros. The same procedure is necessary for the MicroEMACS macro file CXT_ME.CMD. (THIS DOCUMENT HAS 63 PAGES) - 63 -