CONVERT.DOC 1 Revised: 01-27-97 The CONVERT.EXE program converts data between several basic data formats. Features: * You can read data in from any of the following data formats: ASCII-delimited fixed field dBase-compatible * You can write data out in any of the following data formats: ASCII-delimited fixed field dBase-compatible WKS (Lotus 1-2-3 release 1 compatible) * For ASCII-delimited files, you can specify the delimiters used between fields as well as around numeric or character data. * For dBase input files, you can retain deleted records if you want. * You can resize variables or drop them entirely if desired. * You can add fields as desired and assign initial values to them. * You can specify up to 10 include filters; all records processed must meet at least one of these filter conditions. * You can specify up to 10 exclude filters; any records which meet these filter conditions are dropped. * For WKS output files, you can process input files bigger than Lotus 1-2-3 itself can handle (1-2-3 limits input records to being 240 characters or less). * Cell filters can be created, ignoring, say, any record where the value in cell 4 is less than 10. * The program can only handle files with 255 fields or fewer. (Those obsessed with handling more fields can ask for a version that does so.) * Handles DOS text files (lines end with CR/LF), Mac text files (lines end with CR), or Unix text files (lines end with LF). CONVERT.DOC 2 Revised: 01-27-97 Data format types: An ASCII-delimited file is one which typically has double quotes around each character field (the quotes are optional in CONVERT) and typically commas between fields. Leading and trailing spaces are removed from character as well as all other values. A fixed-field file places each field in the same column positions on each record. The lengths of the fields are the same from record to record. As an example, these might be an ASCII-delimited records: "Economic Bulletin Board","202 482-3870",35 "EBB High-Speed","202 482-2584",100 Fixed-field file records might look like this for the same data: Economic Bulletin Board 202 482-3870 35 EBB High-Speed 202 482-2584 100 (On files generated on a PC, most fixed-field file records end with a CR/LF combination. Although these two characters actually add two characters to each line, most people and programming languages ignore them. Files generated elsewhere may not have these line terminators. Use the /BINARY option if this is the case.) WKS files are supported directly by Lotus 1-2-3 (all versions) as well as most other spreadsheet programs. DBF files are supported by dBaseIII, dBaseIII+, dBaseIV, and most other data base management programs (Paradox etc). CONVERT.DOC 3 Revised: 01-27-97 Field-definition file: Unless you are reading a dBase file, this program requires a field-definition file to figure out the characteristics for each field and also to set certain file characteristics. If you're processing an ASCII-delimited input file, the routine can try to create a field-definition file for you if desired. The field-definition file can be created with any text editor. The field-definition file consists of several records with the following fields separated by spaces. Except for the record type indicator (which must begin in column 1), all other fields can be placed in any columns: (1) record type (see below) (2) length of field on input (3) number of decimal places for numeric data (if you don't know, put a "?" here; for non-numeric data, a "0" is fine) on output (4) length of field on output Any characters after the field length are treated as comment fields. You would typically use this to enter the field name or column position or any other information of use to you. If you'd like, you can leave out both decimal place count *and* the length of the output field. (You cannot leave off one and include the other however.) If they are left out, the number of decimal places is presumed to be "0" and the length of the output field is presumed to be the same as the length of the input field. The data record types accepted by this routine are as follows: type C = character data (leading spaces are trimmed) V = verbatim character data (no leading spaces are trimmed) N = numeric L = logical (T or F) D = date (in yyyymmdd format) M = memo fields (only for dBase input files; ignored on output) The data fields should be in the order the fields are found in the source file. Note that for fixed field files, you have to account for every byte in the file. If you have something like this: 12345678_1_2345678_2_2345678_3 (column positions) APPLE X Y 12 BANANAS Even though you may think you only have five fields, the following .DEF file will NOT work: ; Bad .DEF file: Note does not account for blank spaces ; NOTE: Input and output lengths are the same (mistakenly) so left ; out number of decimal places and output length for each record. C 8 Fruit1 C 1 Class1 C 1 Class2 N 5 Value C 11 Fruit2 CONVERT.DOC 4 Revised: 01-27-97 You may want the Fruit1 field to be in columns 1 through 8 and Class1 to be in column 10 but the routine will not know to skip column 9 so it will start reading Class1 beginning in column 9, Class2 beginning in column 10, etc. To drop the blank positions, you have to add dummy fields on input and ask for them to be dropped on output: ; Good .DEF file: Spaces between fields are accounted for C 8 Fruit1 C 1 0 0 Filler C 1 Class1 C 1 0 0 Filler C 1 Class2 C 1 0 0 Filler N 5 Value C 1 0 0 Filler C 11 Fruit2 You can also use the input field length and output field lengths to either drop fields using other formats (by specifying a zero length for the output field length) or for creating fields on output (by specifying a zero length for the input field length). You can also use this to expand on contract a field. For example, if Fruit1 is 8 characters long but you only want it to occupy 4 characters on output (thus the field would be truncated), specify 8 for the input field length and 4 for the output field length: C 8 0 4 Fruit If the output field length is wider than the input field length, the data values will be shifted right or left depending on the data type. In general, numeric fields are shifted right (so extra spaces show up in front of the number) and all other field types (character, logical, or date) are shifted left. You can have the routine create the field-definition file it's using for dBase and ASCII-delimited files. This is controlled by the /OUTDEF=deffile and /-OUTDEF parameters. ASCII-delimited file? Letting the program guess the fields: Let's say you have some sort of ASCII-delimited file. Do you have to create a field-definition file? Nope. The program will try to guess the format for you if you want. For example, let's say you have a file called MYFILE.PRN and the first records look like this: ACIN,970114, 1.49 , 1.5 , 1.49 , 1.495 , 2312512 , 0 AGRO,970114, 1.3 , 1.4 , 1.3 , 1.4 , 68319 , 0 ALPA,970114, .875 , .89 , .875 , .885 , 1805075 , 0 CONVERT.DOC 5 Revised: 01-27-97 Well, looking at it, you could probably guess that it has 8 fields per record, the first of which is character and the others are probably numeric. So you could create a field-definition file and go with that. But let's see how the program does on its own. Looking at the fields, there are apparently no special characters (like quotation marks) around the character fields. There are also no special characters around the numeric fields. There are, however, commas between the fields. A standard ASCII-delimited file would have quotation marks around the alphabetic fields. Your's does not have this. As a result, you will have to specify a /DELIMS= parameter. The format for this is: /DELIMS=aroundstrings,aroundnums,betweenfields In your case, there is nothing around the alphabetic fields so "aroundstrings" is null. There is nothing around numeric fields so "aroundnums" is null. There are commas between fields so "betweenfields" is a comma. Thus, your /DELIMS= parameter is: /DELIMS=,,, So convert this file to a fixed-field file without having already created a field-definition file, you would issue the following command: CONVERT MYFILE.PRN /FROM ASCII /TO FIXED /DELIMS=,,, /-INDEF /OUTDEF When the program runs, it will, in this case, create a fixed-field file called MYFILE.FIX as well a field-definition file called MYFILE.DEF. That file would might like this: ; Field definition file: C:\HOME\PRECIOS.DEF ; Created by CONVERT for FROM ASCII/FROM FIXED ; Created at 13:04:30 on 01-27-1997 ; Positions in far left are positions on output N 10 4 10 COL_001 In( 1: 10) Out( 1: 10) N 6 0 6 COL_002 In( 11: 16) Out( 11: 16) N 7 4 7 COL_003 In( 17: 23) Out( 17: 23) N 8 4 8 COL_004 In( 24: 31) Out( 24: 31) N 7 4 7 COL_005 In( 32: 38) Out( 32: 38) N 8 4 8 COL_006 In( 39: 46) Out( 39: 46) N 11 9 11 COL_007 In( 47: 57) Out( 47: 57) N 1 0 1 COL_008 In( 58: 58) Out( 58: 58) You can modified this field-definition file if you'd like. Maybe you want to change the number of decimal places shown or expand some columns on output or perhaps eliminate others. Just edit the field-definition file with a text editor and resave it. Then rerun the CONVERT command, this time specifying that there *is* a field-definition file: CONVERT MYFILE.PRN /FROM ASCII /TO FIXED /DELIMS=,,, CONVERT.DOC 6 Revised: 01-27-97 Filters: CONVERT supports two types of filters; record filters and cell (field) filters. Record filters: In general, record filters apply to the record as a whole. They are specified as unique types of records in the field-definition file. You may specify up to 10 include record filters and up to 10 exclude record filters in the field-definition file. If an include record filter is specified, the input record must contain at least one of the specified character strings in order to be processed. If an exclude record filter is specified, any input record which contains any of the specified character strings will be ignored. Record filters are case sensitive (capitalization matters) and processed as "or" items (if any filter is met, the condition is met; record filters are not combined to determine fulfillment). Record filters are specified in the field-definition file in the following ways: /+=string include filter, the string "string" can appear anywhere /S+=string include filter, the string "string" is at the beginning of the record /+S=string include filter, the string "string" is at the end of the record /-=string exclude filter, the string "string" can appear anywhere /S-=string exclude filter, the string "string" is at the beginning of the record /-S=string exclude filter, the string "string" is at the end of the record For example, if you want to specify in your control file that you only want records that contain either "Japan" or "France" *and* you want to exclude any records that begin with an underscore character, you would need to include the following three filter statements: /+=Japan /+=France /S-=_ The character string can include hexadecimal codes (in the &Hxx format) or decimal codes (in the \ddd format) if necessary. See BRUCEHEX.DOC file. CONVERT.DOC 7 Revised: 01-27-97 Cell filters: Cell filters are applied to individual cells (or fields) in the data. If any cell fails the test for that cell, the entire record is skipped for additional processing. Each variable in the field-definition file can include a cell filter. Filters are specified on the field type records as the last parameters on the record and are immediately preceded with "||" indicators: N 8 Weight || > 100 N 4 Height || <= 6 C 10 Name || = Banana Cell filters can be specified as any one of six relations: "=" is equal to "<" is less than ">" is greater than "<=" is less than or equal to ">=" is greater than or equal to "<>" is not equal to You cannot specify ranges. The item to the right of the relation is treated as a string if the field is non-numeric, otherwise, it's treated as a number. Do not include quotes around the strings unless you want that as part of the condition. The value can include hexadecimal codes (in the &Hxx format) or decimal codes (in the \ddd format) if necessary (see BRUCEHEX.DOC file). It can also contain spaces. The cell filter is case sensitive (capitalization matters). If the field is being created, you can specify an assignment cell filter (using "=") which will set the value of that cell as something. For example: C 10 First Name C 0 0 1 Middle Initial || = ? C 10 0 20 Last Name Field-definition file for SimTel archives: People who use the SimTel archives (ftp.coast.net or http://www.coast.net/SimTel) may find the enclosed SIMIBM.DEF file useful. It provides the field definitions for the standard SIMIBM.IDX file (from subdirectory SimTel/filedocs, download simindex.zip). The file includes some hints for dropping fields that don't seem to be useful as well as restricting the listing to just those files that have been added since a given date. Specifying parameters: Parameters for this program can be set in the following ways. The last setting encountered always wins: - Read from an *.INI file (see BRUCEINI.DOC file), - Through the use of an environmental variable (SET CONVERT=whatever), or - From the command line (see "Syntax" below) CONVERT.DOC 8 Revised: 01-27-97 Syntax: CONVERT infile [ outfile [ deffile ] ] [ /INDEF=deffile | /-INDEF ] [ /OUTDEF=deffile | /OUTDEF | /-OUTDEF ] [ /OVERWRITE | /-OVERWRITE | /APPEND | /OVERASK ] [ /FROM FIXED | /FROM ASCII | /FROM DBF ] [ /DELETED | /-DELETED ] [ /TO FIXED | /TO ASCII | /TO WKS | /TO DBF ] [ /HEADER | /-HEADER ] [ /DELIMS=aroundstrings,aroundnums,betweenfields ] [ /BEEP ] [ /INMISS=val ] [ /INMISSC=val ] [ /OUTMISS=val ] [ /OUTMISSC=val ] [ /SKIP | /MISSING | /ABORT ] [ /FIRSTOBS=n ] [ /LASTOBS=n ] [ /-VER ] [ /-NULLS ] [ /BINARY ] [ /SCAN=n ] [ /GAP=n ] [ /-CFILTERS ] [ /MONO ] [ /Iinitfile | /-I ] [ /Q | /Qn ] [ /? ] [ /?&H ] "infile" is the file specification for the ASCII-delimited or fixed-field file you want converted. You can specify a drive and path specification if necessary. This parameter is required. "outfile" is the file specification of the file you want to create. You can specify a drive and path specification if necessary. If no outfile is provided, the routine will presume you want the output file called the same thing as the infile but you want the extension to be ".FIX" (if the output file is a fixed-field file), ".PRN" (if the output file is ASCII-delimited), or ".WKS" (if the output file is to be in a WKS format). "deffile" is the file specification for the input field-definition file. You can specify a drive and path specification if necessary. If no deffile is specifically provided, the routine will presume it is called the same thing as the infile but it has the extension of ".DEF". Note that the deffile can only be provided this way if you also specify the outfile name; the use of /INDEF=deffile is recommended instead. "/INDEF=deffile" provides the name of the field-definition file to be read by the program. If no deffile is specifically provided and you're using FROM ASCII, the routine will presume the field-definition file exists and is named the same thing as the infile but it has the extension of ".DEF". A deffile has to be specifically provided for FROM FIXED files if one is desired. "/-INDEF" says there is no field-definition file. This is the default if you're using FROM DBF or FROM FIXED. "/OUTDEF=deffile" provides the name of the output file that you want the program to write the field-definition file to. This is useful in cases where you did *not* use a field-definition file on input since it allows you to see and possibly modify the field-definition file for next time. "/OUTDEF" says to create the field-definition file and automatically name it for you. The file name is the same as the infile with an extension of ".DEF". "/-OUTDEF" says to skip the creation of the field-definition file. This is the default for ASCII-delimited and fixed-field files. CONVERT.DOC 9 Revised: 01-27-97 "/OVERWRITE" says to overwrite the output file if it exists already. "/-OVERWRITE" says to abort if the output file exists already. "/APPEND" says to append (add) to the output file if it exists already. This option is only available if you're creating either a fixed-field or ASCII- delimited output file. "/OVERASK" says to ask if the output file exists already. This is initially the default. "/FROM FIXED", "/FROM ASCII", and "/FROM DBF" specifies the format for the input file. The routine usually reads the input file and guesses its format for you. If the routine guesses incorrectly, use the /-VER parameter (below) to overrule it. "/DELETED" applies to dBase input files only. It says you want to retain records tagged as "deleted". Otherwise, they're dropped in the output file. "/-DELETED" applies to dBase input files only. It says to drop records tagged as "deleted". This is initially the default. "/TO FIXED", "/TO ASCII", "/TO WKS", and "/TO DBF" tells the routine what sort of output file you'd like to create. The input and output file formats can be the same if desired but you have to explicity specify an output file name in that case. Initially defaults to "/TO FIX". "/HEADER" is used in conjunction with WKS output files. If /HEADER is in effect, the first row of the spreadsheet will contain the variable name for the cell as provided in your control file. If none are provided, the field names will be FIELD_01 onward. "/-HEADER" turns this off and is the default. Note that the header line (if any) will show up in the output counts. "/DELIMS=aroundstrings,aroundnums,betweenfields" allows you to specify the delimiters (in sequence) around string fields, around numeric fields (any fields that isn't a character field), and between fields. Defaults to: /DELIMS=",,, (Use quotes around character strings, nothing around numeric data, and the third comma indicates that there is a comma between fields.) The replacement string can include hexadecimal codes (in the &Hxx format) or decimal codes (in the \ddd format) if necessary (see BRUCEHEX.DOC file) so either of the following would put a tab between fields: /DELIMS=",,&H09 /DELIMS=",,\009 "/BEEP" beeps when the program is finished. "/-BEEP" reverses /BEEP. Initially the default. CONVERT.DOC 10 Revised: 01-27-97 "/INMISS=val" specifies that any numeric value that has the character string representation of "val" will be considered missing. Note that this is an exact character string comparison so /INMISS=1 will not compare to a value of "1.00". Defaults to /INMISS=NULL (which translates as spaces). "/INMISSC=val" specifies that any character string value that has the value of "val" will be considered missing. Defaults to /INMISSC=NULL (which translates as spaces). "/OUTMISS=val" specifies that any missing numeric input value will be translated on output as "val". For example, "/OUTMISS=N.A." would fill in "N.A." for each missing value. Defaults to /OUTMISS=NULL (which translates as spaces). "/OUTMISSC=val" specifies that any missing character input value will be translated on output as "val". Defaults to /OUTMISSC=NULL (which translates as spaces). "/SKIP" says to skip records with bad data values; otherwise the routine aborts when it runs into any. /SKIP, /MISSING, and /ABORT are mutually exclusive. "/MISSING" says to presume any missing fields in an ASCII-delimited record should be filled in with blanks (for character fields) and 0 for numeric fields. Incomplete records are written out (unlike in /SKIP). /SKIP, /MISSING, and /ABORT are mutually exclusive. Note that the program will only print out the first "bad" record. There may be others that show up after this one. "/ABORT" says to abort when you run into bad records. Initially the default. /SKIP, /MISSING, and /ABORT are mutually exclusive. "/FIRSTOBS=n" says to start reading the data beginning with record number n. Initially defaults to "/FIRSTOBS=1". "/LASTOBS=n" says to stop reading the data after record number n. Initially defaults to "/LASTOBS=2000000000" (2 billion). "/-VER" is used when the program verifies your input file and mistakenly determines that it is a file type other than what it is. This sometimes happens with fixed-field input files even when /FROM FIXED is specified. "/NULLS" allows fields that begin with decimal 0 to be left in the output file as valid values. Otherwise, they're treated as being missing. /NULLS is initially the default. "/-NULLS" translates any field which begins with the decimal 0 value as missing. For character fields, it's translated to the value of INMISSC, numeric fields are switched to the INMISS value. /NULLS is initially the default. CONVERT.DOC 11 Revised: 01-27-97 "/BINARY" says that you have a fixed-field input file and that the physical records in this file do not end with the normal CR/LF combination. This is fairly typical of files created on a mainframe or copied from a tape. If /BINARY is used, every byte must be precisely accounted for. /BINARY automatically invokes the /FROM FIXED option. "/-BINARY" says that every record ends with a CR/LF combination. This is fairly standard on files created on a PC, even fixed-field files. When /-BINARY is in effect, trailing fields are ignored. "/-BINARY" is initially the default. "/SCAN=n" says to read the first n-records when determining the input file characteristics. This is only relevant for /FROM ASCII files which use the /-INDEF option. The field types are based on the first record read but the maximum field widths are determined by reading the first n-records. Initially defaults to "/SCAN=100". "/GAP=n" specifies to add n-characters to each output field width. This is only relevant for /TO FIXED fields which use the /-INDEF option. In this case, n-characters are added to each output field length for non-numeric fields and extra blank fields are added before every numeric field. This is an easy way of splitting up columns on display. If you want non-uniform gaps, you can always process the file twice; specify /-INDEF with /OUTDEF the first time, modify the field-definition file by hand, and then reprocess the file with /INDEF=filename. Initially defaults to "/GAP=0". "/CFILTERS" says that if cell filters are specified in the field-definition file (for example, "N 3 Age || > 5"), these cell filters are to be respected. This is initially the default. (If cell filters do not appear in the field-definition file at all, the switch is ignored.) "/-CFILTERS" says that if cell filters are specified in the field-definition file, they are to be ignored. This is useful sometimes if you want the whole file instead of just getting a subset of it. Note that this switch will also turn off initializing new fields (as in "N 0 0 3 Age || = 12"). "/MONO" (or "/-COLOR") does not try to override screen colors. Initially defaults to "/COLOR". "/COLOR" (or "/-MONO") allows screen colors to be overridden. This is initially the default. "/Iinitfile" says to read an initialization file with the file name "initfile". The file specification *must* contain a period. Initfiles are described in the BRUCEINI.DOC file. Initially defaults to "/ICONVERT.INI". "/-I" (or "/INULL") says to skip loading the initialization file. "/Q" turns off the record-by-record status report. "/Qn" shows a status message every n-number of records. The default is "/Q25". "/?" or "/HELP" or "HELP" shows you the syntax for the command. "/?&H" gives you a hexadecimal and decimal conversion table. CONVERT.DOC 12 Revised: 01-27-97 Return codes: CONVERT returns the following ERRORLEVEL codes: 0 = no problems, file converted 250 = operation aborted by pressing Escape 251 = other problems 252 = problems with input data and /ABORT specified 253 = problems with INDEF or OUTDEF file 254 = could not find a decent temporary output subdirectory 255 = syntax problems, file(s) not found, output file already exists, /? requested Restrictions and Caveats: * The program skips all Memo fields in dBase files on input. * Most spreadsheet programs restrict a given field length to being 240 characters or less. * Date fields which are in the form "yy-mm-dd" (instead of "yyyymmdd") should be declared as character fields instead of date fields. * Date fields converted for WKS files are changed into strings of the "yy-mm-dd" format. * Some users have reported problems using this program when their default drive is a network drive instead of a local hard drive. If you get a "path/file access error", make C: your default drive and try again. Author: Bruce Guthrie Room H-4885 U.S. Dept of Commerce/ESA/STAT-USA Washington, DC 20230 voice: (202) 482-3234 e-mail: bguthrie@doc.gov You may freely copy and re-distribute this program; however, the U.S. Department of Commerce neither guarantees nor assures compatibility of the program with all computer software or hardware. Additional information about this and other Bruce Guthrie programs can be found in the file BRUCE.DOC which should be included in the original ZIP file. The recent change history for this and the other programs is provided in the HISTORY.ymm file which should be in the same ZIP file where "y" is replaced by the last digit of the year and "mm" is the two digit month of the release; HISTORY.611 came out in November 1996. This same naming convention is used in naming the ZIP file (CONVRymm.ZIP) that this program was included in. Please provide an Internet e-mail address on all correspondence.