7 Mar 1995 - Preliminary Information

SpHyDir Program Logic

Program Structure

Bugs can be most easily fixed when they are reported with enough information to reproduce or at least localize the problem. Suggestions will be most helpful if the user understands what things are simple and what changes will be most difficult. For these reasons, it is a good idea to provide at least a high level review of SpHyDir program logic.

SpHyDir is driven by the abilities and limitations of the OS/2 operating system, the Rexx language, and the VX-Rexx development environment. It would not have been possible to develop SpHyDir so quickly with anything like its current function in any other environment.

Rexx is an interpreted language. This means that execution works directly off a version of the source. Development is fast, and debugging is easy. However, some errors that in other languages would be detected by the compiler are, in Rexx, not found until execution time.

An important limit of Rexx is that programs in two separate source files cannot share global variables. A program like SpHyDir works only if it can be packaged as one massive source file. Using a normal editor, such a large program becomes unmanagable. VX-Rexx addresses this problem by dividing the code into "sections." Each section has a name and logically appears to the programmer as if it were a separate file. In the end, VX-Rexx combines all the sections into a single file so that variables can be shared.

Although IBM is distributing a test version of an Object Oriented Rexx, SpHyDir is written in the traditional SAA OS/2 Rexx language. Object would probably have been useful, but many current Warp users will not have access to the Object Rexx package and PCLT cannot redistribute it.

VX-Rexx creates its own objects to manage the Graphic User Interface. A Rexx program, such as SpHyDir, cannot define new types of objects, but there are a few general purpose VX-Rexx objects that are quite flexible. In particular, the Document is expanded in the Workarea as a sequence of Record Objects with a Container. VX-Rexx created (actually OS/2 PM created) the Container and Record. Records have attributes (the icon displayed, the caption, the parent and sibling order that creates the tree). VX-Rexx provides functions that SpHyDir can call to set or interrogate these attributes or to create new instances of Records. The rest of the VX-Rexx objects (windows, buttons, lists, and menus) are defined statically during development using the VX-Rexx tools.

The objects exist in memory managed by the VX-Rexx system outside the scope of Rexx internal variables. Statically defined objects have a name which is set at design time and can be added to any program. Dynamic objects, such as records, have an internally generated id (the "handle"). Handles can be passed as arguments or stored in global variables. However, the currently selected record in the Workarea can be determined at any time by calling a VX-Rexx function (to essentially ask the workarea what record is selected), and much of the code interrogates the VX-Rexx objects for information rather than relying on global variables or other ugly interface options.

The Main Window contains the Toolbar, Workarea, and entry fields. It is loaded when SpHyDir starts and goes away when the program ends. The other windows (Link Manager, Table of Contents, Text Edit, and Hotword Selection) are loaded as needed. Because the Main Window is always loaded, the Workarea object (and its Records) are available to every part of the code.

In VX-Rexx, a secondary window is created during program design and is populated with lists, fields, buttons, and menus. The Rexx code that will be called when the user types data into or uses controls on a secondary window can be a separate file, but SpHyDir chooses to make it part of the one large common code routine.

However, the objects on a secondary window do not exist until that window is "loaded." SpHyDir loads secondary windows in response to actions in the Main Window. The Table of Contents and Link Manager are loaded from the "Window" menu, the Clipboard loads when the user puts something in it, and the Text Edit window loads when the user doubleclicks on a Paragraph, Point, or other record containing text content. In all cases, the objects in the secondary window do not exist until it has been loaded.

Secondary windows are created by default as "children" of the main window. This means that when the secondary and main windows overlap, the secondary window is alway "on top." A secondary window can also be created as a child of the desktop, which would allow the main window to be on top. SpHyDir doesn't use the latter option (it didn't seem to look right, but that is a value judgement).

The Workarea is a container permanently set to a Tree-Name view. The Toolbar is a Value Set whose contents are icons. Normally a value set is a kind of radio button (one value is selected) but that is not how the Toolbar is used. Rather, sometimes the icons are double-clicked and sometimes they are drag-dropped. The Value Set was chosen simply because it is a convenient way to arrange a bunch of icons.

VX-Rexx provides a function that returns the handle of the first record in a Container. In SpHyDir, this is the Document record. Other functions can find the first child of any parent record, or the next record among a collection of siblings. All the information about the document (text, links, structure, etc.) can be globally accessed by any function in the SpHyDir program by simply "walking" the tree of container records.

Records have attributes and data fields.

Input HTML Processing

  • SpHyDir loads an HTML file when one is dropped onto the workarea, or if one is passed as an argument when SpHyDir is loaded (ususally by dropping an HTML file on the Program Object in WPS configured to run SpHyDir). To process a new file, first the workarea is cleared of all existing objects. Then the HTML file is processed in three stages.

    The first stage reads the file into memory and breaks it into sections of Text and Tags. Tags are the part between "<" and ">" characters. Text is outside the tags. The process creates a stem variable called _Token. Each entry in the stem contains either tag information or the text string.

    When the _Token. table has been built, a second pass scans the tags for matching Start and End tags. SpHyDir has a vocabulary of tag names that pair off: PRE HTML HEAD BODY TITLE H1 H2 H3 H4 H5 H6 A B I U DL UL OL ADDRESS. Every time it encounters a starting tag in this list, it pushes an element on a stack. When it encounters the matching end tag, it pops the stack. Ordinary <P> tags are not on the list because the </P> tag is often missing. Normally, the tag that ends should be on the top of the stack, but a large number of real Web documents are sloppy. SpHyDir allows two adjacent tags to end in the wrong order, since "<H1><B> ... </H1> </B>" has been observed in many documents. Otherwise, if an ending tag doesn't seem valid given the current stack, SpHyDir stops and pops up the HTML edit window for manual correction.

    Once the scope of the tags has been determined, it is now possible to build the objects that structurally represent the document. The first step is to scan through the HTML HEAD TITLE ... /TITLE /HEAD BODY part at the front. The title goes on the document object. The rest of the header is discarded by the current version of SpHyDir but may be supported in a later release. The body of the document is then processed, with H1...H6 tags turned into sections, IMG tags turned into images, OL/UL/DL tags turned into lists, and text between paragraph breaks turned into paragraphs.

    Tag parsing is a big SELECT ("case") statement. Anything that is not recognized is turned back into a tag (though the "<" and ">" are replaced by 0x1E and 0x1F during SpHyDir editing). This is intentional for formatting tags (B, I, CITE, CODE) and seemed like a good fallback for any unrecognized experimental, obsolete, or exotic stuff.

    Object construction is a recursive procedure. When it encounters a object that has contents (SECTION, ULIST, OLIST, DLIST) then it benefits from the previous scan that paired off starting and ending tag locations. It can create a new parent object, push a new level into the tree, and recursively call itself to process all the tags between the start and the end of the current structural component.

    Filenames

    The issue of file names should have been more carefully thought out. SpHyDir has tried to go back and fix things systematically, but problems may still arise.

    SpHyDir is typically used to edit files on a personal machine. They need to be locally tested with Web Explorer. Then they will be transferred to the production server, which can be a Unix, NT, or OS/2 machine. Three problems can occur in the transfer:

    1. A Unix system requires the use of "/" between directory levels. Although NT and OS/2 use "\" normally, they will tolerate the forward slash. Therefore, a forward slash is generated in all HTML references to files. However, Rexx is not quite so forgiving. The Rexx functions that parse and validate file names require that the "\" character be used. Thus SpHyDir is always internally jumping back and forth between the internal "\" and external "/" version of file names.
    2. Unix file names are case sensitive. OS/2 and NT are not. If the file names are left in mixed case, then Rexx string comparisons will not find that "stuff.htm" and "STUFF.HTM" are the same file. If everything is folded to uppercase, then the Unix server may complain. This is currently categorized as an outstanding bug that needs to be fixed sometime soon.
    3. The target HTML library on the server may be a subdirectory of some larger stucture. For example, on the author's machine SpHyDir operates against the F:\PCLT directory, but on the server the same tree is stored under D:\HTTP\PCLT. All the file references generated in the HTML are relative to the current document. This document is SPHYDIR/LOGIC.HTM when viewed from the main PCLT directory. Within the SPHYDIR subdirectory, a reference to ../ICONS/HOOD.GIF references \PCLT\ICONS\HOOD.GIF (the ".." goes up one level from \PCLT\SPHYDIR). Again, although this particular syntax has to be religiously generated in HTML, Rexx won't accept any of this syntax. Every such relative reference has to be convertable to an absolute path name on the current machine ("F:\PCLT\SPHYDIR\LOGIC.HTM") in order for Rexx to open the file or find its extended attributes.

    The "solution" was to create the general purpose Parse_Filename subroutine. It takes a file name in one of two formats: a fully qualified OS/2 path ("F:\PCLT\WINWORLD\OS2.HTM") or a Unix path relative to the the current document ("../author.htm"). It produces three output forms for the filename: the OS/2 path, the document-relative path, and the library-relative path ("winworld/os2.htm"). To get the document-relative path you have to pass the library relative path of the current document. If this argument is not supplied, then there is no current document and all Unix-style parameters are library-relative.

    The PC path format is only used for system interface subroutines and file I/O. It is never stored in any record field, kept in a shared variable, or written to any file. You can copy the entire library to another disk or directory without effecting any logical links.

    The Unix-style reference to the position of another file relative to the current document is used for the GIF file that is the source of an Image object, as a Hotlink to another library file, and when referencing a subdocument.

    The Unix-style reference from the start of the library (or more properly from the HTMLLIB environment setting) is used to refer to the current document itself. As a consequence, it is also stored in the Parent Extended Attribute for a subdocument file.

    If care is taken to remember the rules, then things will come out all right. Sloppy thinking can embed the wrong type of file reference and cause trouble. After a bit of thinking about the problem, the various forms become somewhat natural. It should be noted, however, that while the Subdoc Extended Attribute from the parent file to the subdocuments is relative to the parent location, the Parent Extended Attribute from the subdocuments back to the parent is a library-relative expression.

    Document Editing

    There are five ways to edit a document. Some attributes of objects can be changed by selecting the object and typing new values in the entry fields that appear at the top of the Workarea. Image and Subdocument objects are associated with a file by dropping the icon of the file on top of the object. Text-y objects are edited by double-clicking the object and opening the Text Editor Window. New objects are created by dropping icons from the Toolbar into the document. Existing objects can be moved by dragging them around or through the SpHyDir Clipboard.

    Fields at top of Workarea

    At design time, the top of the work area is configured to have a set of fields:

    Back PCLT

    Copyright 1995 PCLT -- SpHyDir Web Document Manager -- H. Gilbert
    May be distributed with SpHyDir program

    This document generated by SpHyDir another fine product of PC Lube and Tune.