TROUBLESHOOTING OPERATING SYSTEM ABENDS DISCLAIMER: THE ORIGIN OF THIS INFORMATION MAY BE INTERNAL OR EXTERNAL TO NOVELL. NOVELL MAKES EVERY EFFORT WITHIN ITS MEANS TO VERIFY THIS INFORMATION. HOWEVER, THE INFORMATION PROVIDED IN THIS DOCUMENT IS FOR YOUR INFORMATION ONLY. NOVELL MAKES NO EXPLICIT OR IMPLIED CLAIMS TO THE VALIDITY OF THIS INFORMATION. Contents of this Document What Is A Server Abend Troubleshooting an Abend - Preliminary Steps Troubleshooting An Abend - A Process Troubleshooting An Abend - Data Collection Troubleshooting An Abend - Isolation & Duplication Using Novell Technical Support Getting a Core Dump Appendix A - Check list/Summary Appendix B - Dealing With An NMI Error Appendix C - How To Access The NetWare OS Patches And Updated Files Appendix D - Troubleshooting Tools Appendix E - Using the Internal Debugger Appendix F - Faxback Service Fax Form for Feedback on this Document What Is A Server Abend "When an Abend message appears on the server console, either NetWare or the server CPU has detected a critical error condition (fault) and jumped into the NetWare's fault handler. This handler idles NetWare and displays the Abend message on the server console for immediate action by the server administrator....." An error condition, or fault, that is detected by the CPU is called a "Processor Exception." An error condition that is detected by NetWare is called a "Software Exception." "NetWare's fault handler (function) is declared as a public function by the operating system so it can be used by any operating system module or Novell NLM." (Abend Recovery Techniques for NetWare Servers. June 1995 Application Notes. Page 79.) "The NetWare 3 and 4 operating systems continually monitor the status of various server activities to ensure proper operation. If NetWare detects a condition that threatens the integrity of its internal data (such as an invalid parameter being passed in a function call, or certain hardware errors), it abruptly halts the active process and displays an "Abend" message on the screen. ("Abend" is a computer science term signifying an ABnormal END of program.) The primary reason for Abends in NetWare is to ensure the stability and integrity of the internal operating system data. For example, if the operating system detected invalid pointers to cache buffers and yet continued to run, data would soon become unusable or corrupted. Thus an Abend is NetWare's way of protecting itself and users against the unpredictable effects of data corruption." (Resolving Critical Server Issues. Feb. 1995 Application Notes. Page 37.) The term Abend will be used in its most generic sense in this document. Meaning any Software, or Processor Exception or where the server hangs or locks. Troubleshooting An Abend - Preliminary Steps Anytime you experience a server Abend (or almost any other server problem) consider the following maintenance steps before you do anything else. Experience has shown that a high percentage of Abends are corrected by applying current versions of the Operating System (OS) patches, and by updating lan and disk drivers. See the "patch.doc" document included in TABND2.exe. Novell Technical Support (NTS), therefore, strongly recommends, that regardless of the problem you are experiencing, apply the current patches and update drivers. After this is done consider the other items on the list that follows. The list is in no particular order. NOTE: Be aware that an NMI Parity error (Abend: Non-Maskable Interrupt) is a hardware error. (See Appendix B - Dealing With An NMI Error.) 1. Load ALL the current patches that are available for your version of NetWare. The patches are written to resolve known issues. The patch download file will be called PT.EXE for the "PT" set, and IT.EXE for the "IT" set (IE: 410pt3.exe & 410it6.exe). Load both the "PT" patches and the "IT" patches. (See "Patch.doc," included in the TABND2.EXE file, for a detaile explanation of the patches,) 2. Update Drivers. Each manufacturer of Lan and disk cards must develop their own drivers. The only way to assure that you have the latest version of these drivers is to download them from the respective vendor. Even new hardware does not usually ship with the most current drivers. Be certain that drivers are the newest available from the respective vendor!!! 3. Look for nlms that may be outdated. Remember, every time we update a file it is to make it more robust, and/or usable. Commonly updated NetWare nlms are compressed into self-extracting download files. Get a list of the current revision of files and patches from the file "patlst.txt." This list can be viewed or downloaded from Novell's Internet Web site. (see Appendix C for more on downloading files from Novell.) Also, don't forget to check with 3rd party vendors to update their NLM's as well. 4. Virus scan the dos partition on the server to make certain that a virus has not crept in. 5. Clean and re-seat the cards and cables (ALWAYS use static precautions.) 6. Double check termination and SCSI ID's, etc. 7. Check fans for proper operation. 8. Use an anti-static air can to blow dust off of the system board and other cards and components. Troubleshooting An Abend - A Process Figure 1, below, is a flow chart which suggests a logical process that could be used when troubleshooting a server Abend. Some Abends have simple solutions, but more often you will have to pull a little bit of hair before you come to a solution. The name of the troubleshooting game is usually called "Experience," and a "Little Bit of Luck." Don't forget to draw on other peoples experience. When troubleshooting, don't expect a magic fix the first thing that you try. Do expect to have to narrow in on the problem by trying different things. Troubleshooting An Abend - Data Collection This step, if done well, can be the most valuable step toward identifying your problem. The most common shortcoming when doing data collection is not looking far enough beyond the immediate symptoms. Following is a partial list of things to look for and questions to ask yourself. Don't feel embarrassed to raise a question that seems completely unrelated. Sometimes it IS unrelated, but often times it helps to get a more complete picture of the problem. The questions and suggestions that follow should help you gain a higher level view of the problem. Sometimes your data collection will reveal more than one potential problem and you'll have to perform "Computer Triage." However, without a complete picture you could waste time troubleshooting in a meaningless direction. Use this list to get you thinking about the kinds of things that could be happening on your server / network to cause problems. These are in no particular order: 1. Keep a record of the Abend messages and watch for trends. Abend messages that are consistent sometimes indicate that software is at fault, while messages that are not consistent may indicate a hardware failure. There is NO hard/fast rule here, you want the data to point you in some troubleshooting direction. 2. Look through the system error log (sys:sytem\sys$log.err) for clues that haven't surfaced anywhere else. An error on a certain node just before the Abend, an error on a certain file, a print queue, volume dismounts, etc. 3. What resources are being used at the time of the Abend? IE. Printing, file, tape, com port, memory access, etc. 4. What time of day does the Abend occur? Is it consistent? 5. Is the room air conditioning off when the machine Abends? This may indicate a heat problem. 6. What is the environment like? Dry, dusty, or hot environments contribute to heat problems and static. 7. Are there power problems either at the power source or at the power supply? 8. Is there a certain data base function running such as reindexing? 9. Is lan traffic high? Are disk reads or writes high? 10. With the Abend still on the screen, break into the debugger and record basic information such as the EIP (instruction pointer), running nlm, running process. (See Appendix E - Using the Internal debugger.) 11. Use conlog.nlm to capture console messages you would otherwise miss. Note: Conlog has to be unloaded in order to close the log file and make it available to read. (See conlog.nlm in the 4.1 Utilities Reference Manual.) 12. Is there a certain user, segment, application, server process (like a backup), or anything else that is common or consistent when the Abend occurs? 13. Is the client software current? 14. Ask, "What has changed in my server environment?" Consider these questions: - Have the number of users increased? - Is there new software or any software upgrades that have been put on? - Is someone using software in a way different than it had been used, such as database indexing, etc.? - Is there new or different hardware? - Have there been changes to the LAN, the routers, or the cabling? - Have workstations or the file server been physically moved? - Are there new printers on the LAN? - Have there been any power outages? - Have SET parameters been changed? 15. Is there any new or strong Electro Magnetic Force (EMF) near the server or cabling? Large motors, cabling across the florescent lights, vacuum cleaner, transmitters, etc. 16. Has the hardware been handled without Static Protection? 17. "Set DStrace = on" and then watch the DSTRACE screen for errors, or for an "All processed = NO" message. Be sure to give any DS errors time to go away before you worry too much. Fifteen minutes to several hours is usually adequate. 18. Are any users dropping their connection? 19. File corruption? 20. Drive deactivation? If partitions are mirrored, the drive can deactivate without bringing the server down. Check the error log. 21. Printing problems? 22. Is power filtered? If so has the filtering hardware been tested recently? IE: Is it still functioning? 23. Monitor.nlm and Install.nlm are valuable NetWare utilities to check your servers health. Use them to find information such as - Climbing packet receive buffers. - No ECB available count that continues to climb. - Low server memory. Cache buffer percentage should usually be around 60 - 70%. - LRU sitting time. - Dirty cache buffers that stay high. - A high number of LAN errors (more then 10% of the total packets sent or received). - High utilization (if it stays high for more then 10 or 20 minutes at a time). - Check Service Processes to see if they have max'd out. - Partition, volume, and mirroring information. - View and edit NCF files. Config.nlm: An invaluable tool for data collection is Config.nlm (included in TABND2.exe). When run at the server, config.nlm will create a file which includes information about your server's configuration. You may notice something here that raises a "red-flag" that you hadn't noticed before. Use it to document your configuration before you make any changes to the server. Also, If you place a call to Tech Support, you will often ask for this information. IMPORTANT NOTE: Sometimes understanding the data you've collected will require you to find out from other sources if what you are seeing is normal. For example, it's common for the server's utilization to stay at 100% for a few seconds or even a few minutes, or more. Likewise, the allocation of packet receive buffers or the size of the directory entry table is dynamic up to the setable maximum. They are allocated dynamically, on an as-needed-basis. It is often only through experience that you'll determine if what you're seeing is normal or if it is indicating a problem. Watch YOUR server to determine what is normal in YOUR environment and then tweek SET parameters or make other changes as needed. It is also ok to get DSTRACE errors. If DS is trying to process a request, and another request is already in process, you can get a DS error until the first process completes. It is important to establish what is normal for your environment so that you can accurately determine when you have a real problem and when you have simply hit against the limitations of your hardware and/or software. Troubleshooting An Abend - Narrowing (Isolation) & Duplication If the problem is not solved by now it's time to roll up your sleeves and troubleshoot. Now that the preliminary steps have been covered, and the initial data collected, troubleshooting is primarily a matter of going back and forth between "Problem Isolation," and "Problem Duplication." Trying to narrow in on the problem, while at the same time trying to discover a sequence of events that will reproduce the problem. In simpliest terms, an Abend is caused either by a hardware failure or by a misbehaved NLM's. In either case the result is usually corrupted memory. Remember from the introduction, a software exception occurs when NetWare fails a consistency check (performed in memory, on memory), and that a processor exception occurs when the processor encounters an address or machine instruction that does not comply with the rules. Again, corrupted memory. Problem Isolation & Problem Duplication are almost the same. The main difference is that in Problem Duplication, you are specifically trying to reproduce a problem. There may be an nlm, which, every time it loads, the server Abends. Or perhaps if someone does a copy while someone else is logging in the server Abends. If you are able to find a reproducible problem like this you can now eliminate variables one at a time, try the test again, and see if the problem goes away. In the other case, Problem Isolation (Narrowing), the data may not have given you a clue so you have to probe around, trying different things to see if you can narrow the problem down to a system or component. You may be able to determine that the problem is isolated to the disk channel because it only happens when the disk is being accessed. Or, you may be able to relate the problem to a certain nlm such as a backup. Consider these systems when trying to isolate a problem: - Disk channel, - Lan channel, - System board, - Com port, - the NetWare OS, - a 3rd party nlm product, - the cabling, - a certain type of workstation ( Win95 vs. Win 3.1 vs. dos), - a certain type of shell (netx, vlm, client32, 3rd party client). Remember that the objective is to fine a sequence of events that will reproduce the problem, or at least narrow down to a system, an nlm, or a piece of hardware that is always involved when the Abend occurs. The following troubleshooting ideas should help you to "Divide & Conquer". This list is in no particular order, but it is grouped somewhat by lan, disk, and general troubleshooting ideas. Here are some Problem Isolation & Problem Duplication Ideas. 1. Use "server -ns," or "server -na" to bring the server up without executing the startup.ncf, or the autoexec.ncf respectively. Loading "server -ns" will allow you to bring up the server without the volume mounting automatically. These parameters also work for SFT3. 2. Does the Abend message itself suggest anything? LAN channel, disk channel, memory corruption, system board problem, a certain nlm, printing, a certain piece of hardware, a certain lan segment, a workstation, a router, an environmental condition, etc. 3. Use "server -na" to prevent the autoexec.ncf from running. Then load nlm's manually, one at a time. 4. Use "server -ndb" to prevent the DS database from loading and thereby eliminate directory services. Note that you won't be able to log in without the database loaded. 5. What is the age of the hardware? If nothing has changed in the environment then the hardware may have simply failed. Don't assume that new hardware is always good. 6. Could hardware have received static shock? Static is not always destructive, often it will cause degenerative damage to your hardware allowing it to continue to work for a time. 7. Check with the vendor of 3rd party products to see if they are aware of the kind of problem you are experiencing. 8. Temporaily unload any 3rd party nlms. 9. Unload virus scan, and server/lan monitoring nlms. 8. Could there be power problems either at the power source or from the power supply? 9. Check the cooling fan in the power supply, the case, and on the cpu. Heat will cause hardware failure. 10. A dry, hot or dusty environment can cause hardware degradation due to static electric discharge. It also increases the chance of NMI errors. 11. Avoid int. 15, 2, & 9, in that order. 12. Try to isolate the problem to a hardware subsystem, i.e.: LAN Channel, Disk Channel, and System Board. You can swap hardware or try a different interrupt or slot. 13. Although the Abend message is very generic it can still be used to point you in a direction. Most Abends indicate memory corruption. Some will be disk related, others lan related. Often an Abend will include a function name, for example: "Abend: DeallocateMappedPage was supplied an invalid memory pointer." The words that appear without spaces (DeallocateMappedPage) is the name of a function in NetWare's code. In this Abend a memory pointer was sent into the DeallocateMappedPage function. During a consistency check the function determined that the pointer was not a valid address. When an Abend mentions ....interrupted.... take a look at the lan simply because the lan does more interrupting then anything else in the server. As is always the case, this becomes more intuitive from experience. 14. Clean and re-seat the cards and cables. Remember static protection. 15. How long has the server been installed? If it is a new install (less then a month) you may still have configuration and set up issues, or you could have faulty hardware ...even if it is new. 16. If you have a 16bit card in a machine with more then 16 meg of Ram, are there load parameters for the drivers? 17. Go back to basics. Every technician's tool kit should have an NE2000 lan card, and a basic no-nonsense disk card to use for testing purposes. 18. If the machine has PCI cards try a non-PCI card if possible. 19. Server.exe, like any other file, can become corrupt. A corrupt server.exe can be difficult to track down. If you can reduce the environment to not much more then a server, a disk and a lan, and you still have the problem try a fresh copy of server.exe. The same idea applies to any other nlm on the server. Remember, the server.exe in NetWare v3.x contains the server license number. Don't copy the wrong server.exe file. 20. Run a bindfix or a dsrepair, or a vrepair. 21. Load install and view partition or mirroring information. Install retreives this information dynamically. There can be partition table corruption, for example, that is not surfacing as such. When you try to access partition information in install, you'll get a specific error because the table cannot be read. 22. Always double and triple check termination, interrupt settings, SCSI ID, drive translation (should be off), etc. 23. Run vrepair. If vrepair runs clean (zero errors) and you still suspect disk corruption, change the vrepair options. Option 2 from the main menu will change the vrepair options. Set them to "Write changes immediately...," "Write all changes....," and "Purge deleted files." (This is not exact syntax.) Don't be alarmed with a LOT of errors after these changes. Note: Because of the nature of what vrepair is doing, Novell recommends that you always have a verified good back up before you run vrepair. 24. Try different workstations. 25. Try workstations on different segments. 26. Can you isolate the lan completely by attaching a single workstation directly to the server? 27. Category 5 cabling is usually required on faster lan cards. Running the wrong cable can cause problems. 28. Heavy I/O will sometimes stress the server enough to force an error. Try doing a copy *.* or an xcopy continuously. Try it from several workstations. 29. Use ncopy to copy a file where the source and destination are the same server. Ncopy will not send any packets across the lan in this scenario. If you still have the problem, you know the lan is not part of the problem. Any other form of copy (copy, xcopy), where the server is the source or destination, will send the file across the wire even if it is just going to be sent back a second time across the wire. If copy has the problem but ncopy does not, then the problem is probably on the lan. Using Novell Technical Support If you need further troubleshooting direction, your first step should be to call a Novell Authorized Service Center (NASC). These Gold and Platinum dealers are Novell NetWare trained and willing to help you. To find the service center closest to you call 1-800-NET-WARE (638-9273) between 7:30am to Midnight CST, choose option 1, 2. If you still need to contact Novell Technical Support Do The Following Before You Call. 1. Find config.nlm included in TABND2.exe file, and run it at your server. Read the associated read-me file for specific instructions. The file that config.nlm creates will contain important server information that we can use to help troubleshoot your Abend. Note: Before you run config you should have already updated drivers and loaded current os patches before we will want to look at your config information. 2. Next, fill out the form in Appendix A. Have this available for reference or to fax to us if needed. 3. At this point open an incident with Novell Technical Support. 4. Consider the possibility that you may need to get a core memory dump from your server. A core memory dump takes a "snapshot" of the server's RAM as it looks at the time of the Abend. The core dump is discussed further on in this document. DO NOT automatically take a core dump. Wait until a Technical Support Engineer instructs you to do so. Also, Do not send us core dumps from servers that do not have the patches and current LAN and disk drivers loaded. We cannot spend time troubleshooting a problem that has already been resolved by current patches or updated software. Make sure you have the current patches and current LAN and disk drivers! Getting a Core Dump If nothing to this point has solved the problem it may be necessary to get an image of you memory (Core Memory Dump) for us to analyze. DO NOT automatically take a core dump. Wait until a Technical Support Engineer instructs you to do so. Also, Do not send us core dumps from servers that do not have the current patches and drivers loaded. Current patches and drivers fix known problem. We cannot spend time looking at a problem we've already solved. Make sure you have the current patches and drivers! Dump to Floppy: This is the simplest method, but is not practical if you have more then 16meg of Ram. When the server Abends you are asked if you want to ".. copy the diagnostic image to disk." In NW3.12 and NW4.x the image will go to the c: drive by default, with an option to write it to floppy. To copy the image to floppy you need to have enough blank, formatted floppies to hold your entire server memory. Dump to Hard Drive: This method is much faster than dumping the image to floppy. In this case the core dump is copied to the C:\ partition of the server as a file called coredump.img. To copy the image to the C: drive you must have enough free space to hold the entire server memory. If you have not planned this extra space into your c: partition you may be able to add an inexpensive IDE drive and use the entire drive as a dos partition. After the image is copied to the drive the server can be brought back up and users can log in. You will then need imgcopy.nlm. This nlm is contained in the download file imgcpy.exe. When imgcopy.nlm is loaded it will allow you to copy coredump.img from the c: partition to the NetWare partition. Once on the NetWare partition, the file can be ftp'd to us or it can be backed up on tape and the tape can be sent to us. Before you do this, make sure that we have the software needed to restore the file. Forcing a Core Dump: Occasionally you'll need to force a core dump during a certain condition other then after an Abend. For example, if the server is experiencing high utilization and a core dump is needed at the time the utilization is high. A core dump can be taken at anytime by breaking into the OS' internal debugger. NOTE that if you break into the debugger while the server is in use, the workstation will not be able to communicate with the server and will probably time out. Break into the debugger by holding, simultaneously, , , , . Here are a few commands to be used while in the debugger. .c Force a core dump. q Quite to dos. g Go back to the point where you can into the debugger. h Help Dump to Server: This is the faster way to write the dump. It is called the network method. The problem server must have an additional lan card (must be ethernet) that gives it a client connection to a second server that will be the destination for the image file. When the problem server Abends the core dump is sent across the lan connection to the destination server. This is how to setup for this type of coredump: 1. Down the problem server and power it off. 2. Put in an additional Lan Card (Ethernet Only). 3. Bring the server back up to a dos prompt. 4. Set up an nwclient directory, and load the client drivers. 5. Log in to the server that will be the destination server, and map a drive to the place you want the dump to be sent to (for example: f:\dumpdir). 6. Go to the destination server and write down your connection number (Get this from monitor). Also, write down the drive mapping (ie: F:\dumpdir). 7. Bring up the problem server. 8. You should test the connection by going into the debugger and initiating a core dump. Press the following keys all at once (Alt , L-Shift, R-Shift, Esc). 9. At the # prompt type .c . 10. Follow the prompts with a "Y" to dump to a hard drive. When you are prompted for a path, enter, f:\dumpdir\coredump.img (or the mapping you wrote down if it's different). Press . The image should start copying. 11. If this works go on to the next step. If it doesn't here are some of the problems we've seen: - Hardware configuration problem on additional Lan Card. - Using Netx with some version of lan drivers produces bad packets. - Connection to second server , routing, network , etc.... - Other normal client to server trouble shooting issues. 12. If you're able to duplicate the Abend, take a core dump. Most likely, however, you will have to wait until the next time it Abends on its own. Your default connection time to the destination server will be 15 minutes. In order to hold the connection longer you have two choices: - Increase the watchdog time out parameters, or - Load netalive.nlm (included in the TABND2.EXE file). You will load netalive with two parameters, the name of the problem server, and the name of the destination server (See the readme with netalv.) You can use monitor.nlm on the destination server to see if your connection is being maintained. You will need an open support incident with Novell Technical Support before sending the core dump. At that time you will be given a location to ftp the core dump to, or an address where you can mail a tape. Appendix A - Check list/Summary Incident Number: ________________ Name: __________________________ Phone: ______________________ O/S version _______________ DS version ________________ Amount of RAM ____________ Make/Model of Machine (indicate if a clone)/Bus Type __________________________ LAN card, driver name, driver date & version __________, __________, ___________ LAN card, driver name, driver date & version __________, __________, ___________ HBA (controller), driver name, driver date & version _______, ________, __________ List the devices on this HBA: __________________________________ HBA (controller), driver name, driver date & version _______, ________, __________ List the devices on this HBA: __________________________________ Are your drives mirrored? Y / N Duplexed? Y / N Total volume space? __________ 1. Have you updated the LAN and disk drivers? Y / N 2. Have you applied all the appropriate OS patches? Y / N 3. Have you tried a fresh copy of Server.exe? Y / N 4. Is your clib.nlm current? Y / N 5. Have you virus scanned the DOS and NetWare Partitions? Y / N 6. What other information do you have that may help troubleshoot this problem? 7. What changes have been made to the server recently? (Increased number of users, new software, upgraded software, new or different hardware, LAN or router changes, workstations or file server physically moved, power outages, set parameter changes, etc...) 8. What hardware has been swapped out already? 9. Do you have config.txt ready to upload? Appendix B - Dealing With An NMI Error As mentioned in the main body of this document, an NMI error is a hardware problem. There are three types of interrupts that a processor can handle: a maskable hardware interrupt (INTR), a non-maskable hardware interrupt (NMI), and a software interrupt (INT). The processor has a dedicated line on the system board bus that handles only non-maskable hardware interrupts. According to Intel's - i486 Microprocessor Hardware Reference Manual this NMI line can be asserted as a result of one of three catastrophic events,: 1) an imminent power loss, 2) a bus-transfer parity error or, 3) a memory-data parity error. When this NMI line is asserted the processor generates an NMI error. This error is received by the NetWare operating system and then reported to the console screen. There are two flavors of NMI errors, "Abend: NMI parity error generated by IO check," and "Abend: NMI parity error generated by System Board." If the NMI is generated by the system board there is a fairly good chance the problem is with the system board or its' memory, although it can still be elsewhere. If the NMI is generated by an IO check, the problem could be anywhere. Here is a list of hardware related items that we have found to cause NMIs. These ideas should help you as you troubleshoot an NMI error. 1. Faulty RAM. 2. Faulty system board 3. Any I/O card. Especially cards with on-board memory. 4. Low or fluctuating power at the power source. Remember, a UPS can go bad too. 5. Power supply going bad. 6. Memory extension boards. 7. System board memory that is mismatched in either speed or brand. 8. Conflicting interrupts. 9. Try cleaning and reseating cards/cables/and memory modules. 10. Incompatibility between hardware pieces. 11. Look at the environment and how the equipment is handled. NMI's can often be traced back to static electric discharge. A sometimes overlooked point is that static does not always cause immediate failure, the damage can be degenerative. The hard failure may not occur until sometime in the future. 12. This is rare, but in two separate cases NTS has seen a hard drive, and a printer cause NMI error's. Appendix C - How To Access The NetWare OS Patches And Updated Files What file to download? For Novell's official statement on OS patches see the document Patch.doc. The file "Patlst.txt"is updated regularly to reflect current patches and files that are available from Novell's Technical Support. This list can be downloaded from the online services listed below. You can view the current version of this file from Novell's InterNet Site: http://netwire.novell.com (choose Technical Support / File Updates / Current minimum OS & NLM updates.) Where to get updated files and patches? 1) CompuServe: Go Netwire Go NWOSFiles (OS Files) Go NWGenFiles (General Files) Go NovFF (Novell File Finder) Go NSD 2) Internet: Novell Web Page: http://www.novell.com ( choose Technical Support / File Updates ) Patlst.txt: http://netwire.novell.com/FileUpdt/patlst.html Files in the NSD area use "ftp://ftp.novell.com/pub/netwire/nsd/" 3) MS Network Online: Go Netwire 4) Space Works: 800-577-2235 5) Novell BBS 801-373-6999 Appendix D - Troubleshooting Tools Each of the files and documents included in the TABND2.EXE file is provided as a troubleshooting aid. Some are for specific problems and some are for troubleshooting in general. Here is a list of the diagnostics and Documents found in TABND2.exe. See readme.txt, or the documents associated with each troubleshooting tool for a more detailed description. With the exception of Hdump.nlm, each of these utilities will run on a 3.x or a 4.x server. README.TXT Readme file for TABND2.EXE. HDUMP.NLM Used to aid you in taking a core dump on a NW3.11 server CONFIG.NLM Used to document a servers configuration. FCONSOLE.EXE Used to down a file server from a workstation. IMGCOPY.NLM Used to transfer a core dump from a dos partition to the NetWare volume. NETALIVE.NLM Used to send your core dump to a volume on another NetWare server. 410PBOFF.NLM Allows you to disable packet burst during troubleshooting. HIGHUTIL.CMP Technical Information Document (TID) 1005736. Troubleshooting high utilization vs. file compress at the NW4.1 server. HIGHUTIL.SUB TID1005436. Troubleshooting high utilization vs. NW4.1 file system suballocation. HIGHUTIL.TRB TID1005963. Troubleshooting high utilization issues in general. HIGHUTIL.ADD TID2905856 is an addendum to TID1005963. Troubleshooting high utilization. PATCH.DOC TID1007561. A statement explaining the use of NW OS patches. RCSI.APP February 1995 Application Note, "Resolving Critical Server Issues." RECOVERY.APP June 1995 Application Note, "Abend Recovery Techniques for NetWare 3 and 4 Servers." RECOVERY.BMP Flow Chart graphic that goes with the document, "Abend Recovery Technique ..." TABEND.WP6 This document. TABEND.TXT Text version of this document. TABENDS.WPG Graphic of the Troubleshooting Abends flow chart Appendix E - Using the Internal Debugger You can break into the NetWare debugger at any time. NOTE that if you break into the debugger while the server is in use, the workstation will not be able to communicate with the server and will probably time out. Break into the debugger by holding, simultaneously, , , , . Here are a few commands to be used while in the debugger. These commands may be different or unavailable depending on your version of NetWare. .c Force a core dump. .r Current running process v View open screens ? Current running nlm h Help .h More help q Quite to dos. g Go back to the point where you can into the debugger. Appendix F - Faxback Service Novell Technical Support (NTS) maintains a Faxback service with documents on known issues, top ten current issues, and commonly ask general information. You can reach the NTS Faxback service by calling 801-861-5350 or call 800-NetWare (638-9273) Choose option 2 (for Technical Support), then option 2 (for the Faxback service), then option 1 (to skip the Faxback introduction), then option 1 (if you have the document number) or option 2 (for a catalog). Below is a listing of some of the documents available on the Faxback as of May 29, 1996. Document Title Document number: What is Drive Deactivation? 5000513 Trouble Mounting CDROM's as NetWare Volumes 5000522 Register Memory in NetWare 3.x and 4.x 5000572 4.x Backup/Restore 5000743 Drive Deactivation Troubleshooting Tips 5000893 Troubleshooting Tips for NetWare Directory Services 30065 Partition Troubleshooting Guide 30072 Patch List and NLM Updates 30082 Top Server Issues (February 1996) 1001198 Support Tips For Adaptec EISA, VL, PCI HBA'S 5003623 Intel Pentium Floating Point Flaw on Netware 5004623 HCSSIT.EXE -- HCSS NLM Update 5005622 SFT III 4.1 Features/Changes Document 02197415 NetWare v4.1 Multimedia CD Q & A 10025532 Replacing Failed Hard Drive in Mirrored Group 10024983 High Utilization Recommendations 10059634 High Utilization and Compression Document 10057362 High Utilization and Suballocation Document 10054363 Directory Entries Document 12020466 PCI Technology Document 12024722 File Listings NetWare 3.11 File Listing 40004 NetWare 3.12 File Listing 40012 NetWare 4.01 File Listing 40023 Novell Technical Support To: TABND Feedback Fax Number: 1-801-861-5988 From: Use this form to influence what goes into future updates of this troubleshooting document. Or send your comments to email to Dshaver@Novell.com. This is for feedback only. Novell will not be able to respond to you personally. Thanks for your feedback. Some things we would like to know include: 1) Were you able to solve your problem without opening an incident with Technical Support? 2) How can we make this document or the included files more useful to you? 3) Are there other issues that might lend themselves to this type of document ? 4) What other comments / suggestions do you have for us? Number of Pages (including cover sheet): ____ Novell, Inc. / MS E31-1 / 122 East 1700 South / Provo. Ut 84606 Telephone 800-638-9273 / Telex 37895941 / Alternate Fax 801-861-5200