TROUBLESHOOTING OPERATING SYSTEM ABENDS
DISCLAIMER: THE ORIGIN OF THIS INFORMATION MAY BE INTERNAL OR
EXTERNAL TO NOVELL. NOVELL MAKES EVERY EFFORT WITHIN ITS
MEANS TO VERIFY THIS INFORMATION. HOWEVER, THE INFORMATION
PROVIDED IN THIS DOCUMENT IS FOR YOUR INFORMATION ONLY.
NOVELL MAKES NO EXPLICIT OR IMPLIED CLAIMS TO THE VALIDITY OF
THIS INFORMATION.
Contents of this Document
What Is A Server Abend
Troubleshooting an Abend - Preliminary Steps
Troubleshooting An Abend - A Process
Troubleshooting An Abend - Data Collection
Troubleshooting An Abend - Isolation & Duplication
Using Novell Technical Support
Getting a Core Dump
Appendix A - Check list/Summary
Appendix B - Dealing With An NMI Error
Appendix C - How To Access The NetWare OS Patches And Updated Files
Appendix D - Troubleshooting Tools
Appendix E - Using the Internal Debugger
Appendix F - Faxback Service
Fax Form for Feedback on this Document
What Is A Server Abend
"When an Abend message appears on the server console, either NetWare or the server
CPU has detected a critical error condition (fault) and jumped into the NetWare's fault
handler. This handler idles NetWare and displays the Abend message on the server
console for immediate action by the server administrator....."
An error condition, or fault, that is detected by the CPU is called a "Processor
Exception."
An error condition that is detected by NetWare is called a "Software Exception."
"NetWare's fault handler (function) is declared as a public function by the operating
system so it can be used by any operating system module or Novell NLM." (Abend
Recovery Techniques for NetWare Servers. June 1995 Application Notes. Page 79.)
"The NetWare 3 and 4 operating systems continually monitor the status of various server
activities to ensure proper operation. If NetWare detects a condition that threatens the
integrity of its internal data (such as an invalid parameter being passed in a function call,
or certain hardware errors), it abruptly halts the active process and displays an "Abend"
message on the screen. ("Abend" is a computer science term signifying an ABnormal
END of program.)
The primary reason for Abends in NetWare is to ensure the stability and integrity of the
internal operating system data. For example, if the operating system detected invalid
pointers to cache buffers and yet continued to run, data would soon become unusable or
corrupted. Thus an Abend is NetWare's way of protecting itself and users against the
unpredictable effects of data corruption." (Resolving Critical Server Issues. Feb. 1995
Application Notes. Page 37.)
The term Abend will be used in its most generic sense in this document. Meaning any
Software, or Processor Exception or where the server hangs or locks.
Troubleshooting An Abend - Preliminary Steps
Anytime you experience a server Abend (or almost any other server problem) consider
the following maintenance steps before you do anything else. Experience has shown that
a high percentage of Abends are corrected by applying current versions of the Operating
System (OS) patches, and by updating lan and disk drivers. See the "patch.doc"
document included in TABND2.exe. Novell Technical Support (NTS), therefore,
strongly recommends, that regardless of the problem you are experiencing, apply the
current patches and update drivers. After this is done consider the other items on the list
that follows. The list is in no particular order.
NOTE: Be aware that an NMI Parity error (Abend: Non-Maskable Interrupt) is a
hardware error. (See Appendix B - Dealing With An NMI Error.)
1. Load ALL the current patches that are available for your version of NetWare. The
patches are written to resolve known issues. The patch download file will be called
<OS version>PT<file revision number or letter>.EXE for the "PT" set, and <OS
version>IT<file revision>.EXE for the "IT" set (IE: 410pt3.exe & 410it6.exe).
Load both the "PT" patches and the "IT" patches. (See "Patch.doc," included in the
TABND2.EXE file, for a detaile explanation of the patches,)
2. Update Drivers. Each manufacturer of Lan and disk cards must develop their own
drivers. The only way to assure that you have the latest version of these drivers is to
download them from the respective vendor. Even new hardware does not usually
ship with the most current drivers. Be certain that drivers are the newest available
from the respective vendor!!!
3. Look for nlms that may be outdated. Remember, every time we update a file it is to
make it more robust, and/or usable. Commonly updated NetWare nlms are
compressed into self-extracting download files. Get a list of the current revision of
files and patches from the file "patlst.txt." This list can be viewed or downloaded
from Novell's Internet Web site. (see Appendix C for more on downloading files
from Novell.) Also, don't forget to check with 3rd party vendors to update their
NLM's as well.
4. Virus scan the dos partition on the server to make certain that a virus has not crept
in.
5. Clean and re-seat the cards and cables (ALWAYS use static precautions.)
6. Double check termination and SCSI ID's, etc.
7. Check fans for proper operation.
8. Use an anti-static air can to blow dust off of the system board and other cards and
components.
Troubleshooting An Abend - A Process
Figure 1, below, is a flow chart which suggests a logical process that could be used when
troubleshooting a server Abend. Some Abends have simple solutions, but more often you
will have to pull a little bit of hair before you come to a solution. The name of the
troubleshooting game is usually called "Experience," and a "Little Bit of Luck." Don't
forget to draw on other peoples experience. When troubleshooting, don't expect a magic
fix the first thing that you try. Do expect to have to narrow in on the problem by trying
different things.
Troubleshooting An Abend - Data Collection
This step, if done well, can be the most valuable step toward identifying your problem.
The most common shortcoming when doing data collection is not looking far enough
beyond the immediate symptoms. Following is a partial list of things to look for and
questions to ask yourself. Don't feel embarrassed to raise a question that seems
completely unrelated. Sometimes it IS unrelated, but often times it helps to get a more
complete picture of the problem. The questions and suggestions that follow should help
you gain a higher level view of the problem. Sometimes your data collection will reveal
more than one potential problem and you'll have to perform "Computer Triage."
However, without a complete picture you could waste time troubleshooting in a
meaningless direction. Use this list to get you thinking about the kinds of things that
could be happening on your server / network to cause problems. These are in no
particular order:
1. Keep a record of the Abend messages and watch for trends. Abend messages that
are consistent sometimes indicate that software is at fault, while messages that are
not consistent may indicate a hardware failure. There is NO hard/fast rule here, you
want the data to point you in some troubleshooting direction.
2. Look through the system error log (sys:sytem\sys$log.err) for clues that haven't
surfaced anywhere else. An error on a certain node just before the Abend, an error
on a certain file, a print queue, volume dismounts, etc.
3. What resources are being used at the time of the Abend? IE. Printing, file, tape, com
port, memory access, etc.
4. What time of day does the Abend occur? Is it consistent?
5. Is the room air conditioning off when the machine Abends? This may indicate a
heat problem.
6. What is the environment like? Dry, dusty, or hot environments contribute to heat
problems and static.
7. Are there power problems either at the power source or at the power supply?
8. Is there a certain data base function running such as reindexing?
9. Is lan traffic high? Are disk reads or writes high?
10. With the Abend still on the screen, break into the debugger and record basic
information such as the EIP (instruction pointer), running nlm, running process.
(See Appendix E - Using the Internal debugger.)
11. Use conlog.nlm to capture console messages you would otherwise miss. Note:
Conlog has to be unloaded in order to close the log file and make it available to read.
(See conlog.nlm in the 4.1 Utilities Reference Manual.)
12. Is there a certain user, segment, application, server process (like a backup), or
anything else that is common or consistent when the Abend occurs?
13. Is the client software current?
14. Ask, "What has changed in my server environment?" Consider these questions:
- Have the number of users increased?
- Is there new software or any software upgrades that have been put on?
- Is someone using software in a way different than it had been used, such as
database indexing, etc.?
- Is there new or different hardware?
- Have there been changes to the LAN, the routers, or the cabling?
- Have workstations or the file server been physically moved?
- Are there new printers on the LAN?
- Have there been any power outages?
- Have SET parameters been changed?
15. Is there any new or strong Electro Magnetic Force (EMF) near the server or cabling?
Large motors, cabling across the florescent lights, vacuum cleaner, transmitters, etc.
16. Has the hardware been handled without Static Protection?
17. "Set DStrace = on" and then watch the DSTRACE screen for errors, or for an "All
processed = NO" message. Be sure to give any DS errors time to go away before
you worry too much. Fifteen minutes to several hours is usually adequate.
18. Are any users dropping their connection?
19. File corruption?
20. Drive deactivation? If partitions are mirrored, the drive can deactivate without
bringing the server down. Check the error log.
21. Printing problems?
22. Is power filtered? If so has the filtering hardware been tested recently? IE: Is it still
functioning?
23. Monitor.nlm and Install.nlm are valuable NetWare utilities to check your servers
health. Use them to find information such as
- Climbing packet receive buffers.
- No ECB available count that continues to climb.
- Low server memory. Cache buffer percentage should usually be around 60 -
70%.
- LRU sitting time.
- Dirty cache buffers that stay high.
- A high number of LAN errors (more then 10% of the total packets sent or
received).
- High utilization (if it stays high for more then 10 or 20 minutes at a time).
- Check Service Processes to see if they have max'd out.
- Partition, volume, and mirroring information.
- View and edit NCF files.
Config.nlm: An invaluable tool for data collection is Config.nlm (included in
TABND2.exe). When run at the server, config.nlm will create a file which includes
information about your server's configuration. You may notice something here that
raises a "red-flag" that you hadn't noticed before. Use it to document your configuration
before you make any changes to the server. Also, If you place a call to Tech Support,
you will often ask for this information.
IMPORTANT NOTE: Sometimes understanding the data you've collected will require you
to find out from other sources if what you are seeing is normal. For example, it's
common for the server's utilization to stay at 100% for a few seconds or even a few
minutes, or more. Likewise, the allocation of packet receive buffers or the size of the
directory entry table is dynamic up to the setable maximum. They are allocated
dynamically, on an as-needed-basis. It is often only through experience that you'll
determine if what you're seeing is normal or if it is indicating a problem. Watch YOUR
server to determine what is normal in YOUR environment and then tweek SET
parameters or make other changes as needed. It is also ok to get DSTRACE errors. If DS
is trying to process a request, and another request is already in process, you can get a DS
error until the first process completes. It is important to establish what is normal for your
environment so that you can accurately determine when you have a real problem and
when you have simply hit against the limitations of your hardware and/or software.
Troubleshooting An Abend - Narrowing (Isolation) & Duplication
If the problem is not solved by now it's time to roll up your sleeves and troubleshoot.
Now that the preliminary steps have been covered, and the initial data collected,
troubleshooting is primarily a matter of going back and forth between "Problem
Isolation," and "Problem Duplication." Trying to narrow in on the problem, while at the
same time trying to discover a sequence of events that will reproduce the problem. In
simpliest terms, an Abend is caused either by a hardware failure or by a misbehaved
NLM's. In either case the result is usually corrupted memory. Remember from the
introduction, a software exception occurs when NetWare fails a consistency check
(performed in memory, on memory), and that a processor exception occurs when the
processor encounters an address or machine instruction that does not comply with the
rules. Again, corrupted memory.
Problem Isolation & Problem Duplication are almost the same. The main difference is
that in Problem Duplication, you are specifically trying to reproduce a problem. There
may be an nlm, which, every time it loads, the server Abends. Or perhaps if someone
does a copy while someone else is logging in the server Abends. If you are able to find a
reproducible problem like this you can now eliminate variables one at a time, try the test
again, and see if the problem goes away. In the other case, Problem Isolation
(Narrowing), the data may not have given you a clue so you have to probe around, trying
different things to see if you can narrow the problem down to a system or component.
You may be able to determine that the problem is isolated to the disk channel because it
only happens when the disk is being accessed. Or, you may be able to relate the problem
to a certain nlm such as a backup. Consider these systems when trying to isolate a
problem:
- Disk channel, - Lan channel,
- System board, - Com port,
- the NetWare OS, - a 3rd party nlm product,
- the cabling, - a certain type of workstation ( Win95 vs. Win 3.1 vs. dos),
- a certain type of shell (netx, vlm, client32, 3rd party client).
Remember that the objective is to fine a sequence of events that will reproduce the
problem, or at least narrow down to a system, an nlm, or a piece of hardware that is
always involved when the Abend occurs. The following troubleshooting ideas should
help you to "Divide & Conquer". This list is in no particular order, but it is grouped
somewhat by lan, disk, and general troubleshooting ideas.
Here are some Problem Isolation & Problem Duplication Ideas.
1. Use "server -ns," or "server -na" to bring the server up without executing the
startup.ncf, or the autoexec.ncf respectively. Loading "server -ns" will allow you to
bring up the server without the volume mounting automatically. These parameters
also work for SFT3.
2. Does the Abend message itself suggest anything? LAN channel, disk channel,
memory corruption, system board problem, a certain nlm, printing, a certain piece of
hardware, a certain lan segment, a workstation, a router, an environmental condition,
etc.
3. Use "server -na" to prevent the autoexec.ncf from running. Then load nlm's
manually, one at a time.
4. Use "server -ndb" to prevent the DS database from loading and thereby eliminate
directory services. Note that you won't be able to log in without the database
loaded.
5. What is the age of the hardware? If nothing has changed in the environment then
the hardware may have simply failed. Don't assume that new hardware is always
good.
6. Could hardware have received static shock? Static is not always destructive, often it
will cause degenerative damage to your hardware allowing it to continue to work for
a time.
7. Check with the vendor of 3rd party products to see if they are aware of the kind of
problem you are experiencing.
8. Temporaily unload any 3rd party nlms.
9. Unload virus scan, and server/lan monitoring nlms.
8. Could there be power problems either at the power source or from the power
supply?
9. Check the cooling fan in the power supply, the case, and on the cpu. Heat will cause
hardware failure.
10. A dry, hot or dusty environment can cause hardware degradation due to static
electric discharge. It also increases the chance of NMI errors.
11. Avoid int. 15, 2, & 9, in that order.
12. Try to isolate the problem to a hardware subsystem, i.e.: LAN Channel, Disk
Channel, and System Board. You can swap hardware or try a different interrupt or
slot.
13. Although the Abend message is very generic it can still be used to point you in a
direction. Most Abends indicate memory corruption. Some will be disk related,
others lan related. Often an Abend will include a function name, for example:
"Abend: DeallocateMappedPage was supplied an invalid memory pointer." The
words that appear without spaces (DeallocateMappedPage) is the name of a function
in NetWare's code. In this Abend a memory pointer was sent into the
DeallocateMappedPage function. During a consistency check the function
determined that the pointer was not a valid address. When an Abend mentions
....interrupted.... take a look at the lan simply because the lan does more interrupting
then anything else in the server. As is always the case, this becomes more intuitive
from experience.
14. Clean and re-seat the cards and cables. Remember static protection.
15. How long has the server been installed? If it is a new install (less then a month) you
may still have configuration and set up issues, or you could have faulty hardware
...even if it is new.
16. If you have a 16bit card in a machine with more then 16 meg of Ram, are there load
parameters for the drivers?
17. Go back to basics. Every technician's tool kit should have an NE2000 lan card, and
a basic no-nonsense disk card to use for testing purposes.
18. If the machine has PCI cards try a non-PCI card if possible.
19. Server.exe, like any other file, can become corrupt. A corrupt server.exe can be
difficult to track down. If you can reduce the environment to not much more then a
server, a disk and a lan, and you still have the problem try a fresh copy of server.exe.
The same idea applies to any other nlm on the server. Remember, the server.exe in
NetWare v3.x contains the server license number. Don't copy the wrong server.exe
file.
20. Run a bindfix or a dsrepair, or a vrepair.
21. Load install and view partition or mirroring information. Install retreives this
information dynamically. There can be partition table corruption, for example, that
is not surfacing as such. When you try to access partition information in install,
you'll get a specific error because the table cannot be read.
22. Always double and triple check termination, interrupt settings, SCSI ID, drive
translation (should be off), etc.
23. Run vrepair. If vrepair runs clean (zero errors) and you still suspect disk corruption,
change the vrepair options. Option 2 from the main menu will change the vrepair
options. Set them to "Write changes immediately...," "Write all changes....," and
"Purge deleted files." (This is not exact syntax.) Don't be alarmed with a LOT of
errors after these changes. Note: Because of the nature of what vrepair is doing,
Novell recommends that you always have a verified good back up before you run
vrepair.
24. Try different workstations.
25. Try workstations on different segments.
26. Can you isolate the lan completely by attaching a single workstation directly to the
server?
27. Category 5 cabling is usually required on faster lan cards. Running the wrong cable
can cause problems.
28. Heavy I/O will sometimes stress the server enough to force an error. Try doing a
copy *.* or an xcopy continuously. Try it from several workstations.
29. Use ncopy to copy a file where the source and destination are the same server.
Ncopy will not send any packets across the lan in this scenario. If you still have the
problem, you know the lan is not part of the problem. Any other form of copy
(copy, xcopy), where the server is the source or destination, will send the file across
the wire even if it is just going to be sent back a second time across the wire. If copy
has the problem but ncopy does not, then the problem is probably on the lan.
Using Novell Technical Support
If you need further troubleshooting direction, your first step should be to call a Novell
Authorized Service Center (NASC). These Gold and Platinum dealers are Novell
NetWare trained and willing to help you. To find the service center closest to you call 1-800-NET-WARE (638-9273) between 7:30am to Midnight CST, choose option 1, 2.
If you still need to contact Novell Technical Support Do The Following Before You
Call.
1. Find config.nlm included in TABND2.exe file, and run it at your server. Read the
associated read-me file for specific instructions. The file that config.nlm creates will
contain important server information that we can use to help troubleshoot your
Abend.
Note: Before you run config you should have already updated drivers and loaded
current os patches before we will want to look at your config information.
2. Next, fill out the form in Appendix A. Have this available for reference or to fax to
us if needed.
3. At this point open an incident with Novell Technical Support.
4. Consider the possibility that you may need to get a core memory dump from your
server. A core memory dump takes a "snapshot" of the server's RAM as it looks at
the time of the Abend. The core dump is discussed further on in this document.
DO NOT automatically take a core dump. Wait until a Technical Support Engineer
instructs you to do so. Also, Do not send us core dumps from servers that do not
have the patches and current LAN and disk drivers loaded. We cannot spend
time troubleshooting a problem that has already been resolved by current patches or
updated software. Make sure you have the current patches and current LAN and
disk drivers!
Getting a Core Dump
If nothing to this point has solved the problem it may be necessary to get an image of you
memory (Core Memory Dump) for us to analyze.
DO NOT automatically take a core dump. Wait until a Technical Support Engineer
instructs you to do so. Also, Do not send us core dumps from servers that do not have
the current patches and drivers loaded. Current patches and drivers fix known
problem. We cannot spend time looking at a problem we've already solved. Make sure
you have the current patches and drivers!
Dump to Floppy: This is the simplest method, but is not practical if you have more then
16meg of Ram. When the server Abends you are asked if you want to
".. copy the diagnostic image to disk." In NW3.12 and NW4.x the
image will go to the c: drive by default, with an option to write it to
floppy.
To copy the image to floppy you need to have enough blank, formatted
floppies to hold your entire server memory.
Dump to Hard Drive: This method is much faster than dumping the image to floppy. In this
case the core dump is copied to the C:\ partition of the server as a file
called coredump.img. To copy the image to the C: drive you must
have enough free space to hold the entire server memory. If you have
not planned this extra space into your c: partition you may be able to
add an inexpensive IDE drive and use the entire drive as a dos
partition.
After the image is copied to the drive the server can be brought back
up and users can log in. You will then need imgcopy.nlm. This nlm is
contained in the download file imgcpy.exe. When imgcopy.nlm is
loaded it will allow you to copy coredump.img from the c: partition to
the NetWare partition. Once on the NetWare partition, the file can be
ftp'd to us or it can be backed up on tape and the tape can be sent to
us. Before you do this, make sure that we have the software needed to
restore the file.
Forcing a Core Dump: Occasionally you'll need to force a core dump during a certain
condition other then after an Abend. For example, if the server is
experiencing high utilization and a core dump is needed at the time the
utilization is high. A core dump can be taken at anytime by breaking
into the OS' internal debugger. NOTE that if you break into the
debugger while the server is in use, the workstation will not be able to
communicate with the server and will probably time out.
Break into the debugger by holding, simultaneously, <left shift>,
<right shift>, <alt>, <esc>. Here are a few commands to be used
while in the debugger.
.c Force a core dump.
q Quite to dos.
g Go back to the point where you can into the debugger.
h Help
Dump to Server: This is the faster way to write the dump. It is called the network
method. The problem server must have an additional lan card (must
be ethernet) that gives it a client connection to a second server that
will be the destination for the image file. When the problem server
Abends the core dump is sent across the lan connection to the
destination server. This is how to setup for this type of coredump:
1. Down the problem server and power it off.
2. Put in an additional Lan Card (Ethernet Only).
3. Bring the server back up to a dos prompt.
4. Set up an nwclient directory, and load the client drivers.
5. Log in to the server that will be the destination server, and
map a drive to the place you want the dump to be sent to (for
example: f:\dumpdir).
6. Go to the destination server and write down your connection
number (Get this from monitor). Also, write down the drive
mapping (ie: F:\dumpdir).
7. Bring up the problem server.
8. You should test the connection by going into the debugger
and initiating a core dump. Press the following keys all at
once (Alt , L-Shift, R-Shift, Esc).
9. At the # prompt type .c <return>.
10. Follow the prompts with a "Y" to dump to a hard drive.
When you are prompted for a path, enter,
f:\dumpdir\coredump.img (or the mapping you wrote
down if it's different). Press <return>. The image should
start copying.
11. If this works go on to the next step. If it doesn't here are
some of the problems we've seen:
- Hardware configuration problem on additional Lan
Card.
- Using Netx with some version of lan drivers produces
bad packets.
- Connection to second server , routing, network , etc....
- Other normal client to server trouble shooting issues.
12. If you're able to duplicate the Abend, take a core dump.
Most likely, however, you will have to wait until the next
time it Abends on its own. Your default connection time
to the destination server will be 15 minutes. In order to
hold the connection longer you have two choices:
- Increase the watchdog time out parameters, or
- Load netalive.nlm (included in the TABND2.EXE file).
You will load netalive with two parameters, the name of
the problem server, and the name of the destination server
(See the readme with netalv.)
You can use monitor.nlm on the destination server to see if
your connection is being maintained.
You will need an open support incident with Novell Technical Support
before sending the core dump. At that time you will be given a
location to ftp the core dump to, or an address where you can mail a
tape.
Appendix A - Check list/Summary
Incident Number: ________________ Name: __________________________ Phone:
______________________
O/S version _______________ DS version ________________ Amount of RAM
____________
Make/Model of Machine (indicate if a clone)/Bus Type __________________________
LAN card, driver name, driver date & version __________, __________, ___________
LAN card, driver name, driver date & version __________, __________, ___________
HBA (controller), driver name, driver date & version _______, ________, __________
List the devices on this HBA: __________________________________
HBA (controller), driver name, driver date & version _______, ________, __________
List the devices on this HBA: __________________________________
Are your drives mirrored? Y / N Duplexed? Y / N Total volume space? __________
1. Have you updated the LAN and disk drivers? Y / N
2. Have you applied all the appropriate OS patches? Y / N
3. Have you tried a fresh copy of Server.exe? Y / N
4. Is your clib.nlm current? Y / N
5. Have you virus scanned the DOS and NetWare Partitions? Y / N
6. What other information do you have that may help troubleshoot this problem?
7. What changes have been made to the server recently? (Increased number of users, new
software, upgraded software, new or different hardware, LAN or router changes,
workstations or file server physically moved, power outages, set parameter changes,
etc...)
8. What hardware has been swapped out already?
9. Do you have config.txt ready to upload?
Appendix B - Dealing With An NMI Error
As mentioned in the main body of this document, an NMI error is a hardware problem.
There are three types of interrupts that a processor can handle: a maskable hardware
interrupt (INTR), a non-maskable hardware interrupt (NMI), and a software interrupt
(INT). The processor has a dedicated line on the system board bus that handles only non-maskable hardware interrupts. According to Intel's - i486 Microprocessor Hardware
Reference Manual this NMI line can be asserted as a result of one of three catastrophic
events,: 1) an imminent power loss, 2) a bus-transfer parity error or, 3) a memory-data
parity error. When this NMI line is asserted the processor generates an NMI error. This
error is received by the NetWare operating system and then reported to the console
screen. There are two flavors of NMI errors, "Abend: NMI parity error generated by IO
check," and "Abend: NMI parity error generated by System Board." If the NMI is
generated by the system board there is a fairly good chance the problem is with the
system board or its' memory, although it can still be elsewhere. If the NMI is generated
by an IO check, the problem could be anywhere. Here is a list of hardware related items
that we have found to cause NMIs. These ideas should help you as you troubleshoot an
NMI error.
1. Faulty RAM.
2. Faulty system board
3. Any I/O card. Especially cards with on-board memory.
4. Low or fluctuating power at the power source. Remember, a UPS can go bad too.
5. Power supply going bad.
6. Memory extension boards.
7. System board memory that is mismatched in either speed or brand.
8. Conflicting interrupts.
9. Try cleaning and reseating cards/cables/and memory modules.
10. Incompatibility between hardware pieces.
11. Look at the environment and how the equipment is handled. NMI's can often
be traced back to static electric discharge. A sometimes overlooked point is that
static does not always cause immediate failure, the damage can be degenerative.
The hard failure may not occur until sometime in the future.
12. This is rare, but in two separate cases NTS has seen a hard drive, and a printer
cause NMI error's.
Appendix C - How To Access The NetWare OS Patches And Updated Files
What file to download?
For Novell's official statement on OS patches see the document Patch.doc.
The file "Patlst.txt"is updated regularly to reflect current patches and files that are
available from Novell's Technical Support. This list can be downloaded from the online
services listed below. You can view the current version of this file from Novell's
InterNet Site: http://netwire.novell.com (choose Technical Support / File Updates /
Current minimum OS & NLM updates.)
Where to get updated files and patches?
1) CompuServe: Go Netwire
Go NWOSFiles (OS Files)
Go NWGenFiles (General Files)
Go NovFF (Novell File Finder)
Go NSD
2) Internet: Novell Web Page: http://www.novell.com ( choose Technical Support / File Updates )
Patlst.txt: http://netwire.novell.com/FileUpdt/patlst.html Files in the NSD area use
"ftp://ftp.novell.com/pub/netwire/nsd/<filename>"
3) MS Network Online: Go Netwire
4) Space Works: 800-577-2235
5) Novell BBS 801-373-6999
Appendix D - Troubleshooting Tools
Each of the files and documents included in the TABND2.EXE file is provided as a
troubleshooting aid. Some are for specific problems and some are for troubleshooting in
general. Here is a list of the diagnostics and Documents found in TABND2.exe. See
readme.txt, or the documents associated with each troubleshooting tool for a more
detailed description. With the exception of Hdump.nlm, each of these utilities will run on
a 3.x or a 4.x server.
README.TXT Readme file for TABND2.EXE.
HDUMP.NLM Used to aid you in taking a core dump on a NW3.11 server
CONFIG.NLM Used to document a servers configuration.
FCONSOLE.EXE Used to down a file server from a workstation.
IMGCOPY.NLM Used to transfer a core dump from a dos partition to the NetWare
volume.
NETALIVE.NLM Used to send your core dump to a volume on another NetWare
server.
410PBOFF.NLM Allows you to disable packet burst during troubleshooting.
HIGHUTIL.CMP Technical Information Document (TID) 1005736. Troubleshooting
high utilization vs. file compress at the NW4.1 server.
HIGHUTIL.SUB TID1005436. Troubleshooting high utilization vs. NW4.1 file
system suballocation.
HIGHUTIL.TRB TID1005963. Troubleshooting high utilization issues in general.
HIGHUTIL.ADD TID2905856 is an addendum to TID1005963. Troubleshooting high
utilization.
PATCH.DOC TID1007561. A statement explaining the use of NW OS patches.
RCSI.APP February 1995 Application Note, "Resolving Critical Server Issues."
RECOVERY.APP June 1995 Application Note, "Abend Recovery Techniques for
NetWare 3 and 4 Servers."
RECOVERY.BMP Flow Chart graphic that goes with the document, "Abend Recovery
Technique ..."
TABEND.WP6 This document.
TABEND.TXT Text version of this document.
TABENDS.WPG Graphic of the Troubleshooting Abends flow chart
Appendix E - Using the Internal Debugger
You can break into the NetWare debugger at any time. NOTE that if you break into the
debugger while the server is in use, the workstation will not be able to communicate with
the server and will probably time out. Break into the debugger by holding,
simultaneously, <left shift>, <right shift>, <alt>, <esc>. Here are a few commands to be
used while in the debugger. These commands may be different or unavailable depending
on your version of NetWare.
.c Force a core dump. .r Current running process
v View open screens ? Current running nlm
h Help .h More help
q Quite to dos. g Go back to the point where you can into the
debugger.
Appendix F - Faxback Service
Novell Technical Support (NTS) maintains a Faxback service with documents on known
issues, top ten current issues, and commonly ask general information. You can reach the
NTS Faxback service by calling 801-861-5350 or call 800-NetWare (638-9273) Choose
option 2 (for Technical Support), then option 2 (for the Faxback service), then option 1
(to skip the Faxback introduction), then option 1 (if you have the document number) or
option 2 (for a catalog). Below is a listing of some of the documents available on the
Faxback as of May 29, 1996.
Document Title Document number:
What is Drive Deactivation? 5000513
Trouble Mounting CDROM's as NetWare Volumes 5000522
Register Memory in NetWare 3.x and 4.x 5000572
4.x Backup/Restore 5000743
Drive Deactivation Troubleshooting Tips 5000893
Troubleshooting Tips for NetWare Directory Services 30065
Partition Troubleshooting Guide 30072
Patch List and NLM Updates 30082
Top Server Issues (February 1996) 1001198
Support Tips For Adaptec EISA, VL, PCI HBA'S 5003623
Intel Pentium Floating Point Flaw on Netware 5004623
HCSSIT.EXE -- HCSS NLM Update 5005622
SFT III 4.1 Features/Changes Document 02197415
NetWare v4.1 Multimedia CD Q & A 10025532
Replacing Failed Hard Drive in Mirrored Group 10024983
High Utilization Recommendations 10059634
High Utilization and Compression Document 10057362
High Utilization and Suballocation Document 10054363
Directory Entries Document 12020466
PCI Technology Document 12024722
File Listings
NetWare 3.11 File Listing 40004
NetWare 3.12 File Listing 40012
NetWare 4.01 File Listing 40023
Novell Technical Support
To: TABND Feedback
Fax Number: 1-801-861-5988
From:
Use this form to influence what goes into future updates of this troubleshooting
document. Or send your comments to email to Dshaver@Novell.com. This is for
feedback only. Novell will not be able to respond to you personally. Thanks for your
feedback.
Some things we would like to know include:
1) Were you able to solve your problem without opening an incident with Technical
Support?
2) How can we make this document or the included files more useful to you?
3) Are there other issues that might lend themselves to this type of document ?
4) What other comments / suggestions do you have for us?
Number of Pages (including cover sheet): ____
Novell, Inc. / MS E31-1 / 122 East 1700 South / Provo. Ut 84606
Telephone 800-638-9273 / Telex 37895941 / Alternate Fax 801-861-5200