NetWare Directory Services Troubleshooting Tips May 5, 1995 Disclaimer Novell, Inc. makes no representations or warranties with respect to the contents or use of this manual, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. Novell, Inc. 122 East 1700 South Provo, UT 84606 Trademarks Novell and NetWare are registered trademarks of Novell, Inc. Troubleshooting Tips for NetWare Directory Services Introduction NetWare Directory Services (NDS) is a distributed database, and as such is loosely consistent. Therefore, errors that come and go without intervention are normal and are no causes for alarm. When an error occurs consistently for more than an hour, however, something is generally wrong and we recommend further investigation. This document outlines general strategies for troubleshooting NDS problems. The intent is to help System Administrators gather pertinent information and isolate their problem before placing a service call with Technical Support. By following these guidelines, System Administrators can help Technical Support in resolving their issue quickly and perhaps even avoid placing a service call. Preventing Problems It is important to stay current with patches, read the release notes and readme files whenever upgrading or installing, and avoid experimenting on production trees. If there is a warning notice on an operation in NWAdmin, Partition Manager, or DSRepair, rest assured that it is there for a good reason! Many options available in DSRepair are for advanced troubleshooting purposes and should only be used in very specific situations. Partition Manager should be used for ALL partition operations (i.e., deleting a replica from a server, changing a read/write replica to a master). If a utility such as Partition Manager or NWAdmin returns an error during an operation, resolve the error instead of forcing the operation through DSRepair. When in doubt, place a service call and take no action until advised to do so by a Technical Support Engineer. The current NDS files can be found on the NSEPro, CompuServe, and the Internet. The NetWare 4.x forum on NetWire is NOVLIB 14. Current issues can be found in the above places and on the FaxBack service. To obtain FaxBack documents, call 1-800-NetWare, press 2, then 2, and then follow the instructions or call FaxBack directly at (801)429-5350. Identifying the Problem If you are experiencing a problem with your NDS, you are generally having trouble with one or more partitions, not with the entire tree. The first step in identifying the problem is to determine the partition having trouble and then check the status of the servers holding replicas of that partition. If there are multiple partitions with issues, take a top-down approach and start at the top of the tree. Resolve one partition's errors and then move down to the next partition. DSRepair and DSTrace are the best tools for finding NDS error conditions. In a purely 4.1 tree, load DSRepair on a server holding a replica of the problem partition and perform Replica Synchronization from the Available Options menu. This action will synchronize all replicas held on this server with all other replicas of the same partitions stored on other servers. The log file will report the partition being synchronized, the server on which the synchronization was done, and the status of the synchronization according to that server. The status column is especially helpful, providing error codes if the synchronization was unsuccessful. The information shown on the screen in DSRepair after a Replica Synchronization is from a file in SYS:SYSTEM named DSREPAIR.LOG. To manage the log file, go to the main menu in DSRepair, then to Advanced Options, then Log file and login configuration. You can also view the file through a text editor and, if necessary, send it to Technical Support. An example is shown below: /**************************************************************** ************/ Netware 4.1 Directory Services Repair 4.25 , DS 4.77 Log file for server "Saturn.novell" in tree "Galaxy" Start: Thursday, May 4, 1995 10:32:00 pm Local Time Synchronizing Replica: [Root] Performed on server: Mars.Novell Servers that contain a replica Replica Type Status -----------------------------------------+-----------------+----- ----------- Mars.Novell Subordinate Host Saturn.Novell Master OK Earth.Servers.Novell Read Write OK Mercury.Novell Read Write -625 In the above example, the [Root] partition is being synchronized and the replica being read is the one stored on the Mars server. According to the Mars server, synchronization was successful with both Saturn and Earth, but failed with a -625 error on the Mercury server. It is normal to see -683 (INVALID_API_VERSION) errors during replica synchronization in a mixed (4.0x and 4.1) tree. In mixed environments (4.1 and 4.0x servers in the same tree) and 4.0x trees, using DSTrace to identify the problem may be necessary. DSTrace is a debugging screen on the server console. To enable the screen, type SET DSTRACE=ON. To force an immediate synchronization, type SET DSTRACE=*H. Next, toggle to the Directory Services screen. The synchronization of the replicas stored on that server will be shown on the screen. An example is shown below: Line 00 2FA9A1A3:152:FB018000 (95/05/04 21:55:15) 01 SYNC: Start sync of partition <[Root]> state:[0] type:[0] 02 SYNC: Start outbound sync with (2) [6B00023C] 03 SYNC: sending updates to server 04 2FA9A1A3:355:FB018000 SYNC: update to server successfully completed 05 2FA9A1A3:370:FB018000 SYNC: Start outbound sync with (3) [010001A3] 06 SYNC: sending updates to server 07 2FA9A1A4:529:FB018000 SYNC: update to server successfully completed 08 2FA9A1A4:542:FB018000 SYNC: Start outbound sync with (1) [01000134] 09 2FA9A1A4:549:FB018000 (21:55:16) SYNC: failed to communicate with server ERROR: 10 -625 11 SYNC: End sync of partition <[Root]> All processed = NO. Line 01 shows the partition being synched ([Root]), the state of the replica (0, which means "on"), and the replica type (0=Master, 1=Read/Write, 2=Read Only, 3=Subordinate Reference). Line 02 shows the first outbound synch for the partition to server Mars. Line 04 indicates that the synch with server Mars was successful. Lines 06 and 07 shows outbound synch to server Earth completing successfully. Line 08 shows the start of an outbound synch to server Mercury, and lines 09 and 10 show that synch failing with a -625 error. The end result on line 11 is that the partition was unable to complete synchronization with all replicas. The goal here is to see "All processed=YES" for every partition. To log the DSTrace information to a file, do the following at the console prompt: SET DSTRACE=ON (Turns on the Directory Services screen) SET TTF=ON (Turns on Trace To File) SET DSTRACE=*R (Resets the Trace log file to zero bytes) SET DSTRACE=*H (Forces an immediate synchronization) Then toggle to the DS screen and wait for the cycle to complete. SET TTF=OFF (Turns off the Trace To File, closing the log file) This will send the DSTrace screen output to a file in SYS:SYSTEM named DSTRACE.DBG. This file can be viewed through a text editor and, if necessary, sent to Technical Support. Other Information to Gather Time Synchronization and DS Version: Time synchronization problems can prevent partition operations from completing successfully. To check time synchronization for the entire tree in a 4.1 environment, load DSRepair on a 4.1 server holding a replica of the [Root] partition and choose Time Synchronization from the main menu. This will send time synchronization information to a file (SYS:SYSTEM\DSREPAIR.LOG) which will list the server name, the timesync type, the DS version, and the timesync status. A sample is shown below: /**************************************************************** ************/ Netware 4.1 Directory Services Repair 4.25 , DS 4.77 Log file for server "Saturn.Novell" in tree "Galaxy" Time synchronization and server status information Start: Friday, May 5, 1995 9:51:40 am Local Time DS.NLM Replica Time Time is Time Server name Version Depth Source in sync +/- ---------------------------+-------+----------+-----------+----- ---+------- Mars.Novell 4.77 0 Secondary No +1 Saturn.Novell 4.77 0 Secondary Yes 0 Earth.Servers.Novell 4.77 0 Secondary Yes 0 Mercury.Novell 3.10 0 Single Yes 0 *** END *** In a 4.0x tree, the time synchronization status can be found by typing TIME at the server console for each server. Time synchronization should be active and time should be synchronized to the network. Type MODULES at the server console to find the DS.NLM version for each server, look for the DS.NLM, and note the number found in parentheses on the line immediately after the name DS.NLM (i.e., NetWare Directory Services (310) means that the version of DS is 310). Server Status: The status for each server in a replica list should be UP. Directory Services was designed to be able to function normally when a server in the tree is down for short periods; however, if a server is not going to be up and available for synchronization for an extended period (more than a few days, depending on how busy the tree is) the "down" server should be removed from the tree by loading INSTALL on that server and removing Directory Services from it. If the network is having communication problems, Directory Services may have trouble synchronizing. Therefore, checking the status of servers in the tree can pinpoint possible LAN issues which can adversely affect NDS. To check the status of servers in a 4.1 environment, load DSRepair on one server, choose Advanced Options, then select Servers Known to this Database. All the servers found in that particular server's database will be displayed along with their status and ID number (according to the server you are viewing). Any information gathered this way will be from the perspective of the server on which you ran the DSRepair. To get a complete picture of the state of a partition, you should run this option on most, if not all, servers in the partition's replica list. Any 4.x server showing as "DOWN" or "UNKNOWN" is a problem and should be noted in the Partition Troubleshooting Guide (PARTGUID.xxx). To check the status of servers in a 4.0x environment, you need an enhanced version of the 4.0x DSRepair. This DSRepair can be found in 4X241.EXE on NetWire in NOVLIB 14. Load this DSRepair on a 4.0x server as follows: LOAD DSREPAIR -UR. This will load the DSRepair in unattended mode and gather the replica ring information and send it to DSREPAIR.LOG in SYS:SYSTEM. Running this option will NOT lock the database, so users should not be affected. You can then print or view this text file and send it Technical Support if necessary. The replica ring information will be according to the server on which you ran the DSRepair, so to get a good idea of the state of a partition, you should run this DSRepair option on several, if not all, servers in the partition's replica list and then compare the output. Summary With the information gathered through the above steps, one can fill out the Partition Troubleshooting Guide (found in PARTGUID.xxx) to summarize partition errors and NDS troubleshooting status. With that information in hand, System Administrators will be well equipped to solve NDS problems or, if necessary, help Technical Support to quickly resolve their issues. The Partition Troubleshooting Guide document (PARTGUID.xxx) has a chart to help outline a partition's replica list, answer relevant NDS questions, and spot problem areas. Included in that document is a sample chart with the information gathered in the examples listed above. This chart is a tool to help you to organize your troubleshooting efforts.