IBM OSD Components for Oracle OPS on Windows NT 4.0
                        Version 1.1
           IBM Netfinity Cluster Enabler README File

This README file contains the latest hints and tips to enhance
reliability and performance of your Netfinity Cluster. Refer to the
"IBM Netfinity Cluster Enabler Hardware and Software Installation
Guide for Oracle Parallel Server" for complete installation and
configuration instructions.

MAJOR CHANGES FROM LAST RELEASE
_______________________________

 o   This version of the IBM Netfinity Cluster Enable Software supports
     Oracle Version 8.0.5 on Windows NT Service Pack 4.

 o   The IBMGSCFG.exe configuration utility now allows input of a
     database name other than the default name of "OPS". The database
     name specified when configuring Oracle (e.g., using the OPSCONF.exe
     utility) must match the database name specified when using the
     IBMGSCFG.exe utility.


CONTENTS
________


1.0  Tips and Troubleshooting Hints for Installing and Configuring the
     Netfinity Cluster Enabler Software

2.0  How to Obtain the Oracle Patch Set

3.0  Trademarks and Notices




1.0  Tips and Troubleshooting Hints for Installing and Configuring the
     Netfinity Cluster Enabler Software
______________________________________________________________________


     o  Whenever Oracle is reinstalled on a node, you must reinstall the 
        IBM Netfinity Cluster Enabler software. This ensures that 
        dependencies between the IBM and Oracle services are set correctly.

     o  Before updating or reconfiguring the IBM Netfinity Cluster Enabler
        software, the IBMCoreClusterService service must be stopped on
        all nodes.

     o  Symptom: After first installing and configuring the IBM Netfinity
            Cluster Enabler Software, the IBMCoreClusterService fails to
            start. The %installation_dir%\config directory contains
            files named cscomputer.cfg.0 through cscomputer.cfg.n-1 where 
            n is the number of nodes.
        Explanation: During the configuration step, the configuration
            files were not properly distributed to each of the nodes.
        Action: Ensure that all nodes are connected to the interconnect
            and can ping each other. Ensure that at least one free drive
            letter exists on the node from which the IBMGSCFG.exe
            configuration utility is run.
  
     o  Symptom: "net stop ibmcoreclusterservice" indicates that
            OracleServiceOPSn is not started, and IBMCoreClusterService
            is not stopped.
        Explanation: When the IBMCoreClusterService is installed, it
            makes itself a dependency of OraclePGMSService.
            OraclePGMSService is itself a dependency of OracleServiceOPS.
            When stopping IBMCoreClusterService from a command line,
            the user is prompted that the two other Oracle services will
            be stopped in order. The order is:

                 1.  OraclePGMSService
                 2.  OracleServiceOPSn

            As a byproduct of step 1, OracleServiceOPSn is stopped.
            Then when step 2 is attempted, the indication that
            OracleServiceOPSn is not started is seen. This terminates
            the "net stop" command, and IBMCoreClusterService is not
            stopped.  This is a normal behavior of Windows NT.
        Action:  Reissue the "net stop ibmcoreclusterservice" command.
            Alternatively, stop the services sequentially in the
            following order:

                 1.  OracleServiceOPSn
                 2.  OraclePGMSService
                 3.  IBMCoreClusterService

            Alternatively, use the Windows NT Services window to stop
            IBMCoreClusterService.

     o  Symptom: OPSCONF does not create Net8 configuration to support
            an OPS cluster with more than one public network card.
            The user cannot select instances to start or stop from the
            Oracle Enterprise Manager Console.
        Explanation: Oracle Enterprise Manager Version 1 does not
            support multiple network cards on the agent machine. This
            can affect some operations in Oracle Enterprise Manager
            Console, Oracle Intelligent Agent, and the OPSCONF utility.
            Oracle plans to address this with the next version of these
            programs. Check with Oracle for details of the availability
            of the next version.
        Action:  Use only one public network card on the Agent machine.

     o  Symptom: The Net8 Assistant program does not start on the Oracle
            Enterprise Manager Console.
        Explanation: When the Net8 Assistant program is selected from
            the Windows NT Start-Programs menu, the program may fail to
            start.
        Action:  Ensure that JRE 1.1.6 or later is installed. There are
            specific instructions for installing JRE with Oracle.
            Contact Oracle for instructions to acquire and install the
            JRE program with Net8.

     o  Symptom: After installation of Oracle and the IBM Netfinity
            Cluster Enabler Software, a service or a database cannot be 
            started. 
        Explanation: The symbolic links for the shared disk partitions 
            may not be set up correctly. The symbolic links are set up
            using the SETLINKS program as described in the Oracle 
            Parallel Server "Getting Started" guide book (page 5-3). 
            If the links have not been set up correctly, the problem 
            could be in the input .tbl file for the SETLINKS program.
        Action:  Ensure that there is a Carriage-Return character after
            the last line in the .tbl file used with SETLINKS.

     o  Symptom: The Oracle Installer program reports an incorrect
            amount of disk storage on the installation drive.
        Explanation: The actual amount of available disk storage can
            be checked by using Windows NT commands.  There is no
            functional problem due to the reported value.
        Action:  None.

     o  Symptom: Nodes are unable to communicate with each other or
            clients are unable to connect to a node. PING and/or
            TNSPING80 report different IP addresses or fail when
            pinging a node.
        Explanation: PING and/or TNSPING80 against the local node may
            return a different IP address than a PING or TNSPING80 from
            a remote node. This is due to how host names and IP addresses
            are resolved by Windows NT. The result is that two or more
            nodes may be unable to communicate.  When a node pings
            itself, the returned IP address is that of the first
            network adapter card in the Windows NT list. When a node
            pings a remote node, the returned IP address is that of the
            public network.  If the public network is not connected to
            the lowest numbered network adapter card, then the results
            of the two pings can be different.
        Action:  Ensure that the lowest numbered network adapter card in
            the machine is connected to the public network. The private
            network should be connected to a higher numbered network
            adapter card. After installation, the easiest way to correct 
            this problem is to switch the adapter cables and IP addresses 
            for the installed network adapter cards. Also ensure that the 
            network adapter properties (e.g., duplex, data rates) are 
            also changed.
 
     o  Symptom: "SELECT * FROM v$active_instances;" returns invalid
            information. The response may include an incorrect list
            of instances, a message that no rows were found, or
            random characters.
        Explanation: This SELECT statement is valid only when the
            database instances are in a stable state. If a database
            instance is in the process of being shutdown, the response
            may be invalid.
        Action:  Reissue the statement after the database instance
            shutdown has completed and the remaining database instances
            are stable.

     o  Symptom: The OraclePGMSService fails to start with error 1067.
	    Explanation: This error normally indicates that there is an 
            error in the software configuration.
        Action: Ensure that IBMCoreClusterService has been started.
            Ensure that only simple computer names were specified when
            using the IBMGSCFG.exe configuration utility rather than
            Fully Qualified Domain Names. For example, a name such as
            "ops1" should be used rather than "ops1.yourcompany.com".
         
     o  Symptom: The OraclePGMSService service terminates when
            attempting to start a database instance.  The messages
            "ORA-29702: Error occurred in Group Membership Services
            operation" or "ORA-03113: end-of-file in communication
            channel" might be seen.
        Explanation: When a node joins or leaves the cluster or when a
            database instance is started or stopped, Oracle must perform
            additional processing to complete the startup or shutdown
            of the new node or database instance. During this processing,
            additional membership changes may not be able to complete
            successfully. This is particularly so after a node failure
            when recovery actions are required by the database.
        Action: After a cluster membership change, it is recommended
            that time be allowed for the database state to stabilize before
            initiating another change. For example, when starting
            OraclePGMSService, after receiving the message 
            "The OraclePGMSService service was started successfully" 
            or observing the service status change to "Started" in the 
            Windows NT Services panel, wait at least 30 seconds after the
            service has been reported as started before attempting to 
            start the "OraclePGMSService" service on another node. 
            Similarly, when stopping OraclePGMSService, wait at least 30
            seconds after the service has been stopped before attempting
            to start or stop the service on another node. While 30 seconds
            is usually sufficient, the time can vary depending on
            the database load on the other nodes that have already joined 
            the cluster.
            When starting or stopping a database instance, it may be necessary
            to wait several minutes before performing a similar action on
            another node. These times may be longer if a service or database
            instance was stopped due to a failure on one of the nodes.

     o  Symptom:  When the Oracle Enterprise Manager is used to start or
     	    stop all database instances together (as opposed to selecting
            an individual database instance), the operation does not
            complete successfully.
        Explanation: It is recommended that Oracle services and database
            instances not be started simultaneously on different nodes.
     	Action: Select only individual instances when starting or stopping
            databases on different nodes.

     o  Symptom: The Oracle "shutdown immediate" command does not
            complete within 15 minutes.
        Explanation: After the "shutdown immediate" command is issued,
            it is recommended that the OracleServiceOPSn service also
            be stopped.  In some cases, "shutdown immediate" may take
            several minutes to complete.
        Action: Use the Windows NT Services window to stop
            OracleServiceOPSn or enter "net stop OracleServiceOPSn"
            from a command prompt, where n is the OPS instance number.
            If "shutdown immediate" reports that the database was closed
            and dismounted, then the OracleServiceOPSn may be stopped to
            free up resources of that database. If a message does not
            indicate that the database was closed and dismounted, then
            stopping OracleServiceOPSn may result in the loss of uncommitted
            changes but will not affect the integrity of committed data.

     o  Symptom: The manual startup of a service fails when performed
            immediately after starting up a node and logging on.
            This occurs with one of the following services:
            IBMCoreClusterService, OraclePGMSService, or OracleServiceOPSn.
        Explanation: The system is still performing startup tasks when
            the attempt is made to start up the IBMCoreClusterService.
            This may slow the startup of this service to the point where
            it times out and stops.  Since the Oracle services are
            dependent upon IBMCoreClusterServices, they also do not
            start.
        Action: Any of the following actions can be taken:
            - Wait a minute and retry the command to start the service.
            - After logging on to a system that is still starting up,
              wait a minute before attempting to start these services.
            - Set these services to "automatic" startup. This allows
              the system startup processes to complete before the
              services are started. This is the default setting for
              OraclePGMSService when Oracle is installed.

     o	Symptom: Excessive shared drive activity on Mondays or Tuesdays.
        Explanation: The Symplicity Storage Manager software used to manage
            the shared storage is set up to do an automatic parity check of 
            all LUNs on the shared storage every Sunday night by default.
            This can take quite some time. Since Symplicity Storage Manager
            needs to be installed on each node and the scheduling of the
            parity check is done by the Symplicity Storage Manager software
            on each node, the result is that the parity check will be 
            scheduled to run from all six nodes (i.e., six times) every
            Sunday night.
       Action: Start the Symplicity Storage Manager Maintenance and Tuning
            application on one of the nodes.  Go into the Options menu and 
            select Auto Parity Settings.  Uncheck the Automatic Parity 
            Check/Repair.  Repeat this process for all but one node (it is
            only necessary for the parity check to run from one node).

     o Symptom: Node running very slowly with process Oracle80.exe consuming 
            the majority of the processing time. Another symptom may be
            an ORA-00600 error in an Oracle instance's instanceLCK0.trc file. 
            The specific error in that file might be
            ORA-00600: internal error code, arguments: [ksires_1],
                       [KJUSERSTAT_SHUTDOWN], [], [], [], [], [], []
       Explanation: The database is thrashing on that node due to redo logs 
            or rollback segments that are either too small or too few in
            number. This is more likely to occur after a failure which causes
            database recovery operations to run.
       Action: The system should be tuned by increasing the number of redo 
            logs, increasing the size of the redo logs, and increasing the
            initial size of the rollback segments. As an example, in a test
            configuration this problem was resolved by increasing the number
            of redo logs per thread from 2 to 4 and by increasing their 
            sizes from 20MB to 100MB. The initial size of the rollback
            segments was increased from 2MB to 20MB with 2MB increments for
            the extents.


2.0  How to Obtain the Oracle Patch Set

_______________________________________

     - Go to the Oracle Web site http://www.oracle.com
     - Click on "Support".
     - If you have already registered for a Metalink ID,
       then Click on "Visit Metalink".
       Otherwise, Click on the link to register for a Metalink ID.
     - Click on "Download".
     - From the "product" pull-down, select Parallel Server Option.
       From the "platform" pull-down, select MS Windows NT
     - Download the Patch Set for OPS Version 8.0.5.1.a

3.0  Trademarks and Notices

___________________________

The following terms are trademarks of the IBM Corporation in the
United States or other countries or both:

     IBM
     Netfinity


Windows NT is a trademark or registered trademark of Microsoft Corporation.

Oracle and Oracle OPS are trademarks or registered trademarks of Oracle
Corporation.

Any other company, product, and service names may be trademarks or
service marks of others.



THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. IBM
DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING WITHOUT
LIMITATION, THE IMPLIED WARRANTIES OF FITNESS FOR PARTICULAR PURPOSE AND
MERCHANTABILITY WITH RESPECT TO THE INFORMATION IN THIS DOCUMENT. BY
FURNISHING THIS DOCUMENT, IBM GRANTS NO LICENSES TO ANY PATENTS OR
COPYRIGHTS.

Copyright (C) 1998, 1999 IBM Corporation.  All rights reserved.


Note to U.S. Government Users -- Documentation related to restricted
rights -- Use, duplication or disclosure is subject to restrictions set
forth in GSA ADP Schedule Contract with IBM Corp.