Datacenter Internals: Veritas Cluster

Overview

Veritas Cluster enables one system to failover to the other system. All related software processes are simply moved from one system to the other system with minimal downtime.

Cluster Startup :-

Here is what the cluster does at startup:-

-Node checks if other node is already started, if so -- stays OFFLINE
-If no other machine is running, checks communication (gabconfig). May need system admin intervention if cluster requires both nodes to be available. (/sbin/gabconfig -c -x)

-Once communication between machines is open -- or gabconfig has been started, it sets up network (nic & ip adddress) (starts cluster server) .

If also brings up volume manager, file system, and then Applications.

File Locations (Logs, Conf, Executables):-

Log location: /var/VRTSvcs/log

There are several logs in this directory:-

engine.log_A: primary log, usually what you will be reading for debugging.

Conf files:-

Llt conf: /etc/llttab [should NOT need to access this]
Network conf: /etc/gabtab
Cluster conf: /etc/VRTSvcs/conf/config/main.cf (Has exact details on what the cluster contains. )

Most executables are in: /opt/VRTSvcs/bin or /sbin

Changing Configurations :-

ALWAYS be very careful when changing the cluster configurations.

There are two ways of changing the configurations.

The method one uses if the system is up (cluster is running on at least one node, preferably on both):

haconf -makerw
run needed commands (ie. hasys ....)
haconf -dump -makero

If both systems are down: -

hastop -all (shouldn't need this as cluster is down)

cp main.cf main.cf.%date%

vi main.cf

hacf -verify /etc/VRTSvcs/conf/config

hacf -generate /etc/VRTSvcs/conf/config

hastart

Veritas Cluster Debugging Tips :-

The normal debugging of steps includes: checking on status, restarting if no faults, checking licenses, clearing faults if needed, and checking logs.

To find out Current Status:-
/opt/VRTSvcs/bin/hastatus -summary This will give the general status of each machine and processes
/opt/VRTSvcs/bin/hares -display This gives much more detail - down to the resource level.
If hastatus fails on both machines (it returns that the cluster is not up or returns nothing), try to start the cluster
/opt/VRTSvcs/bin/hastart
/opt/VRTSvcs/bin/hastatus -summary will tell you if processes started properly. It will NOT start processes on a FAULTED system.

To check licenses:

vxlicense -p

Make sure all licenses are current - and NOT expired! If they are expired, that is your problem. Call VERITAS to get temporary licenses.

There is a BUG with veritas licences. Veritas will not run if there are ANY expired licenses -- even if you have the valid ones you need. To get veritas to run, you will need to MOVE the expired licenses.

vxlicense -p

Note the NUMBER after the license (ie: Feature name: DATABASE_EDITION [100])

cd /etc/vx/elm

mkdir oldmv lic.number old [do this for all expired licenses]

vxlicense -p [Make sure there are no expired licenses AND your good licenses are there]

hastart
If still fails, call veritas for temp licenses. Otherwise, be certain to do the same on your second machine.

To clear FAULTS:

hares -display

For each resource that is faulted run:

hares -clear resource-name -sys faulted-system

If all of these clear, then run hastatus -summary and make sure that these are clear. If some don't clear you MAY be able to clear them on the group level. Only do this as last resort:

hagrp -disableresources groupname

hagrp -flush group -sys sysname

hagrp -enableresources groupname

To get a group to go online:

hagrp -online group -sys desired-system

Thursday, April 2, 2009

Veritas Cluster

No comments:

Post a Comment

Followers