Politics and Technology.

Monday, April 28, 2008

RHEL 5 Cluster and GFS, the saga begins

Well, my next techie project is to get GFS up and running. To do this, we needed to reinstall several servers with RHEL 5. That was no sweat. Now, to use GFS, we need to install RHEL Cluster first. Ok, I can understand that: GFS is a part of RHEL Cluster and draws on its components to do its jobs.

First problem: how on earth do you install RHEL Cluster? Our first victim is subscribed to our Satellite server and is subscribed to the appropriate channels (don't even get me started on how much effort it was to get the Satellite server to subscribe to the RHEL 5 cluster channels in the first place). The manual for RHEL 5 cluster is not very helpful. It basically says to install it like you would RHEL 5.

"...secure and install the software as you would with Red Hat Enterprise Linux software."

Swell.

I want to avoid running yum a dozen or so times just to suck down the right rpm's. Apparently in the olden days, you can instruct up2date to pull down the full cluster suite. The man page for yum on RHEL 5 doesn't elude to similar functionality, though there are these curiously undocumented switches containing the word "group" in them.

Man pages. Yet another reason why I was a big fan of OpenBSD. It packages the best damn man pages a Unix guy could ask for. RHEL's man pages were written by people who apparently work the Help Desk phones at my office.

I suspect this will be a long thread, so rather than just wait for the end and blog the results, I'll blog the experience and finish with a how to. Hopefully the next entry will be a no-hassle how to to install Cluster with yum.

Friday, April 18, 2008

VVR and DD

It's time to start this blog back up.

Recently, I've had the pleasure of having to wipe clean disk storage arrays that hold Veritas volumes in an RVG. The pleasure did not come with executing "vxdg destroy", but came in having to figure out a way to preserve the data at both sides of the rlink without performing a full resynchronization.

In this case a full resynchronization would have taken 22 days, a time period not acceptable to the client, as one site is in South Brunswick, NJ, and the other site located somewhere near the Kingdom of the Rat. Our bandwidth and latency between the two sites are not the best that money can buy.

The answer was to do a block level copy of the volumes to portable USB drives and ship those drives between Orlando and South Brunswick.

I created a wiki page on our internal site that outlined my experience and I have copied over the content below. The setup involved was:

Veritas Cluster Server (VCS)
Veritas Global Cluster Manager (GCM)
Veritas File System (VxFS)
Veritas Volume Manager (VxVM)
Veritas Volume Replicator (VVR)
1 Pair of RHEL on Dell servers (RHEL 4 on Dell 2950's) in Orlando, connected to a single AX150 (a rebranded Clariion)
1 Single RHEL on Dell in South Brunswick, connected to a single AX150.

The AX150's handled the RAID setup, and EMC Powerpath handled the redundant HBA connections.

We had to convert the storage at both ends from RAID 5 to RAID1+0. Since the AX150 ONLY allows you to present disks as RAID 5 or RAID 1+0, and not as a JBOD, we couldn't just let VxVM handle the RAID. Furthermore, VVR does not support replicating RAID 5 Veritas Volumes, so when the client wanted to go cheap and do a RAID 5 (against our advice, of course), we had to use an external RAID solution.

After everything was built and right before the go-live date, we demonstrated how poor the performance of the whole solution was with RAID 5. The client then coughed up the money for the extra disks needed for RAID 1+0 and we set about trying to do the conversion as fast as possible.

1 Make the vxmake files

Create vxmake input files for each volume at both sites. Do this for every volume in the rvg, '''including the srl volume'''. Keep these files safe and make copies.


# vxprint -hmvpsQq -g DISKGROUP VOLUMENAME > /var/tmp/VOLUMENAME.vxprint

Though it won't be used, it is also probably a good idea to capture the RVG and RLink information for reference.


# vxprint -Vm -g DISKGROUP > /var/tmp/RVGNAME.vxprint
# vxprint -Pm -g DISKGROUP > /var/tmp/RLINKNAME.vxprint



2. Prep the systems

Mount the portable USB drives.

# mount /dev/DEVICE /mnt

Stop Veritas Cluster and unmount the RVG volumes. You may have to stop service groups and completely turn off VCS on both the primary and secondary. Ensure that the volumes are umounted but in a 'started' state within VxVM.


# vxinfo -g DISKGROUP

3. Backup the data

On the primary, perform a block-level copy of each volume to the portable disk using 'dd'. The block size is set high to speed the copy process.


# dd if=/dev/vx/rdsk/DISKGROUP/VOLUMENAME of=/mnt/VOLUMENAME.dd bs=1M

4. Reconfigure VxVM

Destroy the disk groups at both sites.


# vxdg destroy DISKGROUP

Add the new hardware to the hosts. You may have to reboot (ensure that VCS '''won't''' restart), run '''fdisk''', move the "/etc/vx/*info" files aside, run '''vxdctl enable''', rerun '''vxdiskadm''', or even run '''vxinstall'''. Add your new hardware to VxVM into a diskgroup of the same name that was used before. It is important to use the same disk names as before and have disks of the same size. If not, some heavy manual editing of the vxprint files will be necessary.

Recreate the volumes at both sites. Run '''vxmake''' using the vxprint's your created. You will need to edit the vxprint output files to remove the "rvg" references, and set the "path" stanza to the correct device names.


# vxmake -g DISKGROUP -d /var/tmp/VOLUMENAME.vxprint

Initialize the volumes at both sites. For the RVG data volumes, merely enable each of them since you will dd the data back on to the RVG volumes.


# vxvol -g DISKGROUP init enable VOLUMENAME

For the SRL volumes, zero them out. This process will take a long time so you may wish to hold off on doing this until there is a suitable time to execute it. A good time is while waiting for the portable USB drive to arrive at the secondary site.


# vxvol -g DISKGROUP init zero SRLVOLUME

5. Restore the data

Primary Site

Mount the portable USB disks to the primary server and copy that volumes data from the portable disk back on to the primary. Use "rdsk" in the path, NOT "dsk".


# dd if=/mnt/VOLUMENAME.dd of=/dev/vx/rdsk/DISKGROUP/VOLUMENAME bs=1M


Once the data is copied over, activate the volumes. '''''DO NOT MOUNT THE VOLUMES YET AS THAT WILL TAINT THE DATA!!!'''''


# vxvol -g DISKGROUP init active VOLUMENAME

Secondary Site

Unmount the portable and transport it to the secondary machine.


# umount /mnt

Mount the portable USB disks to the secondary server and copy that volumes data from the portable disk back on to the secondary. Use "rdsk" in the path, NOT "dsk".


# dd if=/mnt/VOLUMENAME.dd of=/dev/vx/rdsk/DISKGROUP/VOLUMENAME bs=1M

Once the data is copied over on the secondary, activate the volumes. '''''DO NOT MOUNT THE VOLUMES YET AS THAT WILL TAINT THE DATA!!!'''''

# vxvol -g DISKGROUP init active VOLUMENAME


6. Recreate the RVG

Prep the VVR network. Plumb up and activate the VVR IP's by hand on both the primary and secondary.

On the primary, create the RVG with '''vradmin'''.


# vradmin -g DISKGROUP createpri RVGNAME VOLUMENAME1,VOLUMENAME2,...VOLUMENAMEX SRLVOLUMENAME

On the primary, add the secondary volumes as secondaries with '''vradmin'''.


# vradmin -g DISKGROUP addsec RVGNAME PRIMARYVVRNAME SECONDARYVVRNAME prlink=PRIMARY_to_SECONDARY_RLINK_NAME srlink=SECONDARY_to_PRIMARY_RLINK_NAME


7. Verify and Start Replication

On the primary, run the sync with verify command on the volumes for the whole RVG. By using the "verify" option, the systems exchange and compare checksums and do not update. This should take a fraction of the time a full resynchronization would take. For instance, an RVG set that would normally take three weeks to synchronize, should take only six hours to verify.

When completed, the output should report that "the volumes are verified as identical." The RVG will then be ready to be started.


# vradmin -g DISKGROUP -verify syncrvg RVGNAME SECONDARYVVRNAME


Forcibly start the replication.


# vradmin -g DISKGROUP -f startrep RVGNAME SECONDARYVVRNAME


At this point you can mount the volumes on the primary host to check the data.

8. Clean up and reboot

Ensure that VCS will start at boot up, unmount the usb drives, etc. Reboot both sides. Sometimes VVR will take several minutes to recognize itself on both sides after the first time VCS starts it up.

9. Special Note

With VVR pairs that have high network latency, low bandwidth or high number of hops in between, it may be necessary to keep the VVR packets from fragmenting. This is easily accomplished with vradmin by setting the VVR pack_size to something small enough to avoid reaching the 1500 byte threshold when header information is added.


# vradmin -g DISKGROUP pauserep RVGNAME
# vradmin -g DISKGROUP set RVGNAME packet_size=1480
# vradmin -g DISKGROUP resumerep RVGNAME


Setting packet_size can only be done when UDP is the transport of choice.