Wednesday, February 6, 2013

Metro Storage Cluster

Metro Storage Clusters can be designed to maintain data availability beyond a single physical or logical site. In simple terms this is a stretched VMware ESXi cluster between two sites with a stretched stoarge system.A Metro Cluster configuration consists of two storage controllers, each residing in the same datacenter or two different physical locations, clustered together. It provides recovery for any single storage component or multiple point failure, and single-command recovery in case of complete site disaster. Metro Storage Clusters can be created with different storage systems like NetApp, EMC, HP,IBM etc.
In this article we would discuss about Metro cluster with NetApp storage systems and their specific requirements and solution overview.
Metro Cluster using NetApp:
MetroCluster leverages NetApp HA CFO functionality to automatically protect against controller failures. Additionally, MetroCluster layers local SyncMirror, cluster failover on disaster (CFOD), hardware redundancy, and geographical separation to achieve extreme levels of availability. No data loss occurs because of synchronous mirroring. Hardware redundancy is put in place for all MetroCluster components. Controllers, storage, cables, switches (fabric MetroCluster), and adapters are all redundant.
A VMware HA/DRS cluster is created across the two sites using ESXi 5.0 or 5.1 hosts and managed by vCenter Server 5.0 or 5.1. The vSphere Management, vMotion, and virtual machine networks are connected using a redundant network between the two sites. It is assumed that the vCenter Server managing the HA/DRS cluster can connect to the ESXi hosts at both sites.
Based on the distance considerations, NetApp MetroCluster can be deployed in two different configurations:


  • Stretched Metro Cluster:
This setup is ideal for two sites upto 500m range.
  • Fabric MetroCluster
This setup is for sites seperated upto 100km range



















Configuration Requirements
These requirements must be satisfied to support this configuration:
  • For distances under 500 m, stretch MetroCluster configurations can be used, and for distances over 500 m but under 160 km for systems running ONTAP version 8.1.1, a Fabric MetroCluster configuration can be used.
  • The maximum round trip latency for Ethernet Networks between two sites must be less than 10 ms, and for syncmirror replications must be less than 3 ms.
  • The Storage network must be a minimum of 1 Gbps throughput between the two sites for ISL connectivity.
  • ESXi hosts in the vMSC configuration should be configured with at least two different IP networks, one for storage and the other for management and virtual machine traffic. The Storage network will handle NFS and iSCSI traffic between ESXi hosts and NetApp Controllers. The second network (VM Network) will support virtual machine traffic as well as management functions for the ESXi hosts. End users can choose to configure additional networks for other functionality such as vMotion/Fault Tolerance. This is recommended as a best practice but is not a strict requirement for a vMSC configuration.
  • FC Switches are used for vMSC configurations where datastores are accessed via FC protocol, and ESX management traffic will be on an IP network. End users can choose to configure additional networks for other functionality such as vMotion/Fault Tolerance. This is recommended as a best practice but is not a strict requirement for a vMSC configuration.
  • For NFS/iSCSI configurations, a minimum of two uplinks for the controllers must be used. An interface group (ifgroup) should be created using the two uplinks in multimode configurations.
  • The VMware datastores and NFS volumes configured for the ESX servers are provisioned on mirrored aggregates.
  • The vCenter Server must be able to connect to ESX servers on both the sites.
  • The maximum number of Hosts in an HA cluster must not exceed 32 hosts.
  • A MetroCluster TieBreaker Machine  (MetroCluster TieBreaker (MCTB) Solution is a plug-in that runs in the background as a Windows service or Unix daemon on an OnCommand Unified Manager (OC UM) host) should be deployed in a third site, and must be able to access the storage controllers in Site one and Site two in order to initiate a CFOD in case of an entire site failure.
  • vMSC certification testing was conducted on vSphere 5.0 and NetApp Data ONTAP version 8.1 operating in 7 mode. 

Disaster Recovery vs Disaster Avoidance

One thing is for sure, that you cannot mix disaster recovery for disaster avoidance; both are separate even though it sounds similar.
Let’s try to understand the difference between these concepts using VMware Stretched clusters and VMware Site Recovery Manager.
Stretched clusters consist of two or more physical ESXi hosts deployed in separate sites that are less than 100 kilometres apart and are contained within a single vSphere cluster. This simply means that a VMware cluster of several hosts located at different sites using the same vCenter server.
By using Stretched cluster you can vMotion a VM from one site to another without any downtime, that’s Disaster Avoidance – A solution where  you cannot afford to have minimal downtime for your DR solution. In short it’s a very good Active site balancing solution.
In case of VMware SRM, the VM would start at the recovery site when the storage snap is attached to it by a sequence of automated scripts. Now this is a solution which would require the restart of the VM, hence downtime is mandatory. That’s a normal Disaster Recovery situation. Moreover you would require two vCenter servers for your SRM DR solution.
Now, let’s see the difference in a tabular format
However it’s important to note that both the solutions have their benefits according to the requirement. Both solutions enhance service availability, while stretched clusters focus on data availability and service mobility Site Recovery Manager focuses on controlled and repeatable disaster recovery processes to recover from outages.

Thursday, January 31, 2013

Virtualizing Active Directory

Now it's time to virtualize the major component in an IT Infrastructure the Active Directory!

There are several considerations which we need to make before making this plunge to the virtual world.
First we need to analyse our own environment and check the Active directory topology, its various sites and services, bandwidth, virtualization platform to be used,disaster recovery etc.

In this article we would like to focus on the major deployment consideration, the failure of which would be disastrous for an virtualized Active Directory  Active Directory in a VMware ESXi platform. The issue is known as clock drift, think what will happen if the Active Directory is running 20 mins behind the actual time!

What is clock drift?
In a virtualized environment, virtual machines that don’t require CPU cycles don’t get CPU cycles. For the typical Windows application in a virtual machine, this is not normally a major problem. When virtualizing Microsoft’s Active Directory, however, these idle cycles can cause significant time drift for domain controllers. Active Directory is critically dependent on accurate timekeeping, and one of the most important challenges you must address is how to prevent clock drift. In fact, a large part of a successful Active Directory implementation will be in the proper planning of time services.Accurate timekeeping is also essential for the replication
process in a multi-master directory environment

How to fix it?
The solution is to use the Windows Time Service and not VMware Tools synchronization for the forest root PDC Emulator. This requires configuring the forest PDC emulator to use an external time source. The procedure for defining an alternative external time source for this “master” time server is as follows:

1. Modify Registry settings on the PDC Emulator for the forest root domain:

In this key HKLM\System\CurrentControlSet\Services\W32Time\ Parameters\Type
Change the Type REG_SZ value from NT5DS to NTP.
This determines from which peers W32Time will accept synchronization. When the REG_SZ value is changed from NT5DS to NTP, the PDC Emulator synchronizes from the list of reliable time servers specified in the NtpServer registry key.

HKLM\SYSTEM\CurrentControlSet\Services\W32Time\ Parameters\NtpServer

Change the NtpServer value from time.windows.com,0x1 to an external stratum 1 time source—for example, clock.us.navy.com,0x1. This entry specifies a space-delimited list of stratum 1 time servers from which the local computer can obtain reliable time stamps. The list can use either fully-qualified domain names or IP addresses. (If DNS names are used, you must append ,0x1 to the end of each DNS name.)


HKLM\System\CurrentControlSet\Services\W32Time\Config
Change AnnounceFlags REG_DWORD from 10 to 5. This entry controls whether the local computer is marked as a reliable time server (which is only possible if the previous registry entry is set to NTP as described above). Change the REG_DWORD value from 10 to 5 here.

2. Stop and restart the time service:
net stop w32time
net start w32time

3. Manually force an update:

w32tm /resync /rediscover

Monday, January 28, 2013

IBM Server Firmware Update


        There are two ways to update the firmware one is the manual process another one is     automated process using the USB key.

Automated process: For automated firmware deployment Download the IBM bootable media creator "ibm_utl_bomc_2.30_windows_i386.exe" , run the utility and follow the wizard by providing the proxy details and the server details via which the updates would be downloaded for the server type selected. Once the updates are downloaded you can make a bootable USB key. The next steps would be to plug-in the USB key into the server and follow the instructions to update the firmware. This automatically updates all the firmware which includes IMM, UEFI, FPGA and DSA. Once done reboot the system.





Manual process:
Alternatively we can manually updating the firmware one after another.
However there is a specific sequence for updating the firmware the sequence is provided below:
Sequence of updating the firmware:
  • IMM
  • Uefi
  •  FPGA
  • DSA
Open a browser and type the IP address of the RSA and login to the RSA server with the Username and password.


 Check the System Status and ensure that the System Health summary does not show any errors or inconsistencies. If the server is healthy the colour of the icon should be green.

Check if the server has been turned on. If not then go to the Power/Restart option to power ON or Restart the server.

Go to Remote Control option and select “Start Remote Control in Multi-User mode”

Then go to Firmware update section and click browse

 Browse the IMM (*.upd file) update which has been downloaded from IBM support site. Once done click Update.

This will take a while to upload the update file  into the server.

Once the upload is completed it will display the firmware build number and it will also show the existing build number. To know when this build was released you can check the release notes from IBM support site. Finally click continue.

Once you click continue it will update the firmware to the latest version which has been downloaded.

When the firmware update is successful, click on OK.

Repeat the process for Uefi, FPGA and  DSA



Then go to Vital Product Data to validate that the firmware has been updates.

VMware ESXi 4.1 to ESXi 5.0 Upgrade

It Is Time to Upgrade to vSphere 5!!!

This document presents the detailed steps to upgrade an ESXi 4.1 host to ESXi 5.0 host.

First of all let's download the ISO image for ESXi 5.0 from the VMware download site at
http://www.vmware.com/download/

To begin with shutdown the ESXi 4.1 host:

Press F12 in the logon screen using DCUI 

Press F2 to shutdown the host.

Shutdown in Progress...


Go to the BIOS option and mode CD-ROM drive as the Boot option 1.


 It's time to insert the ESXi 5.0 iso into the CD-Rom drive.
 ESXi 5.0 will start loading
Accept the license agreement by pressing F11 
 This would start scanning for available devices...
 Now select a disk to install or upgrade
  
 After selecting the disk, this would again start scanning the selected device.
If the installer finds an existing ESX or ESXi installation and VMFS datastore you can choose from the following options:
  •          Upgrade ESXi, preserve VMFS datastore
  •          Install ESXi, preserve VMFS datastore
  •          Install ESXi, overwrite VMFS datastore

If an existing VMFS datastore cannot be preserved, you can choose only to install ESXi and overwrite the existing VMFS datastore, or to cancel the installation. If you choose to overwrite the existing VMFS datastore, back up the datastore first.
If the existing ESX or ESXi installation contains custom VIBs that are not included in the ESXi installer ISO, the option Upgrade ESXi, preserve VMFS datastore is replaced with Force Migrate ESXi, preserve VMFS datastore.

We always need to analyse our requirements carefully before selecting either of these options.In this case we have selected the first option to force migrate and preserve the VMFS data store, we would upgrade VMFS later on. Hit enter to proceed.
Now press F11 to Force Migrate.

 Fingers crossed! Upgrading to ESXi 5.0 In Progress...
 Bingo! Upgraded to ESXi 5.0 successfully!!!
Please hit enter to Reboot the host after upgrade.

 Now,the DCUI screen shows the expected upgraded version ESXi 5.0!
Well that's not the end of the story, we have to look into the options of upgrading to latest VMFS version. 

Lets Log on to VI Client 5.0 and select the ESXi Host, then select the Configuration tab and then select Storage from the Hardware section.
Now, to upgrade the datastore versions from VMFS 3.46 (for ESXi 4.1) to VMFS 5, simply select the datastore and click "Upgrade to VMFS 5..."


Please click OK in the warning popup for VMFS upgrade.

Now the File System format is displayed as VMFS 5.5.4!!!

NOTE: For testing purpose some folders and data were kept in this datastore, the VMFS 5.54 Up gradation process did not delete any data.


Checking the vmfs version using command:

Log on to DCUI then Press Alt F1 to enter the command mode:

To check the VMFS version and partitions
Type: CD vmfs/volumes/
Then check the volumes and the datastore’s

Command to check vmfs version:

vmkfstools -P "/vmfs/volumes/DatastoreName/"


 Before upgrading:
 After Upgrading:


Simplyfing Disaster Recovery


Ever thought of a DR solution, where you can press the RED button and  DR is initiated seamlessly!
Here's the solution for all VMware ESXi virtual machines named VMware Site Recovery Manager, also known as SRM in short. VMware vCenter Site Recovery Manager makes disaster recovery rapid, reliable, manageable, and affordable. SRM is a suite of tools that help to automate and test a disaster recovery plan. SRM works by integrating tightly with storage array-based replication. Most of the major storage vendors have made their products compatible with SRM. Testing the DR plan is one of the major challenges of DR. Typically, it is not performed often enough, and sometimes not at all. SRM allows you to perform a non-disruptive failover test to your remote site and revise and fine-tune your DR plan as necessary. IP addresses of the virtual machines in the remote site can be automatically reconfigured with IP addresses that match the IP scheme of the remote DR site. The failover plan can be executed with a single click that launches user-defined scripts during the recovery process. SRM can automate a series of complex manual steps to simplify the failover process.


The best feature of SRM is that non-disruptive Test DR can be conducted at any point of time!!! where a snap of the LUN’s would be presented to the DR ESXi hosts and the placeholder VM’s would be powered on in a test bubble network which would have no connectivity with the existing VLAN’s. This would help to test the functionality of the DR solution using SRM seamlessly. During real disaster the storage replication would be broken automatically; the replicated LUN’s would be presented to the ESXi hosts at DR site. Once the LUN’s are available the VM’s would be powered on one by one with the actual IP’s and hostname. SRM would also be automated to power-off the some non-prod VM’s at remote site to avoid any capacity constraints at the recovery site.

To get started for the deployment of VMware SRM, you can refer to the VMware documentation at http://www.vmware.com/pdf/srm_admin_4_1.pdf