Category Archives: General

Part 2 – EMC VPLEX Experiences

Hi

Thanks, for coming back, or reading on from my previous post on EMC’s VPLEX kit and my experiences.

The first thing I was going to cover in this post is the most common commands that I’ve been using and what they are for. This is as much for my ease of reference as it is to touch on them for anyone else reading J

To begin with, there are two main places where you use the CLI

The Clusters management Server and the VPLEXcli

To logon to the Cluster Management interface you open a SSH connection using your preferred SSH client

  • IP address of the Cluster Management Ethernet interface
  • Port 22
  • SSH Protocol 2
  • Scrollback lines to 20000

 

When you you get to the logon interface you need to logon as the appropriate account, during the implementation phase this is probably going to be the service account. As always, please be smart about security and get this password changed early and don’t leave the default passwords on the default accounts

When you are logged in you will come to the following interface:

service@ManagementServer:~>

 

The second place you perform most of your CLI work is the VPLEXcli, you need to logon to the Management Server first and then enter the VPLEXcli using the command VPLEXcli (wow, who would have guessed it would be that easy!)

service@ManagementServer:~> vplexcli

 

Several messages are displayed, and a username prompt appears:

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

Enter User Name:

 

Again, you will need to authenticate with the appropriate UN and PW. Probably still the Service account you used to logon to the Management Server. When logged on successfully you’ll see the following:

creating logfile:/var/log/VPlex/cli/session. 

log_service_localhost_T28921_20101020175912

 

VPlexcli:/>

 

I found it highly useful to run up two sessions of putty, and logon with both, one that stay’s logged onto just the cluster management server and the other logged into the VPLEXcli. This allows you to quickly flick back and forth.

 

Commands

 

 

You can confirm that the Product Version of what’s running matches the required version in the VPLEX release notes and your expectations.

version -a

 

 

Verify the VPLEX directors

 

From the VPlexcli prompt, type the following command:

ll /engines/**/directors

Verify that the output lists all directors in the cluster, and that all directors show the following:

  • Commissioned status: true
  • Operational status: ok
  • Communication status: ok

Output example in a dual-engine cluster:


 
 

 

Verify storage volume availability

From the VPlexcli prompt, type the following commands to rediscover the back-end storage:

cd/clusters/cluster-1/storage-elements/storage-arrays/EMC-*

array re-discover <array_name>

Type the following command to verify availability of the provisioned storage:

storage-volume summary

 
 


 
 

Resume-at-loser

This is probably one of the most important commands to know, after you’ve had an outage of some type and you need to get your data re-sync’d

 

During an inter-cluster link failure, an you or your client can allow I/O to resume at one of the two clusters the “winning” cluster.

I/O remains suspended on the “losing” cluster. When the inter-cluster link heals, the winning and losing clusters re-connect, and the losing cluster discovers that the winning cluster has resumed I/O without it. Unless explicitly configured otherwise (using the auto-resume-at-loser property), I/O remains suspended on the losing cluster. This prevents applications at the losing cluster from experiencing a spontaneous data change. The delay allows the administrator to shut down applications and get into a clean state. After stopping the applications, the administrator can use this command to resynchronize the data image on the losing cluster with the data image on the winning cluster, Resume servicing I/O operations. The administrator may then safely restart the applications at the losing cluster.

 

Without the ‘–force’ option, this command asks for confirmation to proceed, since its accidental use while applications are still running at the losing cluster could cause

applications to misbehave.

 

 

Cd /clusters/cluster-n/consistency-groups/group-name/resume-at-loser
			

 

 

One of the important things to check is the Rx and Tx power of your FC modules, the following command allows you to bring this up and look for discrepancies or things that are out of the ordinary.

Ccd /engines/engine-1-1/directors/director-1-1-A/hardware/sfps/

 

 

 

Next up is some information around VPLEX and storage, and then VPLEX and VMWare Vsphere.

 

 


Part 1 – EMC VPLEX Experiences

 

Welcome everyone to Part One of my EMC VPLEX Metro Experiences

 

I recently designed and deployed a VPLEX Metro for a client to enable them to achieve some business requirements around disaster recovery and to in some cases avoid even the recovery aspect of a disaster by automating and replicating their data and services across multiple data centres.

The first thing I want to say about the EMC VPLEX in relation to the design and architecture phases is “Do not be fooled by the pretty web interface” Yes there is a pretty web interface and yes I believe the client will use this as the first touch point moving forward, however the EMC VPLEX is very CLI intensive and this needs to be taken into account, you need a skilled resource involved and engaged in the design

 

Overview

Simply put, the EMC VPLEX federates data located on heterogeneous (i.e. difference vendors and type) storage arrays to create dynamic, distributed, highly available data centers. You use this to achieve a number of tasks and objectives. This is a very powerful capability, however it also needs to be used correctly to have that power realised. The primary and most valuable uses for VPLEX are centred around Mobility, Availability and Collaboration.

VPLEX comes in three flavours, VPLEX Local, VPLEX Metro and VPLEX GEO.

  • VPLEX Local is for intra Data Center or across a campus and can be used to federate data on SAN’s from multiple vendors
  • VPLEX Metro is for Regional or Metropolitan area up to approx. 100KM apart and within 5ms RTT for latency
  • VPLEX Geo is for when you start to look at going across far greater distances (up to 50ms in latency) and asynchronous replication.

In my experience the most valuable feature of the EMC VPLEX is its ability to protect data in the event of disasters striking your business facilities or data center, however it also protects you from failure of components in your data centers.

Using the EMC VPLEX you can move data without interruption or downtime to hosts between EMC storage arrays or between EMC and non-EMC storage arrays. As the storage is presented through the virtual volumes it retains the same identities and access points for the hosts.

Collaboration is critical to many of today’s businesses and is driven by the highly competitive nature of so many industries, collaboration over distance is achieved with Access Anywhere which provides cache-consistent active-active access to your critical data across VPLEX clusters.

 

EMC has a nice Info graphic that shows how this looks.

VPLEX Active-Active

There is also a VPLEX management server which has Ethernet connectivity which provides cluster management services when connected your client’s network. This Ethernet port also provides the point of access for communications with the VPLEX Witness.

Witness Server

To help control where things land during a disaster or failure, a Witness server is used, this is a VMware Virtual machine located within a separate site, network or location (a separate failure domain) to provide a witness between VPLEX Clusters that are part of a distributed solution. This additional site needs only IP connectivity to the VPLEX sites and a 3 way VPN will be established between the VPLEX management servers and the VPLEX Witness. I’ve utilised the client head office or secondary sites with existing network and infrastructure to facilitate. While not something that I’ve implemented some customers require a third site, with a FC LUN acting as the quorum disk. This must be accessible from the solution’s node in each site resulting in additional storage and link costs.

 

So what physically is it?

 

Below is an image of the front and back of a VPLEX VS2 engine

One thing that should be kept in mind from the beginning is that the VPLEX hardware is designed and locked with a standard preconfigured port arrangement this is not reconfigurable. The VS2 hardware must be ordered as a Local, Metro or Geo. It is pre-configured with FC or 10 Gigabit Ethernet WAN connectivity from the factory. You can not currently purchase a VPLEX with both IP and FC connectivity, I hope that EMC changes this in the future as being able to have redundant paths or multiple paths to different arrays could be very valuable.

The VPLEX cluster sits in your racks and is connected between your storage array and and your compute. It consists of

  • 1,2 or 4 VPLEX Engines
  • Each engine contains 2 directors
  • Management Server
  • In Dual or Quad Engine designs there is also 1 pair of FC switching for communication between the directors and 2 UPSs for battery backup to the FC switching and Management Server.

 

As a solution architect I’ve been frustrated by customers with 2 or 3 types of Storage array in their environment, as they don’t have the budget to swap out multiple SAN’s at the same time it’s limited the solutions that can be presented or its required another storage array to specifically address a requirement. VPLEX can really slot into and fill certain needs and utilise existing storage at the same time.

The VPLEX’s connectivity is split between front and back end connectivity (FE and BE). The FE ports will log in to the fabrics and present themselves as targets for zoning to the hosts. The BE ports will log in to the fabrics as initiators to be used for zoning to the storage arrays.

Each director will connect to both SAN fabrics with both FE and BE ports. It should be noted that direct attaching can be done and is supported however is limiting and might not meet the customers’ requirements.

The WAN connectivity ports are configured as either 4 port FC modules or dual port 10GigE modules.

The FC WAN Com ports should be connected to dual separate backbone fabrics or networks that span the two sites. If the VPLEX is a IP version the 10GigE connections will need to be connected to dual networks consisting of the same QoS. The networking / site connectivity can be very complex and I would strongly recommend having a service provider who is experienced in successful VPLEX deployments involved or engage EMC to work with you.

 

The CLI

The VPLEX CLI is divided into command contexts. Some commands are accessible from all contexts, and are referred to as ‘global commands’. The remaining commands are arranged in a hierarchical context tree. These commands can only be executed from the appropriate location in the context tree. Understanding the command context tree is critical to using the VPLEX command line interface effectively.

The root context contains ten sub-contexts:

  • clusters – Create and manage links between clusters, devices, extents, system volumes and virtual volumes. Register initiator ports, export target ports, and storage views.
  • data-migrations – Create, verify, start, pause, cancel, and resume data migrations of extents or devices.
  • distributed-storage – Create and manage distributed devices and rule sets.
  • engines – Configure and manage directors, fans, management modules, and power.
  • management-server – Manage the Ethernet ports.
  • monitoring – Create and manage performance monitors.
  • notifications – Create and manage call-home events.
  • recoverpoint – Manage RecoverPoint options.
  • security – Configure and view authentication password-policy settings. Create, delete, import and export security certificates. Set and remove login banners. The authentication sub context was added to the security context.
  • system-defaults – Display systems default settings.

Except for system-defaults directory, each of the sub-contexts contains one or more sub-contexts to configure, manage, and display sub-components.

Command contexts have commands that can be executed only from that context. The command contexts are arranged in a hierarchical context tree. The topmost context is the root context, or “/”.

The commands that make up the CLI fall into two groups:

  • Global commands that can be used in any context. For example: cd, date, ls, exit, user, and security.
  • Context-specific commands that can be used only in specific contexts. For example, to use the copy command, the context must be /distributed-storage/rule-sets.

Use the help command to display a list of all commands (including the global commands) available from the current context.

Use the help -G command to display a list of available commands in the current context excluding the global commands

As with most half decent CLI’s these days you can use the Tab key to complete commands, display command arguments and display valid contexts and commands

 

The VPLEX command line interface includes 3 wildcards:

* – matches any number of characters.

? – matches any single character.

[a|b|c] – matches any of the single characters a or b or c.

 

* wildcard

Use the * wildcard to apply a single command to multiple objects of the same type (directors or ports). For example, to display the status of ports on each director in a cluster, without using wildcards:

ll engines/engine-1-1/directors/director-1-1-A/hardware/ports

ll engines/engine-1-1/directors/director-1-1-B/hardware/ports

ll engines/engine-1-2/directors/director-1-2-A/hardware/ports

ll engines/engine-1-2/directors/director-1-2-B/hardware/ports

.

.

.

Alternatively:

Use one * wildcard to specify all engines, and Use a second * wildcard specify all directors:

ll engines/engine-1-*/directors/*/hardware/ports

 

** wildcard Use the ** wildcard to match all contexts and entities between two specified objects. For example, to display all director ports associated with all engines without using wildcards:

ll /engines/engine-1-1/directors/director-1-1-A/hardware/ports

ll /engines/engine-1-1/directors/director-1-1-B/hardware/ports

.

.

.

Alternatively, use a ** wildcard to specify all contexts and entities between /engines and ports:

ll /engines/**/ports

 

? wildcard Use the ? wildcard to match a single character (number or letter).

ls /storage-elements/extents/0x1?[8|9]

Returns information on multiple extents.

 

[a|b|c] wildcard Use the [a|b|c] wildcard to match one or more characters in the brackets displays only ports with names starting with an A, and a second character of 0 or 1.

ll engines/engine-1-1/directors/director-1-1-A/hardware/ports/A[0-1]  

 

Clusters – VPLEX Local™ configurations have a single cluster, with a cluster ID of cluster 1. VPLEX Metro™ and VPLEX Geo™ configurations have two clusters with cluster IDs of 1 and 2.

VPlexcli:/clusters/cluster-1/

 

Engines are named <engine-n-n> where the first value is the cluster ID (1 or 2) and the second value is the engine ID (1-4).

VPlexcli:/engines/engine-1-2/

 

Directors are named <director-n-n-n> where the first value is the cluster ID (1 or 2), the second value is the engine ID (1-4), and the third is A or B.

VPlexcli:/engines/engine-1-1/directors/director-1-1-A

 

For objects that can have user-defined names, those names must comply with the following rules:

  • Can contain uppercase and lowercase letters, numbers, and underscores
  • No spaces
  • Cannot start with a number
  • No more than 63 characters

Command and handy commands from my experience are

 

 

 

 


Zert0 Replication

So I’ve been exposed to Zerto for some time now and I’ve been wanting to put a quick write up on it here as it’s a great product that can play a critical role in your DR and BCP planning.

Zerto is a hypervisor-based replication software product that can integrate with your VMWare VSphere virtual platform to provide replication and advanced features for your DR and BCP Plans. From the business perspective it enables I.T. to align their software and systems with the Business Continuity Planning (BCP) and Disaster Recovery (DR) strategies. The biggest thing for me is that when I’ve used Zerto as for hypervisor-based data replication, I have been able to reduce DR complexity and costs and still protect my client’s mission-critical virtualized applications. The downside is that it doesn’t really touch or allow for your legacy or dedicated hardware or SAN based CIFS shares to be replicated. Both of these have given me grief and in some cases can be a major issue.

Depending on who’s reading this, you may or may not know how insanely complex, quirky and annoying most legacy BCP and DR solutions can be, I’ve seen people quit their jobs over the frustrations and problems caused by systems that just don’t hold up to today’s world of high density and highly virtualised production environments, I’ve also seen numerous clients where there is a substantial difference between the understanding of what I.T. department can provide during a disaster and what the business thinks their I.T. department can do. This gap is a major stressor for I.T. staff and is not normally easy to resolve without major capex and opex costs.

Zerto provides a number of benefits and advantages over dedicated hardware solutions or hosted options

Reduced hardware costs, Zerto is a very powerful product and it is well positioned to take over and leverage existing hardware and slack storage for replication as well as not needing tier 1 high end storage at the replication site.

Reduced complexity and streamline IT operations, Normally I think that companies throw this one out there with anything they sell, however I’ve seen and experienced how much Zerto can reduce the day to day management of DR replication, I’ve also never had clients come back to me months after implementation just to say its great and just works. Zerto really puts a polished and simple interface over the top of what is a very complex process. One of the features that I think people need to have a play with as soon as they start testing this is the Virtual Protection Groups (VPG), I’m frequently working with systems that must be aligned throughout the stack, sometimes as many as 8 to 12 servers for a single instance and frequently with dozens of instances. VPG’s can make life amazingly simple. (Let’s not even talk about how well it plugs into VMware and VSphere, vmotion, DRS, HA and SVM)

Powerful BC/DR for Missions Critical Applications, Replication, Backups, RoboCopy, cloning, snapshotting, it’s all about making a copy of data, and nearly everyone and anything can do this, however I.T. and business is becoming a lot smarter and more intelligent about how they do business. Zerto is really good at not only getting you the right data in the right place, it’s also about getting it there in a useable and valuable state. The number of organisations that I’ve worked with that could not show that their Disaster Recovery system worked is staggering, when working with Zerto you quickly become used to having the ability the run up an isolated version of your SharePoint or CRM environment, applying patches and updates, testing then destroying and moving on to update your production environment with confidence that you’re in a good state.

In short, give Zerto a look if you need replication and recovery, or if you have requirements around testing patches and updates to your systems.


Handy Tips – VMWare

 

The other day I was working with junior engineer to deploy a green field infrastructure for a client, pretty simple all new infrastructure consisting of 2 switches, 3 servers and a SAN, however the Engineer was struggling with getting ISCSI to work. When I started asking him questions I quickly found he’d never thought of trying to ping the iscsi interfaces to check that side of the network was all functional.

I got him to ping ICSCI interface from the VMkernel interface “vmk2” the command looked something like this

esxcli network diag ping –I vmk2 –H 10.10.200.52

Below I’ve included the options with this command to help with specifying things like the outgoing interface, selecting ipv4 or ipv6 and the size of the payload.

–count | -c

Specify the number of packets to send.

–debug | -D

VMKPing debug mode.

–df | -d

Set DF bit on IPv4 packets.

–host | -H

Specify the host to send packets to. (required)

–interface | -I

Specify the outgoing interface.

–interval | -i

Set the interval for sending packets in seconds.

–ipv4 | -4

Ping with ICMPv4 echo requests.

–ipv6 | -6

Ping with ICMPv6 echo requests.

–nexthop | -N

Override the system’s default route selection, in dotted quad notation. (IPv4 only. Requires interface option)

–size | -s

Set the payload size of the packets to send.

–ttl | -t

Set IPv4 Time To Live or IPv6 Hop Limit

–wait | -W

Set the timeout to wait if no responses are received in seconds.

–help

Show the help message.

 

I hope this helps with ISCSI Troubleshooting for anyone who runs into issues.


Adopting the Cloud – The People side of things

 

H

Recently I’ve run into numerous problems getting some staff to engage with cloud adoption projects, I’m frequently brought in to consult or work as a cloud architect and I need to engage with these staff. Thinking back, this has actually happened for quite a number of years and while it’s not as prevalent as it used to be, I’ve found myself trying to sell the cloud not to customers, but to the engineers, consultants and internal IT staff who seem to think it would do them out of a job or reduce their ownership and power in their own environments. To be honest, I don’t think cloud would do any of these guys out of work, in fact I saw it building their importance to the business and need for their skills. Sure, rack and stack and some hardware specialised guys would hurt initially but these are by far the minority and with their experience they would have a far deeper understanding of the cloud that most.

So out of the conversation I thought it was worth discussing the skills that I talked about with both these individuals and their management, I think these are key to the adoption and successful implementation of private, hybrid or public cloud infrastructures. Having the right people at the right time is a key goal for most managers and businesses, however I don’t believe that cloud skill sets are unto their own, and building existing staff and their skills will give you a stronger employee and longer term engagement. I’ve also found staff that I’ve invested time and effort in have been more loyal and steadfast.

The Cloud is an interesting beast and everything I’ve seen comes down to a few key areas that critical to build or maintain both you and your employee’s skills in.

Design and Architecture

    Any cloud architecture project must be design and architected correctly, we are seeing more and more business’s come and ask how to get out of or manage their sprawling, uncontrolled and out of hand cloud adoption. When 4 different departments go to different cloud providers, swipe their credit card and start to do stuff it’s a recipe for disaster. I.T. needs to get in front of the business and talk about cloud services and what they mean and what they can and can’t do. Get involved with and build engage business stake holders who might need cloud, don’t let them go out and ask others, make them want to come to you and talk to you. This allows you to direct and control cloud adoption. This allows you to be closely involved in the design and architecture of a cloud solution in your organisation and not be told 6 months down the track that your contracts department has been using a cloud storage provider to host files and a webserver and its offline for some reason.

Backup / Recovery / High Availability / Resiliency / Disaster Recovery / Business Continuity Planning

    Ok, this area is a big bucket and I could have split it out, however in general no business can run successfully if they don’t have their services available. What this means is that staff need to be completely across what offering their cloud provider has around meeting their needs for backup, HA, DR and BCP. When you boil down a cloud environment it is still Compute, RAM and Storage sitting in a Data Center connected to the internet somewhere in the world. They can still have power outages, link failures, hardware failure, flood, fire and other natural disasters strike. Keep across what is available from the provider and ask the questions, how do you backup up our data, where do you store it, how accessible is it, how to we get it during a Disaster, What are your SLA’s for recovery, What is the RPO and RTO of this service? etc, etc.

Automation

    While I’ve seen cloud providers offer solutions with almost zero automation, the best cloud providers not only offer automation it’s built into the very core of their offering. Cloud is all about Self Service, Self-Provisioning and the trust that when you ask for another virtual machine, extra CPU’s, more RAM, or try and expand a HDD it will just work. Encourage your staff to build their awareness and understanding of these tools and systems and encourage them to build into their daily tasks review and automation of repetitive and mundane tasks. Why do something manually when you can automate it? And when dealing with the cloud automation is key, get across it early and as completely as possible.

Security / User Access Control / Compliance / Auditing

    Security and Compliance questions are probably up there with the most common and easily the most complex questions that I’ve had in regards to the cloud. Everything from are we allowed to do something, to what happens if data is lost or in a lot of cases more importantly compromised. The cloud is based off systems that are built on platforms designed (and sometimes not designed) for multi tenancy, this means that as a customer of a cloud service you are sharing you CPU, RAM, Storage, Network and virtualisation platform with hundreds and possibly thousands of other customers of the cloud providers. If something goes wrong and there is a breach of security or just human error, your data or someone else’s data could end up being accessible or impacted both others. Knowing how to respond to these events is a key skill.

Cross functional / Inter silo / cross divisional

    Cloud engagements require interaction and communication between the IT Silos’ of responsibility to a level rarely seen before, normally a server administrator couldn’t add new disk, CPU or network without engaging and working with the Storage, Virtualisation or Network areas of the IT department. This can cause challenges in almost any organisation, having a person or group who specialises in engaging the disparate areas and working both with the established or new processes and within the change management process is critical to success.

 

There are so many areas for your staff to skill up in or to take on a new responsibilities, it should be seen as a good thing and an advancement. When it is weighed against the loss of managing and owning the infrastructure or a particular area it should be something that can advance and improve, this is something that really can be managed with your staff both proactively and positively.

 

I hope this helps and shows that there is a huge amount of complexity in adopting the cloud and managing your staff and their experience and expectations from the whole endeavour.


Decommissioning old Blog and migrating to WordPress

So after playing with about 5 different platforms and screwing around with Office 365 I’ve decided to settle and use WordPress to host my blog and posts. I’m in the process of migrating the worthwhile posts and cleaning up all the old redundant stuff.


Hello World. Welcome to my Blog.

WELCOME!

Well, I’ve finally gone and done it… started a blog on this Domain Name. I’ve had this domain for quite a number of years and while it’s been fairly quiet and the original idea’s I’ve had didn’t pan out, I thought I’d give this a crack.

A little about me, I work for a mid-sized ICT Integrator in Australia and my primary focus is in corporate and educational ICT Architecture and system design as well as getting to implementation.

What will I be focussing on in this blog? that’s a great question and I think we’ll see how things go, but I would expect this will be a dumping ground for my own knowledgebase as well as anything I run across that I really like.

Things I’d think you will see…

  • Microsoft
  • VMware
  • LANDesk
  • Manage Engine ServiceDesk Plus
  • Switching
  • Routing
  • General IP networking
  • Remote access
  • Local access
  • Interactive White Boards (IWB’s)
  • HP Hardware (Blades, Laptops, Servers, Desktops)
  • Lenovo (Blades, Laptops and Desktops)
  • Cisco Hardware (OCS, Switches, etc)
  • Pretty much anything else I run across.
Anyway I hope you enjoy the blog and that if you do stumble across it that it has the info your looking for.
Dallas