Author Archives: Dallas Hindle

Cloud Backup as a Service (BaaS)

As my job is to design tailored and strategic technology solutions for school environments, I naturally talk to schools every day. When it comes to Backups, I keep hearing ‘I just want it to work’, or ‘I don’t want to worry about it’, and more frequently now ‘I just want someone else to deal with it’. I hear your pain, that’s why Backup as a Service exists!

These days it seems like everyone is getting onto the Backup as a Service (or BaaS) bandwagon! With the ongoing costs managing, repairing, maintaining, replacing, supporting, troubleshooting, etc, etc of LTO tapes, tape library’s, offsite storage providers, disk arrays, backup software, licensing, agents, time, warranties, manpower, power, rack space and air-conditioning… it’s all just getting a bit much and the idea of a simple Backup as a Service solution is very compelling.

While BaaS may not be the right solution for everyone, things have certainly come a long way and are moving forward very quickly with the improvements in technology, bandwidth and cloud services. Historically, one of the biggest roadblocks to adopting a cloud backup solution was the connectivity between the school and the cloud provider. Over recent years schools have seen demand for online services go through the roof and as a result their internet connections have had to grow and scale to provide connectivity to the students in the classrooms. After school hours however, the links are frequently underutilised and have a great deal of bandwidth available.

As could be expected, many schools I speak with still have a lot of questions around how BaaS works, so I’ll endeavour to answer some here.

Backup Problems VS Cloud Backup Problems

I frequently get asked questions about cloud backup and what to do when something goes wrong or breaks. Most of the time the answer is that we do the same as if it was an on-premises backup.

I normally treat the cloud as another site: It exists somewhere, it is connected via a link of some type and is has certain capabilities and capacities. All of these things are very similar to what a remote site may look like.

How do I know if my backups are successful?

You should always have regular reports on failed backups (and successful ones as well), however the only real way to know is to test them!

On site or in the cloud, it is always good practice to test your backups and I strongly recommend that you do so on a regular basis to confirm and enforce good practices around your backups. Data is very valuable and in today’s environment the value of data is only growing.

Who should I use for Backup as a Service?

This is a simple one, you should use the company that best suits your needs and understands your business, provides you with value added services, can assure you that your data is secure, and can help you understand the how, where, when and why of their cloud solution.

What will happen if I hit my maximum storage allocation?

Generally this will cause problems if your Windows guest machine runs out of storage, however this is not a uniquely Cloud, IaaS or BaaS problem and shouldn’t be considered as one. If you are close to, or have run out of space on your cloud environment, you can easily fix this by logging onto the admin console or contact your cloud provider and request additional space to be added to your account. I’ve done this a number of times and it is generally provisioned either instantaneously or within a matter of minutes. This scalability is one of the key benefits of Cloud computing.

Where are my backups?

This is always a great question, and here at Computelec we do tours of the datacenters where our cloud is located to show off both the facilities and the local capabilities. Where your data is stored can be critically important to some customers due to a number of different reasons such as legal, data sovereignty and risk mitigation.

I find that a lot of cloud backup services do not address recoverability time lines and it is not fully considered what the impact of these times will be. When you need to recover a critical virtual machine from a cloud backup you need to restore it. If that machine is hundreds of Gig and you are restoring from the cloud this can take days or even weeks.

To address the above problem, I would recommend a provider that can recover your backed up virtual machine into their cloud IaaS environment allowing you very quick access to the data, or one who can facilitate backups being copied to removable storage that can be securely transported to your site in the event of needing a large amount of data quickly.

How frequently should I be backing up?

Frequency of backups can be a long and interesting discussion, and comes back to what the business Recovery Point Objectives (RPO‘s) and Recovery Time Objectives (RTO‘s) are. I’m a strong believer that ICT should not dictate these times, they should be part of a greater discussion and provide guidance and support to the decision making process.

What is Value Add?

Value add is the differentiation that a lot of cloud providers are striving for. Computelec for instance is purely focused on education, that is a value we have above and beyond the norm. Our understanding of and close alignment with education gives us insight and understanding of schools and means our solutions are suited for today’s educational providers.

Summary

I hope this article has shed some light on Backup as a Service, and hopefully answered a few of your frequently thought questions. Again, BaaS may not be something that will fit your school’s ICT plan, but for schools who find Backup to be a real pain point in their operations, I implore you to investigate further.

If you have any questions that I haven’t addressed above, please don’t hesitate to comment or email me, I’d love to continue the conversation.


Infrastructure as a Service (IaaS) …what does it mean?

Most schools in Australia are now moving beyond asking “what is cloud computing?” and on to “how can we implement cloud computing?”. Now that cloud is a popular solution, the major decisions to make become what kind of service to use and when.

Cloud Services come in many different shapes and forms, such as Software as a Service, Platform as a Service, Disaster Recovery as a Service, Backup as a Service, Database as a Service etc. One of the most common on offer is Infrastructure as a Service (IaaS).

What is IaaS?

Infrastructure as a Service (IaaS) is a model where the IaaS provider hosts infrastructure components on behalf of the school. There are many different possibilities including virtual servers, firewalls, load balancers, and network connections. I wanted to go through a few of the major advantages that come from IaaS as well as some disadvantages.

Scalability

In my opinion the single greatest strength of IaaS is its scalability. Schools have been known to need rapid response from ICT and historically ICT has not been able to respond as quickly as the school needs. With IaaS, resources can be made instantly available for sudden spikes or drops in demand, effectively eliminating any downtime needed to add to or adjust their infrastructure.

Costs

One of the biggest drivers for schools to adopt cloud computing in general and with Infrastructure as a Service specifically, is the way it can cut costs. Since cloud computing is done over the internet, schools don’t have to spend the time, manpower or money on investing in hardware and infrastructure of their own. All of this is handled by the IaaS provider. That means schools don’t have to worry about all of the problems that come with maintaining, powering or cooling their own equipment.

One of the other ways that costs are lowered is by the reduced need for highly skilled IT personnel; the focus moves from IT infrastructure to learning and educational outcomes and working with teachers and staff to provide a better service. With more time on their hands, don’t be surprised to see IT come up with some creative ways to contribute to the schools projects and day to day.

Another big advantage IaaS brings to the table is its pay-as-you-go model. This pay for what you consume model has gained a lot of attention and interest, this has helped reduce ICT wastage or over provisioning.

Disadvantages

As with any technology, there are a number of disadvantages with IaaS. One concern is that of security. It’s the same concern that surrounds all of cloud services, and it is that the schools data stored in the cloud can be more stolen or lost. In essence it should be noted that data is frequently more at risk or equally at risk on school premises. This remains one of the main reasons that some schools avoid cloud computing.

Another disadvantage comes from vendor outages, where IaaS vendors suffer networks and service crashes, leaving customers unable to access their systems. Vendors can only address this problem by promising quick recovery time and by giving assurances around data recoverability and security.

Choosing an IaaS Vendor

The choice of IaaS service provider is important as schools are not only seeking, high quality, reliable service and support but also a partner that understands the school and the business of education.
It is wise to select a partner who understands education, teaching, learning and also understands IT services. Having a close intimate relationship and understanding of education is paramount in selecting the school’s cloud service provider.

Summary

To close off and summarise, the movement we have seen in the education sector towards IaaS is due to schools feeling that the benefits gained from IaaS make a school more flexible and efficient. IaaS can in a lot of cases cut costs and enable ICT to deliver more services more quickly while leaving room for innovation and a tighter focus on the educational outcomes that the school is driving towards.


Part 2 – EMC VPLEX Experiences

Hi

Thanks, for coming back, or reading on from my previous post on EMC’s VPLEX kit and my experiences.

The first thing I was going to cover in this post is the most common commands that I’ve been using and what they are for. This is as much for my ease of reference as it is to touch on them for anyone else reading J

To begin with, there are two main places where you use the CLI

The Clusters management Server and the VPLEXcli

To logon to the Cluster Management interface you open a SSH connection using your preferred SSH client

  • IP address of the Cluster Management Ethernet interface
  • Port 22
  • SSH Protocol 2
  • Scrollback lines to 20000

 

When you you get to the logon interface you need to logon as the appropriate account, during the implementation phase this is probably going to be the service account. As always, please be smart about security and get this password changed early and don’t leave the default passwords on the default accounts

When you are logged in you will come to the following interface:

service@ManagementServer:~>

 

The second place you perform most of your CLI work is the VPLEXcli, you need to logon to the Management Server first and then enter the VPLEXcli using the command VPLEXcli (wow, who would have guessed it would be that easy!)

service@ManagementServer:~> vplexcli

 

Several messages are displayed, and a username prompt appears:

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

Enter User Name:

 

Again, you will need to authenticate with the appropriate UN and PW. Probably still the Service account you used to logon to the Management Server. When logged on successfully you’ll see the following:

creating logfile:/var/log/VPlex/cli/session. 

log_service_localhost_T28921_20101020175912

 

VPlexcli:/>

 

I found it highly useful to run up two sessions of putty, and logon with both, one that stay’s logged onto just the cluster management server and the other logged into the VPLEXcli. This allows you to quickly flick back and forth.

 

Commands

 

 

You can confirm that the Product Version of what’s running matches the required version in the VPLEX release notes and your expectations.

version -a

 

 

Verify the VPLEX directors

 

From the VPlexcli prompt, type the following command:

ll /engines/**/directors

Verify that the output lists all directors in the cluster, and that all directors show the following:

  • Commissioned status: true
  • Operational status: ok
  • Communication status: ok

Output example in a dual-engine cluster:


 
 

 

Verify storage volume availability

From the VPlexcli prompt, type the following commands to rediscover the back-end storage:

cd/clusters/cluster-1/storage-elements/storage-arrays/EMC-*

array re-discover <array_name>

Type the following command to verify availability of the provisioned storage:

storage-volume summary

 
 


 
 

Resume-at-loser

This is probably one of the most important commands to know, after you’ve had an outage of some type and you need to get your data re-sync’d

 

During an inter-cluster link failure, an you or your client can allow I/O to resume at one of the two clusters the “winning” cluster.

I/O remains suspended on the “losing” cluster. When the inter-cluster link heals, the winning and losing clusters re-connect, and the losing cluster discovers that the winning cluster has resumed I/O without it. Unless explicitly configured otherwise (using the auto-resume-at-loser property), I/O remains suspended on the losing cluster. This prevents applications at the losing cluster from experiencing a spontaneous data change. The delay allows the administrator to shut down applications and get into a clean state. After stopping the applications, the administrator can use this command to resynchronize the data image on the losing cluster with the data image on the winning cluster, Resume servicing I/O operations. The administrator may then safely restart the applications at the losing cluster.

 

Without the ‘–force’ option, this command asks for confirmation to proceed, since its accidental use while applications are still running at the losing cluster could cause

applications to misbehave.

 

 

Cd /clusters/cluster-n/consistency-groups/group-name/resume-at-loser
			

 

 

One of the important things to check is the Rx and Tx power of your FC modules, the following command allows you to bring this up and look for discrepancies or things that are out of the ordinary.

Ccd /engines/engine-1-1/directors/director-1-1-A/hardware/sfps/

 

 

 

Next up is some information around VPLEX and storage, and then VPLEX and VMWare Vsphere.

 

 


Part 1 – EMC VPLEX Experiences

 

Welcome everyone to Part One of my EMC VPLEX Metro Experiences

 

I recently designed and deployed a VPLEX Metro for a client to enable them to achieve some business requirements around disaster recovery and to in some cases avoid even the recovery aspect of a disaster by automating and replicating their data and services across multiple data centres.

The first thing I want to say about the EMC VPLEX in relation to the design and architecture phases is “Do not be fooled by the pretty web interface” Yes there is a pretty web interface and yes I believe the client will use this as the first touch point moving forward, however the EMC VPLEX is very CLI intensive and this needs to be taken into account, you need a skilled resource involved and engaged in the design

 

Overview

Simply put, the EMC VPLEX federates data located on heterogeneous (i.e. difference vendors and type) storage arrays to create dynamic, distributed, highly available data centers. You use this to achieve a number of tasks and objectives. This is a very powerful capability, however it also needs to be used correctly to have that power realised. The primary and most valuable uses for VPLEX are centred around Mobility, Availability and Collaboration.

VPLEX comes in three flavours, VPLEX Local, VPLEX Metro and VPLEX GEO.

  • VPLEX Local is for intra Data Center or across a campus and can be used to federate data on SAN’s from multiple vendors
  • VPLEX Metro is for Regional or Metropolitan area up to approx. 100KM apart and within 5ms RTT for latency
  • VPLEX Geo is for when you start to look at going across far greater distances (up to 50ms in latency) and asynchronous replication.

In my experience the most valuable feature of the EMC VPLEX is its ability to protect data in the event of disasters striking your business facilities or data center, however it also protects you from failure of components in your data centers.

Using the EMC VPLEX you can move data without interruption or downtime to hosts between EMC storage arrays or between EMC and non-EMC storage arrays. As the storage is presented through the virtual volumes it retains the same identities and access points for the hosts.

Collaboration is critical to many of today’s businesses and is driven by the highly competitive nature of so many industries, collaboration over distance is achieved with Access Anywhere which provides cache-consistent active-active access to your critical data across VPLEX clusters.

 

EMC has a nice Info graphic that shows how this looks.

VPLEX Active-Active

There is also a VPLEX management server which has Ethernet connectivity which provides cluster management services when connected your client’s network. This Ethernet port also provides the point of access for communications with the VPLEX Witness.

Witness Server

To help control where things land during a disaster or failure, a Witness server is used, this is a VMware Virtual machine located within a separate site, network or location (a separate failure domain) to provide a witness between VPLEX Clusters that are part of a distributed solution. This additional site needs only IP connectivity to the VPLEX sites and a 3 way VPN will be established between the VPLEX management servers and the VPLEX Witness. I’ve utilised the client head office or secondary sites with existing network and infrastructure to facilitate. While not something that I’ve implemented some customers require a third site, with a FC LUN acting as the quorum disk. This must be accessible from the solution’s node in each site resulting in additional storage and link costs.

 

So what physically is it?

 

Below is an image of the front and back of a VPLEX VS2 engine

One thing that should be kept in mind from the beginning is that the VPLEX hardware is designed and locked with a standard preconfigured port arrangement this is not reconfigurable. The VS2 hardware must be ordered as a Local, Metro or Geo. It is pre-configured with FC or 10 Gigabit Ethernet WAN connectivity from the factory. You can not currently purchase a VPLEX with both IP and FC connectivity, I hope that EMC changes this in the future as being able to have redundant paths or multiple paths to different arrays could be very valuable.

The VPLEX cluster sits in your racks and is connected between your storage array and and your compute. It consists of

  • 1,2 or 4 VPLEX Engines
  • Each engine contains 2 directors
  • Management Server
  • In Dual or Quad Engine designs there is also 1 pair of FC switching for communication between the directors and 2 UPSs for battery backup to the FC switching and Management Server.

 

As a solution architect I’ve been frustrated by customers with 2 or 3 types of Storage array in their environment, as they don’t have the budget to swap out multiple SAN’s at the same time it’s limited the solutions that can be presented or its required another storage array to specifically address a requirement. VPLEX can really slot into and fill certain needs and utilise existing storage at the same time.

The VPLEX’s connectivity is split between front and back end connectivity (FE and BE). The FE ports will log in to the fabrics and present themselves as targets for zoning to the hosts. The BE ports will log in to the fabrics as initiators to be used for zoning to the storage arrays.

Each director will connect to both SAN fabrics with both FE and BE ports. It should be noted that direct attaching can be done and is supported however is limiting and might not meet the customers’ requirements.

The WAN connectivity ports are configured as either 4 port FC modules or dual port 10GigE modules.

The FC WAN Com ports should be connected to dual separate backbone fabrics or networks that span the two sites. If the VPLEX is a IP version the 10GigE connections will need to be connected to dual networks consisting of the same QoS. The networking / site connectivity can be very complex and I would strongly recommend having a service provider who is experienced in successful VPLEX deployments involved or engage EMC to work with you.

 

The CLI

The VPLEX CLI is divided into command contexts. Some commands are accessible from all contexts, and are referred to as ‘global commands’. The remaining commands are arranged in a hierarchical context tree. These commands can only be executed from the appropriate location in the context tree. Understanding the command context tree is critical to using the VPLEX command line interface effectively.

The root context contains ten sub-contexts:

  • clusters – Create and manage links between clusters, devices, extents, system volumes and virtual volumes. Register initiator ports, export target ports, and storage views.
  • data-migrations – Create, verify, start, pause, cancel, and resume data migrations of extents or devices.
  • distributed-storage – Create and manage distributed devices and rule sets.
  • engines – Configure and manage directors, fans, management modules, and power.
  • management-server – Manage the Ethernet ports.
  • monitoring – Create and manage performance monitors.
  • notifications – Create and manage call-home events.
  • recoverpoint – Manage RecoverPoint options.
  • security – Configure and view authentication password-policy settings. Create, delete, import and export security certificates. Set and remove login banners. The authentication sub context was added to the security context.
  • system-defaults – Display systems default settings.

Except for system-defaults directory, each of the sub-contexts contains one or more sub-contexts to configure, manage, and display sub-components.

Command contexts have commands that can be executed only from that context. The command contexts are arranged in a hierarchical context tree. The topmost context is the root context, or “/”.

The commands that make up the CLI fall into two groups:

  • Global commands that can be used in any context. For example: cd, date, ls, exit, user, and security.
  • Context-specific commands that can be used only in specific contexts. For example, to use the copy command, the context must be /distributed-storage/rule-sets.

Use the help command to display a list of all commands (including the global commands) available from the current context.

Use the help -G command to display a list of available commands in the current context excluding the global commands

As with most half decent CLI’s these days you can use the Tab key to complete commands, display command arguments and display valid contexts and commands

 

The VPLEX command line interface includes 3 wildcards:

* – matches any number of characters.

? – matches any single character.

[a|b|c] – matches any of the single characters a or b or c.

 

* wildcard

Use the * wildcard to apply a single command to multiple objects of the same type (directors or ports). For example, to display the status of ports on each director in a cluster, without using wildcards:

ll engines/engine-1-1/directors/director-1-1-A/hardware/ports

ll engines/engine-1-1/directors/director-1-1-B/hardware/ports

ll engines/engine-1-2/directors/director-1-2-A/hardware/ports

ll engines/engine-1-2/directors/director-1-2-B/hardware/ports

.

.

.

Alternatively:

Use one * wildcard to specify all engines, and Use a second * wildcard specify all directors:

ll engines/engine-1-*/directors/*/hardware/ports

 

** wildcard Use the ** wildcard to match all contexts and entities between two specified objects. For example, to display all director ports associated with all engines without using wildcards:

ll /engines/engine-1-1/directors/director-1-1-A/hardware/ports

ll /engines/engine-1-1/directors/director-1-1-B/hardware/ports

.

.

.

Alternatively, use a ** wildcard to specify all contexts and entities between /engines and ports:

ll /engines/**/ports

 

? wildcard Use the ? wildcard to match a single character (number or letter).

ls /storage-elements/extents/0x1?[8|9]

Returns information on multiple extents.

 

[a|b|c] wildcard Use the [a|b|c] wildcard to match one or more characters in the brackets displays only ports with names starting with an A, and a second character of 0 or 1.

ll engines/engine-1-1/directors/director-1-1-A/hardware/ports/A[0-1]  

 

Clusters – VPLEX Local™ configurations have a single cluster, with a cluster ID of cluster 1. VPLEX Metro™ and VPLEX Geo™ configurations have two clusters with cluster IDs of 1 and 2.

VPlexcli:/clusters/cluster-1/

 

Engines are named <engine-n-n> where the first value is the cluster ID (1 or 2) and the second value is the engine ID (1-4).

VPlexcli:/engines/engine-1-2/

 

Directors are named <director-n-n-n> where the first value is the cluster ID (1 or 2), the second value is the engine ID (1-4), and the third is A or B.

VPlexcli:/engines/engine-1-1/directors/director-1-1-A

 

For objects that can have user-defined names, those names must comply with the following rules:

  • Can contain uppercase and lowercase letters, numbers, and underscores
  • No spaces
  • Cannot start with a number
  • No more than 63 characters

Command and handy commands from my experience are

 

 

 

 


Zert0 Replication

So I’ve been exposed to Zerto for some time now and I’ve been wanting to put a quick write up on it here as it’s a great product that can play a critical role in your DR and BCP planning.

Zerto is a hypervisor-based replication software product that can integrate with your VMWare VSphere virtual platform to provide replication and advanced features for your DR and BCP Plans. From the business perspective it enables I.T. to align their software and systems with the Business Continuity Planning (BCP) and Disaster Recovery (DR) strategies. The biggest thing for me is that when I’ve used Zerto as for hypervisor-based data replication, I have been able to reduce DR complexity and costs and still protect my client’s mission-critical virtualized applications. The downside is that it doesn’t really touch or allow for your legacy or dedicated hardware or SAN based CIFS shares to be replicated. Both of these have given me grief and in some cases can be a major issue.

Depending on who’s reading this, you may or may not know how insanely complex, quirky and annoying most legacy BCP and DR solutions can be, I’ve seen people quit their jobs over the frustrations and problems caused by systems that just don’t hold up to today’s world of high density and highly virtualised production environments, I’ve also seen numerous clients where there is a substantial difference between the understanding of what I.T. department can provide during a disaster and what the business thinks their I.T. department can do. This gap is a major stressor for I.T. staff and is not normally easy to resolve without major capex and opex costs.

Zerto provides a number of benefits and advantages over dedicated hardware solutions or hosted options

Reduced hardware costs, Zerto is a very powerful product and it is well positioned to take over and leverage existing hardware and slack storage for replication as well as not needing tier 1 high end storage at the replication site.

Reduced complexity and streamline IT operations, Normally I think that companies throw this one out there with anything they sell, however I’ve seen and experienced how much Zerto can reduce the day to day management of DR replication, I’ve also never had clients come back to me months after implementation just to say its great and just works. Zerto really puts a polished and simple interface over the top of what is a very complex process. One of the features that I think people need to have a play with as soon as they start testing this is the Virtual Protection Groups (VPG), I’m frequently working with systems that must be aligned throughout the stack, sometimes as many as 8 to 12 servers for a single instance and frequently with dozens of instances. VPG’s can make life amazingly simple. (Let’s not even talk about how well it plugs into VMware and VSphere, vmotion, DRS, HA and SVM)

Powerful BC/DR for Missions Critical Applications, Replication, Backups, RoboCopy, cloning, snapshotting, it’s all about making a copy of data, and nearly everyone and anything can do this, however I.T. and business is becoming a lot smarter and more intelligent about how they do business. Zerto is really good at not only getting you the right data in the right place, it’s also about getting it there in a useable and valuable state. The number of organisations that I’ve worked with that could not show that their Disaster Recovery system worked is staggering, when working with Zerto you quickly become used to having the ability the run up an isolated version of your SharePoint or CRM environment, applying patches and updates, testing then destroying and moving on to update your production environment with confidence that you’re in a good state.

In short, give Zerto a look if you need replication and recovery, or if you have requirements around testing patches and updates to your systems.


Handy Tips – VMWare

 

The other day I was working with junior engineer to deploy a green field infrastructure for a client, pretty simple all new infrastructure consisting of 2 switches, 3 servers and a SAN, however the Engineer was struggling with getting ISCSI to work. When I started asking him questions I quickly found he’d never thought of trying to ping the iscsi interfaces to check that side of the network was all functional.

I got him to ping ICSCI interface from the VMkernel interface “vmk2” the command looked something like this

esxcli network diag ping –I vmk2 –H 10.10.200.52

Below I’ve included the options with this command to help with specifying things like the outgoing interface, selecting ipv4 or ipv6 and the size of the payload.

–count | -c

Specify the number of packets to send.

–debug | -D

VMKPing debug mode.

–df | -d

Set DF bit on IPv4 packets.

–host | -H

Specify the host to send packets to. (required)

–interface | -I

Specify the outgoing interface.

–interval | -i

Set the interval for sending packets in seconds.

–ipv4 | -4

Ping with ICMPv4 echo requests.

–ipv6 | -6

Ping with ICMPv6 echo requests.

–nexthop | -N

Override the system’s default route selection, in dotted quad notation. (IPv4 only. Requires interface option)

–size | -s

Set the payload size of the packets to send.

–ttl | -t

Set IPv4 Time To Live or IPv6 Hop Limit

–wait | -W

Set the timeout to wait if no responses are received in seconds.

–help

Show the help message.

 

I hope this helps with ISCSI Troubleshooting for anyone who runs into issues.


Adopting the Cloud – The People side of things

 

H

Recently I’ve run into numerous problems getting some staff to engage with cloud adoption projects, I’m frequently brought in to consult or work as a cloud architect and I need to engage with these staff. Thinking back, this has actually happened for quite a number of years and while it’s not as prevalent as it used to be, I’ve found myself trying to sell the cloud not to customers, but to the engineers, consultants and internal IT staff who seem to think it would do them out of a job or reduce their ownership and power in their own environments. To be honest, I don’t think cloud would do any of these guys out of work, in fact I saw it building their importance to the business and need for their skills. Sure, rack and stack and some hardware specialised guys would hurt initially but these are by far the minority and with their experience they would have a far deeper understanding of the cloud that most.

So out of the conversation I thought it was worth discussing the skills that I talked about with both these individuals and their management, I think these are key to the adoption and successful implementation of private, hybrid or public cloud infrastructures. Having the right people at the right time is a key goal for most managers and businesses, however I don’t believe that cloud skill sets are unto their own, and building existing staff and their skills will give you a stronger employee and longer term engagement. I’ve also found staff that I’ve invested time and effort in have been more loyal and steadfast.

The Cloud is an interesting beast and everything I’ve seen comes down to a few key areas that critical to build or maintain both you and your employee’s skills in.

Design and Architecture

    Any cloud architecture project must be design and architected correctly, we are seeing more and more business’s come and ask how to get out of or manage their sprawling, uncontrolled and out of hand cloud adoption. When 4 different departments go to different cloud providers, swipe their credit card and start to do stuff it’s a recipe for disaster. I.T. needs to get in front of the business and talk about cloud services and what they mean and what they can and can’t do. Get involved with and build engage business stake holders who might need cloud, don’t let them go out and ask others, make them want to come to you and talk to you. This allows you to direct and control cloud adoption. This allows you to be closely involved in the design and architecture of a cloud solution in your organisation and not be told 6 months down the track that your contracts department has been using a cloud storage provider to host files and a webserver and its offline for some reason.

Backup / Recovery / High Availability / Resiliency / Disaster Recovery / Business Continuity Planning

    Ok, this area is a big bucket and I could have split it out, however in general no business can run successfully if they don’t have their services available. What this means is that staff need to be completely across what offering their cloud provider has around meeting their needs for backup, HA, DR and BCP. When you boil down a cloud environment it is still Compute, RAM and Storage sitting in a Data Center connected to the internet somewhere in the world. They can still have power outages, link failures, hardware failure, flood, fire and other natural disasters strike. Keep across what is available from the provider and ask the questions, how do you backup up our data, where do you store it, how accessible is it, how to we get it during a Disaster, What are your SLA’s for recovery, What is the RPO and RTO of this service? etc, etc.

Automation

    While I’ve seen cloud providers offer solutions with almost zero automation, the best cloud providers not only offer automation it’s built into the very core of their offering. Cloud is all about Self Service, Self-Provisioning and the trust that when you ask for another virtual machine, extra CPU’s, more RAM, or try and expand a HDD it will just work. Encourage your staff to build their awareness and understanding of these tools and systems and encourage them to build into their daily tasks review and automation of repetitive and mundane tasks. Why do something manually when you can automate it? And when dealing with the cloud automation is key, get across it early and as completely as possible.

Security / User Access Control / Compliance / Auditing

    Security and Compliance questions are probably up there with the most common and easily the most complex questions that I’ve had in regards to the cloud. Everything from are we allowed to do something, to what happens if data is lost or in a lot of cases more importantly compromised. The cloud is based off systems that are built on platforms designed (and sometimes not designed) for multi tenancy, this means that as a customer of a cloud service you are sharing you CPU, RAM, Storage, Network and virtualisation platform with hundreds and possibly thousands of other customers of the cloud providers. If something goes wrong and there is a breach of security or just human error, your data or someone else’s data could end up being accessible or impacted both others. Knowing how to respond to these events is a key skill.

Cross functional / Inter silo / cross divisional

    Cloud engagements require interaction and communication between the IT Silos’ of responsibility to a level rarely seen before, normally a server administrator couldn’t add new disk, CPU or network without engaging and working with the Storage, Virtualisation or Network areas of the IT department. This can cause challenges in almost any organisation, having a person or group who specialises in engaging the disparate areas and working both with the established or new processes and within the change management process is critical to success.

 

There are so many areas for your staff to skill up in or to take on a new responsibilities, it should be seen as a good thing and an advancement. When it is weighed against the loss of managing and owning the infrastructure or a particular area it should be something that can advance and improve, this is something that really can be managed with your staff both proactively and positively.

 

I hope this helps and shows that there is a huge amount of complexity in adopting the cloud and managing your staff and their experience and expectations from the whole endeavour.


EMC VNX2 SAN Storage

Recently EMC released their upgraded storage platform the VNX2 or Next Generation VNX. There are actually quite a few changes to the hardware and capabilities of this platform that are really impressive and worth a look. Also a lot of marking spiel and numbers that will need to be proven in our real world environments to be believed.

Let’s start where it began to get a good understanding of where it’s going and to have something to measure the improvements against.

The EMC VNX Series SAN has been a pretty damn reliable and solid platform that my customers have been very happy with over the last several years, EMC has some impressive numbers around how many they’ve sold.

  • More than 3,600 PBs shipped
  • Over 500 customers purchasing more than a PB of storage
  • Over 71,000 Units shipped
  • SSD’s have gotten traction in the enterprise with over 200,000 SSD’s shipped, this boils down to over 60% of the VNX Series shipping with at least some Flash

All very impressive and substantial numbers and figures. I have been involved in the architecture and design as well as deployment of about 30 VNX Series SAN’s from the VNXe 3300 to the higher end of the scale. In all cases I’ve found that building an understanding of the customers’ requirements and then ensuring that they are met by the solution recommended is key. Storage isn’t just about TB’s and Drives anymore and Virtualisation changed so many aspects of how the Data center is designed and operated that in essences the storage array can be thought about as the foundation of the data center. Without that foundation meeting requirements then everything that sits on top of it will be impacted, from the Hyper-Visors, operating systems, applications, business services, etc. once these are impacted blame gets thrown around to the network, wireless, internet and other IT services.

Today’s Storage Area Network Arrays have to support a large number of physical hosts, an even larger number of Virtual machines and possibly thousands of applications, with workloads and demands that can fluctuate and change significantly with no warning. It will be interesting to see how EMC’s Next Generation VNX series adapts to and handles this workload.

EMC like a lot of vendors have adopted Flash as the key to the future of storage design. This works to ensure that the focus is on optimizing for both performance and cost. EMC is leveraging their investment in the FAST suite of technologies from the original VNX and this is a great place to start as it’s proven that a very small amount of flash can serve a very high percentage of the overall IOPS and typical workloads. Hybrid arrays are a great balance.

As with all things new and shiny and ‘Next Generation’ almost everything is faster.  Faster CPUs with more cores.  More RAM.  Better I/O. It all works together to get things get done quicker, the new VNX’s scale up to over 1 Million IOPS.  Up to 3PB.  Can do 200,000 IOPS in a 3U package. All in all pretty damn impressive!

I’ll detail the biggest changes around VNX below;

  • Vault Drives now use 300GB per disk for the operating system, eg if you put 4 300GB drives in as vault you will not be able to provision anything on them.
  • System cache is now one pool, there is no assigning read cache/write cache or watermarks, also it is set to 8K. Cache allocated for write is still mirrored to the other service processor
  • FAST Cache disks can be used for FAST VP, but FAST VP disks cannot be used as FAST Cache
  • FAST Cache will now promote single page read requests until it hits 80% capacity, then it will defer to 3 read promotion.
  • Hot Spares are now managed by the system, you don’t manually assign disks as hot spares, you apply a policy to a type of disk and the system will automatically select hot spares based off that policy
  • Drive mobility – you can move disks between any DAE or slot in the system and it will still be a member of the same raid group/pool, this allows you to move DAE’s between buses if you are doing an array expansion, also if a disk reports a failure, you can’t remove it and re-add, the system won’t allow the disk to be re-used.
  • Permanent sparing – when an array rebuilds to an assigned hot spare, that disk will become a permanent replacement.
  • Rebuilding RAID groups now uses write logging to LUN metadata, when a disk fails or goes offline writes will be logged like a journal to the RAID group so when a rebuild takes place there is less time required as parity is not being calculated to rebuild.
  • Symmetrical LUN access replacing ALUA is only available on Classic/Traditional LUN’s with the initial VNX2 release, it is not available on storage pool LUN’s and  also the host has to have updated native multipathing or power path software to support this.
  • By default LUN’s in storage pools will be provisioned thin now
  • The array now supports block level deduplication
    • Deduplication is scheduled, rather than in-line
    • Chuck sizes are 8K so host filesystem cluster sizes should be sized accordingly
    • A storage pool can have one deduplication container, but the storage pool can contain mixed thin/thick luns and dedupe volumes.
    • A dedupe volume is tied to a single service processor, so when creating multiple dedupe pools they should be balanced between processors
    • The dedupe container with in a pool is basically a private LUN containing every chunk from every deduped LUN in it, FAST Cache and FAST VP teiring policies are applied to the whole dedupe container, not individual deduped LUN’s so this needs to have design considerations when mixing Fast/Slow LUN’s.

VNX2 Datamovers support SMB 3.0 now, this may not seem like a big thing except now Windows 2012 Hyper-V allows you to use SMB 3.0 to provision shared storage for VM’s on CIFS rather than needing block shared storage, so for a Hyper-V setup you can look at hosting your VM’s through CIFS folders via VDM’s rather than via FC or iSCSI.


Decommissioning old Blog and migrating to WordPress

So after playing with about 5 different platforms and screwing around with Office 365 I’ve decided to settle and use WordPress to host my blog and posts. I’m in the process of migrating the worthwhile posts and cleaning up all the old redundant stuff.


What is Infrastructure Architecture?

What is Infrastructure Architecture?

I will be the first to say that it’s a very broad role title and I’ve asked a number of people what it is, every one of them gave me a different answer.

To add complexity to what is already a complex area is the growth in Cloud infrastructure which has and will continue to change the role of the infrastructure architect. I’ve heard this referred to by a college as a virtual architect, a comment that I thought was very fitting.

Architecting a Data centre deployment today is a very different task than it was even 3 years ago, let alone 5 to 10 years ago. Virtualisation came along and turned everything on its head and changed an entire profession in a staggeringly short period of time, and now cloud is coming along and could do the same thing for many people and organisations. Careful and accurate sizing, location, power consumption, rack location, connection type, etc. are now unnecessary for applications in a datacentre where you are consuming IaaS or SaaS offerings. In an example of your own Data Centre or Infrastructure you may be able to increase you compute or Memory capacity by adding blades to an existing Blade chassis without any additional rack space consumed.

In today’s infrastructure it’s not at all uncommon for an infrastructure architect to build something that is running entirely on a virtual infrastructure and doesn’t consume or require the additional of any new hardware. Recently I architected and deployed an infrastructure consisting of Firewalls, Routers, Switching, load balancers, multi-tiered applications, domain services, and client access all without touching a physical piece of kit or even walking into the data centre. I believe this will continue to become more common all the time.

So what is an Infrastructure Architect?

An Infrastructure architect is the person or people who take the requirements and constraints defined by the the business, collaborate with the key stakeholders and staff, and design the supporting environment for the solution. Infrastructure Architects will normally work very closely with Enterprise and Solution Architects to architect an infrastructure that will support the solution that the solution architect puts forward. They will work at a high level and work with functional and non-functional requirements set by the Enterprise and Solution Architects.

I always try and keep it clearly defined and differentiated that architecture is not engineering or delivery, when doing engineering I am highly focussed on the how of a solution, How to deliver it, how to power it, how to configure it, how to setup a storage pool or resource cluster, how to connect x to y, etc.

Architecture can be seen as the philosophy that underlies any system.  It defines the purpose, intent, and structure of a system.  Architecture is the discipline of addressing business needs and requirements with people, process, and technology.  I also try and maintain awareness of what domain I’m working in, There are various perspectives or kinds of architecture, including enterprise architecture, business architecture, data architecture and application architecture, all of these are very different and yet very similar to infrastructure architecture.