Service Assurance in the Cloud
avatar

In my last article, I spoke of seeking advice and almost portrayed a level of caution when approaching cloud. The reality of the matter is, we all want what the cloud promises, an easier and more profitable platform for delivering services at reduced capital and operating expense. We cannot cower in a corner, yet we also cannot sprint into the fray!

Service Assurance (SA) enables a balance between the two extremes. By using your application and network performance management (APM & NPM) tool sets, you can provide vast insight both outside and inside your cloud, while reducing the overall risk. These tool-sets have evolved with customer demand to provide services vision from your customer’s front door to your cloud provider’s environment. Combining methodology from taps, active tests, and remote devices, I will show you a full blown strategy for your cloud monitoring.

Private, Public, Hybrid, SaaS, Stratus, Cumulonimbus

Prior to implementing service assurance for your cloud, you need to understand the available strategies to map the best fit to your needs. However, the options are defined by the cloud architecture by which your organization has decided.  So let`s take a moment and consider each option.

Private Cloud

Private clouds have classically been defined as a cloud architecture built within your own data center. As technology evolved we saw the adoption of Virtual Private Clouds (VPCs), where by a logical portion of the public cloud was segmented for your corporation’s private use.  Examples can include but are not limited to: Amazon’s VPC, Microsoft Azure, and vCloud Express by VMWARE. Basically, VPCs are being called Private Cloud. Good, bad, or ugly, any further mention of Private Cloud in this post should be interpreted as VPC.

It seems that the infrastructure costs incurred by developing a true private cloud (cloud technology inside your own data center) has lead to the definition change for VPC. VPCs are exceptionally cheap and lend themselves well to “shadow IT”, thus driving the adoption rate.

Private Cloud Visibility Strategy

Physical Appliance Deployment

Some VPCs permit you to install traditional APM/NPM solutions inside their facilities at a cost. Typically, the cost for a single data-point, over a 3 year depreciation, could easily refresh your entire APM/NPM environment enterprise wide multiple times over.

Because of the cost, this strategy usually doesn’t get approved by the finance department and becomes the sacrificial goat of the design. Many teams will throw their hands up and say “I tried to monitor it, but you wouldn’t pay for it!“. There are alternatives below so you don’t have to be “that guy“. 

Agent Based Deployment 

I can imagine readers falling out of chairs, spitting out their coffee and grabbing pitchforks because i just wrote ‘Agent’. Hopefully, you’re still reading and I can convince you to put the pitchfork down.

APM/NPM vendors see the need to expand their offerings into the cloud. The issue is not many providers permit physical appliance deployments in an effort to preserve their other customers privacy. Thus, agents become the tool of the trade in the cloud space. Some VPCs offer what equates to ‘enhanced syslog’ to provide performance data. At the end of the day, it is better than nothing, but is still lacking detail. What if you have a disperse cloud offering across multiple vendors, with only one providing this ‘enhanced syslog’? Luckily, your APM solution is there to provide identical metrics across your various providers.

So now that you put the pitchfork down, let`s try and put it away. By utilizing an agent within the VPC, we now have options. Options because some vendors offer the capability to essentially turn this ‘agent’ into a remote span, sending data back to your data centers, which we utilize to feed our traditional APM/NPM tool-set to glean performance metrics.

But hold on! You typically pay for every ‘bit’ that gets transmitted externally from the VPC you say!?! That’s right, many VPCs charge based on usage which includes bandwidth leaving the cloud. This charge immediately takes us into our next step.  While ‘spanning’ the monitored traffic would cause large amounts of bandwidth, so does placing an agent in the cloud which simply spits out meta data. What should we do? Consult your vendor, and ask the question: Is your solution going to drive my cloud costs up due to increased bandwidth utilization?

Now at this point, we need to talk scalability. Not all vendors offer it, so be mindful. Select vendors have the ability to scale their solution in such a way where you would place an APM/NPM server into the cloud.  Thus, all agents inside the cloud transmit all their data to that cloud based server, so the collection process does not cost you additional bandwidth costs. BOOM! Now we are saving OPEX costs while serving up service assurance functionality! Bandwidth is conserved, and because the solution scales from on-premise applications to cloud based, you have the same style of metrics across the board.

Because you can combine these cloud based metrics with your existing edge/data center APM and NPM deployment, you paint a full picture around the performance of your VPC delivered service. The place where this agent lacks is the ability to proactively alert you when “no user traffic” is present. For example, assume your service is used 8am-5pm, and an issue occurs at 6:30am. As your agent and APM/NPM solution rely upon active user traffic, no issue is detected. We can solve this issue further down, so keep reading!

So as you look at protecting and preserving ROI and services hosted in the virtual private cloud, remember to look at all options on the table, meaning a physical appliance or an agent deployment. In the next section, we will address active testing. Remember to consider cost, scalability in the overall solution value.

Active Testgov001ing Deployment

I know what you are thinking, first agents but now active testing? I can see you loyal readers turning into AMC’s Walking Dead’s governor from season 4, more than willing to burn down my Woodbury. For those unfamiliar with AMC’s Walking Dead, look here. Season 7 starts 10/23/16!

For those unfamiliar with active testing, you may be familiar with its other name, “Synthetic Agents”.  The network engineers reading this will likely think ‘IPSLA’ in their minds. Active tests alone aren’t overly useful.  They tell you there is a performance issue, but stop short of triaging the problem. Arguably, triage is the most important aspect of service assurance in my humble opinion.

Active testing does provide constant user traffic albeit synthetic. It also removes various variables such as the local user`s workstation/browser/etc.  If you have a large geographically dispersed environment which adopts independent internet POP’s, then synthetic testing provides a cheap alternative for wide-spread APM. End User Response Time (EURT) is good information, even without triage functionality.

Placing synthetic agents in your remote offices provides the ability to determine and report back response times. For larger sites, we are able to place traditional APM/NPM deployments where it make sense, which enables triage. The added benefit of the synthetic agent is that these tests can run 24x7x365. Should there be a bump in the night; teams can start investigating prior to the end customer feeling the impact.

As we look at active agents, we gain the cloud based agent visibility and ease of deployment but lose all the advanced triage necessary for cloud based deployments. We cannot overlook the advantages active tests bring to the table, but we must deploy them in a smart matter. Our next section looks at this exact topic.

Public Cloud

I define public clouds as provider infrastructure, shared among numerous entities. Each entity has their own systems which can be exposed to the internet society, even tho they are separated from one entity to another.

Personally I run my own private websites in Amazon’s ECS cloud offering among other services with a few other smaller players. What I learned is how easy it is to deploy a service for such a low-cost and small learning curve. The very ease of deployment is what I believe drives adoption. No change controls, no design discussions, just a few clicks and I have servers deployed with IPs. Boom! Done!

Public Cloud Visibility Strategy

Physical Appliance Deployment 1arik9

I have yet to even hear rumors of public clouds permitting the deployment of physical appliances. The reason is undoubtedly to protect the privacy of customers from one another.

Agent Based Deployment

Now everyone should have warm fuzzy feelings about agent based deployments, since we covered this once with private cloud. Like the private cloud scenario, we leverage a light weight agent to generate meta data and offload that information to a server in the cloud. By leveraging the solution’s scalability, we unite metrics from our data center edge with metrics generated from within the cloud. At this point, each environment has the ability to provide relevant and consistent metrics.

What must be considered now is what problem are we really attempting to solve? We could easily deploy agents across each and every server host, but the question is, “do we need to”? What is the criticality of this service to the business? What if this application goes down? How does an issue impact my internal business customer?

Active Testing and SaaS Deployment1arizk

Active testing has no better deployment model than with public cloud offerings! Many folks believe active tests fall inside a very small and defined box. In reality, these active tests can be run from ANYWHERE!

Wondering where else we could run these tests? Our data center? We already do it. From our backup data center? Yup, been there, done that, got the t-shirt! Your customer’s location? Whaaaa!?!

That’s right! While it’s important to understand how applications run from your data center, there is no better place than from an external customer’s location. Imagine detecting a problem with your cloud based application before your customer even arrives to work! If you are running agents at your customer’s location and adding the previously mentioned agents for triage, then, my friend have a solution!

These active agents could then be pivoted to also test “Software as a Service” (SaaS) offerings such as Office365, SalesForce.com, etc. The beauty is that these active agents could exist as a hardware or software deployment! So, if you install an agent on a mobile device connected to corporate wi-fi, now you effectively have wireless testing!

With the proliferation of handheld tablets used for wireless troubleshooting in the market place, it is plausible to also add this active agent as well! Imagine the power of being able to triage the wireless environment while simultaneously troubleshoot a SaaS/Cloud based app from a single device! That is true ROI my friend!

Bringing it all Together

In today’s complex business environment, the reality is that your organization is not simply looking at one of these cloud based offerings. Likely, individuals in many different departments, across your organization, even outside of IT, are looking at cloud. As such, you must evaluate the playing field of vendors and experts.

Ask questions like these. Who has a solution that scales? Who can combine metrics from both cloud and on premise (read hybrid)? Who can triage wired and wireless offerings? It is the tough questions like these that will ensure the ROI for your cloud based offering.

I hope you enjoyed this posting and leave commentary below!

Looking for the Unknown Anomaly (Hacktivism #7)
avatar

Problem to Solve:  How can I use my APM NPM solution to better detect Unknown Anomaly conditions related to Cyber and/or Hacktivism?   “You have to learn the rules of the game. And then you have to play better than … Continue reading