Office365 Ate My Cloud Homework ?? (Part 2)

Problem to Solve – My company transitioned to a Cloud-based Office365 deployment. How can we assure the application service is working properly if it is not in our Data Center?

In the past year, we have had many customers ask us this very same question. We are rolling out Office365 and are concerned about how to address any potential performance issues. Cloud or no Cloud, your end users just want their “stuff to work” period. Per my original Part 1 article, they are not going to go for any excuses like “it runs in the cloud” which sounds an awful lot like the “dog ate my homework” from your grade school days.

Don’t Care for the Cute Excuses option???

Then consider putting in a solution that will help answer application performance challenges. This is the continuation of the previous Part 1 Article.

https://problemsolverblog.czekaj.org/troubleshooting/office365-ate-cloud-performance-homework-part-1/

5) SNI Support for Categorizing HTTPS based Applications – Server Name Indication (SNI), an extension to the TLS protocol, indicates which hostname the client is attempting to connect with as handshaking begins. Your APM/NPM solution should support this functionality so you can easily identify and properly categorize HTTPS/443 based applications. This functionality lets a server present multiple certificates on the same IP address and port number thus allowing multiple secure (HTTPS) websites (or any other service over TLS) to be served off the same IP address without requiring all those sites to use the same certificate. So, for clients and servers that support SNI, a single IP address can be used to serve a group of domain names for which it is impractical to get a common certificate.

Why is this Valuable? Using the SNI field makes the HTTPS URL applications recognizable by the APM/NPM solution. This is especially valuable for individual Office365 product URL’s with changing IP addresses without ever decrypting or exposing data inside these session. This will make the viewing and reporting of individual applications available for top talkers, all talkers, response time information, packet capture with an easy method of administration.

6) Locate HTTP / HTTPS Status Codes – (Depending on your actual Office365 deployment model), you may need the ability to not only define applications by their individual URL, but also look deep into the application to find failures. As Office365 is a URL based application for accessing via a browser, this is important functionality. If a URL is not available to service the user request, it will respond with an error code (i.e., 404 – Not Found). There are various reasons why a server might respond with error codes, but the more important focus is to know about it so it can be solved. For example, if an Office365 user at your location were unable to access “MS Word” via Office365 because of a browser error “404 not found”, this would most likely result in a call to Microsoft to address the issue. A specific example could be http://my365.acmecompany.com as a front end portal to accessing Office365.

Why is this Valuable? When a user experiences a service issue accessing the Office365 suite or complains about response times, it is in your best interest to see that the issue gets resolved quickly. A necessary step in restoring service is triaging “where” failures and issues are occurring. Something as specific as an Office365 “404 – Not Found” message would be an indicator that the URL setup by Microsoft for ACME company was not available …. On the MICROSOFT side of the equation. Your solution should identify the HTTP / HTTPS** (if certificates are available) failures by their error code, which will reduce the time it takes to make this assessment and get to probable cause.

7) Identify Application Response Time Issues – Most APM/NPM solutions monitor response time for individual applications. This is typically geared for application servers located mainly in data centers. With the onset of Cloud based applications like Office365, which are run over the internet and usually shared by cloud providers, there are several more variables which add to the complexity. It is simply not enough to say XYZ server is “slow” today, as your internal customers will demand acceptable response time. The challenge really becomes that your company will not “own the application servers”, because Microsoft is hosting the Office365 servers directly. Furthermore, other variables local to the your Internet link, such as heavy bandwidth usage, DNS and authentication issues, etc. will sometimes “mask” the root cause response time issue. You will need the ability to not only monitor individual applications – www.acmecompany.com/word for response time issues- but also for deeper layer metrics (TCP resets, Zero Window Size, etc.).This functionality is applicable even if the server resides in the cloud. These application response times can be tailored to fit your SLA agreements.

Why is this Valuable? When a group of users are complaining about application performance, the key value is to determine whether this is a “Microsoft related issue?” or an “Internal company related issue?”. If your APM/NPM solution has the ability to separate out and measure individual response time for applications and servers, it makes it very easy to assess the next step in the triage process. For example, if you were seeing multitudes of “TCP resets” for an Office365 server providing Word or Excel, this is an indicator that the server (on the Microsoft hosted side) is experiencing issues and needs attention.

8) Service Dependency Map – Most APM/NPM solutions have the ability to discover application dependencies as they traverse the network. This is a critical function as many times it is very difficult to get accurate logical application flow diagrams from respective application teams. This functionality will locate servers and their dependencies (i.e. other servers and other protocols) and display into a diagram for reference. Usually, performance metrics are also provided for each dependency with visibility into sessions and failures on a per server basis.

Why is this Valuable? The main value of the viewing the Server Dependency Map is actually to validate that applications are working as their respective solution was designed. For example, it can be extremely difficult to see that various application servers have dependencies on LDAP authentication, or a Windows (SMB) file share. We have seen in many customer environments, where an application is understood to be working “one way” and talking to a discreet set of servers, only to find that sometimes it has dependencies on other servers. From the perspective of keeping the Office365 services running optimally, this is a significant “value” as services can be validated against the original solution design.

Points to Ponder

Everyone seems to be deploying Office365 these days. How has your implementation gone so far?
Any juicy “war stories” or classic user conundrums to share with the class?
Found any other real world use cases for your APM/NPM solution to help in your Office365 deployment?

Problem Solver Blog

Troubleshooting Cloud, Virtualization, Service Delivery, Application, Network, Unified Communications, Cyber Security, VoIP, Video Challenges

Comments

Office365 Ate My Cloud Homework ?? (Part 2) — 1 Comment