Problem to Solve – My company transitioned to a Cloud-based Office365 deployment. How can we assure the application service is working properly if it is not in our Data Center?

In the past year, we have had many customers ask us this very same question. Obviously, there can be significant cost savings in migrating from a traditional desktop deployment of Microsoft Office to a cloud-based version of Office365. But if your users (or better yet your executives) experience issues with application performance with Office365, it is always good to have a “good excuse” waiting in the wings. I am partial to the timeless classics like … “the dog ate my homework” for these types of situations. For you “Fletch” fans out there, tell your users that you will check the “Fetzer Valve” on your network for proper application performance. Or you could try a more modern approach of “Sometimes sun spots and moon phases affect computers, just like people…” as an option. 😆 😆 😆

Don’t Care for the Cute Excuses option???

Then consider putting in a solution that will help answer application performance challenges. Obviously, the first decision should really focus on “where” your APM/NPM instrumentation will physically reside. I covered the topic of cloud instrumentation in a previous article here https://problemsolverblog.czekaj.org/cloud-virtualization/cloudsso/ From my experience, these are the types of functionality and requirements that I would specifically look for to assist for this type of deployment.

1) Service Dashboard – it seems pretty straight-forward, but if you can give your “non-technical” concerned parties a view into something easy to read that shows red status/green status and failure percentages, you will make your life easier. Ideally, you want a solution that can measure and collect the relevant protocols, applications, servers, locations, etc for an Office365 deployment. When you peel the onion on how Office365 operates in and out of your environment, you will find many protocols that are key to its performance. Some of these protocols include basic functionality from DNS – DHCP – LDAP – HTTP – HTTPS protocols, and some of these are running inside your data center.

Why is this Valuable? The main value of using a services dashboard methodology will allow IT staff to quickly triage the underlying applications and protocols as well as which regional locations are affected. The added benefit is the reduction of time necessary to triage service impacting events, while not having to be a network engineer.

2) Triage of Key Service Protocols – The APM/NPM solution should be able to triage service impacting issues. Services are defined as the protocols, applications, interfaces, servers that comprise a particular service offering like Office365. This functionality provides a view into a service for “failures” in the various application protocols. For Office365, this means that IT staff can quickly triage and troubleshoot any issue, and quickly determine if the issue is a specific to an application failure. As an example, Office365 leverages the following protocols and applications in order to function as a “service”. Specifically for:

DNS – for name resolution to www.acmecompany.com
LDAP – for authentication into the company Active Directory structure
HTTP / HTTPS – for URL Browser application access into the Office365 suite for Word, Project, PowerPoint, Lync, Excel, etc.
Sharepoint – HTTP URL browser access into SharePoint locations for file sharing and collaboration

Why is this Valuable? When there is an Office365 “issue”, you will want to be able to view all of the respective Office365 applications into a single vantage point. This allows you to triage by pivoting on metrics such as application failures, slowdowns, TCP metrics indicative of server issues, affected user communities, etc. This will allow IT staff to assess whether the issue is “on premise” versus a potential cloud provider issue at Microsoft.

3) Deep Dive view into DNS Performance – While it is recommended to monitor your company’s DNS servers as a whole, Office365 has specific DNS entries that would be service impacting if they became unavailable or slow. That being said, the actual DNS entries for each individual Office365 URL and/or server should be individually monitored for error codes and response time issues. Specific examples might include:

*.microsoftonline.com
My365.acmecompany.com
*.office365.com
*.office.com
Portal.Office.com

Why is this Valuable? When there is an Office365 “specific DNS issue” (i.e., a “Name Failure”), users will not be able to access the application via their browser. The key to restoring the user’s access is to quickly determine if the issue is local to your internal DNS structure, Internet DNS links, potential denial of service, or Microsoft Cloud issue. Your APM/NPM solution should identify DNS failures by their error code, which will reduce the time it takes to make this assessment and get to probable cause.

4) Deep Dive view into LDAP Authentication Performance – Your APM/NPM solution should be able to take a deep dive look into LDAP and Radius for network authentication. For many cloud based applications, there is an interface or “shim” like protocol that interfaces with your company’s authentication servers (i.e., ADFS). Office365 can also be deployed in this manner where it interfaces directly with your company’s authentication servers. The underlying protocol for that authentication is LDAP. When there are LDAP failures or response time issues, then users cannot authenticate into the Office365 application.

Why is this Valuable? When there is an Office365 “specific authentication” error (i.e., a “LDAP Error 16 – Requested attribute does not exist”), users will not be able to authenticate properly into the application. The key to restoring the user’s authentication and access is to quickly determine if the issue is local to the internal authentication structure itself or Office365 interfacing into the authentication infrastructure. Your APM/NPM solution should be able to identify authentication related service impacting events.

Continue on to Part 2 of this Article …..

https://problemsolverblog.czekaj.org/troubleshooting/office365-ate-my-cloud-homework-part-2/

Reference to Microsoft Technet article for Office365 URLs and IP Address Ranges

https://technet.microsoft.com/en-us/library/hh373144.aspx

Points to Ponder

Everyone seems to be deploying Office365 these days. How has your implementation gone so far?
Any juicy “war stories” or classic user conundrums to share with the class?
Found any other real world use cases for your APM/NPM solution to help in youe Office365 deployment?

Comments

Office365 Ate My Cloud Homework ?? (Part 1) — 5 Comments

brian philips on February 16, 2015 at 1:02 pm said:

This is really good and relevant information. As we all become more dependent on SaaS applications it is hard to know who to call about your problem and what to say if you do get a hold of someone.
- Ken Czekaj on February 16, 2015 at 2:22 pm said:
  
  Appreciate the feedback, thanks. I agree with you, while cloud and SaaS based applications can be great application models, they can be quite complex to triage and troubleshoot performance issues. The biggest challenge that I have seen at my customers is really the notion of …”Is this a problem being caused on my side, the Internet service provider, or at my Cloud application?”
KarlSchaub on February 19, 2015 at 4:45 pm said:

Nice job and very well written.
- Ken Czekaj on February 19, 2015 at 7:51 pm said:
  
  Thanks Karl, appreciate the feedback.
Pingback: Office365 Ate My Cloud Homework ?? (Part 2) - Problem Solver Blog