Applying APM to Healthcare; a PCPM Odyssey (Part 3)
avatar

In the first article in this series, we reviewed how NPM statistics will provide details of the operation and performance of internal network super highways. The second article introduced the role of service enablers and how their poor performance can drastically impact application performance. This article will focus on flow based APM level metrics which are available in traffic flow data.

There are basically two primary methods for this type of data collection. The first method is to use synthetic agents and the other is to collect traffic flow data. Synthetic agents have their place in the enterprise, and I have advised them to clients to address certain use cases. However, if the only source of data is a synthetic agent, and the agent has alerted to a condition, then what? What do you do now? Is this alert showing me that a single transaction has gone “off the rails” or do I have an enterprise wide issue? Quite simply, this is very difficult to determine. I believe that if you combine the two methods together (agents and traffic flow), then you have a real world solution. My opinion is that synthetic transactions alone typically represent a product, not a total solution.

Before continuing, I would like to clarify that my use of the term “agent” in this article is in reference to synthetic agents. Not a reference to a component management agent which keeps track of hardware components such as disk drive, CPU, and memory usage. I will touch on component management and its role in article 5.

Looking at Traffic Flow Data

So why look at flow data (packets) traversing your internal network? Isn’t that the data-set just for that packet jockey/Packet Ninja/Packet Head on staff??? NO! From this extremely rich data-set you can determine individual performance of PCs, server, and permit the detection/alerting of errors. If properly deployed, key information is available in meta data to present to an untapped audience.

Since we are looking at packet flow data, we are reviewing live client/customer data, which is exactly the rich data-set you should review! This approach is the direct opposite of synthetic transactions. The common challenge with so much data is addressing how to best locate the “diamond from the rough”. As speeds, network convergence and adoption of the Internet of Things (IoT) increase, a true strategy is required.  A strategy will help to address security and considerations so you’re able to keep up with this raging data flow. But more on that strategy in article 5!

What can you expect by adding network based application performance management to your tool-set?  First, we will gain access into what the client(s) are experiencing. As the user(s) accesses resources, experiences errors, and/or attempt a malicious act we will be able to view it from that layer 7 protocol.

For example, if you were to successfully access www.epic.com; the network based APM tool-set will show a 200 http status code.  Sometimes failed attempts may show a range of http protocol errors from 404s to 500s. By watching these flows we can keep track of the resources being accessed, URLs in the case of HTTP,  and can be monitored individually or separately.  In addition, we can monitor the response time of individual URLs and their corresponding network metrics/response times. This functionality is a key building block of a solution, where it accounts for multiple disciplines and perspectives, not a singular targeted viewpoint.

The Problem Scenario
Scenario 1 : A nurse calls and reports he/she is unable to access the EHR application.
NPM Facts: NPM tools report that the link MAN/DWDM/WAN/T1/Etc is running at 20% utilization, with zero errors; it appears that the top application is Citrix.
NPM+ Facts: NPM+ toolsets are showing DNS queries of “ehr.corporate.na.loca” are 100% successful. DHCP (a problem two weeks ago) is now showing a 100% success rate for the facility the nurse is calling from
APM Facts: The APM tool-set is showing an increased response time for the Citrix delivered application, ‘Epic’, as having numerous timeouts. Failure percentages associated with ‘Epic’ show zero percent failure. Other Citrix delivered applications also show zero failures, and have reasonable response times.
Question: Can you help solve this problem? Yes, by reviewing the APM data we are able to see that the Citrix delivered application is experiencing a slow down as it relates to response time. We will need to dig deeper to determine if the response time is the result of improper load balancing, zero windows, or other events. However we are able to lock into where we should look next, which is the delivery of Citrix.
 
Now some will argue, this is NOT true APM, and that I am directly talking about response times, further discrediting the statement. I personally deem this flow based APM as I am able to isolate a single Citrix delivered application (Epic), by tracking layer 7 data. Then generating a response time on only those flows hosting the application that I am troubleshooting.
 
Summary
Flow based APM can take us in many directions. From analysis of radiology, telemetry services, EMR layer 7 data, to individual HL7 messages. None of these items will be visible unless our tool set is “flow aware”. As we expand the series into APM+ and strategy, additional examples will be provided to take full advantage of APM. But APM by itself is lack luster; it is a single application. Will a single application take down your enterprise hospital? No, but something else will…Curious what “IT” is? Stay tuned for the  next article “Applying Advanced APM to Healthcare”.

Continue on to the next article in the Series

“Applying Advanced APM to Healthcare”

Link to the article ===> https://problemsolverblog.czekaj.org/troubleshooting/applying-advanced-apm-healthcare-part-4/