Applying NPM to Healthcare; a PCPM Odyssey (Part 1)

Network Performance Management’s (NPM) roots date back to the early 90s with emergence of SNMP, MIBII, and NetFlow. Quickly every IT shop, small and large, had various toolsets graphing all sorts of network metrics. Many of Network Managers lived and died by these tools and made significant financial decisions based on them alone. Does the age of these technologies mean that its dataset is obsolete? By no means! But like any good doctor, we will review the instruments at our disposal and utilize the ones that make the most sense for the outcome we are seeking.

Resolution of Time

Many modern vehicles today have a digital readout on its fuel economy. If your dashboard is like mine, then you probably have three separate readings. One is your instant fuel economy, the other is for Trip A, while the final is for Trip B. Each has its own resolution or AKA average. If you are going up a steep hill your instant fuel economy may read 7 MPG while Trip A shows a modest 30 MPG. Why? Averages. The instant reading may be an average for the past 5 seconds; while trip A is the average of your entire hour-long trip. NPM experiences this same behavior.

As an industry we see the most common resolution for SNMP being 5 minutes and flow based technologies (NetFlow,JFlow, and SFlow) being 1 minute. Each tool set, open source or commercial solution, may differ slightly, but as a whole you will find that 1 minute and 5 minutes is the most common timeframe. So what is wrong with the picture SNMP and NetFlow are drawing for us?

1. We as a society measure data communication speeds in X bits per second. I.E., Megabits per second (Mbps)
2. Problems often begin and end within these 1 or 5 minute average sampling rates (poll cycles)
3. These metrics are primarily based in traffic utilization/consumption, and some expand further into values such as discards or CRC error rates.

The Role of Sample Rates in Modern Troubleshooting

So open your favorite NPM tool set and look at the data and the resolution or sampling rate. Let’s assume 5 Mbps at a 5 minute resolution, what does that mean? You have a reading which is an average for the past 5 minutes. Peaks and valleys are absent for that period of time. With that information available to you, now what do you do?

Problem Scenario 1: A nurse calls and reports periodic slowness just moments ago in the delivery of her EHR application.
NPM Facts: NPM tools report that your MAN/DWDM/WAN/T1/Etc is at that 20% utilization; zero errors; top application is Citrix
Question: Can you solve this problem with only utilization metrics? Can you provide additional insight?
My take: No; however you are able to explain that bandwidth does not appear to be a contributing factor.

If your healthcare organization is afflicted with mass bandwidth consumption for an extended period, then metrics from NPM will be beneficial. If you are experiencing CRC errors on your WAN circuit, NPM will be beneficial. Do these problems happen? Yes they do but they are not all inclusive of the problems you are likely to experience.

Diamond in the Rough

For those who have performed packet analysis with tools such as Wireshark or Sniffer, you know that going “packet by packet” is quite a daunting task. Now, imagine all those hundreds or thousands of packets …. being summarized by a single data value for a five minute period of time. It has its purpose, but lacks much detail and honestly that is by design. This is a methodology built during a time when system resources were limited.

To close this article, I want to ensure the reader knows that NPM provides actionable data, however the problems it will help solve are limited. It doesn’t discredit or otherwise suggest you shouldn’t invest time or money into these types of solutions. But know what you can and can’t do with this data.

Where will NPM help your medical facilities?

* Capacity planning of remote medical facility.   Best used to monitor link usage statistics)
* Identify badly behaving applications; as it relates to traffic volumes (I.E. broken anti-virus)
* Insight into what applications remote facilities rely upon the most

Continue on to the next article in the Series

Applying Advance NPM (aka NPM+) to Healthcare; a PCPM Odyssey (Part 2)


Applying NPM to Healthcare; a PCPM Odyssey (Part 1) — 1 Comment

  1. Pingback: A Patient Care Performance Management (PCPM) Odyssey - Problem Solver Blog

Leave a Reply

Your email address will not be published. Required fields are marked *