Problem to Solve: How can I use my APM NPM solution to better detect Unknown Anomaly conditions related to Cyber and/or Hacktivism?
“You have to learn the rules of the game. And then you have to play better than anyone else. ” – Unknown
No matter what you call yourself.. a cyber-enthusiast / InfoSec analyst / Security Practitioner/ white hat / hacker / etc, your playing field lacks so few rules. Yet, the field requires its participants to play harder and better than their competition. This isn’t new to you security folks, you know this to be true. It is this odd paradox of frustration and pure joy that if taken too seriously, creates an environment where burnout is unavoidable. You live for the discovery of the unknown, and the quest to understand who, what, when, where and why.
It always starts with the “unknown“…
Anomaly Detection
Before there are signatures, white papers, and long marketing driven blogs, someone like you must first detect and classify the anomaly. This article is dedicated to that detection process using an APM/NPM solution set in mind.
The ability to detect the unknowns are in the APM NPM solution’s very nature. There are many possible methodologies/processes to use, and the approach below represents my own perspective. Your process may be better and most likely is based on your experience. Why? Because over the years you probably found tips, tricks and processes for what work best for you. This approach outlined below is what works for me, and maybe we can share each other’s ideas and grow both our knowledge sets? I encourage your feedback! Note that my method is very focused on OSI layers 3 and 4, and is very TCP driven. A future article will touch on layers above OSI layer 4.
Statistical Deviation
There are many approaches to detecting anomalies, and much can be made of statistical deviation metrics (and rightfully so). If I have a data set, why not review it and look for the outliers? Easy enough if you have an APM/NPM tool already available at your finger tips. Below follows a very manual and rudimentary approach but one I find it to be quite effective.
Tip 1 : Tap not Span for your traffic sources
- Taps provide for more reliable data feed as they are not as easily changed as feeds from spans.
Tip 2: Scale your solution for data collection
- Sizing your solution to properly collect all the relevant data feed is critical, otherwise your statistics will likely be skewed without your knowledge due to missed or uncollected traffic.
My Process
I like to use a combination of link & application analysis module with an application review of the traffic exists today. Better yet, if your tool-set allows traffic to be defined, start placing friendly names around it. If it is unknown, simply label it by the protocol.
After we have established and understand what traffic currently traverses the link, we can then focus on our unknown traffic (here called “IP_OTHER”). As pictured below, we now have a representation of the total average traffic load, and our “unknown” traffic on this link.
By looking a little deeper into this unknown traffic, we can then classify it appropriately. This is most useful in the next time we have an occurrence of undefined traffic (IP_Other). The reason is that because once the traffic is defined properly, anything that then shows up as “unknown” form that point on is potentially anomalous or dangerous traffic.
Detecting Strange Behavior
Strange is a word that has many definitions. In this particular case we will classify it as normal application traffic, that begins to show oddities in OSI layer 7. For example, the HTTP protocol that begins to display “500” error messages would be considered strange.
Logging
Logging SYSLOG is a normal and required task by many security people and organizations. However, anyone who has dealt with logs will agree that it can quickly go from a great idea to a great pain in the rump. So, what if we looked at the user’s traffic as an alternative approach?
Traffic Analysis of Layer 7 Errors
It has been said, “a picture is worth a thousand words“. Maybe then I should change my blog posts to include only one picture and call it a day? (I kid… I kid). In truth, we can learn much from statistic renderings based in packet flow data. Let’s consider the example graph below. In this case, we can see our service that includes an external facing DNS server, that is suddenly experiencing errors. (Name Errors)
Options to Proceed?
We could simply go to the log files to review information! But, we could also just simply use the APM NPM tool set which is very aware of these service enabler protocols (DNS/DHCP/LDAP/Radius/etc). This type of solution will also provide a deeper analysis of what was requested or referenced, without a lot of manual time invested. Take a look at this previous article, “You can’t handle the DNS Truth.” https://problemsolverblog.czekaj.org/troubleshooting/cant-handle-the-dns-truth/
So why not leverage the solution that provides the most information in the most efficient manner?
Above, we can clearly see the DNS request for a site named “malware8.malware8.com”. Hmm, I wonder if it is bad? Kidding, this is some lab traffic I often utilize for demonstration purposes (Again, I kid). Not only have we seen the anomaly error itself, but we also went a few levels deeper by exposing the actual request that caused the anomaly. Way to go APM NPM tools.
Key Takeaways
Here are the key take aways in this APM NPM approach. I was quickly able ….
- to determine that there was a bunch of undefined traffic traversing the network.
- (After initial configuration of Applications) to determine how much and when anomalous traffic has started. A positive by-product is that unidentified traffic is recorded as well as trended which makes it easier to report this information.
- to get to a Layer 7 view of anomalies. We saw the increase in DNS traffic itself, but more importantly the increase in actual DNS error codes.
- to see the actual DNS request making the DNS call to the malware8.com site.
- to use this collected information to create and use baseline alarms to detect other such anomaly conditions in my critical network and application services.
Net Gain to You
You can triage a cyber anomaly quickly with efficient workflow, trend and baseline traffic as well as complement other cyber solutions with packet based evidence.
Continue on to the next article in the series, “SIEM Bat Time .. SIEM Bat Channel”