Problem to Solve: When we put in PCI Compliance mechanisms like Anti-Virus & Malware & Timeout values, how can we be sure things are … (gulp) working?
Contributing Author – Robert Wright, Network Engineer with 15+ years experience
When “best practices” cripple your network, who you going to call?
At first glance, PCI DSS 3.0 requirements 5 & 6, will not strike you as a problem which APM/NPM products have anything to do with let alone be able to solve. Yet this requirement negatively strikes nearly every enterprise with a wide area network (WAN).
The very first requirement, 5.1, tells us that anti-virus must be deployed on all systems which are commonly affected by malware. Now while our APM/NPM should be able to detect abnormal network traffic patterns through analytics, its primary function is not to detect malware.We shouldn’t turn our back on APM/NPM as it has a valuable role in requirement 5.
After our anti-malware solution is deployed, its signatures must remain updated. How is this accomplished? Through the network utilizing common protocols such as HTTP,HTTPS,FTP,NFS,SMB, and more. More often then not these protocols at one time or another will consume 100% of the available bandwidth at your remote locations. When this issue occurs people will not call your help desk stating anti-virus is consuming your wide area resources, but rather report the dreaded “everything is slow!”
How do we Avoid this Problem ??????
How do we avoid this problem? Through APM/NPM tools we should be able to define this specific traffic through IP and port matching. At this point we can perform capacity planning and understand the impact of complying with 5.1. We should also review deploying Quality of Service (QoS) to protect our critical business applications at this time. When reviewing APM/NPM vendors, ensure to pick one which is able to understand QoS so you’re better able to understand queue usage.
So are we done? No, 5.2 requires that all anti-virus mechanisms are maintained. To comply with 5.2 anti-virus must be kept current and appropriate audit logs must be maintained. At this point we leverage the APM of our APM/NPM solution to monitor those protocols I mentioned earlier. What value does this bring you? Let us count the ways.
|HTTP||Error codes (404s) when attempting to retrieve update|
|HTTPS||HTTP error codes if we can decrypt. HTTPS error codes with or without decryption|
|FTP||Login failures, files not found, incorrect permissions|
|NFS||Login failures, files not found, incorrect permissions|
|SMB||Login failures, files not found, incorrect permissions, memory errors|
|DNS||unable to resolve the location which application dependencies reside|
If we boil this last paragraph down, what we are really talking about is the ability to perform service triage on a critical business service. Without a critical service component of “antivirus” covered, we are exposing ourselves and our customers to known threats, and possibly failing to comply with PCI.
Requirement 6 pushes the enterprise to better understand how it develops applications to provide separation between production and development environments. It further dives into making secure communications a consideration during application development (6.5.4). The final requirement, which we will discuss is 6.5.10. This requirement goes into effect on 6/30/2015 and requires applications to incorporate “time-outs” for successful logins.
We can address 6.4.1 and 6.5.4 together. As we discussed in previous blog posts, your APM/NPM solution likely will include the capability to alarm should “certain protocols” or “certain IP Addresses” communicate with one another across the network. This is typically considered a policy of sorts, which if violated generates an alarm or trap. This doesn’t replace a firewall or good development mantra, but does provide a passive third set of eyes should this traffic be occurring.
Another methodology commonly offered today, is dependency mapping. As your APM/NPM tool set already has all the conversation data, it opens the ability to map out your application and observe if production and test environments are talking to one another. This has obvious use cases outside of compliance with PCI 6/4/1, should your organization experience any turnover inside development, it will become quite valuable.
Timeouts Timeouts Timeouts
6.5.10 makes us consider that keeping hour, day or even week long persistent connections open may “not” be the best idea. These long-winded sessions often make it extremely difficult to troubleshoot problems as much detail is gained during the initial TCP three-way handshake. We see long sessions in database connections but also in the web tier of poorly written applications that lack true session management.
Many APM/NPM vendors permit the ability to configure response time alarms. If we configure our timeout bucket on a per application basis to correspond with our documented “reasonable timeout”, we will be able to quickly identify if our application timeouts are working. This approach won’t work for overly chatty applications with a keep alive invoked. But those very mechanisms should not be deployed in the first place to ensure compliance with 6.5.10.