Networks Are Not Always Up or Down
I recently spent time travelling internationally for work. During my trip in one of the countries, I caught a nasty bug. I won’t give you the details, but suffice to say I was not working at 100% physically and mentally. Of course, I spent a lot of time planning this trip, speaking at certain events, and meeting different teams and customers. I had to find a way to perform to meet everyone’s expectations. I had to identify the cause of my sub-par state and then find a solution to fix it which included hydration, vitamins, and medicine.
Our networks are often affected in a similar manner. Usually, they are working properly and everything is happy. Sometimes there is a catastrophic failure of some sort and applications or data are not accessible. Usually, though, if there is a problem, the network and application infrastructure has not failed to the point that things are inaccessible. Instead, it is functioning and the applications work, just not at 100%.
It works. …kind of
Network and application performance degradation is hard to identify and resolve if there is no actual outage. Traditional troubleshooting tools like network ping tests, and iostat server metrics may not show an issue and cannot tell one the actual performance of the application delivery.
Not all applications are created equal. Some are more important than others. End user expectations are different for different applications. Performance requirements depend on the nature of the application. Corporate email access is more important than Candy Crush. Video conferencing requires higher bandwidths and lower latencies than the human resources portal.
Insight into the bottlenecks
The end user is concerned about the application responding within an acceptable timeframe with the correct information. To measure the impact of the different components within the IT infrastructure on the application delivery performance, we can look at several tools and metrics:
- Network utilization – Is the network pipe saturated and not allowing all the traffic to flow through it?
- Network latency – How quickly can content get from point A to point B?
- Session table management – How utilized are the state tables on the stateful network devices like firewalls, load balancers, and the application servers themselves?
- Client performance – How much CPU, memory, disk I/O, and network capacity does the client have to receive and render the information to the end user’s display?
- Front-end server performance – How much CPU, memory, disk I/O, and network capacity does the server have to compile and send the information to the client?
- Back-end server performance – How responsive are the databases, application engines and other components that deliver the content to be formatted by the front-end servers?
All of these parameters affect the performance of the application delivery and determines whether this performance meets the expected application service level assurance (SLA). By monitoring them, and incorporating them into a tool that translates the collected statistics into three broad and easy to understand categories:
- Server to load balancer response time – the time spent for the front-end server to collect and send the data in the data center
- Network latency – the time it takes for the server response to reach the end user
- Client rendering time – the time it takes the client to render and display the content for the end user
Visibility and policy
By looking at these parameters, we can start to identify when applications are not at optimal performance levels. Depending on which parameters are impacting the performance, we gain immediate insight into the nature of the problem and the method for quick resolution.
Not everything is black and white, or in this case, red and green. Often, there is something yellow and it is critical for us to acknowledge and respond to the degradation scenarios by identifying and resolving them with equal attention.