Performance Improvement
Often a critical web application needs to be scaled up to handle higher volume traffic than the original development team anticipated. We can often reorganize your existing system to handle much higher loads without significant new hardware expenditures. We do this by asessing your architecture, measuring your system's performance, and making a series of incremental improvements, measuring our impact at each stage.
Initial Assessment
The first step is to thoroughly review the current system- both it's layout, as well as the desired business results the system is not achieving.
Once the architecture is understood and the business goals are defined for the cluster, we begin by benchmarking the system's behavior by installing performance monitors (either perfmon for windows, or the big brother agent for linux). If the system is not currently in production, we will perform a basic load test using JMeter. This lets us test our improvements and verify that we have enhanced the system. If a staging or integration system is available, we will preflight our changes on that system to give us a stable testing environment that won't break the site for live customers. We can also install the application on our test lab to test basic performance characteristics.
Enhancement
When baselining is completed, we go through a series of upgrades and improvements to the core system. At each step, the baseline performance monitors are checked and verified, so we can make sure we are fixing the issues instead of creating new ones. We will also run functional regression tests against the server to make sure the system is functionally equivalent.
Typical steps:
- Load Balancer
-
The first step is to make sure that a quality load balancer has been deployed to split the load between multiple servers. This also allows administrators to reconfigure the system dynamically without disrupting ongoing operations.
We have successfully deployed the open-source balance tcp load balancer (available from http://www.inlab.de/balance.html). When used in conjunction with our open source balance_ctl project, it forms a reasonably capable load balancing system for web clusters.
- Fault Tolerance
-
The next step is to make sure the load balancer is equipped to catch the application's failure modes. Often custom scripts or web pages will be built that test the performance of application, database, or remote servers so the load balance can intelligently route around them.
- Reverse Caching Proxy
-
The Squid software is better known as an HTTP Proxy server, it also has a very capable http acceleration mode. It is most useful in caching large static binary assets- images, flash movies, quicktime and windows media movies. We've successfully scaled machines to handle millions of daily hits by implementing transparent caching systems in front of the web tier.
- Dynamic Page Caching
-
It's simple math- your webserver can probably handle more than 450 static hits / second. Your application server and database can probably render 10 pages per second at most. By caching these dynamic hits, we convert them to static pages. This is a trade off- usually there is a reason why the page was dynamic in the first place. Cached pages need to be selected with care. We have developed techniques that allow these caches to be implemented transparently- the application is unaware that some of the site is cached.
On Linux / Unix systems, we have successfully used a combination of Apache's mod_rewrite, rsync, and a retrieval script run from cron. The retrieval script runs periodically, and takes a static snapshot of the page in question. Rsync is used to distribute the page, and mod_rewrite is used to serve the static version of the page rather than the dynamic version. This gives a high degree of granularity- pages can be moved in and out of cache at the URL level. Additionally, the retrieval script can be set to check the response headers of the dynamically generated page- if it is a server error, then the previous version can be left alone until the server recovers.
This technique is especially powerful on sites that are generated via a dynamic content management system. CMS-generated pages can be slow to render, but converting to a purely static export of the site's content will rarely satisfy the needs of the business managers. Identifying key high traffic pages, and installing transparent caches for them can greatly speed up the system's load handling capability.
- Database Tuning
-
Many database systems are installed with their default settings. These settings are intended to give reasonable performance across a wide range of tasks. It is often possible to analyze the real workload of a system and tune the operating parameters, providing a significant increase in speed with no impact on the system's running code. Similarly, indexing columns within the database will often help. If a change to the underlying codebase is acceptable, we have seen significant increases in performance by moving to a stored procedure model for critical functions
- Hardware Upgrades
-
If all else fails, we can help you determine whether it makes more sense to scale your application horizontally (more servers) or vertically (more RAM, more CPU's in your existing servers). Hardware expenditures should be based on locating and fixing the actual system hotspot, rather than blindly throwing money at the problem and hoping it worked.
Institutionalizing Improvements
While many of the improvements above are architectural, we commonly identify changes to application design and implementation that can have significant impact on the long-term performance of the system. Where possible, we identify steps that can be taken in development to certify applications against the expected loads, as well as assist with developing rigorous QA procedures that can prevent these problems from recurring.
Our final deliverable from this type of engagement (in addition to a better performing, more stable website), is thorough and usable documentation describing the steps we took and the incremental performance gain we realized. This will help guide your development team as they work to improve and enhance the system over time- and it provides your management with tangible information documenting what the issues were, what was done to resolve them, and how they will be prevented from recurring in the future.








