Tier III Support
Many of our customers put datacenter support responsibilities on the shoulders of the development team. Unfortunately, this usually breaks down. Critical problems we've seen are:
- Developers hate wearing pagers - often, the support team hands the pager to the most junior, least qualified developer, who is not very invested in the system's success - and routinely sleeps through alerts.
- Developers are undisciplined system administrators- too often, a developer makes an expedient configuration change as a quick fix to an issue. This change is never propagated to the configuration management system, and is obliterated during the next build, causing unnecessary downtime.
- Developers have difficulty implementing rigorous build and test regimen - Generally speaking, developers shouldn't be doing their own final QA. Similarly, they cannot be relied on to adequately manage a build process from integration, to stage, to production without modifying the codebase as the files are transitioned. In a rigorous build process, only external configuration files should be modified as the codebase is moved. When developers run the build process, they will often compile special application versions for each platform. This means that the version going to production has never actually been tested.
The bottom line is that developers usually do a bad job as administrators- and are quickly so miserable, they turn into bad developers as well. Using an external system administration team has several benefits:
- Responsive team when the inevitable incidents happen
- Proactive maintenance and automated scrips can prevent most outages.
- Experienced team knows how to construct a solid build pipeline.
- Solid process documents system configurations and manages their change over time.
- Comprehensive suite of support tools puts system status and historical reporting at your fingertips.
A solid development and production process can make all the difference. Developers should have full access to the integration servers, be able to configure them as needed, and document those changes. The operations team should apply the new codebase and configuration changes to the staging system, verify correct operation, and perform regression testing. At this point, it is reasonably safe to migrate the code and configuration changes to production during a maintenance window.
When the inevitable call comes in, a robust process kicks into gear. Alerts are usually generated by automated systems, but we also provide a 24x7 manned 800 number to reach an oncall engineer.








