Remediation Services
During the remediation phase, we will install the tools and systems required to keep the system operational, address any security or operational issues we've discovered, develop Standard Operating Procedures around incidents, maintenance, and upgrades, and provision supporting external systems (Issue tracking, Client document libraries, 800#s, and so on).
We pay special attention to the development and installation of the monitoring system. Monitoring systems typically provide decent value out of the box, but can be greatly improved through the development of custom tests specific to your installation. An ecommerce provider will typically need to monitor backend connectivity to a fulfillment system, while a community based site will often need to keep a close watch on the number of posts coming from overseas ip addresses (to fight comment spam). During this phase we implement the basic monitoring system as well as developing any required customer-specific tests needed.
Once this step is complete, we are ready to move into the operational phase of the project.
Tools:
- Nagios - Network Monitoring System
- Big Brother - Agent-based Monitoring System
- Nessus - Network Vulnerability Scanner
- Bugzilla - Issue Tracking System
- Client Portal - documentation repository, interface to other online services
- Integrit- host-based intrusion detection scanner
- Subversion - version control system to manage document version issues.
- VQWiki - System documentation and configuration reference
Activities:
- Configuration updates - change the system configuration as required to remediate security or performance issues isolated during the audit phase.
- Monitoring System Installation / Customization / Configuration - we will customize the selected monitoring system to precisely meet your installation's needs
- Failover Testing - we will test fault tolerance by forcing physical faults and verifying continuing operation.
- Replication Testing - if the database system relies on replication as a failover strategy, we will test that replication is working by making small DDL and data changes, and testing that they appear in the remote system within the time specified by the operational requirements.
- SOP Development - we will develop the standard operating procedures as a sitewide runbook for the system
- Backup System Testing - we will perform intensive tests of the backup system to make sure the projected storage and recovery requirements are met
- Alarm System Testing - we will generate synthetic outages and escalations and test compliance with the standard operating procedures.
- Load Balancer Testing - we will force application faults and verify that the load balancer both detects the issue and correctly handles it.
- Set up intrusion detection system - we will set up a network-based intrusion detection system, as well as a host-based intrusion detection scanner on each critical system.
Documents:
- Incident Response Escalation Tree
- Incident Response Procedure
- incident triage
- routine incident
- critical incident
- security incident
- Daily / Weekly / Monthly Maintenance Plan
- Archival Strategy (retention, rotation, restoration procedure)
- Build Procedure
- Configuration Change Control Procedure








