Testing c-tree Server Failure
Simulating c-tree Server system failure situations is valuable because the results of such tests provide insight into the expected behavior of the system in the event of an actual catastrophic failure. Monitoring the behavior of the system in such cases provides the following benefits:
- Practice using monitoring tools
- Idea of patterns to look for
- Gives idea of state of the system following a catastrophic failure, which influences recovery plans.
Below are some suggested tests that exhaust system resources in an attempt to cause system failure. In each of these cases, use system monitoring tools as described in the next chapter to observe the effect of the test on the system.
- Simulate Saturated CPU: Write a CPU-intensive program and use it to exhaust available CPU resources while the server is running. A likely result of this test is reduced system throughput as threads wait for CPU time. This reduced throughput may trigger second-order effects (such as increased memory use due to increased system queue backlog).
- Simulate Insufficient Disk Space: Write a program that creates large files and use it to exhaust available disk space while the server is running. A likely result of this test is that the c-tree Server shuts down because it is unable to write to its transaction logs.
- Simulate Insufficient Memory: Write a program that allocates large amounts of memory and use it to exhaust available memory while the server is running. A likely result of this test is reduced system throughput due to swapping of virtual memory to disk.
- Simulate Saturated Network: Write a program that performs significant network I/O and use it to consume network bandwidth while the server is running. A likely result of this test is reduced system throughput due to slower communication between clients and the c-tree Server.
- Simulate Abnormal Server Termination: Some of the above tests will cause the server to terminate abnormally. It is also possible to force abnormal server termination directly using the operating system’s ability to forcibly terminate a process. For example, the command ‘kill -9 <process_id>’ can be used on many Unix systems. Use this approach to test automatic recovery and system recovery procedures following an abnormal server termination.
|