Measuring System and Software Reliability using an Automated Data Collection Process

The factors which affect the behaviour of the customer’s computing environment, which is undergoing a revolution away from a server or timeshare centric model to a client/server or distributed model, can no longer be identified solely through using traditional methods of data collection. Digital Equipment Corporation has developed an automated data collection process, collecting on-system data logging information from customer sites that has yielded consistent, quantitative, high integrity information. This information has been used to preactively focus on direct product and process improvements. This paper describes the on-system data logging process and analysis methodology used by Digital to measure system, product and operating system reliability with examples of the application of the techniques that provide insight into the causes of failures.