|
I am a Senior Researcher at Microsoft Research Centre in Cambridge UK focusing on measurement and System Dependability. I work on on Empirical Software Engineering and Measurement (ESM) activities in Mixrosoft focussing on software reliability, quality and process issues. Prior to my current position at Microsoft I was at Compaq Corporation (previously Digital), Ayr Scotland till August 1999, where I ran DPP program which collected and analyzed dependability data from customer sites (see here for more details). Prior to working in Scotland I worked for Digital in Galway Ireland, UNISYS (Scotland and US) and ICL (West Gorton, Manchester). I graduated from Newcastle University. I serve on the steering committee of ISSRE (IEEE International Symposium on Software Reliability Engineering). I am the general chair for ISSRE 2008 in Redmond/Seattle. Research Interests My research interests lie in the area of System Dependability which encompasses Measurement, Reliability and Availability. My areas of focus, using data currently available are:-
Previous analysis of the data logged by applications identified burst of activity which appear to be indicative of potentially catastrophic problems.
Developing metrics that characterize the way software systems are built.
Fault management architectures have not advanced in the last 10 years but I believe opportunities exist in the area of o NT management of hardware failures. Traditionally operating systems manage hardware failures through a knowledge of the hardware architecture, resulting in versions of the OS being tied to specific hardware products. This method cannot be applied to NT whose interface to the hardware is through an abstraction layer resulting in the operating system being independent of the hardware. Therefore the challenge is to develop a management of hardware failures through this abstraction level (i.e. the best of both worlds). o Total system fault architecture. An architecture to manage hardware, operating system, application, cluster etc faults.
Published Papers (selected) One of the main problems with this area of research is the lack of opportunities for publication of papers.
Selected Talks I take a different approach to a lot of System Dependability research which I feel has traditionally addressed the needs of the fault tolerant and telecommunications world which excluding the vast majority of users whose business are increasingly becoming mission critical. My views on the current status of system reliability research were explored in a talk I gave at the French research Laboratory LAAS whose title was "System Reliability Research: Solutions to Problems that no longer exist?" (also in power point format). The notes embedded within the slides explores this underlying question. I also recently gave a talk at both Stanford and Berkeley on the subject of the role of Fault Tolerance in this high availability world (also available in power point format) which discusses the need for fault tolerance architecture but not for the products themselves. Prior work measuring system dependability at Compaq (Digital) My previous work in Digital has highlighted that identifying those dependability drivers that impact the end customer are extremely difficult to measure, or replicate, in a lab environment and require access to data collected from systems running in production environments. At Compaq I developed and managed the DPP program (now renamed to CARS). DPP was based around an automated data collection process, continuously capturing behavioural information from production systems on customer sites. The process was originally developed to understand why improvements in hardware reliability were not resulting in corresponding improvements in the system behaviour on the customer sites. DPP subsequently became the corporate program measuring the reliability and availability of customer system (hardware and operating system), providing information for product and process improvements. Initially the data proved very difficult to analyze as the focus was to identify technical solutions to the technical problems affecting computer systems on the customer sites. In reality dependability drivers are complex and require taking an holistic view of the total customer experience to start to understand them. The programs architecture was flexible allowing it to be applied to monitor systems running OpenVMS (running on VAX and Alpha systems), Digital Unix (running on Alpha systems) and also Windows NT (running on Intel and Alpha systems). Analysis of the behaviour of systems running multiple operating systems on multiple hardware architectures highlighted that the main system dependability drivers were pretty much technologically independent. Sadly the main system dependability drivers (installation of hardware, upgrading of operating system, configuration change) have not changed in decades. Other areas investigated whilst I was at Compaq were
Cambridge Group's home page. |