Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Software Analytics

A huge wealth of various data exists in software lifecycle, including source code, feature specifications, bug reports, test cases, execution traces/logs, and real-world user feedback, etc. Data plays an essential role in modern software development, because hidden in the data is information about the quality of software and services as well as the dynamics of software development. With various analytical and computing technologies, such as pattern recognition, machine learning, data mining, information visualization and large-scale data computing & processing, software analytics is to enable software practitioners to perform effective and efficient data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks in engineering software and services.

The mission of the Software Analytics Group at MSR Asia is to advance the state of the art in the software analytics area; and utilize our technologies to help improve the quality of software and services as well as the development productivity for both Microsoft and software industry.

Software Systems

Depending on scale and complexity, the spectrum of software systems can span from operating systems for devices to large networked systems that consist of thousands of servers. System quality such as reliability, performance and security, is the key to success of modern software systems. As the system scale and complexity greatly increase, larger amount of data, e.g., run-time traces and logs, is generated; and data has become a critical media to monitor, analyze, understand and improve system quality.

 

Our group has on-going efforts in this challenging yet promising area. For example, we are mining huge operating system traces to help domain experts quickly identify performance issues. We are researching analysis techniques and developing easy-to-use tools to assist performance diagnosis for large scale networked systems.

 

Software Users

Users are (almost) always right because ultimately they pay for the software and services in various ways. Therefore, it is important to continuously create the best user experience. Usage data collected from the real world reveals how users interact with software and services. The data is incredibly valuable for software practitioners to better understand their customers and gain insights on how to improve user experience accordingly.

 

Our group works on various usage data exploration and analysis projects. The goal is to provide advanced techniques to enable the acquisition of actionable information and driving informed decision for creating the best user experience.

 

Development Process

    Software development has evolved from its traditional form to exhibit different characteristics. The process is more agile and engineers are more collaborative. Analytics on software development data provides a powerful mechanism that we can leverage in order to achieve higher development productivity.

     

    Our group is looking into interesting topics using the data-driven approach such as "how to find bad smells in source code", "how to support efficient code review and test selection", and "how to measure and predict the health of development process".  

Selected Publications

Meng-Hui Lim, Jian-Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding and Dongmei Zhang. Identifying Recurrent and Unknown Performance Issues. To appear in Proceedings of IEEE International Conference on Data Mining 2014 (ICDM 2014), Shenzhen, China, December 14-17, 2014.

Chengnian Sun, Haidong Zhang, Jian-Guang Lou, Hongyu Zhang, Qiang Wang, Dongmei Zhang, and Siau-Cheng Khoo. Querying Sequential Software Engineering Data. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014), Hong Kong, China, November 2014.

Yu Cao, Hongyu Zhang, Sun Ding. SymCrash: Selective Recording for Reproducing Crashes. In Proceedings of the 29th IEEE/ACM International Conference on Automated Software Engineering (ASE 2014), Västerås, Sweden, September 2014.

Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, Zhe Wang. Correlating Events with Time Series for Incident Diagnosis. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge Discovery and Data Mining (SIGKDD 2014), Industry & Government Track, New York City, August 2014.

Rongxin Wu, Hongyu Zhang, Shing-Chi Cheung, and Sunghun Kim. CrashLocator: Locating Crashing Faults Based on Crash Stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014), Bay Area, California, July 2014.

Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Tao Xie. Mining Historical Issue Repositories to Heal Large-Scale Online Service Systems. In Proceedings of the 44th annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2014), Atlanta, Georgia, June 2014.

Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. Where Do Developers Log? An Empirical Study on Logging Practices in Industry. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), Software Engineering In Practice, Hyderabad, India, May 2014.

Xiao Yu, Shi Han, Dongmei Zhang, and Tao Xie. Comprehending Performance from Real-World Execution Traces: A Device-Driver Case. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014), Salt Lake City, Utah, March 2014.

Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. Software Analytics for Incident Management of Online Services: An Experience Report. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE 2013), Experience Papers, Palo Alto, California, November 2013.

Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, and Tao Xie. Software Analytics in Practice. IEEE Software, Special Issue on the Many Faces of Software Analytics, September/October 2013.

Xusheng Xiao, Shi Han, Tao Xie, and Dongmei Zhang, Context-Sensitive Delta Inference for Identifying Workload-Dependent Performance Bottlenecks, in Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA 2013), Lugano Switzerland, July, 2013. (project webpage)

Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang, Mining Succinct and High-Coverage API Usage Patterns from Source Code, in Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, California, May 2013.

Qiang Fu, Jian-Guang Lou, Qingwei Lin, Rui Ding, Dongmei Zhang and Tao Xie, Contextual Analysis of Program Logs for Understanding System Behaviors, short paper, in Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, California, May 2013.

Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, Tao Xie, XIAO: Tuning Code Clones at Hands of Engineers in Practice, in Proceedings of Annual Computer Security Applications Conference 2012, (ACSAC 2012), Orlando, Florida, USA, December, 2012.

Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim, How Do Software Engineers Understand Code Changes? An Exploratory Study in Industry, in Proceedings of the 18th International Symposium on the Foundations of Software Engineering (FSE 2012), Cary, North Carolina, USA, November, 2012. 

Qiang FU, Jian-Guang LOU, Qingwei LIN, Rui DING, Dongmei ZHANG, Zihao YE, Tao XIE, Performance Issue Diagnosis for Online Service Systems, in Proceedings of 31st International Symposium on Reliable Distributed Systems (SRDS 2012), October, 2012.

Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Jiajun Shen, Tao Xie, Healing Online Service Systems via Mining Historical Issue Repositories, short paper, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), Essen, Germany, September, 2012.

Xiaoyin Wang, Yingnong Dang, Lu Zhang, Dongmei Zhang, Erica Lan, Hong Mei, “Can I Clone This Piece of Code Here?”, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), Essen, Germany, September, 2012.

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie, Performance Debugging in the Large via Mining Millions of Stack Tracesin Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, June 2012.

Yingnong Dang, Rongxin Wu, Hongyu Zhang, Dongmei Zhang and Peter Nobel, ReBucket – A Method for Clustering Duplicate Crash Reports based on Call Stack Similarity, in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Zurich, Switzerland, June 2012.

Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie, Software Analytics as a Learning Case in Practice: Approaches and Experiences, in Proceedings of International Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), Lawrence, Kansas, November 2011.

Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, and Bin Wu, Mining Program Workflow from Interleaved Traces, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), July 25, 2010.

Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li, Mining Invariants from Console Logs for System Problem Detection, in Proceedings of the 2010 USENIX Annual Technical Conference, USENIX, June 2010.

Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li, Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, in Proceedings of the 2009 IEEE International Conference on Data Mining, (ICDM 2009), December 9, 2009. 

Selected Talks and Presentations

Dongmei Zhang and Tao Xie, Software Analytics - Achievements and Challenges, Tutorial at the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014), Hong Kong, China, November 2014.

Dongmei Zhang, Software Analytics in Practice - Approaches and Experiences, Keynote at the 12th National Software Application Conference (NASAC 2013), Tianjin, China, November 9, 2013.

Dongmei Zhang and Tao Xie, Pathways to Technology Transfer and Adoption: Achievements and Challenges, mini-tutorial at the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), San Francisco, USA, May 23, 2013.

Dongmei Zhang, Data-Driven Performance Management in Practice for Online Services, invited talk at the second International Symposium on High Confidence Software (ISHCS 2012), Qingdao, Shandong, China, October 29, 2012.

Dongmei Zhang, Software Analytics in Practice – Approaches and Experiences, Keynote at the 9th Working Conference on Mining Software Repositories (MSR2012), Zurich, Switzerland, June 2, 2012.

Dongmei Zhang and Tao Xie, Software Analytics in Practice, mini-tutorial at the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, June 6, 2012.

Hiring

Here at Microsoft, there are rich data and real problems providing us great opportunities to do excellent research in software analytics. We welcome new members to our group as well as collaborators from both internally and externally. If you are interested, please contact Dr. Dongmei Zhang at dongmeiz@microsoft.com.

Members
Dongmei Zhang
Dongmei Zhang
Haidong Zhang
Haidong Zhang
Qiang Wang
Qiang Wang
Ray Huang
Ray Huang
Song Ge
Song Ge