A huge wealth of various data exists in software lifecycle, including source code, feature specifications, bug reports, test cases, execution traces/logs, and real-world user feedback, etc. Data plays an essential role in modern software development, because hidden in the data is information about the quality of software and services as well as the dynamics of software development. With various analytical and computing technologies, such as pattern recognition, machine learning, data mining, information visualization and large-scale data computing & processing, software analytics is to enable software practitioners to perform effective and efficient data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks in engineering software and services.
The mission of the Software Analytics Group at MSR Asia is to advance the state of the art in the software analytics area; and utilize our technologies to help improve the quality of software and services as well as the development productivity for both Microsoft and software industry.

|
|
Software development has evolved from its traditional form to exhibit different characteristics. The process is more agile and engineers are more collaborative. Analytics on software development data provides a powerful mechanism that we can leverage in order to achieve higher development productivity.
Our group is looking into interesting topics using the data-driven approach such as "how to find bad smells in source code", "how to support efficient code review and test selection", and "how to measure and predict the health of development process". |
Here at Microsoft, there are rich data and real problems providing us great opportunities to do excellent research in software analytics. We welcome new members to our group as well as collaborators from both internally and externally. If you are interested, please contact us at dongmeiz@microsoft.com.
Selected Publications
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie, Performance Debugging in the Large via Mining Millions of Stack Traces, to appear in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, June 2012.
Yingnong Dang, Rongxin Wu, Hongyu Zhang, Dongmei Zhang and Peter Nobel, ReBucket – A Method for Clustering Duplicate Crash Reports based on Call Stack Similarity, to appear in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Zurich, Switzerland, June 2012.
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, and Bin Wu, Mining Program Workflow from Interleaved Traces, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10), July 25, 2010.
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li, Mining Invariants from Console Logs for System Problem Detection, in Proceedings of the 2010 USENIX Annual Technical Conference, USENIX, June 2010.
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li, Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, in Proceedings of the 2009 IEEE International Conference on Data Mining, (ICDM’2009), December 9, 2009.



