A huge wealth of various data exists in software lifecycle, including source code, feature specifications, bug reports, test cases, execution traces/logs, and real-world user feedback, etc. Data plays an essential role in modern software development, because hidden in the data is information about the quality of software and services as well as the dynamics of software development. With various analytical and computing technologies, such as pattern recognition, machine learning, data mining, information visualization and large-scale data computing & processing, software analytics is to enable software practitioners to perform effective and efficient data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks in engineering software and services.
The mission of the Software Analytics Group at MSR Asia is to advance the state of the art in the software analytics area; and utilize our technologies to help improve the quality of software and services as well as the development productivity for both Microsoft and software industry.
Software development has evolved from its traditional form to exhibit different characteristics. The process is more agile and engineers are more collaborative. Analytics on software development data provides a powerful mechanism that we can leverage in order to achieve higher development productivity.
Our group is looking into interesting topics using the data-driven approach such as "how to find bad smells in source code", "how to support efficient code review and test selection", and "how to measure and predict the health of development process".
Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. Where Do Developers Log? An Empirical Study on Logging Practices in Industry. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), Software Engineering In Practice, Hyderabad, India, May 2014.
Xiao Yu, Shi Han, Dongmei Zhang, and Tao Xie. Comprehending Performance from Real-World Execution Traces: A Device-Driver Case. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014), Salt Lake City, Utah, March 2014.
Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. Software Analytics for Incident Management of Online Services: An Experience Report. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE 2013), Experience Papers, Palo Alto, California, November 2013.
Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, and Tao Xie. Software Analytics in Practice. IEEE Software, Special Issue on the Many Faces of Software Analytics, September/October 2013.
Xusheng Xiao, Shi Han, Tao Xie, and Dongmei Zhang, Context-Sensitive Delta Inference for Identifying Workload-Dependent Performance Bottlenecks, in Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA 2013), Lugano Switzerland, July, 2013. (project webpage)
Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang, Mining Succinct and High-Coverage API Usage Patterns from Source Code, in Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, California, May 2013.
Qiang Fu, Jian-Guang Lou, Qingwei Lin, Rui Ding, Dongmei Zhang and Tao Xie, Contextual Analysis of Program Logs for Understanding System Behaviors, short paper, in Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, California, May 2013.
Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, Tao Xie, XIAO: Tuning Code Clones at Hands of Engineers in Practice, in Proceedings of Annual Computer Security Applications Conference 2012, (ACSAC 2012), Orlando, Florida, USA, December, 2012.
Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim, How Do Software Engineers Understand Code Changes? An Exploratory Study in Industry, in Proceedings of the 18th International Symposium on the Foundations of Software Engineering (FSE 2012), Cary, North Carolina, USA, November, 2012.
Qiang FU, Jian-Guang LOU, Qingwei LIN, Rui DING, Dongmei ZHANG, Zihao YE, Tao XIE, Performance Issue Diagnosis for Online Service Systems, in Proceedings of 31st International Symposium on Reliable Distributed Systems (SRDS’12), October, 2012.
Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Jiajun Shen, Tao Xie, Healing Online Service Systems via Mining Historical Issue Repositories, short paper, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), Essen, Germany, September, 2012.
Xiaoyin Wang, Yingnong Dang, Lu Zhang, Dongmei Zhang, Erica Lan, Hong Mei, “Can I Clone This Piece of Code Here?”, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), Essen, Germany, September, 2012.
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie, Performance Debugging in the Large via Mining Millions of Stack Traces, in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, June 2012.
Yingnong Dang, Rongxin Wu, Hongyu Zhang, Dongmei Zhang and Peter Nobel, ReBucket – A Method for Clustering Duplicate Crash Reports based on Call Stack Similarity, in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Zurich, Switzerland, June 2012.
Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie, Software Analytics as a Learning Case in Practice: Approaches and Experiences, in Proceedings of International Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), Lawrence, Kansas, November 2011.
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Jiang Li, and Bin Wu, Mining Program Workflow from Interleaved Traces, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10), July 25, 2010.
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li, Mining Invariants from Console Logs for System Problem Detection, in Proceedings of the 2010 USENIX Annual Technical Conference, USENIX, June 2010.
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li, Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, in Proceedings of the 2009 IEEE International Conference on Data Mining, (ICDM’2009), December 9, 2009.
Selected Talks and Presentations
Dongmei Zhang, Software Analytics in Practice - Approaches and Experiences, Keynote at the 12th National Software Application Conference (NASAC 2013), Tianjin, China, November 9, 2013.
Dongmei Zhang and Tao Xie, Pathways to Technology Transfer and Adoption: Achievements and Challenges, mini-tutorial at the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), San Francisco, USA, May 23, 2013.
Dongmei Zhang, Data-Driven Performance Management in Practice for Online Services, invited talk at the second International Symposium on High Confidence Software (ISHCS 2012), Qingdao, Shandong, China, October 29, 2012.
Dongmei Zhang, Software Analytics in Practice – Approaches and Experiences, Keynote at the 9th Working Conference on Mining Software Repositories (MSR2012), Zurich, Switzerland, June 2, 2012.
Here at Microsoft, there are rich data and real problems providing us great opportunities to do excellent research in software analytics. We welcome new members to our group as well as collaborators from both internally and externally. If you are interested, please contact Dr. Dongmei Zhang at email@example.com.