Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Facilitating Semantic Research
May 20, 2009 9:30 AM PT

On May 20, as part of the Open Repositories Conference being held May 18-21 at Atlanta’s Georgia Institute of Technology, Tony Hey, corporate vice president of Microsoft Research’s External Research group, will be announcing the public availability of downloads of a pair of tools, Zentity and the second version of the Article Authoring Add-in for Word 2007, part of the scholarly-communication tools life cycle announced during Faculty Summit 2008. The downloads are the latest in a regular drumbeat of releases from the Scholarly Communications team within Hey’s group. Hey will make the announcement during a special session during the conference, an annual, worldwide meeting that gathers those responsible for the conception, development, implementation, and management of digital repositories to address theoretical, practical, and strategic issues.

Dale Heenan, Web project manager at the United Kingdom’s Economic and Social Research Council (ESRC), has had an opportunity to implement an early version of Zentity, a research-output repository platform that enables researchers to store, archive, and preserve their work more efficiently, as a pilot to investigate ways to improve and extend the organization’s Social Sciences Repository.

“We have been impressed with Zentity’s flexibility and ease of use,” Heenan says. “Our pilot project has demonstrated that this innovative software could form the basis of a new ESRC Social Sciences Repository which would better meet the needs of the organization and our users. This is a promising solution from Microsoft, and we look forward to implementing Zentity version 1 over the coming months.”

In the days leading to the opening of the conference, for which Microsoft Research is a leading sponsor, Lee Dirks, director of the Education and Scholarly Communication team, took a few minutes to discuss his team’s efforts, along with Alex Wade, director of Scholarly Communication, and Savas Parastatidis, now an architect for Live Search after serving as the primary architect for the Zentity project:

Q: What does the Scholarly Communication team do, and what is your vision for this work?

Dirks: When we look at the broad field of academic research, there are multiple steps in the life cycle. There is the initial concept of collecting data and doing analysis, then putting forward a hypothesis. Following this is an authoring, or writing stage. Then there’s the publication/dissemination phase. That ranges from blogging to publishing an article or a book to delivering a paper at a conference. And then, finally, one stores or archives the material, with an intent and goal to preserve it for posterity.

Those four steps form the basic scholarly-communication life cycle. But we feel it is important to augment those phases with two additional concepts. One is discoverability, the pervasive need to search and find information during the process. The other concept is collaboration, the need to partner and work with other authors, editors, librarians, or research scientists over the course of the entire life cycle.

Scholarly Communication life cycle

At Microsoft Research, we have looked at that entire life cycle, mapped the multitude of tools, resources, and technologies across Microsoft, and saw that we could—and should—be adding more value into the process. Indeed, most academic institutions around the globe have licensed Microsoft software—they have it available—but they aren’t actively utilizing it as part of this life cycle. What our team is doing proactively is building add-ins and accelerators that go the “last mile” to help academics get their jobs done more efficiently. It’s taking software that the institutions already own and make available to their campuses and helping researchers become more productive.

In this effort, I want to stress that all of the software, accelerators, and add-ins that we are making available are free to demonstrate the value of the Microsoft platform and the various products the institutions have already licensed.

Q: What will Tony be announcing today?

Wade: We are announcing version 1.0 of Zentity, our research-output repository platform, and version 2.0 of the Article Authoring Add-in for Word 2007.

Zentity, previously called Research-Output Repository Platform and code-named Famulus, is a platform that allows institutions to store all of their digital scholarship: papers, lectures, presentations, videos—anything that might be collected by the university as part of the digital output of their researchers and scholars. We have released two betas over the past nine months, one last fall and one in the winter, and this refreshes the user interfaces, adds new UI controls, and complements the set of services we provide in that package.

With regard to the Article Authoring Add-in for Word 2007, we’ve added a lot of great functionality over version 1.0, which was released last summer. You now can upload directly into a repository—ours or those of others—via the SWORD [Simple Web Operation for Repository Deposit] protocol. We also have added support for authoring Object Reuse and Exchange [ORE] resource maps within the Word environment. We’ve also added the ability to perform literature searches and to import the bibliographic information in Word with one click, which makes it very simple to quickly add citations into a paper.

Q: You’re planning to release these as open source?

Dirks: Yes. First and foremost, we’re releasing the binaries, but soon thereafter, we’ll release both of these as open source. Once they are available, our big push over the next 12 to 18 months will be to build a worldwide community around these assets.

Q: Savas, what was the motivation for the architectural approach you took with Zentity?

Parastatidis: The main idea was to investigate the use of a semantic store, a store that is rich in terms of entities and in terms of relationships between those entities but also is very flexible and allows developers to build new, interesting, and engaging applications on top of it. We also wanted to demonstrate how easy it is to build interesting solutions on top of existing Microsoft products and technologies by making use of Microsoft’s development tools.

With Zentity version 1.0, we are primarily focusing on the value that can be added to a particular community, the scholarly-communication community. However, the approach we have taken is general, so we are eager to see what the developers out there will do with our semantic store/platform in other domains. We believe in the value of semantics. Zentity is an example of how semantics-rich applications could be made possible with Microsoft technologies.

Dirks: This is a tool, architected by Savas and realized by Alex, that is vaulting us into this semantic space in a very positive way.

We’ve built a Web interface that sits on top of SQL Server® and the ADO.NET Entity Framework that lets anyone say: “I co-authored this book with this colleague, and we gave a presentation at this conference. Here’s the video of it. Here’s the audio file. Here’s the data set I used and some images.” Then you can navigate your way through all of those things; they’re all connected by rich relationships between the entities.

Our approach is following the ideas behind the Semantic Web that Tim Berners-Lee has envisioned. This is our team’s first contribution to help realize that vision for the academic community, and we are very excited about it.

Q: You will be hosting a workshop as part of the Open Repositories Conference. What do you have planned for that?

Wade: We are going to host a post-conference workshop on May 21 where we’ll take half a day and walk participants through Zentity and demonstrate some of the new functionality that they haven’t seen in the betas. We’ll also be demonstrating interoperability and integration with several of the other tools, such as the Article Authoring Add-in.

Q: Have you had any significant collaborators on this work?

Parastatidis: Yes, we worked closely with people from the SQL Server organization. We wanted to make sure that we were making the best use of their technologies, such as SQL Server 2008 and Entity Framework, in a way that was beneficial to the task at hand, easy for people to understand, given our intention to make the source code available at some point, and efficient. Furthermore, we wanted to make sure that we were building on top of their technologies as if we were not part of Microsoft. Remember that one of our primary goals was to demonstrate how we can add value on top of existing Microsoft technologies. We wanted to do so without making use of anything that is not already available to outside developers. Everything we’ve done is using publicly available APIs and well-documented functionality.

We also worked closely with the community in order to better understand its requirements, its use cases, and the emerging standards in the domain. I’ll let Lee talk more about that.

Dirks: Absolutely. Reaching out and engaging the community has been crucial. In the academic community, there are thousands of people already building and managing institutional repositories and research-output systems. Very early on, we made a concerted effort to inform them about what we were doing. Specifically, for those people who already have access to Microsoft software, we’re simply trying to provide added value on top of what they’ve already licensed. To be very clear, we’re not open-sourcing the entire stack; we’re open-sourcing this one component—the final abstraction layer. This is part of our team’s Open Edge strategy.

As part of engaging with the repository community, we have been able to create a very constructive dialogue with the key players in the space to ensure that we’re participating productively. We want to ensure that our work with the community is beneficial for everyone involved, and, in that spirit, we are initiating a number of side projects to demonstrate how we can improve the situation since Microsoft tools and products are widely used in this space. We start our dialogues by asking: “Are you having any issues trying to do your work? What can we do to facilitate a solution? How can we all interoperate? What learnings can we share?”

Q: What sort of response are you getting from the academic community?

Dirks: It is still early days for our team’s work. There are certain things we’re still working through, but I foresee a time in the near future where we will have tremendous impact. We sit down regularly with key influencers and innovators in specific fields—like academics in fields ranging from physics to chemistry to biology.

We stress to the researchers: “Instead of building a whole new stack from scratch, why don’t you focus on what you need to do, the research part, and not worry about the rest of the stack? Microsoft already has that built out, and it is a solid, supported platform.”

Wade: We have received lots of good feedback, and it has shaped the way that we’ve evolved the version 1.0 of Zentity. What we set out to do with this is to make it very easy for somebody to get installed in an environment they already have. We have been working with universities that already have, for example, a SQL Server configured within their department, their library, their university. They can install Zentity right on top of that, it creates a new database for them, and they don’t have to touch SQL Server directly. Our installer takes care of the configuration for them.

With the Article Authoring Add-in, we’ve had great success in working directly with a number of publishers, most significantly the National Library of Medicine. One of the primary features, even back with version one, allows authors to author and edit articles using the National Library of Medicine’s DTD [document-type definition], an XML format.

Dirks: We have engaged with the right people in the space. They’re incredibly smart, practical people who simply want to get their jobs done. They have a vision for their field, and they are working with and through us to help to realize their objectives. They likely have many other aspects of their strategy for evolving their field—and Microsoft is only one of them—but in that process, we can play a constructive role to advance scholarly communication and speed innovation. We feel that this is the primary role of the research function at Microsoft.

That said, in the different phases of the life cycle that we work with, leveraging the different assets that we’ve made available, we’re seeing different uptakes. It is necessary to work in different ways with different communities. And in this process, we‘re slowly seeing the perception of Microsoft shift. We’re now invited to participate in meetings that we might not have been invited to before. We’re being asked to be on advisory boards and task forces. We’re being invited to a dialogue that Microsoft would not have been invited to two or three years ago. This is very positive progress, but we still have a long way to go. The academic time frame is a lot different from the corporate time frame. Microsoft Research’s External Research group bridges the gap in many ways. We have to be prepared that our efforts might take five years or even 10 years. But we’ve prepared for the long term. This is a significant effort for us.

Q: Savas, what do you find interesting about Zentity from a research perspective?

Parastatidis: We believe that it is a great platform for research organizations to capture their research output and the relationships between them. But going forward, we believe that this could become a great platform for domain-specific developers to build their own applications on top of our platform.

Imagine, for example, a museum or a patent office or any other application that requires not only for information to be stored, but also the relationships between the various entities to be stored—dynamically stored, dynamically navigated, dynamically visualized, and so on. The application architects and developers don’t have to think and model all the relationships in advance. They can build an application that is flexible enough to evolve as their data evolves, as new concepts and new relationships emerge. Zentity is designed to be a great tool for such scenarios by exposing a very simple-to-use API.

Q: What are the bigger goals of your group?

Dirks: At the end of the day, we have two larger goals and a trailing one. First and foremost—and this has been the charge that Craig Mundie gave us when Tony was hired—it’s to advance science, to advance the state of the art in academic research. Second is, in that process, to fold our learnings and experiences back into the product groups at Microsoft. Third is to engage with the community to improve the perception of Microsoft in the academic realm. But at the core, our focus is to facilitate discovery and to accelerate innovation.

Q: What have you found most exciting about working on these projects?

Alex Wade, Lee Dirks, and Savas Parastatidis (Photo by Kathleen Kennedy Knies)
Zentity and the Article Authoring Add-in for Word 2007 are brought to you by (from left) Alex Wade, Lee Dirks, and Savas Parastatidis (Photo by Kathleen Kennedy Knies)

Parastatidis: It’s been fantastic working with Lee and Alex and the rest of the development team. Doing an investigation into the semantic space made it even more exciting. Interacting with the community and trying to analyze the requirements and deliver something to meet those requirements was very educational―and very, very rewarding. I learned a lot from this process, and the community consists of some great people.

Wade: I’ll echo some of the same sentiments. It’s very enriching to have the ongoing conversations with the rich community of participants in digital scholarship and to work with folks who are developing the standards and protocols on how systems like Zentity can talk to each other.

Rather than us cloistering ourselves and coming up with communication protocols between, for example, the Article Authoring Add-in and our repository, we can adopt the standards being developed and participate in the system interoperability that is going to be important for these things to succeed.

Dirks: When I started with Microsoft Research two years ago, walking onto a college campus as a representative of my company was daunting because our dialogue with academics was really just re-commencing. But we’ve made tremendous progress, and I feel very confident that Microsoft will dramatically evolve the academic research process as we currently know it in a very positive way. Microsoft has the ability to operate at a large scale and help the community have a substantial impact.

Overall, we are moving in the right direction regarding how Microsoft engages with the open-source community. Tony Hey’s entire group is on the leading edge in making our case that we can engage in a positive dialogue, that all sides can benefit, and that we can all learn something in that process. It is critical when Microsoft initiates any engagement with academics that we listen and understand. We need to observe and adapt. I can tell you that, for many academics we work with, we represent a new Microsoft.