Technologies for the Scholarly Communications Lifecycle

Education and Scholarly Communication

Scholarly communications lifecycleA key element of the Microsoft Research Connections vision is to support the scholarly communications lifecycle with software and services so that data and information flow in a coordinated and seamless fashion.

By working with members of the research community, Microsoft Research Connections develops technologies that are designed for researchers and academics.  

Frequently Asked Questions

Q. How can I convert 2007 Office MathML (OMML) to MathML?

A. Beta versions of the Office Word 2007 MathML Transforms (XSLT) are now available for download from the Microsoft Connect site.

Join the beta program (sign in with your Windows Live ID)

Q. How can I extract OMML from the equations bitmap?

A. See Murray Sargent’s post on how one can extract the 2007 Office system MathML (OMML) from math-zone images stored in .doc files that have been converted for use in Office Word 2003 and earlier versions of Office Word.

Learn how to extract OMML from the equations bitmap

Q. What are the names of the clipboard slots for MathML?

A. We write to and read from two clipboard slots entirely devoted to MathML. These are "MathML" and "Presentation MathML" (without the quotation marks). Note that we always sniff the text slot for Presentation MathML, and if we detect it, we will convert it to an equation on paste into Microsoft Office Word. We can also write Presentation MathML to the text slot, depending on the setting of the clipboard option under Equations | Tools | Equations Options.

Q. What is not allowed in an equation?

A. The following list is not the full set of the limitations to equations in Microsoft Office Word 2007, but it encapsulates the most important items both from a schema and typography perspective.

  • You can have only one font per equation. (The single font limitation pertains to math fonts only. You can use other fonts [for example] for characters in other languages.)
  • You can have only one font size per equation. Note that we automatically scale the script and script-script level, so that items such as superscripts and numerators of "small fractions" appear smaller than the regular text size. However, these characters are considered the same font size as the rest of the text in the equation.
  • We do not support TeX-style tweaks to positioning.
  • We do not support insertion of Office Word tables inside of equations. However, you are permitted to have equations inside of tables.
  • You cannot insert clip art, shapes, charts, WordArt, drop caps, or any breaks other than line breaks (page breaks, section breaks, and column breaks are disallowed).
  • You cannot specify the default vertical spacing between wrapped lines of the same equation. For a series of adjacent equations in the same paragraph, you cannot override the default spacing between the equations.
  • Some TeX-style tweaks are allowed. We support thin and other positive spaces, along with phantoms/smashes.

Q. How can I identify a .docx file by file signature rather than by file extension?

A. Take the following steps:

  1. Check that the file is a .zip file
  2. Check for a file called [Content_Types].xml at the root of the .zip
  3. Check for a file in /_rels/ called .rels
  4. Follow additional steps from the following article:
    Building Word 2007 Documents Using Office Open XML Formats
  5. Get the corresponding target attribute's value
  6. Open [Content_Type].xml
  7. Look for the Override element with the PartName attribute equal to the target attribute you obtained in step 5
  8. Check the ContentType attribute—it must be one of the following:
    • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml
    • application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml
    • application/vnd.ms-word.document.macroEnabled.main+xml
    • application/vnd.ms-word.template.macroEnabled.main+xml

Q. Do moves in Microsoft Office Word 2007 show up as insertions and deletions when the file is opened in an earlier version of Office Word?

A. Yes. When Track Changes is on in an Office Word 2007 document and the document is opened in an earlier version of Word, moves are displayed as insertions and deletions.

Q. Does the Document Inspector scrub document variables along with document properties?

A. Yes. Document variables are removed when the Document Properties and Personal Information setting is checked and the Document Inspector is run.

Q. Can the Document Inspector be automated?

A. Yes. The Document Inspector can be automated whenever Winword.exe is running.

Microsoft External Presentations

Word 2007 and Scholarly Publishing (inera.com)
Pre-meeting seminarfor:Society for Scholarly Publishing (SSP) 30th Annual Meeting (sspnet.org)
May 2008, Boston, MA, United States

Association of Learned and Professional Society Publishers (alpsp.org.uk)
International Scholarly Communications Conference
April 2007, London, England

The Future of Research Communication (alpsp.org) (PDF file, 3.52 MB)

External Links

arXiv (arxiv.org)
Hosted by Cornell University Library, the arXiv is an e-print service for physics, mathematics, non-linear science, computer science, quantitative biology, and statistics. As of February 16, 2008, arXiv accepts submissions of Microsoft Office Word .docx files and other Office Open XML (OOXML) documents. Microsoft External Research has provided support to arXiv to develop the facilities for handling OOXML documents.

View the arXiv.org DOCX submission page

myExperiment (myexperiment.org)
myExperiment makes it easy to share scientific workflows that define, to varying levels of detail, procedures for specific types of experiments. These workflow specifications take the form of files that can be executed by workflow tools, such as the Taverna workbench. This is a University of Manchester and University of Southampton project.

SWORD (Simple Web Service Offering Repository Deposit) (jisc.ac.uk)
SWORD is a lightweight Web service protocol for a "smart deposit" tool to make it easier to populate repositories. SWORD's goal is to improve the efficiency and quality of repository deposit and to diversify and expedite the options for timely population of repositories with content. SWORD also promotes a common deposit interface and supports the principles of interoperability.

Open Archives Initiative–Object Reuse and Exchange (OAI-ORE) (openarchives.org/ore/)
Microsoft External Research has been a key contributor in the initial, "alpha" version of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) initiative. OAI-ORE defines standards for the identification, description, and exchange of aggregations of Web resources. The structure and semantics of each Aggregation is described by a Resource Map (ReM), which is a network-accessible resource that encapsulates a set of Resource Description Framework (RDF) statements. These statements describe an Aggregation as a resource with a URI, and enumerate the constituents of the Aggregation and the relationships among those constituents.

PubMed Central (pubmedcentral.nih.gov/)
PubMed Central is a free digital archive of biomedical and life sciences journal literature at the United States National Institutes of Health (NIH) and is developed and managed by NIH's National Center for Biotechnology Information (NCBI) in the National Library of Medicine (NLM).

Murray Sargent's blog (blogs.msdn.com)
Murray Sargent is a software development engineer on the 2007 Microsoft Office system team. He has been working on the RichEdit editor since 1994. In his MSDN blog, he focuses on mathematics in the 2007 Office system, along with some posts on RichEdit and other related topics.

The Open University Mathematics Online Project Guide: Using the mathematical features of Word 2007 (mcs.open.ac.uk)
The Mathematics Online Project at the Open University (UK) has worked actively with the Microsoft Office Word 2007 team to develop a tool for electronic marking of student mathematics assignments. As part of this ongoing work, Gaynor Arrowsmith has authored a quick guide to the use of the mathematical features of Office Word 2007. The guide is available for download from the Open University Web site.

CT Watch Quarterly (August 2007 issue) (ctwatch.org)
CT Watch Quarterly is an online journal that focuses on cyberinfrastructure- related research that is critical to collaboration and information dissemination within the science community as a whole. This issue of CT Watch Quarterly ("The Coming Revolution in Scholarly Communications & Cyberinfrastructure") centered on recent developments and directions in scholarly communication.

Nature Magazine's Nascent blog (blogs.nature.com)
The "Nascent" blog (by Howard Ratner, Chief Technology Officer of the Nature Publishing Group) is a helpful source for insights and updates about the scientific publishing industry.

Inera (inera.com)
Inera's "NLM DTD Resources" page provides useful details about the National Library of Medicine (NLM) Journal Archiving and Interchange Tag Suite, the de facto standard full-text DTD for scholarly publishing.

Design Science (dessci.com)
The MathType Software Development Kit (SDK) is provided "as-is," free of charge. Design Science does not provide support for the SDK. The MathType API allows you to call functions used by the MathType Commands For Word. On Windows, this API is split between MathPage.WLL and MT5.DLL.

HighWire Press (highwire.stanford.edu)
Visit the Stanford University HighWire Press page for publisher support.