Leveraging Human Intelligence: Semi-automated Processing in Assuring Access to Digital Content

  • Mohammad Raza ,
  • Natasa Milic-Frayling

Published by Open Research Challenges in Digital Preservation (ORC-iPres)

The need for standardization in the content production industry has led producers of popular authoring and publishing applications to adopt structured mark-up languages, such as XML, to implement their content file formats. As part of our effort to ensure long term access to such content, we need to consider properties of the mark-up schemas and devise methods to enable effective mapping among them. The methods may range from a fully automated mapping between two formats to semi-automated format transformation of individual artifacts through human intervention. The former is an ideal scenario and achievable when full specifications of the original and target formats are available and when the development of a full converter is economically feasible. However, a common lack of these resources creates challenges and requires exploration of alternative approaches.