Presentable Document Format: Improved On-demand PDF to HTML Conversion

Search engines such as Google and MSN Search crawl and index files in Adobe’s Portable Document Format (PDF) alongside material in HTML. Google furthermore offers a View as HTML option for PDF that includes query term highlighting. The visual appearance of these HTML files converted from PDF is very poor. In this paper we claim that significant improvements to the quality of on-demand PDF to HTML conversion can be achieved at insignificant cost in terms of increased file size and processing time. We can show in particular, that a slightly more sophisticated HTML coding can easily compensate for the increase in file size when including line graphics and images.

tr-2004-119.pdf
PDF file

Details

TypeInproceedings
Pages8
NumberMSR-TR-2004-119
InstitutionMicrosoft Research
> Publications > Presentable Document Format: Improved On-demand PDF to HTML Conversion