Simple pdf to html converter
Converts PDF files into simple HTML.
This code has been written by Joel Yliluoma, a.k.a. Bisqwit,
and is distributed under the following terms:
GNU make is probably required.
swftools is required -- particularly, the source code of its "pdf2swf/xpdf" directory and the "libpdf.a" file which is formed when the source code is compiled. Edit the Makefile and change the XPDF setting to point to your xpdf directory within the swftools source code directory.
xpdf and swftools are not required on runtime.
Full reconstruction of HTML pages from PDF documents is not possible. PDF is merely a sequence of rendering instructions; i.e., it is all about layout, nothing about structure. In contrast, HTML is all about structure, little about layout.
The usability of all-about-layout HTML pages usually is very low. Thus, to make the resulting pages usable and readable, pdf2simplehtml ignores most of the layout and only concentrates on making the text content readable.
In a PDF document, spaces are not always saved as text. Sometimes, each word (or even each letter!) is individually positioned on the page. pdf2simplehtml tries to guess word breaks, line breaks, and new paragraphs, by analyzing the distances of lines and words. Sometimes it does not work well, but most of time it does. It does not understand about indentations yet. It cannot guess about headers either. It tries to convert fonts and text sizes.
In the author's opinion, pdf2simplehtml converts PDF documents into better readable HTML pages than Google's cache does.
Date (Y-md-Hi) acc Size Name 2006-0705-0726 r-- 442726 pdf2simplehtml-0.9.0-x86linuxbinary-static.zip 2006-0705-0726 r-- 8294 pdf2simplehtml-0.9.0.tar.bz2← Back to the source directory index at Bisqwit's homepage