1. Purpose 2. Copying |
3. Requirements 4. Notes 5. Downloading |
* No warranty. You are free to modify this source and to
* distribute the modified sources, as long as you keep the
* existing copyright messages intact and as long as you
* remember to add your own copyright markings.
* You are not allowed to distribute the program or modified versions
* of the program without including the source code (or a reference to
* the publicly available source) and this notice with it.
The usability of all-about-layout HTML pages usually is very low. Thus, to make the resulting pages usable and readable, pdf2simplehtml ignores most of the layout and only concentrates on making the text content readable.
In a PDF document, spaces are not always saved as text. Sometimes, each word (or even each letter!) is individually positioned on the page. pdf2simplehtml tries to guess word breaks, line breaks, and new paragraphs, by analyzing the distances of lines and words. Sometimes it does not work well, but most of time it does. It does not understand about indentations yet. It cannot guess about headers either. It tries to convert fonts and text sizes.
In the author's opinion, pdf2simplehtml converts PDF documents into better readable HTML pages than Google's cache does.
Generated from
progdesc.php (last updated: Wed, 05 Jul 2006 07:26:02 +0300)
with docmaker.php (last updated: Sun, 12 Jun 2005 06:08:02 +0300)
at Wed, 05 Jul 2006 07:27:41 +0300