Character set and encoding converter | ||
1. Purpose
Converts text stream in one character set and encoding to another.
This program is obsolete. I am not maintaining it anymore, as it has been obsoleted by GNU recode and htmlrecode. 2. PHP and character sets
Did you came here looking for character set conversions using PHP? I'm planning to write a sort of howto of that some day. Meanwhile, you can try my japcharset.php, which is an extempore character set converter to be used in php scripts handling multinational texts. It depends on the htmlrecode program, which you can get here too. You can also try the recode extension of php (php4-recode). 3. Supported character sets and encodingsUsage: charconv [-h] <incharset> <outcharset> Reads stdin, outputs stdout. Does incharset->outcharset conversion via unicode. -h = Input is html (THIS BUGS) Available character sets/encodings: - unihtml (&#number; codes) - utf8linux (with vt100 escape codes) - utf7mod (imap modified) - koi8r - jis-x-0201 - shift_jis - big5 - iso-8859-1 - iso-8859-2 - iso-8859-3 - iso-8859-4 - iso-8859-5 - iso-8859-6 - iso-8859-7 - iso-8859-8 - iso-8859-9 - iso-8859-10 - iso-8859-13 - iso-8859-14 - iso-8859-15 - cp437 - cp737 - cp775 - cp850 - cp852 - cp855 - cp857 - cp860 - cp861 - cp862 - cp863 - cp864 - cp865 - cp866 - cp869 - cp874 - cp1250 - cp1252 - cp1254 - cp1256 - cp1258 - cp1251 - cp1253 - cp1255 - cp1257 - cp856 - cp1006 - cp424 - roman - romanian - iso-2022-jp - utf8 - utf7 - euc-jp Typoes are allowed to some degree in the character set names, and some general aliases like latin* and iso* are known. 4. Copying
charconv has been written by Joel Yliluoma, a.k.a.
Bisqwit, and is distributed under the terms of the General Public License (GPL). If you want to make your own converter or just study how something works, you might still want to download this program. The package contains plain TXT files describing the character sets, and there are .cc files for each different encoding. 5. Examplesoktober:~/src/charconv$ echo '�iti tykk�� oliivi�ljyst�'|charconv latin1 utf7 +AMQ-iti tykk+AOQA5A oliivi+APY-ljyst+AOQ oktober:~/src/charconv$ echo '+AMQ-iti tykk+AOQA5A oliivi+APY-ljyst+AOQ'|charconv utf7 unihtml Äiti tykkää oliiviöljystä oktober:~/src/charconv$ echo 'pikachu' | sed -f /WWW/src/kr2k.sed | charconv sjis utf8 ぴかちゅ oktober:~/src/charconv$ echo -e '\33$B$P$+\33(B' | charconv iso-2022-jp unihmtl Charconv: Warning: Assuming 'unihmtl' means 'unihtml' ばか oktober:~/src/charconv$ echo '�������' | charconv cp1251 koi8r Charconv: Warning: Assuming 'koi8r' means 'koi8-r' ��������� 6. Requirements
charconv has been written in C++, utilizing the standard template library. The hashes the program uses have been heavily optimized for both size and speed, with size being the top priority. The compilation takes lots of memory and time therefore. GNU make is required. I have g++ version 3.0.1, and charconv compiles without warnings (except some signed/unsigned mismatches). Some parts of makefiles have been generated with a php script (included in the archive). If you want to regenerate them, you need PHP 4 too. 7. See also
GNU Recode:
This recoding library converts files between various coded character
sets and surface encodings. When this cannot be achieved exactly, it
may get rid of the offending characters or fall back on approximations.
The library recognises or produces more than 300 different character
sets and is able to convert files between almost any pair.
Most RFC 1345 character sets, and all
`libiconv'
character sets, are supported.
The `recode' program is a handy front-end to the library. I have made an online version of it available for use for converting short amounts of data between encodings. If you are converting HTML pages, use htmlrecode instead. It handles them (and changes the character set) losslessly. If you are _not_ converting HTML encoding, use GNU recode. It might be more effecient than charconv. 8. DownloadingDownloading help
Date (Y-md-Hi) acc Size Name 2002-0902-0124 --- 1816 patch-charconv-1.1.2.6-1.1.2.7.sh.bz2 2002-0902-0125 --- 251018 charconv-1.1.2.7.tar.bz2 2002-0902-0124 --- 1329 patch-charconv-1.1.2.6-1.1.2.7.bz2 2002-0827-2209 --- 6132 patch-charconv-1.1.2.5-1.1.2.6.sh.bz2 2002-0827-2209 --- 251120 charconv-1.1.2.6.tar.bz2 2002-0827-2209 --- 5983 patch-charconv-1.1.2.5-1.1.2.6.bz2 2002-0812-0205 --- 244250 charconv-1.1.2.5.tar.bz2 2002-0812-0205 --- 2452 patch-charconv-1.1.2.4-1.1.2.5.bz2 2002-0811-0016 --- 244922 charconv-1.1.2.4.tar.bz2 2002-0811-0016 --- 2296 patch-charconv-1.1.2.3-1.1.2.4.bz2 2002-0802-1002 --- 244659 charconv-1.1.2.3.tar.bz2 2002-0802-1002 --- 1620 patch-charconv-1.1.2.2-1.1.2.3.bz2 2002-0712-1431 --- 244449 charconv-1.1.2.2.tar.bz2 2002-0712-1431 --- 2675 patch-charconv-1.1.2.1-1.1.2.2.bz2 2002-0602-0953 --- 243994 charconv-1.1.2.1.tar.bz2 2002-0602-0953 --- 1316 patch-charconv-1.1.2-1.1.2.1.bz2 2002-0601-0035 --- 244002 charconv-1.1.2.tar.bz2 2002-0601-0035 --- 1774 patch-charconv-1.1.1.1-1.1.2.bz2 2002-0601-0035 --- 17357 patch-charconv-1.0.0-1.1.2.bz2 2002-0527-1546 --- 4852 patch-charconv-1.1.1-1.1.1.1.bz2 2002-0428-1144 --- 2620 patch-charconv-1.1.0-1.1.1.bz2 2002-0314-2042 --- 7239 patch-charconv-1.0.3-1.1.0.bz2 2002-0130-0845 --- 6004 patch-charconv-1.0.2-1.0.3.bz2 2002-0122-2358 --- 6971 patch-charconv-1.0.1-1.0.2.bz2 2002-0121-0227 --- 1180 patch-charconv-1.0.0-1.0.1.bz2 2002-0121-0208 --- 221861 charconv-1.0.0.rar 2002-0121-0208 --- 234350 charconv-1.0.0.tar.bz2 2002-0121-0208 --- 4734 patch-charconv-0.0.13-1.0.0.bz2 2002-0121-0208 --- 13439 patch-charconv-0.0.10-1.0.0.bz2 2002-0112-1858 --- 7100 patch-charconv-0.0.12-0.0.13.bz2 2001-1017-0131 --- 2633 patch-charconv-0.0.11-0.0.12.bz2 2001-1008-0830 --- 5858 patch-charconv-0.0.10-0.0.11.bz2 2001-1008-0501 --- 229891 charconv-0.0.10.tar.bz2 2001-1008-0501 --- 18696 patch-charconv-0.0.9-0.0.10.bz2 2001-1008-0501 --- 117514 patch-charconv-0.0.2-0.0.10.bz2 2001-1006-1704 --- 7617 patch-charconv-0.0.8-0.0.9.bz2 2001-1006-0322 --- 97222 patch-charconv-0.0.7-0.0.8.bz2 2001-1005-0427 --- 4099 patch-charconv-0.0.6-0.0.7.bz2 2001-1005-0202 --- 2985 patch-charconv-0.0.5-0.0.6.bz2 2001-1003-1220 --- 3777 patch-charconv-0.0.4-0.0.5.bz2 2001-0927-2334 --- 4783 patch-charconv-0.0.3-0.0.4.bz2 2001-0925-0149 --- 1406 patch-charconv-0.0.2-0.0.3.bz2 2001-0925-0112 --- 124076 charconv-0.0.2.tar.bz2 2001-0925-0112 --- 5441 patch-charconv-0.0.1-0.0.2.bz2← Back to the source directory index at Bisqwit's homepage | ||