.NET Components
Chilkat .NET Components for Email, Zip Compression, Encryption, XML, S/MIME, and Character Encoding Conversion

BACK

Chilkat::CharsetConvert Class

Character encoding conversion class. Convert text or HTML documents from one charset to another. All major character encodings are supported, including utf-8, ucs-2 (unicode) iso-8859-*, windows-*, shift-jis, iso-2022-jp/kr/cn, euc-jp/kr/cn, gb2312, big5, tis-620, and more.

Base Classes: Object
  Properties Description  
    FromCharset Character encoding of source data, such as "iso-8859-1", "utf-8", or "euc-kr".

 
    LastError XML error information for the last failed method call.

 
    LastInputAsHex The input data for the most recent character encoding conversion in hexidecimal format. Useful for debugging.

 
    LastInputAsQP The input data for the most recent character encoding conversion in quoted-printable format. Useful for debugging.

 
    LastMethodFailed True if the last method call failed.

 
    LastOutputAsHex The output data for the most recent character encoding conversion in hexidecimal format. Useful for debugging.

 
    LastOutputAsQP The output data for the most recent character encoding conversion in quoted-printable format. Useful for debugging.

 
    SaveLast If true, the component will save the input and output data for the last conversion, which can then be retrieved in quoted-printable format.

 
    ToCharset Character encoding of destination, such as "iso-8859-1", "utf-8", or "euc-kr".

 
    Version The version of the component, such as "9.0.0"

 
  Methods Description  
    ConvertData Converts text data from one charset to another. Returns the converted data. The text is converted according to the FromCharset and ToCharset properties.

 
    ConvertFile Convert a file from one character encoding to another. The file is converted according to the FromEncoding and ToEncoding properties. Returns true if the conversion is successful.

 
    ConvertFromUnicode Convert Unicode wide-character data to a multibyte character encoding such as iso-8859-1, utf-8, shift-jis, big5, gb2312, tis-620, etc.

 
    ConvertHtml Converts HTML text to another charset. The "from" charset is determined by parsing out the charset information found in a META-tag, or if that is not present, it uses the FromCharset property. The HTML is converted to the charset specified by the ToCharset property. The META-tag is updated, or added if it previously did not exist. The converted HTML is returned. If the HTML could not be converted, a NULL is returned.

 
    ConvertHtmlFile Converts HTML text to another charset. The "from" charset is determined by parsing out the charset information found in a META-tag, or if that is not present, it uses the FromCharset property. The HTML is converted to the charset specified by the ToCharset property. The META-tag is updated, or added if it previously did not exist. Returns true if successful, otherwise false.

 
    ConvertToUnicode Convert multibyte character data to a Unicode wide character string. Multibyte character encodings include iso-8859-*, euc-*, iso-2022-*, windows-125*, utf-8, and anything else except for ucs-2 or Unicode.

 
    DetectCharset Tries to detect the charset by examining the character data. This method is more accurate when more data is available to examine, but is never 100% accurate.

 
    DownloadHtml A convenient method to download the HTML from a URL. It does not download all external referenced parts of the Web page -- it only downloads the HTML text of a page. A NULL is returned if the Web page could not be downloaded.

 
    From_BIG5 Sets the FromCharset property to Chinese Big5

 
    From_EUC_CN Sets the FromCharset property to euc-cn

 
    From_EUC_JP Sets the FromCharset property to euc-jp

 
    From_EUC_KR Sets the FromCharset property to euc-kr

 
    From_GB2312 Sets the FromCharset property to Chinese GB2312

 
    From_ISO_2022_JP Sets the FromCharset property to iso-2022-jp

 
    From_ISO_2022_KR Sets the FromCharset property to iso-2022-kr

 
    From_ISO_8859_1 Sets the FromCharset property to iso-8859-1

 
    From_ISO_8859_2 Sets the FromCharset property to iso-8859-2

 
    From_ISO_8859_3 Sets the FromCharset property to iso-8859-3

 
    From_ISO_8859_4 Sets the FromCharset property to iso-8859-4

 
    From_ISO_8859_5 Sets the FromCharset property to iso-8859-5

 
    From_ISO_8859_6 Sets the FromCharset property to iso-8859-6

 
    From_ISO_8859_7 Sets the FromCharset property to iso-8859-7

 
    From_ISO_8859_8 Sets the FromCharset property to iso-8859-8

 
    From_ISO_8859_9 Sets the FromCharset property to iso-8859-9

 
    From_KOI8_R Sets the FromCharset property to koi8-r

 
    From_KOI8_U Sets the FromCharset property to koi8-u

 
    From_SHIFT_JIS Sets the FromCharset property to shift-jis (sjis)

 
    From_US_ASCII Sets the FromCharset property to us-ascii

 
    From_UTF_8 Sets the FromCharset property to utf-8

 
    From_Windows_1250 Sets the FromCharset property to Windows-1250

 
    From_Windows_1251 Sets the FromCharset property to Windows-1251

 
    From_Windows_1252 Sets the FromCharset property to Windows-1252

 
    From_Windows_1253 Sets the FromCharset property to Windows-1253

 
    From_Windows_1254 Sets the FromCharset property to Windows-1254

 
    From_Windows_1255 Sets the FromCharset property to Windows-1255

 
    From_Windows_1256 Sets the FromCharset property to Windows-1256

 
    From_Windows_1257 Sets the FromCharset property to Windows-1257

 
    From_Windows_1258 Sets the FromCharset property to Windows-1258

 
    GetHtmlCharset Parses HTML text and returns the charset, such as "iso-8859-1" found in the META-tag that specifies the document's charset. The string "unknown" is returned if the charset was not specified. This method does not try to detect the charset based by examining the character data (like DetectCharset) but simply looks for the META tag containing the charset information.

 
    GetHtmlFileCharset Parses HTML text and returns the charset, such as "iso-8859-1" found in the META-tag that specifies the document's charset. The string "unknown" is returned if the charset was not specified. This method does not try to detect the charset based by examining the character data (like DetectCharset) but simply looks for the META tag containing the charset information.

 
    ReadFile Convenience method for reading the entire contents of a file into memory.

 
    SaveLastError Save the XML error log for the last failed method call.

 
    To_BIG5 Sets the ToCharset property to Chinese Big5

 
    To_EUC_CN Sets the ToCharset property to euc-cn

 
    To_EUC_JP Sets the ToCharset property to euc-jp

 
    To_EUC_KR Sets the ToCharset property to euc-kr

 
    To_GB2312 Sets the ToCharset property to Chinese GB2312

 
    To_ISO_2022_JP Sets the ToCharset property to iso-2022-jp

 
    To_ISO_2022_KR Sets the ToCharset property to iso-2022-kr

 
    To_ISO_8859_1 Sets the ToCharset property to iso-8859-1

 
    To_ISO_8859_2 Sets the ToCharset property to iso-8859-2

 
    To_ISO_8859_3 Sets the ToCharset property to iso-8859-3

 
    To_ISO_8859_4 Sets the ToCharset property to iso-8859-4

 
    To_ISO_8859_5 Sets the ToCharset property to iso-8859-5

 
    To_ISO_8859_6 Sets the ToCharset property to iso-8859-6

 
    To_ISO_8859_7 Sets the ToCharset property to iso-8859-7

 
    To_ISO_8859_8 Sets the ToCharset property to iso-8859-8

 
    To_ISO_8859_9 Sets the ToCharset property to iso-8859-9

 
    To_KOI8_R Sets the ToCharset property to koi8-r

 
    To_KOI8_U Sets the ToCharset property to koi8-u

 
    To_SHIFT_JIS Sets the ToCharset property to shift-jis (sjis)

 
    To_US_ASCII Sets the ToCharset property to us-ascii

 
    To_UTF_8 Sets the ToCharset property to utf-8

 
    To_Windows_1250 Sets the ToCharset property to Windows-1250

 
    To_Windows_1251 Sets the ToCharset property to Windows-1251

 
    To_Windows_1252 Sets the ToCharset property to Windows-1252

 
    To_Windows_1253 Sets the ToCharset property to Windows-1253

 
    To_Windows_1254 Sets the ToCharset property to Windows-1254

 
    To_Windows_1255 Sets the ToCharset property to Windows-1255

 
    To_Windows_1256 Sets the ToCharset property to Windows-1256

 
    To_Windows_1257 Sets the ToCharset property to Windows-1257

 
    To_Windows_1258 Sets the ToCharset property to Windows-1258

 
    UnlockComponent Unlocks the component allowing for the full functionality to be used. Returns true if the unlock code is valid.

 
    VerifyData Verify that a memory buffer contains only characters that are valid for the specified character set. Returns true if valid.

 
    VerifyFile Verify that a file contains only characters that are valid for the specified character set. Returns true if the file is valid.

 
    WriteFile Convenience function for writing text to a file.

 

Copyright 2002, Chilkat Software, Inc. All Rights Reserved.