HPlogo HP-UX Reference Volume 4 of 5 > i

iconv(3C)

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

NAME

iconv, iconv_open, iconv_close — codeset conversion routines

SYNOPSIS

#include <iconv.h> iconv_t iconv_open(const char *tocode, const char *fromcode); size_t iconv( iconv_t cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft ); int iconv_close(iconv_t cd);

Remarks

These interfaces conform to the XPG4 standard, and should be used instead of the the 9.0 iconv interfaces, such as, iconvopen(), iconvclose(), iconvsize(), iconvlock(), ICONV(), ICONV1(), and ICONV2().

Refer to the white paper entitled, iconv Customization, for an understanding of the conversion process. The white paper explains how iconv uses tables and methods to do the conversions. This white paper also shows you how to customize your own conversions. The white paper is in /usr/share/doc/iconv.ps (postscript file) and /usr/share/doc/iconv.txt (plain ASCII text file).

DESCRIPTION

The entries in the config.iconv file are the set of conversions that are supported by iconv(3C). The first two columns correspond to the fromcode and tocode names. These names may be directly used or their corresponding aliases may be used as parameters to iconv_open().

iconv_open()

Returns a conversion descriptor that describes a conversion from the codeset specified by the string pointed to by the fromcode argument to the codeset specified by the tocode argument.

A conversion descriptor remains valid in a process until that process closes it.

The fromcode and tocode arguments must have a corresponding entry in the configuration file /usr/lib/nls/iconv/config.iconv. (See FILES section.)

iconv()

Converts a sequence of characters from one codeset that is contained in the array specified by inbuf, into a sequence of corresponding characters in another codeset, contained in the array specified by outbuf. The codesets are those specified in the iconv_open() call that returned the conversion descriptor cd. The inbuf argument points to a variable that points to the first character in the input buffer and inbytesleft indicates the number of remaining bytes in the buffer being converted. The outbuf argument points to a variable that points to the first available byte in the output buffer and outbytesleft indicates the number of the available remaining bytes in the buffer.

If a sequence of input bytes does not form a valid character in the specified codeset, conversion stops after the previous successfully converted character. If the input buffer ends with an incomplete character or shift sequence (see section on Special Usage), conversion stops after the previous successfully converted character. If the output buffer is not large enough to hold the entire converted output, conversion stops just prior to the character that would cause the output buffer to overflow. The variable pointed to by inbuf is updated to point to the byte following the last byte successfully used in the conversion. The value pointed to by inbyesleft is reduced to reflect the number of bytes still not converted in the input buffer. The variable pointed to by outbuf is updated to point to the byte following the last byte of converted output data. The value pointed to by outbytesleft is reduced to reflect the number of bytes still available in the output buffer.

If iconv() encounters a character in the input buffer that is legal but for which an identical character does not exist in the target codeset, iconv() maps this character to a pre-defined character, called the "galley character" that is defined at the time of table generation. (See genxlt(1)).

iconv_close()

Deallocates the conversion descriptor cd and all other associated resources allocated by iconv_open().

APPLICATION USAGE

iconv_open(), iconv() and iconv_close() are thread-safe. These interfaces are not async-cancel-safe. A cancellation point may occur when a thread is executing these interfaces.

Portable applications must assume that conversion descriptors are not valid after calls to any of the exec functions.

Special Usage

In state-dependent encodings, the characters are interpreted depending on "state" of the input. State shifts occur when a specific sequence of bytes are seen in the input. These sequences will change the way subsequent characters are interpreted (that is, initially the characters may be single-byte characters, after a state shift, subsequent characters may be interpreted as two-byte characters). For state-dependent encodings, the conversion descriptor after iconv_open() is in a codeset-dependent initial shift state, ready for immediate use with iconv().

For state-dependent encodings, the conversion descriptor cd is placed into its initial shift state by a call to iconv() for which the inbuf is a null pointer, or for which inbuf points to a null pointer. When iconv() is called in this way, and outbuf is not a null pointer or a pointer to a null pointer, and outbytesleft points to a positive value, iconv() places the byte sequence to change the output buffer to its initial shift state. If the output buffer is not large enough to hold the entire reset sequence, iconv() fails and sets errno to [E2BIG]. Subsequent calls with inbuf set to other than a null pointer or a pointer to a null pointer cause the conversion to take place from the current state of the conversion descriptor.

For state-dependent encodings, the conversion descriptor is updated to reflect the shift state in effect at the end of the last successfully converted byte sequence.

RETURN VALUE

iconv_open()

Upon successful completion, iconv_open() returns a conversion descriptor for use on subsequent calls to iconv(). Otherwise iconv_open() returns (iconv_t)-1 and sets errno to indicate the error.

iconv()

iconv() updates the variables pointed to by the arguments to reflect the extent of conversion, and returns the the number of non-identical conversions performed. If the entire string in the input buffer is converted, the value pointed to by inbytesleft is zero. If an error occurs, iconv() returns (size_t)-1 and sets errno to indicate the error.

iconv_close()

Upon successful completion, iconv_close() returns a value of zero. Otherwise it returns -1 and sets errno to indicate the error.

ERRORS

iconv_open() fails if any of the following conditions are encountered:

[ENOMEM]

Insufficient storage space is available.

[EINVAL]

The conversion specified by the fromcode and tocode is not supported, or the table or method specified in the configuration file could not be read or loaded correctly. This error will also occur if the configuration file itself is faulty.

iconv() fails if any of the following conditions are encountered:

[EILSEQ]

Input conversion stopped due to an input character that does not belong to the input codeset, or if the conversion table does not contain an entry corresponding to this input character and a galley character was not defined for that particular table.

[E2BIG]

Input conversion stopped due to lack of space in the output buffer.

[EINVAL]

Input conversion stopped due to an incomplete character or shift sequence at the end of the input buffer.

[EBADF]

The cd argument is not a valid open conversion descriptor.

iconv_close() fails if any of the following conditions are encountered:

[EBADF]

The conversion descriptor is invalid.

EXAMPLES

The following example shows how the iconv(3C) interfaces maybe used for conversions.

#include <iconv.h> #include <errno.h> main() { ... convert("roman8", "iso88591", fd); ... } int convert(tocode, fromcode, Input) char *tocode; /* tocode name */ char *fromcode /* fromcode name */ int Input; /* input file descriptor */ { extern void error(); /* local error message */ iconv_t cd; /* conversion descriptor */ unsigned char *table; /* ptr to translation table */ int bytesread; /* num bytes read into input buffer */ unsigned char inbuf[BUFSIZ]; /* input buffer */ unsigned char *inchar; /* ptr to input character */ int inbytesleft; /* num bytes left in input buffer */ unsigned char outbuf[BUFSIZ]; /* output buffer */ unsigned char *outchar; /* ptr to output character */ int outbytesleft; /* num bytes left in output buffer */ size_t ret_val; /* number of conversions */ /* Initiate conversion -- get conversion descriptor */ if ((cd = iconv_open(tocode, fromcode)) == (iconv_t)-1) { error(FATAL, BAD_OPEN); } inbytesleft = 0; /* no. of bytes converted */ /* translate the characters */ for ( ;; ) { /* * if any bytes are leftover, they will be in the * beginning of the buffer on the next read(). */ inchar = inbuf; /* points to input buffer */ outchar = outbuf; /* points to output buffer */ outbytesleft = BUFSIZ; /* no of bytes to be converted */ if ((bytesread = read(Input, inbuf+inbytesleft, (size_t)BUFSIZ-inbytesleft)) < 0) { perror("prog"); return BAD; } if (!(inbytesleft += bytesread)) { break; /* end of conversions */ } ret_val = iconv(cd, &inchar, &inbytesleft, &outchar, &outbytesleft); if (write(1, outbuf, (size_t)BUFSIZ-outbytesleft) < 0) { perror("prog"); return BAD; } /* iconv() returns the number of non-identical conversions * performed. If the entire string in the input buffer is * converted, the value pointed to by inbytesleft will be * zero. If the conversion stopped due to any reason, the * value pointed to by inbytesleft will be non-zero and * errno is set to indicate the condition. */ if ((ret_val == -1) && (errno == EINVAL)) { /* Input conversion stopped due to an incomplete * character or shift sequence at the end of the * input buffer. */ /* Copy data left, to the start of buffer */ memcpy((char *)inbuf, (char *)inchar, (size_t)inbytesleft); } else if ((ret_val == -1) && (errno == EILSEQ)) { /* Input conversion stopped due to an input byte * that does not belong to the input codeset. */ error(FATAL, BAD_CONVERSION); } else if ((ret_val == -1) && (errno == E2BIG)) { /* Input conversion stopped due to lack of space * in the output buffer. inbytesleft has the * number of bytes to be converted. */ memcpy((char *)inbuf, (char *)inchar, (size_t)inbytesleft); } /* Go back and read from the input file. */ } /* end conversion & get rid of the conversion table */ if (iconv_close(cd) == BAD) { error(FATAL, BAD_CLOSE); } return GOOD; }

WARNINGS

If you use iconv(3C) and compile/link your application archive, please note that iconv(3C) has a dependency on libdld.sl that will require a change to the compile/link command:

Compile :

cc -Wl,-a,archive -Wl,-E -Wl,+n -l:libdld.sl -o outfile source

Or compile with CCOPTS and LDOPTS:

export CCOPTS="-Wl,-a,archive options -Wl,-E -l:libdld.sl" export LDOPTS="options -E +n -l:libdld.sl" cc -o outfile source

The option -Wl,-a,archive is positionally dependent and should occur at the beginning of the compile line. For optimum compatibility in future releases, you should avoid using archive libc with other shared libraries except for libdld.sl as needed above.

There is a corner-case situation for multi-byte characters that is not correctly handled by iconv(3C). If the last character in the file being converted is an invalid multi-byte character, iconv(3C) returns EINVAL instead of EILSEQ. The application can get around this by checking whether EOF is reached or if this is the last buffer being converted. In this case, EINVAL should be treated as EILSEQ.

AUTHOR

iconv was developed by HP.

FILES

/usr/lib/nls/iconv/tables

Directory containing tables used for conversion.

/usr/lib/nls/iconv/methods

Directory containing methods used for conversion.

/usr/lib/nls/iconv/config.iconv

Configuration file is used by iconv_open() to check if the requested conversion is supported, and if so, to determine which table and/or method is used for the conversion.

STANDARDS CONFORMANCE

iconv_open(): XPG4

iconv(): XPG4

iconv_close(): XPG4

© Hewlett-Packard Development Company, L.P.