[gtkada] UTF-8 in GtkAda 2.0

Jacob Sparre Andersen sparre at nbi.dk
Thu Mar 27 13:20:06 CET 2003


Preben Randhol wrote:

> But if I have a program which accepts input of several
> different encodings and I get/set the string from a GEntry
> widget what is returned/expected. Is it utf-8 strings?

Yes (if I am not mistaken).

> Second question: With a normal Latin1 (or Latin7 etc...)
> string : "the house", I can say noun (5 .. noun'last) to
> only get "house", but if it is in utf-8 then I must
> convert the string from uft-8 to Latin1 (or Latin7 etc...)
> before I can do this right?

In the concrete case you wouldn't have to do any encoding
conversions since ISO-646 encoded strings is a proper subset
of UTF-8 encoded ISO-10646 strings, but in general you would
have to convert your string to an array of ISO-10646
characters or count the ISO-10646 characters to split the
string properly.

> If I split a utf-8 string in the wrong place they won't
> make any sense. Have I understood it correctly?

Yes.

> I need more or less to make a :
>
>    type Word_String is
>       record
>          String : Unbounded_String;
>          Encoding : Encoding_Type;
>       end record;
>
> to keep track of which encoding the string is in?

Yes.

> Third question:
>
> What happens if I have two utf-8 strings an concate them? I mean :
> "the " and  "house" and you do
>
>    noun :string := "the " & "house"
>
> will this always produce a valid utf-8 string or can on
> risk that is is invalid?

That will always produce a new valid UTF-8 encoded string.

Jacob
-- 
LDraw.org Parts Tracker FAQ:
               http://www.ldraw.org/library/tracker/ref/faq/



More information about the gtkada mailing list