[ldapvi] ldapvi and utf-8

Stefan Pfetzing dreamind at dreamind.de
Sun Jul 30 17:00:35 CEST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David,

Am 30.07.2006 um 16:45 schrieb David Lichteblau:
> Quoting Stefan Pfetzing (dreamind at dreamind.de):
>> I've just discovered that when I add entries with utf-8 multibyte
>> characters in it (with vim and :set encoding=utf8) ldapvi will only
>> show the DN correctly, but any attribute that contains utf-8
>> characters, it will only display base64 encoded stuff.
>>
>> Is this behaviour intended or just a bug?
>
> well, the strict use of Base 64 for all non-ASCII data is what is
> currently implemented, but I am aware that it is not a particularly
> convenient thing to do for a users's perspective.  Fixing it is  
> slightly
> tricky though, so I had postponed that and decided to wait for someone
> to turn up and complain about it.  I suppose that day has come. :-)

Yep. :)

> ldapvi syntax already deviates from standard LDIF to allow mory  
> data to
> be represented directly instead of as Base 64.  Currently, files can
> even contain arbitrary binary data as far as ldapvi is concerned if  
> the
> appropriate attribute encoding is used.

Yes I know that, I already have some binary "blobs" in my ldap tree.

> First of all, however, a braindump of the problems I can think of:
>
> The moment we start using UTF-8, we should actually look at the Posix
> locale.  And if we do that, we would have to not just send UTF-8, but
> would have to recode data we get from the server into whatever  
> encoding
> the user selected with his locale.

Hm, is this recoding in ldapvi really neccessarry? IMHO an editor  
(like vim) can already do this.
For vim one would use:

:let &termencoding = &encoding
:set encoding=utf-8

This would let vim recode your current file. (needs iconv support in  
vim)

> Having debugged too many bugs due to incorrectly recoded characters in
> my life already, the idea of producing ldapvi files in random  
> character
> sets of the users choosing horrifies me.  I do not even want to think
> about the possibility of an editor starting to second-guess the choice
> of encoding (watch your entire DIT being recoded) or of a user putting
> unrelated binary data into the same file.

Well vim does not automatically "recodes" your data... you'll have to  
do that on your own.
Maybe one could add some option to have some extra command line  
parameters in the vim ($EDITOR) call, so the above vim snipplet could  
be added if wanted.

> Also, recoding characters from UTF-8 to something else can only be  
> done
> if the input is actually known to be UTF-8 in the first place.  To do
> that correctly, ldapvi would have to read schema from the server and
> look up every attribute type's syntax before being able to write an
> attribute value into the file.  Not a good plan.

Hm, sounds complicated...

> So much for the theory.  Here's a possible workaround:
>
>   * New command line option --encoding with possible values:
>       `ascii', `utf-8', or `binary'.
>   * The current behaviour would be retained as mode "ascii".
>   * New mode `binary' would put all bytes into the file exactly as
>     received from libldap.
>   * New mode `utf-8' would use a heuristic and put attribute values  
> into
>     the file in verbatim if they happen to be valid UTF-8 and  
> revert to
>     Base 64 otherwise.
>   * For safety, if either `binary' or `utf-8' is specified, look at  
> the
>     encoding specified by the user's locale.  In mode `binary',  
> proceed
>     only if it is one of the encodings that are effectively 8 bit  
> clean,
>     like ISO-8859-*.  In mode `utf-8', allow UTF-8 in addition to
>     ISO-8859.  Otherwise, quit immediately with a fatal error.
>   * For extra safety, ldapvi will put -*- coding: utf-8 -*- into the
>     files for the benefit of emacs.  And of course, whatever syntax  
> VIM
>     has for the same thing, too.
>
>
> Would this command line option fix the problem for you?

Think so, but I'm still not happy with having ldapvi recode the  
encoding.

> d.
>
> PS In case you are wondering, if DNs are actually put into the file
> without Base 64 currently, that is a "bug" which ISTR "fixing" in  
> my git
> archive recently.  So now even DNs would end up as Base 64.  I suppose
> that makes adding the --encoding option even more urgent...

huh! Sounds definetely not like something one would like to have.
And I already have entries with an utf-8 DN in my Ldap tree.

bye

Stefan

- --
         http://www.dreamind.de/
Oroborus and Debian GNU/Linux Developer.



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (Darwin)

iD8DBQFEzMmUi50xCpfDmMsRAiCsAKCuZa5qeBvJTM/DkjGOrSVLSxUEEACbBF1e
+AqVKa8ITAMWUVeX3sKZBnE=
=aLnJ
-----END PGP SIGNATURE-----



More information about the ldapvi mailing list