[ldapvi] ldapvi and utf-8

David Lichteblau david at lichteblau.com
Sun Jul 30 16:45:20 CEST 2006


Hi,

Quoting Stefan Pfetzing (dreamind at dreamind.de):
> I've just discovered that when I add entries with utf-8 multibyte  
> characters in it (with vim and :set encoding=utf8) ldapvi will only  
> show the DN correctly, but any attribute that contains utf-8  
> characters, it will only display base64 encoded stuff.
> 
> Is this behaviour intended or just a bug?

well, the strict use of Base 64 for all non-ASCII data is what is
currently implemented, but I am aware that it is not a particularly
convenient thing to do for a users's perspective.  Fixing it is slightly
tricky though, so I had postponed that and decided to wait for someone
to turn up and complain about it.  I suppose that day has come. :-)

ldapvi syntax already deviates from standard LDIF to allow mory data to
be represented directly instead of as Base 64.  Currently, files can
even contain arbitrary binary data as far as ldapvi is concerned if the
appropriate attribute encoding is used.


First of all, however, a braindump of the problems I can think of:

The moment we start using UTF-8, we should actually look at the Posix
locale.  And if we do that, we would have to not just send UTF-8, but
would have to recode data we get from the server into whatever encoding
the user selected with his locale.

Having debugged too many bugs due to incorrectly recoded characters in
my life already, the idea of producing ldapvi files in random character
sets of the users choosing horrifies me.  I do not even want to think
about the possibility of an editor starting to second-guess the choice
of encoding (watch your entire DIT being recoded) or of a user putting
unrelated binary data into the same file.

Also, recoding characters from UTF-8 to something else can only be done
if the input is actually known to be UTF-8 in the first place.  To do
that correctly, ldapvi would have to read schema from the server and
look up every attribute type's syntax before being able to write an
attribute value into the file.  Not a good plan.


So much for the theory.  Here's a possible workaround:

  * New command line option --encoding with possible values:
      `ascii', `utf-8', or `binary'.
  * The current behaviour would be retained as mode "ascii".
  * New mode `binary' would put all bytes into the file exactly as
    received from libldap.
  * New mode `utf-8' would use a heuristic and put attribute values into
    the file in verbatim if they happen to be valid UTF-8 and revert to
    Base 64 otherwise.
  * For safety, if either `binary' or `utf-8' is specified, look at the
    encoding specified by the user's locale.  In mode `binary', proceed
    only if it is one of the encodings that are effectively 8 bit clean,
    like ISO-8859-*.  In mode `utf-8', allow UTF-8 in addition to
    ISO-8859.  Otherwise, quit immediately with a fatal error.
  * For extra safety, ldapvi will put -*- coding: utf-8 -*- into the
    files for the benefit of emacs.  And of course, whatever syntax VIM
    has for the same thing, too.


Would this command line option fix the problem for you?


d.

PS In case you are wondering, if DNs are actually put into the file
without Base 64 currently, that is a "bug" which ISTR "fixing" in my git
archive recently.  So now even DNs would end up as Base 64.  I suppose
that makes adding the --encoding option even more urgent...



More information about the ldapvi mailing list