Tuesday, December 22, 2009

2.1 LDIF



[ Team LiB ]










2.1 LDIF



Most system
administrators prefer to use plain-text files for server
configuration information, as opposed to some binary store of bits.
It is more comfortable to deal with data in vi, Emacs, or notepad
than to dig though raw bits and bytes. Therefore, it seems fitting to
begin an exploration of LDAP internals with a discussion of
representing directory data in text form.



The LDAP Interchange Format
(LDIF), defined in RFC 2849, is a standard text file format for
storing LDAP configuration information and directory contents. In its
most basic form, an LDIF file is:




  • A collection of entries separated from each other by blank lines


  • A mapping of attribute names to values


  • A collection of directives that instruct the parser how to process
    the information



The first two characteristics provide exactly what is needed to
describe the contents of an LDAP directory. We'll
return to the third characteristic when we discuss modifying the
information in the directory in Chapter 4.



LDIF files are often used to import new data into your directory or
make changes to existing data. The data in the LDIF file must obey
the schema rules of your LDAP directory. You can think of the schema
as a data definition for your directory. Every item that is added or
changed in the directory is checked against the schema for
correctness. A schema
violation
occurs if the data does not
correspond to the existing rules.



Figure 2-1 shows a simple directory information
tree. Each entry in the directory is represented by an entry in the
LDIF file. Let's begin with the topmost entry in the
tree labeled with the distinguished name (DN)
dc=plainjoe,dc=org:



# LDIF listing for the entry dn: dc=plainjoe,dc=org
dn: dc=plainjoe,dc=org
objectClass: domain
dc: plainjoe



Figure 2-1. An LDAP directory tree



We can make a few observations about LDIF syntax
on the basis of this short listing:




  • Comments in an LDIF file begin with a pound
    character (#) at position one and continue to the
    end of the current line.


  • Attributes are listed on the lefthand side of the
    colon (:), and values
    are presented on the righthand side. The colon character is separated
    from the value by a space.


  • The dn
    attribute uniquely identifies the DN of the entry.




2.1.1 Distinguished Names and Relative Distinguished Names



It is important to realize that the full DN of an entry does not
actually need to be stored as an attribute within that entry, even
though this seems to be implied by the previous LDIF extract; it can
be generated on the fly as needed. This is analogous to how a
filesystem is organized. A file or directory does not store the
absolute path to itself from the root of the filesystem. Think how
hard it would be to move files if this were true.



If the DN is like the absolute path between the root of a filesystem
and a file, a relative distinguished name (RDN) is like a filename.
We've already seen that a DN is formed by stringing
together the RDNs of every entity from the element in question to the
root of the directory tree. In this sense, an RDN works similarly to
a filename. However, unlike a filename, an RDN can be made up of
multiple attributes. This is similar to a compound index in a
relational database system in which two or more fields are used in
combination to generate a unique index key.



While a multivalued RDN is not shown in our example,
it is not hard to imagine. Suppose that there are two employees named
Jane Smith in your company: one in the Sales Department and one in
the Engineering Department. Now suppose the entries for these
employees have a common parent. Neither the
common name (cn)
nor the organizational unit
(ou) attribute is unique in its own right.
However, both can be used in combination to generate a unique RDN.
This would look like:



# Example of two entries with a multivalued RDN
dn: cn=Jane Smith+ou=Sales,dc=plainjoe,dc=org
cn: Jane Smith
ou: Sales
<...remainder of entry deleted...>

dn: cn=Jane Smith+ou=Engineering,dc=plainjoe,dc=org
cn: Jane Smith
ou: Engineering
<...remainder of entry deleted...>


For both of these entries, the first component of the DN is an RDN
composed of two values: cn=Jane
Smith+ou=Sales and cn=Jane
Smith+ou=Engineering.



In the multivalued RDN, the plus character
(+) separates the two attribute values used to form the RDN. What if
one of the attributes used in the RDN contained the + character? To
prevent the + character from being interpreted as a special
character, we need to escape it using a
backslash (\). The
other special characters that require a backslash-escape if used
within an attribute value are:




  • A space or pound (#) character occurring at the beginning of the
    string


  • A space occurring at the end of the string


  • A comma (,), a plus character (+), a
    double quote ("), a
    backslash (\), angle brackets (< or >),
    or a semicolon (;)



Although multivalued RDNs have their place, using them excessively
can become confusing, and can often be avoided by a better namespace
design. In the previous example, it is obvious that the multivalued
RDN could be avoided by creating different
organizationalUnits (ou) in the
directory for both Sales and Engineering, as illustrated in Figure 2-2. Using this strategy, the DN for the first
entry would be cn=Jane
Smith,ou=Sales,dc=plainjoe,dc=org. This design
does not entirely eliminate the need for multivalued RDNs; we could
still have two people named Jane Smith in the Engineering
organization. But that will occur much less frequently than having
two Jane Smiths in the company. Look for ways to organize namespaces
to avoid multivalued RDNs as much as is possible and logical.




Figure 2-2. A namespace that represents Jane Smith with a unique, multivalued RDN



One final note about DNs. RFC 2253
defines a method of unambiguously representing a DN using a UTF-8
string representation. This normalization process boils down to:




  • Removing all nonescaped whitespace surrounding the equal sign (=) in
    each RDN


  • Making sure the appropriate characters are escaped


  • Removing all nonescaped spaces surrounding the multi-value RDN join
    character (+)


  • Removing all nonescaped trailing spaces on RDNs



Therefore, the normalized version of:



cn=gerald carter + ou=sales,  dc=plainjoe ,dc=org


would be:



cn=gerald carter+ou=sales,dc=plainjoe,dc=org


Without getting ahead of ourselves, I should mention that the string
representation of a distinguished name is normally case-preserving,
and the logic used to determine if two
DNs are equal is usually a
case-insensitive match. Therefore:



cn=Gerald Carter,ou=People,dc=plainjoe,dc=org


would be equivalent to:



cn=gerald carter,ou=people,dc=plainjoe,dc=org


However, this case-preserving, case-insensitive behavior is based
upon the syntax and matching rules (see Section 2.2 later in this chapter) of
the attribute type used in each relative component of the complete
DN. So while DNs are often case-insensitive, do not assume that they
will always be so.



Subsequent examples use the normalized versions of all DNs to prevent
confusion, although I may tend to be lax on capitalization.





2.1.2 Back to Our Regularly Scheduled Program . . .



Going back to Figure 2-1, your next question is
probably, "Where did the extra lines in the LDIF
listing come from?" After all, the top entry in
Figure 2-1 is simply
dc=plainjoe,dc=org. But the LDIF lines
corresponding to this entry also contain an
objectClass: line and a dc:
line. These extra lines provide additional information stored inside
each entry. The next few sections answer the following questions:




  • What is an attribute?


  • What does the value of the objectClass attribute
    mean?


  • What is the dc attribute?


  • If dc=plainjoe,dc=org is the top entry in the
    directory, where is the entry for dc=org?









    [ Team LiB ]



    No comments: