XML Directories, New Trends and Opportunities

XML.org Newsletter, Volume 1, Issue 3

By Moshe Shadmon and Neal Sample - RightOrder, Inc.

 

Who Needs Directories?

Directories are the hub around which virtually all middleware services spin. They have the important task of storing and delivering critical information to people, processes, resources and groups. Having this information in a common storage area means that various distributed users and applications can access a consistent and comprehensive source for critical data. Directories are somewhat different from general databases. Directories are optimized for reads, rather than transactions. They frequently contain institutional and personal information for use by myriad applications. Directories will be among the most critical services offered in future information technology environments.

Directories or Relational Databases?

 

When should you use choose a directory implementation or a full-fledged relational database (RDBMS)? Directories are usually the right choice when confronted with hierarchical information such as Human Resources systems, UDDI, pervasive computing, product catalogs, etc. There are times an RDBMS is still important. A comparison of the features of each shows their respective strengths.

 

Relational Databases

Directories

Strongly typed and structured

Strongly typed and structured

Objects have a complex relationship to each other

Objects are nested in hierarchies

Read/write transaction performance is critical

Directory entries are “read mostly”

 

The database is generally centralized – expensive to distribute + query/update

Can be highly distributed - reasonable cost of distribution and replication

Schema is completely user defined for flexibility

 

Fixed “core schema,” controls directory hierarchy (e.g., country/organization/people)

 

Schema for individual objects is highly extensible

Can deal with complex relationships between objects

Representing non-hierarchical relationships is expensive

Good for data analysis and report generation

Good for top-down searches of logical hierarchies

Relationships are known to the query processor

Relationships can be explored in the query processing.

 

These features indicate that some applications not suitable for directories, especially when there is a need for information linking. Examples of these applications are Enterprise Resource Planning (ERP) and accounting systems.

 

Directories are Power

 

The Lightweight Directory Access Protocol (LDAP) is currently integrated into many products, from mail directories to public-key infrastructures to network components. And much more is coming. Applications that integrate directories have been successful for many reasons. One reason is that a core schema enables a common access protocol, thus many applications can leverage the same data source. Clients can have basic directory knowledge “built in.”

 

However, LDAP as a central component of directories is somewhat limiting. It’s supposed to be an access protocol for directory data, but it demands that input and output data conforms to a fairly strict construction. Directories are limited to some degree by the prosaic protocol used to access the underlying information. What is missing from directories is the fundamentally enabling nature of a self-describing system.

 

The combination of directory services and XML is the final step in creating directory-enabled applications within web service architectures. XML is a self-describing language for data of any type. But XML implies much more than just the ability to wrap data into a convenient bundle. XML technologies have been developed to deal with ragged and incomplete data in a robust manner. For instance, validating parsers can make guarantees to the application about the elements in a document. Is the question of validity relevant to LDAP? Perhaps, but peripherally at best.

 

The implications of querying and using directory packaged as XML present new opportunities to applications. No longer do applications have to rely on even the “core schema” of a directory to be effective. Directory schemas may change, but XML enabled applications are relatively immune to the effects. Also, applications written for one directory can be used with another directory by using common tools (such as XSLT) to bridge the gap may use. These approaches are natural to XML applications, but similar ideas are not part of LDAP.

 

 

 

First and foremost, directory technology needs to efficiently support the hierarchal nature of the data. There should also be support for schema extensibility and evolution. Directories should also be scalable. None of these questions (save perhaps scalability) are central to LDAP. Directory services have been driven by the access protocol for those directories, putting the proverbial cart before the horse.

 

With a more flexible framework around the directory and driving directory development, the channel is clear for robust directory applications. For example, XPath can be used to address parts of an XML document and provides the mechanism needed to support hierarchical directory paths.

 

Why XML Directories?

 

Directory Services Markup Language (DSML) is a markup language for representing directory services in XML. DSML is a key enabler for the next stage of flexible and robust directory applications. DSML is being established as an open standard, so that developers and vendors will be able to adopt it into their systems. Two questions remain for DSML:

§         Are there clear reasons DSML is superior to LDAP for directories?

§         Even if DSML is a superior option, is LDAP too entrenched?

 

By now, the answer to the first question should be clear. DSML, as a flavor of XML, benefits from the both the nature XML’s flexibility and the plethora of available tools. On the flexibility side, DSML directories can evolve and change and grow without impacting significantly impacting dependent applications. Likewise, applications can maintain broad applicability because they accept a flexible input set.

 

In terms of tools, there’s no question that developing using a ubiquitous standard such as XML has clear benefits. There are myriad tools to choose from, at all levels of deployment, for XML application developers. There are XML libraries available for every significant platform and language. Because the tools are designed for such much larger, more general set of developers, XML is naturally better supported than LDAP.

 

The second question still remains; is LDAP too entrenched? A quick look at the founding partners of the DSML initiative includes the major directory product vendors. Sun, Novell, IBM, Oracle, and Microsoft are among that founding group, which now includes more than 25 members [http://www.dsml.org/participants.html].

 

With strong backers clearly in place, the final question concerns existing installations of LDAP directories. Will they continue to hobble future developments because of the cost of abandoning or reengineering them? It is doubtful that even the installed base of LDAP servers can ebb the tide of DSML. Already there are wrappers to transform LDAP sources into DSML servers. One example can be found at [http://www.dsmltools.org/].

 

Summary

 

XML is a proven technology that has been used successfully at all tiers of the business application hierarchy. Likewise, directory services have been important component in many applications. Current directory components consist are frequently LDAP servers, which are not XML-enabled.

 

DSML bridges the gap between directory services and XML applications in a clean, robust way. DSML for directories, as an alternative to LDAP, enables a new level of application flexibility and allows mature XML tools and practices to add to the value of directory components.

 

The coupling of directories and XML imposes new requirements for storage and retrieval of data.  This coupling clears the way for new types of applications that will demand larger data volumes with better performance and scalability.  Today, LDAP directories thrive with fixed, pre-defined schema.  The use of XML will force directories to deal with the non-trivial issues of schema evolution and change.  All of this suggests that there is a need for new technologies to supportdirectory/hierarchical/XML based data structures.

 

In the next article we will approach these issues by discussing and comparing some of the existing index technologies with some of the new approaches to index structured and semi-structured data.