stpeter@jabber.org
2001-03-11
An overview of the architecture of the Jabber instant messaging system and server.
The contents of this document are as follows:
The first application of Jabber technology is an instant messaging and presence system that originated in and continues to be developed by the open-source community. The Jabber instant messaging (IM) system is distinguished from existing IM services by several key features:
XML foundation
distributed network
open protocol and codebase
modular, extensible architecture
This document provides a high-level overview of the architecture of Jabber, focusing especially on the design of the Jabber open-source server, which is now at version 1.4. For information regarding Jabber's XML protocol, please refer to the companion document entitled Jabber Protocol Overview.
(Note: this document incorporates content from Jeremie Miller's "Jabber Architecture Overview" of 1999-11-19, Peter Millard's earlier version of this document dated 2000-04-25, and Peter Saint-Andre's "Jabber Technical White Paper" dated 2000-11-06.)
Jabber was designed in large measure along the same lines as the most successful messaging system on the Internet: namely, email. Thus Jabber communications are made possible by a distributed network of servers that use a common protocol, to which specialized clients connect to receive messages as well as to send messages to users of the same server or any other Jabber server that is connected to the Internet.
However, whereas email is a store-and-forward system, Jabber delivers messages in close to real time because the Jabber server (and, by extention, other Jabber users) knows when a particular user is online. This knowledge of availability is called presence and is the key enabler of instant messaging. Jabber combines these standard IM characteristics with two additional features that make Jabber unique. The first is an open protocol which enables interoperability among messaging systems. The second is a strong foundation in XML, which makes structured, intelligent messaging possible not only between human users but also between software applications.
Each of these key features is described in a bit more detail below, then expanded where appropriate within the body of this document.
Jabber uses a client-server architecture, not a client-to-client architecture as some instant messaging systems do. All Jabber messages and data from one client to another must go through the server. Any client is free to negotiate a direct connection to another client, but those connections are for application-specific usage only. There are even specific instances where this is encouraged, such as file transfers, but those instances are negotiated first within the context of a client-server framework.
Jabber's network architecture is modeled after that of the e-mail system. Each user has their local server which receives information for them, and the servers transfer messages and presence information among themselves. There can exist any number of Jabber servers which accept connections from clients as well as communicate to other Jabber servers. Each server functions independently of the others, and maintains its own user list. Any Jabber server can talk to any other Jabber server that is accessable via the Internet. A particular user is associated with a specific server (either through registration with a service provider or administrative setup within an enterprise), and Jabber addresses are of the same form as email addresses, e.g., stpeter@jabber.org (further information about addressing in Jabber is provided in the Jabber ID section below).
The Jabber server plays two primary roles:
Listening for client connections and communicating directly with client applications.
Communicating with other Jabber servers.
The Jabber open-source server is designed to be modular, with specific code packages that handle functionality such as user authentication, data storage (offline messages, rosters, user info), and the like. In addition, the server can be extended with additional services, such as integrated security, allowing special connections for server-side components or alternative clients, and gateways to other messaging systems.
As an example of such modularity, the exchange of messages and presence information between Jabber and any given non-Jabber messaging system is made possible by means of a separate "transport" that translate Jabber XML into the foreign protocol. Such transports are not part of the core server. Instead, they are server-side programs that can be added rather easily to the core server to provide enhanced functionality to the end user.
One of the design criteria for the Jabber system was that it must be capable of supporting simple clients (e.g., even something as simple as a telnet connection). Indeed, the Jabber architecture imposes very few restrictions on clients. The only features which a client must support are:
Communicate to the Jabber server via TCP sockets.
Parse and interpret well-formed XML packets.
Understand message data types.
The preference in Jabber is to move complexity from clients to the server. This makes it relatively easy to write clients (as witness the wide variety of Jabber clients available today) as well as to update the functionality of the system (i.e., without forcing users to download new clients). Jabber clients communicate with the server in XML through TCP sockets over port 5222, and do not normally communicate with each directly. In practice, many of the low-level functions of the client (e.g., parsing XML and understanding basic Jabber XML such as <message/>, <presence/>, and <iq/>) are handled by Jabber client libraries, enabling client developers to focus on the user interface.
XML is an integral part of the Jabber architecture because it is of utmost importance that the architecture be fundamentally extensible and able to express almost any structured data. (Specifically, Jabber utilizes XML Streams for client-server and server-server communication. The XML Stream is always initiated by the client to the server, and the lifetime of the XML Stream is directly associated with the lifetime of that user's online session.)
While Jabber is strongly committed to XML, it is at the same time agnostic with regard to the delivery medium: there are no inherent restrictions to the delivery system, and no knowledge within the architecture of the delivery system. This is to enable, among other things, the building of transports that provide transparent messaging to third-party services (e.g., IRC, ICQ, AIM). However, within the Jabber system, the transport speaks XML, as does every other component in the Jabber system. Further information about the Jabber XML protocol may be found in the companion Jabber Protocol Overview document.
The Jabber server consists of multiple components that handle logically separate functions within the Jabber system. At the heart of the server lies a deliver component whose sole function is to direct deserialized XML from one base component to another. There are four such base components: accept, connect, exec, and load. All of the base components deserialize XML input for delivery to other base components and reserialize XML for use by components that are downstream from the base components. Here is a high-level view of the architecture just described:
On server start-up, the components of the Jabber server register callbacks for their responsibilities with the main Jabber daemon (as defined in the server configuration file), and then handle packets that are associated with those responsibilities (thereby defining the Delivery Logic for all packets). The core Jabber server includes components that handle the following common tasks:
session management
client-to-server communication
server-to-server communication
DNS resolution
user authentication
user registration
database lookups
storing messages for offline users
storing and retrieving vCards
filtering messages based on user preferences
group chat (many-to-many communication)
system logging
In addition, the core server can be supplemented with "transports" designed to handle protocols that are foreign to Jabber's open XML format (see Transports for details). These transports function naturally as components within the overall server architecture. The transports currently in existence translate to and from the following protocols:
AOL Instant Messenger (AIM)
ICQ
Internet Relay Chat (IRC)
MSN Messenger
Rich Site Summary (RSS 0.9)
Yahoo! Messenger
(Note: Additional transports will be added to Jabber as needed, for example to handle the IMUnified format when it is made public, but future transports are not addressed in this document.)
A good entrée to Jabber architecture is to look at the flow of a typical message through the server. (While the XML 'message' element is only one of the three main elements in Jabber's open XML protocol, it is the one most central to the purpose of Jabber: to route information from one point to another using XML.)
Here is a diagram of that flow:
The Jabber server (here represented by the term 'jabberd', short for "Jabber daemon") expects to receive packets of type 'message' in the context of a user session on that host, which normally will take place through a dedicated TCP socket on port 5222 (or port 5223 if SSL is enabled and in use). If a session does not exist, jabberd will initiate the authentication flow as described below under Authentication. If a session does exist, the message packet will be sent to the Jabber session manager component ('JSM' for short).
Here is a sample of what the XML might look like:
<message to='psaintandre@aim.jabber.org' type='chat'> <body>Hey, the AIM transport is working great!</body> </message>
Next, the JSM checks the hostname of the destination server against the list of names contained in the Jabber server's internal configuration file. Often the hostname will be defined; for example, aim.jabber.org is defined in the config file on Jabber.com's server to point to the AIM Transport for that host (which could be on a separate machine). If the hostname is not defined in the config file, the 'dnsrv' component will resolve the hostname to an IP number and port. Either way, the message packet will next be sent on to the server-to-server ('s2s') component for the host in question, in this example jabber.org. The server-to-server component will be sent directly to the appropriate external Jabber server (e.g., jabber.org) or to a transport on this host. In this example, the message packet is intended for delivery to an address on aim.jabber.org, so the packet will be sent to the AIM Transport on jabber.org for subsequent delivery to an AOL Instant Messenger account (see Transports below). In either case, the end result is that a message has flowed from a Jabber client through a Jabber server to either another Jabber server or a foreign IM system.
As mentioned already under Basic Message Flow, messages and presence notification are sent in Jabber within the context of a user's session on a host machine running the Jabber server. In the terms of the Jabber protocol, this session is maintained by means of two XML streams, one from the client to the server and one from the server to the client. Here is what the XML might look like for a session:
SEND:<stream:stream SEND:to='jabber.org' SEND:xmlns='jabber:client' SEND:xmlns:stream='http://etherx.jabber.org/streams'> RECV:<stream:stream RECV:xmlns:stream='http://etherx.jabber.org/streams' RECV:id='39ABA7D2' RECV:xmlns='jabber:client' RECV:from='jabber.org'> SEND:<iq id='1' type='set'> SEND:<query xmlns='jabber:iq:auth'> SEND:<username>stpeter</username> SEND:<resource>Gabber</resource> SEND:<digest>f1e881517e9917bb815fed112d81d32b4e4b3aed</digest> SEND:</query> SEND:</iq> RECV:<iq id='6' type='result'/> (XML for user session goes here) SEND:</stream:stream> RECV:</stream:stream>
However, in order for the server to create a session, it must first authenticate the user. The following diagram captures the activity flow for authentication:
The authentication flow begins when the client connects to the host and initiates an XML stream. Immediately, the Jabber server checks for a packet of type 'iq' (short for info/query) and subtype 'query' in the 'jabber:iq:auth' namespace, containing authenticating information for the user. This authenticating information must consist of a username and resource along with a plaintext password (for obvious reasons this is discouraged), a password scrambled using the SHA1 algorithm (this authentication scheme, a.k.a. "digest authentication", is the default), or appropriate data for zero-knowledge authentication.
Once this information is received, the XML parser passes control to the 'deliver' component of the Jabber server, which will begin to buffer incoming XML if the client continues to send XML without waiting for authentication. The host (usually, but not always, in the form of the JSM) will then pass the authentication packet to the 'xdb' component of the Jabber server. The xdb component ('xdb' stands for "Xml Data Base") will send the packet to whichever sub-component has registered for that type of authentication packet: for example, plaintext authentication packets might be checked against XML files on the filesystem using the 'xbd_file' sub-component, whereas digest authentication packets might be checked against LDAP using the 'xdb_ldap' sub-component. All that the deliver component needs to do is hand off the authentication packet to the xdb component, which will send it to the appropriate sub-component. In addition, to improve performance, the xdb_ldap component has its own thread pool, which functions in a way similar to the model used for Threading in the session manager.
The xdb component will return the result of the authentication query to the host (again, usually the JSM). If the authentication failed, the server will return error code 401 to the client and will not initiate a session. If the authentication succeeded, the JSM will start a session (and free up the XML buffer if necessary). From that point forward, all presence, message, and iq elements will be passed back and forth in the context of a user session until the client or server terminates the session by sending a closing stream tag (</stream>).
Here is the activity flow of the Jabber session manager:
As mentioned, the Jabber session manager component (often shortened to 'JSM') handles packets of type message, presence, and iq to and from a Jabber user who is connected to a Jabber host. However, the JSM will also handle packets intended for a user while that user is offline. For example, let us say that you send a message to me via my Jabber ID (stpeter@jabber.org) even though I am not online. The JSM will handle that message appropriately, most likely by storing it until I am again online.
The JSM differentiates between online and offline users by looking for the 'resource' element in the XML stream (the 'resource' is the device, client, or location with or from which I am connected; examples of these might be 'laptop', 'Gabber', and 'home'). Normally, a user is offline if the packet does not contain a resource element. However, sometimes the resource element is left off in error, so the JSM checks to see if the user really is offline before sending a packet to the 'offline' component, which might (for example) store a message or retrieve a vCard.
If the user is online, the message, presence, or iq packet is not sent to the offline component but instead is handled by the JSM. In essence, any such packet can have only one of two possible states: either it is intended to be delivered to the user or it is being sent from the user. So the JSM contains two listeners that listen for packets "to" or "from" the user and then route them to the appropriate module within the Jabber server. Once the appropriate module has handled the packet, the packet is sent back to the listeners for further processing by more modules or, if all processing has been completed, the packet is sent out to or from the user.
It may be helpful to look at an example. Let's say that I receive a message from foobar@jabber.org. I am online, so the message is sent to the JSM. The "to listener" hears about a packet that is intended for me and sends out a call to the modules that have registered with the JSM. The first module that responds is mod_filter, which sorts through incoming messages according to criteria set by the user. In this case (since I never seem to get anything critically important from our friend foobar), I have configured mod_filter to forward all messages from foobar@jabber.org to my email box using the hypothetical but planned SMTP Transport. Let us say that mod_filter reformats the message so that the host of the intended recipient is now smtp.jabber.org instead of jabber.org, then sends the packet back to the "to listener". Another call goes out to the registered modules, but none reply so the packet is sent to stpeter@smtp.jabber.org, which duly forward it to my email inbox.
The important thing to note is that this process is iterative, so that multiple modules may handle a packet before it is finally sent to or from a user. This gives the JSM a great deal of flexibility and extensibility, since new functionality can be added to the server simply through the addition of a new module (and appropriate changes to the server configuration file) without making changes to the JSM itself or to existing modules.
The Jabber session manager uses threading to improve performance. On server start-up, a number of threads are assigned to the thread pool (the exact number is determined in the configuration file). As message packets are fed to the session manager through the base load component from other parts of the system, the session manager dynamically pulls unused threads from the thread pool and associates them with the message ports for which queued packets are intended. (A "message port" is a data structure that supports a client connection.) If no threads are available in the pool, the session manager may (but is not required to) create a new thread and associate it with the appropriate message port. Here is a visual representation of the process:
The deliver component is the heart of the server, since it moves data from one base component to another. The logic for handling data at this level is shown below:
Once a packet is delivered to one of the base components (accept, connect, exec, or load), it may be sent to a sub-component such as jpolld or xdb_ldap for further processing.
An example of a precondition might be an xdb result (e.g., from a database get) that needs to be handled. An example of a process condition might be the addition of a route namespace for use within JSM. And an example of changing a packet for delivery might be a change in the format of the message, e.g. by adding a from address.
Although the construction of a robust, XML-based messaging system is the core goal of the Jabber project, an important sub-goal is the achievement of interoperability between messaging systems. Fundamentally, the Jabber project contributes to interoperability by making its protocol completely open. However, it also contributes by enabling communication between Jabber's open XML format and numerous non-Jabber formats through the use of what in the Jabber world are called "transports".
When a Jabber user sends a message to a user on a foreign system, the delivery of that message involves the work of a transport component. The user's Jabber client sends a message to the Jabber server intended for a user on a foreign IM system, denoted by a Jabber ID that contains the name of the foreign system (e.g., psaintandre@aim.jabber.org). The Jabber server then routes the data to the appropriate transport application. If the transport is local (running on the same machine), the Jabber server communicates directly to it. If the transport is running remotely (on another machine), then the local server passes the packet to the remote server, which then passes it to the appropriate transport. Once the transport receives the XML packet, it "transforms" the message (or instructions) into a native packet which is readable by the other IM network, and passes it to that IM network.
Here is a high-level view of what Jabber transports do:
In essence, a transport implements the proxy pattern. Most transports contain their own small session manager, which translates Jabber XML into and out of the "foreign" (non-Jabber) protocol for presence, messaging, and (in some cases) info/query requests. In general, when a user logs onto Jabber, a thread is created in the transport to handle all communications to and from that user.
In some cases the translation to and from the Jabber protocol is fairly straightforward, for instance when the foreign protocol is well-documented (e.g., the IRC protocol as well as the so-called "Oscar" version of the AIM protocol). In other cases, the translation is made more difficult by the closed or undocumented nature of the foreign protocol (e.g., the Yahoo! Messenger protocol). It is hoped that the IMUnified initiative (http://www.imunified.org) will be successful in opening up some of the messaging protocols that are now closed, or at least creating an open protocol into which such closed protocols can be translated.
While most transports exist to communicate with non-Jabber services, there are exceptions to this rule. For example, the Groupchat Transport enables communications with other Jabber users within a chat room or IRC-like interface. The Groupchat Transport keeps track of all the users who are currently subscribed to each room and within that room acts as a reflection server, sending each message to all room members. It will also create and destroy rooms as needed that is, if I join a room which does not exist, the transport will create that room, and if I am the last person to leave a room, the transport will then destroy that room. A single room is identified as groupname@groupchatserver, and each participant is identified with a unique resource representing their nickname. For example, the witches' "groupchat" in Shakespeare's Macbeth might take place in a room whose address is cauldron@conference.witches.org room, and the witches might be identified as cauldron@conference.witches.org/firstwitch and so on. Here is what a user might see:
A Jabber entity can subscribe to the presence of any other entity (i.e., anything with a Jabber ID). A subscription is essentially an agreement by the "subscribee" to send presence changes to the subscriber. This information is stored in both the subscriber's roster and the subscribee's roster. When I authenticate and create a session on the server, my presence information is stored within the Jabber Session Manager. Then whenever I change my presence information, the <presence/> packet is handled by the server, which does a lookup in my roster and then forwards the presence packet to all the Jabber entities that have subscribed to my presence.
Subscriptions fall into the following categories, which are stored in the rosters of the entities involved:
to -- another entity sends presence information to you
from -- another entity receives presence information from you
both -- both you and the other entity send and receive presence information from one another
none -- neither you nor the other entity send or receive presence information from the other
The entity sending presence does not have to be another Jabber user, but instead can be an outside service such as a data feed or a non-Jabber IM system. In the latter case, subscriptions to users on the non-Jabber system are handled through a transport, and the Jabber user registers with the appropriate transport (e.g., icq.jabber.org) in order for presence to be passed on to users of the non-Jabber system. Once the Jabber user has successfully registered, the transport needs to know whenever the owner comes online, so it sends a presence subscription request to the submitter. A special presence subscription packet is sent with a "from" attribute generated by the transport, with embedded data needed to login to the native protocol.
The Jabber server maintains a list of each user's subscriptions (usually in a spool directory on the filesystem, although this information may also be stored in a database). This list is called a roster and is similar to what in other IM systems is called a "buddy list". The roster in Jabber is stored on the server and thus can "follow" the user from location to location and computer to computer. The Jabber server automatically adjusts the roster to reflect subscription types when people authorize or refuse subscription requests. Rosters can also contain other information about specific users, such user nickneames and the "groups" to which that user belongs. This information can be used by the client to display the roster in an appropriate interface, e.g., a treeview.
Within Jabber there are many different entities that need to communicate with each other. These entities can represent transports, groupchat rooms, or a single Jabber user. Jabber IDs are used both externally and internally to express ownership or routing information. Key characteristics of Jabber IDs include:
They uniquely identify individual objects or entities for communicating instant messages and presence information.
They are easy for users to remember and express in the real world.
They are flexible enough to enable the inclusion of other IM and presence schemes.
Each Jabber ID (or "JID") contains a set of ordered elements. The JIDs are formed of a domain, node, and resource in the following format:
[node@]domain[/resource]
The Jabber ID elements are defined as follows:
The Domain Name is the primary identifier. It represents the Jabber server to which the entity connects. Every usable Jabber domain should resolve to a Fully Qualified Domain Name.
The Node is the secondary identifier. It represents the "user". All Nodes live within a specific Domain. However, the Node is optional, and a specific Domain (e.g., conference.jabber.org) is a valid Jabber ID.
The Resource is an optional third identifier. All Resources belong to a Node. Within Jabber the Resource is used to identify specific objects that belong to a user, such as devices or locations. Resources enable a single user to maintain several simultaneous connections to the same Jabber Server; examples might be juliet@capulet.com/balcony vs. juliet@capulet.com/chamber.
A Jabber user always connects to a server by means of a particular resource and therefore has an address of the form node@domain/resource while connected (e.g., juliet@capulet.com/balcony). However, since the resource is session-specific, the user's address can be communicated as node@domain (e.g., juliet@capulet.com), which is familiar to people since it is of the same form as most email addresses.
Note that in some circumstances messages may be sent directly to a specific resource, but in general, a message destined for juliet@capulet.com is routed based on some rules in the Jabber server, since each connection instance can have its own priority setting. Thus, if a message is just sent to juliet@capulet.com (i.e. without specifying a resource), the message is routed to the resource which has the highest priority, e.g. juliet@capulet.com/balcony.
The 1.2 server added a feature called server dialback. This feature is designed to discourage server spoofing, thus adding an extra measure of security to server-server interactions. Detailed information about this feature will be provided in a future version of this document. Some preliminary documentation is provided at http://docs.jabber.org/draft-proto/html/dialback.html.
This document has provided a high-level overview of the architecture of Jabber. If you have any questions about this document, feel free to contact its author (Peter Saint-Andre) via email or Jabber at stpeter@jabber.org.
This document is copyright 2001 by Peter Saint-Andre.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You may obtain a copy of the GNU Free Documentation License from the Free Software Foundation by visiting http://www.fsf.org/ or by writing to:
The Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307
USA