BLOB stands for Binary Large OBject, typically an image, video, or audio file, but could represent any binary data. Often I'm asked how Jabber handles BLOBs and why it doesn't encode and transfer MIME objects within the protocol, so I'd like to explain clearly why I think BLOBs are inappropriate in a diverse messaging infrastructure.
First, let's forget about email. Jabber is a new architecture, and doesn't need to continue to advance the baggage that email has accumulated. Email was designed in a time when the network was a very different place, and very few protocols existed for transferring data via a standard method, so let's clear the slate and not incorrectly assume that we must operate like email.
The problem is quite simply, it is easy to think from a human point of view that a message is a message, wether that be simple text, marked up text, word processing document, a picture, audio clip, or any other data. But from a technical point of view, there are some distinct differences.
Textual data is a base common denominator, it can inherently be displayed on any human medium (viewed on any video display, and spoken on any audio medium). Textual data is also the base common denominator across software, every programming environment supports characters at it's most fundamental level. Any data beyond simple text varies in support across all human mediums and software environments.
Let me also define the "diverse" in diverse messaging infrastructure. By diverse, I mean any two entities that are communicating without any predefined or agreed upon format or protocol for that conversation, and the infrastructure manages the differences in the environments and merges the conversation.
So with that background, what are the reasons for not sending BLOBs across a diverse messaging infrastructure?
The appropriate way to handle BLOBs in this environment is by reference. Jabber passes a reference to a BLOB around as part of a message, and the BLOB is retrieved out of band via HTTP on demand at the will of the recipient. This model solves the issues I've pointed out above: all data is textual, the recipient doesn't incur the costs associated with the BLOB until they choose to, the recipient isn't forced to accept a copy of the BLOB before knowing what it is, and the receiving application only retrieves the BLOB if it can understand or present that data type.
Theoretically, any binary data can be encoded into textual data, which could be used to obscure the difference between a textual message and a BLOB. XML is an excellent divider between the grey area differentiating a BLOB from textual data, since it is textual oriented and BLOBs are difficult to encode. By using XML as the structure for the envelope and message an inherent logic is built into the infrastructure: if it's not human intended readable text, does XML add value to the data? If the answer is no, it should be passed by reference.
So, in a diverse messaging infrastructure such as Jabber, BLOBs are most appropriately handled by passing a reference to them within the protocol and allowing the endpoints to manage them.