jQuery in Some Detail: Data

I'm starting a project, where I discuss some jQuery API in extensive detail, so that we can all understand it better. jQuery is a phenomenal piece of software, and underneath its very nice API, there's a lot going on. Some of it we can study and learn, some of it we can actually use for our own purposes.

I am starting with the Data API. The Data API is one of my favourite bits of jQuery. Along with events and passing a function to jQuery, it's one of the three reasons I include jQuery on a Magento project despite Magento already including Prototype. (I would like to point out that I know the Prototype library very well, and that I'm not looking for a Prototype for jQuery developers guide. Take it from me that jQuery is better, and it's better enough to swallow the cost of including it on your site. Your mileage may vary.)

Here's the thing though. The Data API (hereafter Data) is actually two APIs, carefully stitched into one interface. It's pretty neat. One is for objects, and one is for elements, and they are almost, but not quite the same.

I wrote tests for all the functionality I describe in this article. If you want to play with them, then have a look at it on Github.

Data With Objects

We can use Data to attach arbitrary data to objects. Given that objects in JavaScript are in fact arbitrary collections of data, this might seem a little ridiculous. Indeed, I don't think this half of the API would exist if they hadn't gone so far as to build the elemental version as well.

But perhaps you're doing some sophisticated jQuery work, and you've got your own data structure that uses jQuery things inside it. Perhaps you wish to include metadata for this data structure. Then Data might be a good choice for you, because jQuery data doesn't show up when you serialise and object into JSON.

Storing data is easy:


var o = {}

$.data(o, 'foo', 'bar')

// 'bar'
$.data(o, 'foo')

jQuery data is stored under a special key on the element. The key is equal to $.expando, which is a value unique to every jQuery instance. It's 'jQuery' plus some mostly random number. Inside there, we have two things: one is a toJSON function, which does nothing. This is what stops jQuery data from showing up in JSON.stringify.

JSON.stringify has a default way to serialise options, but will delegate to an object's toJSON member function if it exists. Data clearly takes advantage of this, but it's not black magic. Anyone can do it.

The second thing is an object called "data". If we look inside here, we see:


// 'bar'
o[$.expando]['data']['foo']

So that's where Data is keeping its secrets.

Data with Elements

In much the same way as above, we can attach arbitrary data to elements. Unlike objects, this data is not stored directly on the element. This is because when you assign arbitrary objects as members to elements, Internet Explorers 6 and 7 get a bit confused about how to garbage collect things. So, it is worked around.

In addition to the API described above, jQuery also adds a data method to $.fn, so we can avoid passing the first parameter:


var $p = $('<p />')
var p = $p[0]

// $.data(p, 'foo', 'bar'
$p.data('foo', 'bar')

However, Data is still unique to elements. The same element in a different jQuery object will have persistent data, and setting data on a jQuery object containing multiple elements sets data uniquely on each of them. How does jQuery achieve this if it doesn't add the data to the element?

It does write something to the element. Rather than it being the arbitrary data, it's just a unique number. This number is then used to look up data in a central location. The number is generated by calling $.guid++, which is an internal thing jQuery uses to get a unique number. As a fun aside, you can mess up various things by setting $.guid to 0, or something. Don't ever do that on a real site, it plays havoc with event handling. Also, with Data.

The central data location is $.cache. If we look in there, we'll find our data:


// 'bar'
$.cache[p[$.expando]]['data']['foo']

Elemental Data has another nicety: you can initialise Data values using data-* attributes. Note that you can only initialise values; Data will not write data-* attributes to your elements. Reading from a data-* attribute causes jQuery to write the value to its internal store. If a value exists in the internal store, it will not search for a data-* attribute. I cannot repeat enough times that data-* attributes are not a storage backend to Data. However, they are improbably useful for initialising things.


var $p = $('<p data-foo="bar" />')

// 'bar'
$p.data('foo')

// write the data-* attr
$p.attr('data-foo', 'baz')

// still 'bar' (!)
$p.data('foo')


Miscellany

Like the rest of jQuery, Data has an intended API, and an implementation with a lot of public details. While not documented, these tend not to change, and are a nod to anyone who needs to do something a little bit more interesting. It betrays the stunning amount of work that has gone into jQuery, and is why I love using it so much. Here, I'll point out a few things I find interesting about the way Data does it's thing. These are things I've learned by reading the source and trying things out. If you are interested, you should do the same. There's a lot to be learned.

camelCasing

The native tongue of Data is the camelCaseKey. Internally, Data converts all hyphen-separated keys into camelCased ones. Hence,


var $p = $('<p />')

$p.data('foo-bar', 'baz')

// 'baz'
$p.data('fooBar')

// 'baz'
$p.data('foo-bar')

// undefined
$.cache[$p[0][$.expando]]['data']['foo-bar']

// bar
$.cache[$p[0][$.expando]]['data']['fooBar']

As a further nicety, jQuery will search for hyphen-separated data-* attributes when searching for a value.


var $p = $('<p data-foo-bar="baz" />')

// 'baz'
$p.data('fooBar')

Private Data

jQuery uses Data all over the place, but not the usual way. If you dive into the workings, you find a boolean flag on the (private) function that does the work. This flag sets the privacy of Data. Conveniently, $._data is an analogue of $.data which sets this flag.

Recall that Data generates a unique bucket for us for every element, or it attaches it directly to an object. Then it puts all data inside a data object inside the bucket. This seems redundant. Well, private data is outside the data cage, and just in the bucket directly.

I wouldn't advise writing to this, but there might be some information you need to grab, and I'd say this is preferable to sniffing around for the raw object in $.cache or object[$.expando]. Interestingly, you can access the public bucket through the private one.


var $p = $('<p />')

$p.data('foo', 'bar')

// { foo: 'bar' }
$._data($p[0], 'data')

Telling the Difference Between Objects and Elements

Data uses nodeType to decide if an object should be treated as an object or an element. Any truthy value for nodeType will convince jQuery to store data in $.cache, rather than on the element.

Because of the way jQuery blocks certain elements from using Data (see next section), nodeType has certain other restrictions in order for Data to work. In short, the expression +nodeType || 1 has to be either 1 or 9. This means that nodeType coerces to 0, 1, 9, NaN. The following will all be treated as elements, and have their data stored in $.cache:


{ nodeType: 1 }
{ nodeType: 9 }
{ nodeType: 'string' }
{ nodeType: ['something', 'else'] }
{ nodeType: true }

You might feel a bit queasy at adding things like this to your objects. My advice to you is to not do this, but it's still useful, or at least interesting to know that you can. If you are doing this, remember that you can provide a custom implementation of toJSON to hide these if you have to serialise your objects.

$.noData

Certain element types cannot use Data. I don't know why, but it's probably something to do with not being able to write arbitrary properties to these element types in certain browsers. We don't need to worry about that, though. The function used to decide this is $.acceptData.

The first thing it does is test that +elem.nodeType || 1 is either 1 or 9, as we discussed above. If this test is passed, it consults the $.noData object. If we wish to disable Data for a certain element, we should set $.noData[(element.nodeName + ' ').toLowerCase()] = true. jQuery adds a space to the nodeName to ensure it is a string, and sets it to lower case for consistency. While nodeNames for elements are generally strings, we might have a custom object with a custom nodeName property set.

For example, we might wish to disable Data for p tags. This makes the data-* attribute king, although we still cannot write to it:


$.noData['p '] = true

var $p = $('<p data-foo="bar" />')

// 'bar'
$p.data('foo')

$p.attr('data-foo', 'bar')

// 'baz'
$p.data('foo')

We may want slightly more granularity with this. Perhaps we don't want to disable Data for all paragraphs, just ones which have frequently changing data-* attributes. If the value that $.noData holds for a specific nodeName is not exactly true, it will compare the elements classid to it, and only disable Data for elements without a classid. This is not what this feature was designed for, but we can exploit it:


$.noData['p '] = 'standard-data-behaviour'

var $p = $('<p data-foo="bar" classid="standard-data-behaviour" />')
var $q = $('<p data-foo="bar" />')

$p.data('foo', 'baz')
$q.data('foo', 'baz')

// 'baz'
$p.data('foo')

// 'bar'
$q.data('foo')

Elements Can Share Data

There is a fringe advantage to having Data store element data centrally. If we manipulate the id stored in the $.expando property of an element, we can make its data point elsewhere. This could save resources if you are looking up a lot of data for elements of a similar class, or really confuse a co-worker. The choice is yours:


var $p = $('<p />')
var $q = $('<p />')

$p.data('foo', 'bar')

$q[0][$.expando] = $p[0][$.expando]

// 'bar'
$q.data('foo')

You might run into problems if you do this with elements which have data-* attributes to initialise values. In this case, the first one it looks up will be written to Data, and that will be found by all other calls to .data(). Use at your own peril.

If we combine this with our tricks to convince jQuery that we are dealing with an element, we can share Data between multiple plain objects, too.