Drupal Metadata Roadmap

John VanDyk

Glossary

Meta type: a kind of thing. A form in the Platonic sense. A specific kind of attribute. Examples: color (a string), author (a string), last-modified (a date), body text (a string).

Levels of Abstraction

  1. database (a MySQL or Postgres field)
  2. attribute type (PHP type corresponding to database type, e.g. string to varchar)
  3. attribute (a field type which may contain one or more attribute types, treated as a single unit, e.g. a current Flexinode field type)
  4. module (may use attributes to accomplish its goals)

Narrative

One of Drupal's strengths is that it has its backend firmly planted in a relational database. Relational databases have many ways of working with data that have been proven over the years. Many of the operations performed by relational databases are the same operations that individuals want to do with their websites, including searching, sorting, and filtering content.

Databases are most useful if the data in them is strictly typed, that is, strings are saved in string fields, dates in date fields, etc.

There is a wall between PHP and the database that PHP reaches across to get data in and out of the database. On one side of the wall sits that database; on the other side sits Drupal.

We are introducing the concept of attributes to Drupal. Attributes are database fields or derivations thereof that have been wrapped by Drupal and given behavior. For example, here is the story node type expressed as a composition of the attributes Title, Body, Creation Date and Modification Date:

Note that the composition of the story content type is described by it's definition, or schema. It has a title field of type string, a body field of type string, a creation date field of type date, and a last modified field of type date.

Let's look specifically at one attribute, the Title. It is stored in the database as a string. It comes across the wall via the vocabulary module (in the current Drupal this is taxonomy.module). This module takes care of reading and writing from the database, attribute-specific queries, etc. The behavior of the vocabulary is modified by its metadata. For example, is it a closed vocabulary where no new entries are allowed, or is it open and new terms can be defined? Is its default representation to be as a textbox, radio buttons, a select box?

The vocabulary produces an HTML form field according to the interpretation of the vocabulary's metadata. In this case it would incorporate the functions in stringfield.inc to create a text box.

The Title attribute is a string attribute, but it is more than that. It's a title. The behaviors specific to titles as opposed to generic strings are contained in the flexinode-title component. So attribute-specific translation from semantics to behavior happens here.

Lastly, the content type may have some specific behavior of its own, other than the default "get data in, read data out" behavior provided by Drupal's core framework. This lives in a module that has the same name as the content type. Here we have a content type called "story" that is composed of four attributes. This schema management is similar to that offered by Flexinode 1. Description/schema is handled by Flexinode; behavior comes from a module with the same name as the schema.

Now comes the fun part. Let's add another meta type to story. In addition to Title Body, Creation Date, and Last Mod Date, which are the required attributes that make up a story, we're going to add an additional, optional meta type called Topic. In the real world, we might use the Topic meta type to organize a story index by topic. This is equivalent to assigning a taxonomy to a node type in present-day Drupal.

For our next step, we'll add a meta type that not only affects the composition of the story but also its behavior. Let's add a boolean meta type with the name "published":

The brown border has been extended to indicate that "published" is a required field. Drupal users will notice that this reflects the current situation in Drupal, where a series of checkboxes denote behavior:

What I'm asking you to do is to think about these options differently. Don't think about them as "node options". Think of them as fundamental types (boolean, in this case) that are stored in vocabularies and are being rendered through a default format (checkbox, in this case) and are wired up to behavior. They're not just wired up to behavior, they're hard-wired to behavior. The benefit of the approach we are espousing is that the behavior is not hard-wired; it's pluggable.

Take workflow, for example. Suppose we have a workflow meta type. That means we have a vocabulary named "workflow" with the following terms: draft, review, published. This meta type carries behavior with it. The behavior might be something as simple as changing the published/unpublished status when the value of the meta type changes from review to published. The point is that behavior can be added to any node type simply by associating a meta type with it.

Composing a Node

The following steps are necessary to compose a node (we are talking about data only, not presentation):

  1. Find out which schema this node has (what type of node it is)
  2. Find out which fields the schema has
  3. Assemble the fields by letting the attributes get their data

Now that the node's data is composed, suppose we want to edit the node. There should be a default rendering mechanism which treats the fields as units. This default mechanism should be overridable so that I can change the way the edit form looks for a given node type, or be able to have multiple forms for a single node type. For example, I may want to change the order in which the fields appear. Or I may want to render a boolean meta type as a checkbox in one situation, but a Yes/No radio button in another situation.

It is important to remember the distinction between a meta type's type (boolean, string, etc.) and the way it is rendered (checkbox, dropdown, text field, etc.).

The point to remember is that the way Drupal renders a node should be only a default and should be overridable. It should not be hardwired into Drupal the way it is currently.

Comparing MFR and Current Flexinode

MFR is the name of the approach that we have taken (MetadataFRamework).

Flexinode uses three tables: flexinode_data, flexinode_field, and flexinode_type.

MFR uses five tables. The first three are duplicates of the taxonomy module, as we intend to use taxonomy as the backend. So metatype_data, metatype_hierarchy and metatype_node come from that. In addition, we have metatype, mfr_schema, and a lookup table that maps schemas to metatypes (and vice versa): mfr_schema_map.

The flexinode_field and metatype tables are about the same. Here's a comparison of the tables used by the Iowa team and by the Flexinode team.

Metatype table/Flexinode_field table:
Description Iowa Comments Flexinode Comments
Key for this field type mtid field_id
Foreign key for content type schema table used instead ctype_id
Label label label
Default value stored in data table with id 0 default_value mediumtext
Weight for ordering weight weight
Required ??? required
Whether to show teaser not applicable to all show_teaser
Whether to show table rows
Field type field_type mixes data and presentation
Field-specific options private_data options
Description of field description description really help text?
Data type (date, boolean...) data_type mixed in field_type
Default input format input_format textbox, radio, zipcode...
Help text help
Schema table:
Description Iowa Comments Flexinode Comments
Key for this schema sid ctype_id
Name for this schema name should be label? name
Node type schema extends nodetype
Description of schema description description
Help text help help
Schema lookup table:
Description Iowa Comments Flexinode Comments
Foreign key schema id sid
Foreign key meta type id mtid

So the database structure that we are proposing looks, at a minimum, like this:

In this figure I have renamed weight and input_format to default_weight and default_input_format to emphasize that they are defaults. They could be overridden by code at several levels. This is the Iowa proposal.

Below is a figure of the schema from JonBob's sandbox. Asterisks denote primary keys.

Questions that spring to mind about this cck schema:

  1. it combines levels of abstraction 2 and 3 into one level (see above). This negates the benefits of having level 2 separate, namely, the ability to sort and filter as in present-day taxonomy
  2. it does not allow derivative node types (see based_on in Iowa's Schema table)

References: JonBob, mathias, Dries, experience writing a metadata framework in Ruby.
Updated 10/28/04
Updated 16 Feb 2005 (added cck diagram)