Having highlighted the fact that the existing schema paradigm can only express constraints among data items in terms of the child and sibling axes, it is natural to consider whether an alternate paradigm might allow a schema author to exploit these additional relationships to define additional types of constraint amongst document elements. Tree patterns do just that, and XPath provides a convenient syntax in which to express those patterns.
Validation using tree patterns is a two-step process: Firstly the candidate objects in XPath terms, nodes to be validated must be identified. Both the candidate object selection, and the assertions can be defined in terms of XPath expressions.
More formally, the nodes and arcs within a graph of data can be traversed to both identify nodes, and then make assertions about the relationships of those nodes to others within the same graph. Assertions are therefore the mechanism for placing constraints on the relationships between nodes in a graph elements and attributes in an XML document. For example, we may select all house nodes within a document using the expression: Full use of tree pattern validation provides the maximum amount of freedom when modelling constraints for a schema.
This comes at very little cost: XPath is available in most XML environments. For example the following types of constraint are hard, or impossible to express with other schema languages. Examples of 'difficult' constraints Where attribute X has a value, attribute Y is also required Where the parent of element A is element B, it must have an attribute Y, otherwise an attribute Z The value of element P must be either "foo", "bar" or "baz" Tree patterns are the schema paradigm underpinning Schematron as a validation language.
There are reasons to believe that tree-pattern validation may be more suitable in an environment where documents are constructed from elements in several namespaces often termed 'data islands'. As many consider that the future of XML document interchange on the Internet will involve significant mixing of vocabularies, a flexible approach may bring additional benefits. It combines powerful validation capabilities with a simple syntax and implementation framework.
Schematron is open source, and is at the time of writing being migrated to SourceForge to better manage its development by a rapidly growing community of users.
A recent review of six current schema languages [Lee] supports this view, declaring Schematron to be unique in both its approach and intent. Before discussing the details of the Schematron language it is worth reviewing the design goals which have been highlighted by its author. Design Goals There are several aims which Rick Jelliffe which believed were important during the design and specification of Schematron [Schematron], [Jelliffe]: Promote natural language descriptions of validation failures, i.
Document validation in software engineering, through the provision of interlocking constraints Mining data graphs for academic research or information discovery. Constraints may be viewed as hypotheses which are tested against the available data Automatic creation of external markup through the detection of patterns in data, and generation of links Use as a schema language for "hard" markup languages such as RDF.
Aid accessibility of documents, by allowing usage constraints to be applied to documents How It Works The implementation of Schematron derives from the observation that tree pattern based validators can be trivially constructed usings XSLT stylesheets [Jelliffeb], [Norton].
For example, a simple stylesheet that validates that houses must have walls can be defined as follows: Schematron takes this a natural step further by defining a schema language which, when transformed through a meta-stylesheet i. The following diagram summarises this process.
Yet from a user perspective, the details of XSLT are hidden; the end-user need only grapple with the XPath expressions used to define constraints. The following section outlines the Schematron assertion language which is used to define Schematron schemas.
The last section in the paper provides information on the Schematron implementation i. All following examples conform to a simple XML vocabulary introduced in the next section. While the examples could have been couched in terms of an existing schema language, the intention is to provide a simple vocabulary which does not assume any prior knowledge on behalf of the user.
It should be stressed that, while the examples themselves may be trivial this should not be taken to indicate any specific limitation in Schematron, which is capable of handling much more complex schemas. The following DTD defines the building project vocabulary: The roof may not be present if the house is still under construction. A house has an address which consists of a street name, town and a postcode. A house should have either a builder who is currently assigned to its construction and all builders must be certified , or an owner.
Certification numbers of builders, and telephone numbers of owners are also recorded for adminstrative purposes. A sample document instance conforming to this schema is: Assert and Report The basic building blocks of the schematron language are the assert and report elements.
These define the constraints which collectively form the basis of a Schematron schema. Constraints are assertions boolean tests that are made about patterns in an XML document; these patterns and tests are defined using XPath expressions. The best illustration is a simple example: Recall that validation is a two step process of identification and followed by assertion.
The identification step generates the context in which assertions are made. This is covered in the next section. If there are not four walls then the assertion fails and a message, the content of the assert element, is displayed to the user. Asserts therefore operate in the conventional way: The report element works in the opposite manner. If the test in a report element evaluates to true then action is taken.
While reports and asserts are effectively the inverse of one another, the intended uses of the two elements are quite different. An assert is used to test whether a document conforms to a particular schema, generating actions if deviations are encountered.
A report is used to highlight features of the underlying data: However Schematron itself does not define the action which must be taken on a failed assert, or successful report, this is implementation specific. The default behaviour is to simply provide the user with the provided message. An implementation may choose to handle these two cases differently. It is worth noting that there is a trade-off to be made when defining tests on these elements. In some cases a single complex XPath expression may accurately capture the desired constraint.
Yet it is closer to the 'spirit' of Schematron's design to use several smaller tests that collectively describe the same constraint. Specific tests can more accurately provide feedback to a user, than a single general test and associated message. Assert and Report elements may contain a name element which has an optional path attribute. This element will be substituted with the name of the current element before a message is passed to the user. When supplied the path attribute should contain an XPath expression referencing an alternate element.
This is useful for giving additional feedback to the user about the specific element that failing an assertion. Assert and report messages should be simple declarative statements of what is, or should be.
Diagnostics can include detailed information that can be provided to the user as appropriate to the Schematron implementation. Diagnostic information is grouped separately to constraints, and is cross-referenced from a diagnostic attribute.
The context for constraints is defined by grouping them together to form rules. This identifies the candidate nodes to which constraints will be applied.
The above example checks that a house contains 4 wall child elements, and provides feedback to the user if it is missing a roof. To do this, a rule may be declared as 'abstract'. The contents of this rule may be included by other rules as necessary.
This is achieved through the use of the extends element. Two assertions are associated with this abstract rule: These assertions are imported by the other non-abstract rules and will be applied along with the other constraints specific to that element. An abstract rule may contain assert and report elements but it cannot have a context. Assertions from an abstract rule obtain their context from the importing rule.
Producing Patterns and Schemas The next most important element in a Schematron schema is pattern. Patterns gather together a related set of rules. A particular schema may include several patterns that logically group the constraints. A Schematron implementation can then furnish the user with a link to supporting documentation.
Patterns defined within a schema will be applied sequentially in lexical order. Nodes in the input document are then matched against the contexts defined by the rules contained within each pattern. If a node is found to match the context of a particular rule, then the assertions which it contains will be applied.
Within a pattern a given node can only be matched against a single rule. Rules within separate patterns may match the same node, but only the first match within a pattern will be applied. An example of an incorrect schema is given below. The last step in defining a Schematron schema is to wrap everything up in a schema element. Firstly it introduces the namespace for Schematron documents, which is "http: Secondly a schema may have a title; this is recommended.
Schematron achieves this using the concept of phases. A phase allows constraints to be applied according to the state of a document within its lifecycle. A Schematron schema may define any number of phases, where a phase involves the processing of one or more patterns. This means that constraints will be applied selectively according to the active phase.
Identifying the active phase is an implementation specific mechanism, but may be accomplished through command-line arguments to the XSLT engine. A schema may define a default phase which will be selected if not overridden. The first phase is "underConstruction", and captures constraints that need to be checked when a house is being built.
This involves checks that the architectural plans are being followed there are four walls! The second phase "built" captures constraints that are to be enforced once construction is completed.