Project

Only available on StudyMode
  • Topic: XML, Document Type Definition, XML schema
  • Pages : 12 (1554 words )
  • Download(s) : 64
  • Published : December 11, 2012
Open Document
Text Preview
Objectives
The purpose of using schemas The schema languages DTD and XML Schema (and DSD2 and RELAX NG) Regular expressions – a commonly used formalism in schema languages

An Introduction to XML and Web Technologies

Schema Languages

Anders Møller & Michael I. Schwartzbach © 2006 Addison-Wesley An Introduction to XML and Web Technologies

2

Motivation
We have designed our Recipe Markup Language ...but so far only informally described its syntax How can we make tools that check that an XML document is a syntactically correct Recipe Markup Language document (and thus meaningful)? Implementing a specialized validation tool for Recipe Markup Language is not the solution... An Introduction to XML and Web Technologies

XML Languages
XML language:
a set of XML documents with some semantics

schema:
a formal definition of the syntax of an XML language

schema language:
a notation for writing schemas

3

An Introduction to XML and Web Technologies

4

1

Validation
instance document schema schema processor valid normalized instance document invalid

Why use Schemas?

Formal but human-readable descriptions Data validation can be performed with existing schema processors

error message

An Introduction to XML and Web Technologies

5

An Introduction to XML and Web Technologies

6

General Requirements

Regular Expressions
Commonly used in schema languages to describe sequences of characters or elements Σ: an alphabet (typically Unicode characters or element names) σ∈Σ matches the string σ α? matches zero or one α α* matches zero or more α’s α+ matches one or more α’s α β matches any concatenation of an α and a β α | β matches the union of α and β 7

An Introduction to XML and Web Technologies

Expressiveness Efficiency Comprehensibility

An Introduction to XML and Web Technologies

8

2

Examples
A regular expression describing integers:
0|-?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*

DTD – Document Type Definition
Defined as a subset of the DTD formalism from SGML Specified as an integral part of XML 1.0 A starting point for development of more expressive schema languages Considers elements, attributes, and character data – processing instructions and comments are mostly ignored

A regular expression describing the valid contents of table elements in XHTML: caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ )

An Introduction to XML and Web Technologies

9

An Introduction to XML and Web Technologies

10

Document Type Declarations
Associates a DTD schema with the instance document
...

An Introduction to XML and Web Technologies

17

An Introduction to XML and Web Technologies

18

Conditional Sections
Allow parts of schemas to be enabled/disabled by a switch
Example:
• ]]> ]]> •

Checking Validity with DTD
A DTD processor (also called a validating XML parser) parses the input document (includes checking well-formedness) checks the root element name for each element, checks its contents and attributes checks uniqueness and referential constraints (ID/IDREF(S) attributes)

An Introduction to XML and Web Technologies

19

An Introduction to XML and Web Technologies

20

5

RecipeML with DTD (1/2)
step (#PCDATA)> comment (#PCDATA)> nutrition EMPTY> nutrition calories CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED protein CDATA #REQUIRED alcohol CDATA #IMPLIED>

An Introduction to XML and Web Technologies

21

An Introduction to XML and Web Technologies

22

Problems with the DTD description
calories should contain a non-negative number protein should contain a value on the form N% where N is between 0 and 100; comment should be allowed to appear anywhere in the contents of recipe unit should only be allowed in an elements where amount is also present nested ingredient elements should only be allowed when amount is absent – our DTD schema permits in some cases...
tracking img