- Motivation:
- So far, we've looked at structured data model: start with schema
- Now we look at semi-structured. Often called "self-describing"
- This allows for some more felxibility
- can add new schema information when we want
- not all records have to have the data
- Example: movie, in which we add a "want to see it?" field
- flexibility comes with overhead in query cost
- Data representation:
- Collection of nodes (leaf or interior)
- leaves have associated data
- interior nodes have outgoing edges
- edges are labeled
- one interior node (the root) has no incoming edges
- Form a directed graph, not a tree
- Movie example: Root->star, Root->movie
- XML
- tags: <...>
- closing tag:
- element: <...>...
- text: <...> this is the text
- tags may be nested ..
- single tag, can't have nested elements: <.../>
- attributes, name value pais in an opening tag:
...
- Relational model vs. XML
- Structure: tables vs. Hierarchical tree, graph
- Schema: fixed vs. flexible, "self describing" (maybe)
- Queries: SQL (nice), XPath, XQuery (not as nice)
- Ordering: None vs Implied
- Implementation: Native vs Add-on
- "correct" xml
- Well-formed XML: obeys nesting rules, no predefined schema. Can invent your own tags
- Valid XML: involves a DTD (Document Type Definition)
- The DTD specifies allowable tags and how tags may be nested
- Well Formed XML:
- There must be a root element
xml version="1.0" encoding="utf-8" standalone="yes"?>
...
- opening tags must have a matching closing tag
- tag names must be unique
- Namespaces:
- data might come from different source with different meaning
- use namespaces to distinguish sets of tags
- xmlns:name="URI"
Declare a namespace ns:
-
Use a namespace
...
...
mytag is part of the namespace ns
- XML in a database:
(1) Store the XML data in a parse form, use tools to navigate data
- XML Documnet -> XML Parser -> Parsed XML (DOM or SAX)
- SAX: Simple Api for XML
- DOM: Document Object Model (see webpages)
(2) Represent documents and elements as relations
- Give each document, element a unique id
ex: DocRoot(docId, rootelementid)
SubElement(parentId, childId, position)
ElementAttribute(elementId, name, value)
ElementValue(elementId, value)
- Valid XML schema:
- XML Documnet + DTD or XSD -> XML Validator -> Parsed XML (DOM or SAX)
- conforms to schema
- DTD (Document Type Definition): from SGML, no XML syntax, no namespaces
- XML Schema: uses XML syntax for describing scheman
- RelaxNG: - XML syntax, simpler notation
-
- DTDs (Document Type Definitions)
- provides a schema for an xml document using grammar-like rules
- structure of DTD:
... more elements ....
]>
- components are elements that may be nested under the element
- Special cases:
(1) : PCDATA (parsed character data) means text
(2) : no subelements
- *, +, ?, | have their usual meaning for components
character string data
- In XML:
- XML Schema
- allows restrictions on the number of occurences of subelements
- allows declaration of types such as float, int
- allows declaration of keys and foreign keys
- allows for namespaces
- written in XML
Example:
- Elements:
... constraints and structure information...
- Complex types:
... element declarations...
- sequence requires the order given
- minOccurs and maxOccurs attributes of elements control number
- each element must occur, in any order
- exactly one of the elements must occur
- complex types can have attributes:
- Restricted simple types:
- example: integer with a certain range
... upper or lower bounds...
- Keys in XML schema
- associated with elements
- means that for an element, one or more of the fields must be unique
- a field is a sub-element or an attribute
- select defined the class C of elements
- field defines the sub-elements or attributes
- Foreign Keys in XML schema