- Motivation: - So far, we've looked at structured data model: start with schema - Now we look at semi-structured. Often called "self-describing" - This allows for some more felxibility - can add new schema information when we want - not all records have to have the data - Example: movie, in which we add a "want to see it?" field - flexibility comes with overhead in query cost - Data representation: - Collection of nodes (leaf or interior) - leaves have associated data - interior nodes have outgoing edges - edges are labeled - one interior node (the root) has no incoming edges - Form a directed graph, not a tree - Movie example: Root->star, Root->movie - XML - tags: <...> - closing tag: - element: <...>... - text: <...> this is the text - tags may be nested .. - single tag, can't have nested elements: <.../> - attributes, name value pais in an opening tag: ... - Relational model vs. XML - Structure: tables vs. Hierarchical tree, graph - Schema: fixed vs. flexible, "self describing" (maybe) - Queries: SQL (nice), XPath, XQuery (not as nice) - Ordering: None vs Implied - Implementation: Native vs Add-on - "correct" xml - Well-formed XML: obeys nesting rules, no predefined schema. Can invent your own tags - Valid XML: involves a DTD (Document Type Definition) - The DTD specifies allowable tags and how tags may be nested - Well Formed XML: - There must be a root element ... - opening tags must have a matching closing tag - tag names must be unique - Namespaces: - data might come from different source with different meaning - use namespaces to distinguish sets of tags - xmlns:name="URI" Declare a namespace ns: - Use a namespace ... ... mytag is part of the namespace ns - XML in a database: (1) Store the XML data in a parse form, use tools to navigate data - XML Documnet -> XML Parser -> Parsed XML (DOM or SAX) - SAX: Simple Api for XML - DOM: Document Object Model (see webpages) (2) Represent documents and elements as relations - Give each document, element a unique id ex: DocRoot(docId, rootelementid) SubElement(parentId, childId, position) ElementAttribute(elementId, name, value) ElementValue(elementId, value) - Valid XML schema: - XML Documnet + DTD or XSD -> XML Validator -> Parsed XML (DOM or SAX) - conforms to schema - DTD (Document Type Definition): from SGML, no XML syntax, no namespaces - XML Schema: uses XML syntax for describing scheman - RelaxNG: - XML syntax, simpler notation - - DTDs (Document Type Definitions) - provides a schema for an xml document using grammar-like rules - structure of DTD: ... more elements .... ]> - components are elements that may be nested under the element - Special cases: (1) : PCDATA (parsed character data) means text (2) : no subelements - *, +, ?, | have their usual meaning for components character string data - In XML: - XML Schema - allows restrictions on the number of occurences of subelements - allows declaration of types such as float, int - allows declaration of keys and foreign keys - allows for namespaces - written in XML Example: - Elements: ... constraints and structure information... - Complex types: ... element declarations... - sequence requires the order given - minOccurs and maxOccurs attributes of elements control number - each element must occur, in any order - exactly one of the elements must occur - complex types can have attributes: - Restricted simple types: - example: integer with a certain range ... upper or lower bounds... - Keys in XML schema - associated with elements - means that for an element, one or more of the fields must be unique - a field is a sub-element or an attribute - select defined the class C of elements - field defines the sub-elements or attributes - Foreign Keys in XML schema