Laurea Magistrale in Informatica Thecnologies for Innovation

Laurea Magistrale in Informatica Thecnologies for Innovation
XML BASIC Laurea Magistrale in Informatica Chapter 02 Modulo del corso Thecnologies for Innovation

Agenda Syntax : element and attributes XML Prolog Examples
Additional Resource DTD and XML Schema : introduction Well Formed and Valid Documents Validation XML Basic

Sintassi di un documento XML (I)
Un documento XML è un file di testo che contiene una serie di tag, attributi e testo secondo regole sintattiche ben definite Un documento XML è intrinsecamente caratterizzato da una struttura gerarchica Esso è composto da componenti denominati elementi Ciascun elemento rappresenta un componente logico del documento e può contenere altri elementi (sottoelementi) o del testo XML Basic

Sintassi di un documento XML(II)
Gli elementi possono avere associate altre informazioni che ne descrivono le proprietà. Queste informazioni sono chiamate attributi L’organizzazione degli elementi segue un ordine gerarchico ad albero che prevede un elemento principale, chiamato root element o semplicemente root o radice La radice contiene l’insieme degli altri elementi del documento. Possiamo rappresentare graficamente la struttura di un documento XML tramite un albero, generalmente noto come document tree XML Basic

Document Tree Example (I)
<?xml version="1.0" ?> <articolo titolo="Titolo dell’articolo"> <paragrafo titolo="Titolo del primo paragrafo"> <testo> Blocco di testo del primo paragrafo </testo> <immagine file="immagine1.jpg"> </immagine> </paragrafo> <paragrafo titolo="Titolo del secondo paragrafo"> Blocco di testo del secondo paragrafo <codice> Esempio di codice </codice> Altro blocco di testo <paragrafo tipo="bibliografia"> Riferimento ad un articolo </articolo> articolo testo paragrafo immagine codice titolo file XML Basic

Document Tree Example (Newspaper)
<section> <page> <article> <headline>XML 8 Announced</headline> <byline>Jan Doe</byline> <body>The W5C today announced...</body> </article> <ad> <client>Crazy Ed's Cars</client> <size>1/4 page</size> <run>2 weeks</run> </ad> </page> </section> </newspaper> The structure of the document reflects the structure of the newspaper: The newspaper contains sections, which in turn have pages, and on each page are articles and advertisements. XML Basic

Trees and Relationships
As you can see from the preceding example, XML documents are structured as trees, and there are relationships that exist between the elements in an XML document. For example, with these elements: <newspaper> <section> </section> </newspaper> the <newspaper> element is the parent of the <section> element, and the <section> element is the child of the <newspaper> element. These relationships become very important as you move into more advanced areas of XML, as you will use these relationships for navigating and locating information within the XML tree with technologies such as XPath. XML Basic

ELEMENTS The bulk of actual data in your XML documents will be in the form of elements. Elements are tag pairs, which are case sensitive, consisting of both a start tag, and an end tag. The name of the element itself is called the element type, whereas within a document, when the element occurs it is referred to as an instance of the element. <example>An Example Element</example> The element type here is "example"; The element itself is actually the entire string,with the start tag, content, and end tag all together. The text contained between the tags is called the element content. XML Basic

ELEMENTS:different types of content
PCData (text) When elements have PCData or text content, they do not contain any child elements, only text. The "PCData" stands for "Parsed Character Data," which is simply data that is read by the XML parser. Element If an element has only child elements as its content: <example><child>Some text...</child></example> then the element is said to have element content. Mixed If an element has both text and element content: <example>Text and <bold>emphasized</bold> text.</example> then the element is said to have mixed content. XML Basic

Empty Tags There are instances where you might have an element that is empty, or does not contain any text or child elements. If this is the case, you can write the element with both start and end tags: <empty></empty> However, there is also a shorthand that can be used for elements that do not have any content: <empty/> XML Basic

ATTRIBUTES Not all data in XML documents is stored in element content. Some information may be stored in attributes. Attributes are simply a means for associating named values with elements. HTML example: <img src="myimage.gif"> img tag, the src specification is an attribute. Attributes are placed in the start-tag of the element, separated with a space. The content of the attribute is enclosed in quotation marks, either single or double, and an element can have any number of attributes, so long as each attribute name is unique. <shirt size="medium"/>;<pants size="30">Bell Bottoms</pants> As you can see, attributes can be used with empty elements or elements with text or mixed content as well. XML Basic

Structure: XML Declaration
Every XML document should begin with the XML Declaration, which takes the following form: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> The XML declaration always starts with the "<?xml" and always ends with "?>". version The version attribute is required, and it is used to alert the XML processor to the version of XML which was used to author this particular XML document. Currently, the only acceptable version is "1.0." encoding The encoding attribute is used to specify the character set that is used for encoding the document. You can use any Unicode character set here, and the default value is "UTF-8." This attribute is not required. standalone The standalone attribute is used to denote whether or not the document requires a DTD in order to be processed. If the value is "No" then the XML Processor will assume that the document needs a DTD, and if there is not one, it will cause an error. This attribute (or declaration) is not required, and the default value is "Yes." XML Basic

Structure: XML Prolog The XML Prolog consists of least two parts—the XML Declaration which we have just discussed, and a DOCTYPE Declaration. The DOCTYPE Declaration is used to associate an internal set of declarations with the document, or to link the XML document to an external DTD file for validation. The XML Prolog is not required to work with well-formed XML; however, to work with valid XML you will need to use the DOCTYPE declaration. XML Basic

EXAMPLE (I) In this example, we're going to create a simple XML document for a technical journal. This journal XML document will contain some elements that describe the cover, a table of contents, the journal articles themselves, and an index. First, we need to start our XML document with the XML declaration and the root element: <?xml version="1.0"?> <journal> </journal> The XML declaration contains the mandatory version attribute, but because we are not going to do anything with special character sets or with validation, we can leave out the encoding and standalone attributes. By not specifying them, the default values will be used. Next, we will create the element for the cover of our journal, and call it <cover>. <cover art="photo.jpg"> <slug>Learn the Secrets of XML</slug> </cover> The cover element has an art attribute, which is used to specify the cover art. The cover element has element content; that is, it contains another element, which is called slug which contains the text for the slug, or teaser, that will appear on the cover as well.The slug element contains PCData content, which is just text. XML Basic

EXAMPLE (II) <contents>
Next, we want to create the element for the table of contents. We'll call this element contents and like the cover element, it will have element content, in the form of a title element. The title element will contain text that is the title of the article, as it would appear in the table of contents. The other piece of information we need in the table of contents is the page number on which the article appears. <contents> <title page="3">Authoring XML Documents</title> </contents> For the articles, we're going to use a number of elements to describe the article: • article— This element will contain the child elements which contain the data for the article and its author. • headline— This element is the headline of the article. • byline— The byline for the author of the article. • body— The text of the article. The resulting XML code looks like this: <article> <headline>Authoring XML Documents</headline> <byline>Joe Smith</byline> <body>So you want to work with XML...</body> </article> XML Basic

EXAMPLE (III) <index>
Finally, we want to create an index to track references to technologies within the article. The index element will be used to store each reference that will appear in the index, and it will contain child elements for each reference. That reference element will also need to have a page number associated with it, so we can once again make use of a page attribute to track the page number of the reference. The resulting XML code is as follows: <index> <reference page="4">XML Prolog</reference> </index> XML Basic

EXAMPLE complete listing
<?xml version="1.0"?> <journal> <cover art="photo.jpg"> <slug>Learn the Secrets of XML</slug> <slug>XSLT Transforms the Web</slug> <slug>Namespaces and Why You Need Them</slug> </cover> <contents> <title page="3">Authoring XML Documents</title> <title page="6">Transforming the Web with XSLT</title> <title page="9">What's in a Namespace?</title> <title page="12">Graphics and XML with SVG</title> </contents> <article> <headline>Authoring XML Documents</headline> <byline>Joe Smith</byline> <body>So you want to work with XML...</body> </article> <headline>Transforming the Web with XSLT</headline> <byline>Jane Doe</byline> <body>XML can easily be turned into HTML...</body> <article> <headline>What's in a Namespace?</headline> <byline>Jane Jones</byline> <body>When is an name not a name...</body> </article> <headline>Graphics and XML with SVG</headline> <byline>Sally Smith</byline> <body>Drawing on the Web with SVG...</body> <index> <reference page="4">XML Prolog</reference> <reference page="8">apply-templates</reference> <reference page="11">xmlns</reference> <reference page="15">SVG</reference> </index> </journal> XML Basic

Riepilogo sintassi (I)
Prologo XML, necessario per ogni documento XML Ogni documento XML deve contenere un unico elemento di massimo livello (root) che contenga tutti gli altri elementi del documento. Ogni elemento deve avere un tag di chiusura o, se vuoti, possono prevedere la forma abbreviata (/>) Gli elementi devono essere opportunamente nidificati, cioè i tag di chiusura devono seguire l’ordine inverso dei rispettivi tag di apertura XML è case-sensitive I valori degli attributi devono sempre essere racchiusi tra singoli o doppi apici <?xml version="1.0" ?> XML Basic

Riepilogo sintassi (II)
La violazione di una qualsiasi di queste regole fa in modo che il documento risultante non venga considerato ben formato. Anche se queste regole possono sembrare semplici, occorre prestarvi molta attenzione se si usa un semplice editor di testo. Codice del tipo <articolo titolo=test> ... </Articolo> darà qualche problema, e lo stesso dicasi per situazioni analoghe alla seguente: <paragrafo> <testo>abcdefghi... </paragrafo> </testo> XML Basic

Riepilogo sintassi (III)
The text enclosed by the root tags may contain an arbitrary number of XML elements. The basic syntax for one element is: <element_name attribute_name="attribute_value">Element Content</element_name> The two instances of »element_name« are referred to as the start-tag and end-tag, respectively. «Element Content» is some text which may again contain XML elements. So, a generic XML document contains a tree-based data structure. XML Basic

Recipe Data Structure <recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="8" unit="dL">Flour</ingredient> <ingredient amount="10" unit="grams">Yeast</ingredient> <ingredient amount="4" unit="dL" state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient> <instructions> <step>Mix all ingredients together.</step> <step>Knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again.</step> <step>Place in a bread baking tin.</step> <step>Bake in the oven at 180(degrees)C for 30 minutes.</step> </instructions> </recipe> XML Basic

Riepilogo sintassi (IV)
Anche la scelta dei nomi dei tag deve seguire alcune regole. Un tag può iniziare con un lettera o un underscore (_) e può contenere lettere, numeri, il punto, l’underscore (_) o il trattino (-). Non sono ammessi spazi o altri caratteri. Potrebbe essere necessario inserire in un documento XML dei caratteri particolari che potrebbero renderlo non ben formato. Ad esempio, se dobbiamo inserire del testo che contiene il simbolo <, corriamo il rischio che possa venire interpretato come l’inizio di un nuovo tag, come nel seguente esempio: <testo> il simbolo < indica minore di </testo> XML Basic

Entity references In the markup languages a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition (DTD). The replacement text of the entity consists of a single character from the Universal Character Set/Unicode. The purpose of a character entity reference is to provide a way to refer to a character that is not universally encodable. Actually, XML has two relevant concepts: a "predefined entity reference" is a reference to one of the special characters denoted by <, >, &, ", or '; while a "character reference" (or "numeric character reference") is a construct such as or that refers to a character by means of its numeric Unicode codepoint. XML Basic

Entity Reference Examples (I)
<testo> il simbolo < indica minore di </testo> Here is an example using a predeclared XML entity to represent the ampersand in the name "AT&T": <company_name>AT&T</company_name> An example of a numeric character reference is "€", which refers to the Euro symbol by means of its Unicode codepoint in hexadecimal XML Basic

Entity references DTD declaration
Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). Declared entities can describe single characters or pieces of text, and can reference each other. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE example [ <!ENTITY copy "©"> <!ENTITY copyright-notice "Copyright © 2006, XYZ Enterprises"> ]> <example> &copyright-notice; </example> When viewed in a suitable browser, the XML document above appears as: Copyright © 2006, XYZ Enterprises XML Basic

Numeric character references
Numeric character references look like entity references, but instead of a name, they contain the "#" character followed by a number. The number (in decimal or "x"-prefixed hexadecimal) represents a Unicode code point. They have typically been used to represent characters that are not easily encodable, such as an Arabic character in a document produced on a European computer. The ampersand in the "AT&T" example could also be escaped like this (decimal 38 and hexadecimal 26 both represent the Unicode code point for the "&" character): <company_name>AT&T</company_name> <company_name>AT&T</company_name> Similarly, in the previous example, notice that "©" is used to generate the “©” symbol. XML Basic

CDATA SECTION In determinate situazioni gli elementi da sostituire con le entità possono essere molti, il che rischia di rendere illeggibile il testo ad essere umano. Si consideri il caso in cui un blocco di testo illustri proprio del codice XML: <codice> <libro> <capitolo> </capitolo> </libro> </codice> In questo caso, al posto di sostituire tutte le occorrenze dei simboli speciali con le corrispondenti entità è possibile utilizzare una sezione CDATA. XML Basic

Character Data Una sezione CDATA (Character DATA) è un blocco di testo che viene considerato sempre come testo, anche se contiene codice XML o altri caratteri speciali. Per indicare una sezione CDATA è sufficiente racchiuderla tra le sequenze di caratteri <![CDATA[ e ]]>. Il nostro esempio diventerà come segue: <codice> <![CDATA[ <libro> <capitolo> </capitolo> </libro> ]]> </codice> XML Basic

Comments can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA. XML comments start with <!- - and end with - ->. Two consecutive dashes (--) may not appear anywhere in the text of the comment.  XML Basic

Processing Instruction(PI)
XML provides the processing instruction as an alternative means of passing information to particular applications that may read the document. A processing instruction begins with <? and ends with ?>. Immediately following the <? is an XML name called the target , possibly the name of the application for which this processing instruction is intended or possibly just an identifier for this particular processing instruction. The rest of the processing instruction contains text in a format appropriate for the applications for which the instruction is intended. XML Basic

PI EXAMPLE It always appears before the root element
The most common processing instruction, xml-stylesheet, is used to attach stylesheets to documents. It always appears before the root element In this example, the xml-stylesheet processing instruction tells browsers to apply the CSS stylesheet person.css to this document before showing it to the reader. <?xml-stylesheet href="person.css" type="text/css"?> <person> Alan Turing </person> XML Basic

SCHEMI Schema del documento Non si scrivono documenti in XML
Si usa XML per creare specifici linguaggi di marcatura personalizzati (applicazioni XML) Si scrivono i documenti in quei linguaggi Lo specifico linguaggio si definisce specificando quali elementi ed attributi sono ammessi o necessari in un documento conforme Schema del documento Insieme di regole Sistema per la catalogazione delle specie a rischio di estinzione EndML Elementi Animale Sottospecie Popolazione minacce XML Basic

DTD & XML Schema Definiscono regole per la produzione di documenti strutturati Una DTD: Document Type Definition contiene le definizioni dei tipi di elementi, degli attributi, delle entità, delle notazioni. Un DTD dichiara quali elementi, tipi, entità notazioni sono legali …. ed in quale parte del documento lo sono XML Schema: Successore delle DTD Basato su XML, fornisce un’alternativa alle DTD, più potente XML Basic

Document Type Definition DTD
The oldest schema format for XML inherited from SGML. It has no support for newer features of XML, most importantly namespaces. It lacks expressiveness. Certain formal aspects of an XML document cannot be captured in a DTD. It uses a custom non-XML syntax, to describe the schema. DTD is still used in many applications because it is considered the easiest to read and write. XML Basic

XML Schema XML schema language, described by the W3C as the successor of DTDs……. Initialism : XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing XML languages. They use a rich datatyping system, allow for more detailed constraints on an XML document's logical structure, and must be processed in a more robust validation framework. XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them, although XSD implementations require much more than just the ability to read XML. XML Basic

Well-formed and valid XML documents two levels of correctness
Well-formed. A well-formed document conforms to all of XML's syntax rules. For example, if a start-tag appears without a corresponding end-tag, it is not well-formed. A document that is not well-formed is not considered to be XML; a conforming parser is not allowed to process it. Valid. A valid document additionally conforms to some semantic rules. These rules are either user-defined, or included as an XML schema, especially DTD. For example, if a document contains an undefined element, then it is not valid; a validating parser is not allowed to process it. XML Basic

Parser validanti e non validanti
Il cuore di un applicazione XML è il parser, ovvero quel modulo che legge il documento XML e ne crea una rappresentazione interna utile per successive elaborazioni (come la visualizzazione) Un parser validante, in presenza di un DTD, è in grado di verificare la validità del documento, o di segnalare gli errori di markup presenti Un parser non validante invece, anche in presenza di un DTD è solo in grado di verificare la buona forma sintattica del documento Un parser non validante è molto più semplice e veloce da scrivere, ma è in grado di fare meno controlli. In alcune applicazioni, però, non è necessario validare i documenti, solo verificare la loro buona forma XML Basic

Validazione del Linguaggio di MARKUP
Documento XML Parser validatore SCHEMA XML Collegato al documento Documento valido se conforme a tutte le regole Documento XML Parser non validatore SCHEMA XML Collegato al documento Documento well formed se sintatticamente corretto XML Basic

Well formed verification ; book markup
<?xml version = "1.0"?>  <book isbn = " X"> <title> XML Primer</title> <author> <firstName>Paul</firstName> <lastName>Deitel</lastName> </author> <chapters> <preface num = "1" pages = "2">Welcome</preface> <chapter num = "1" pages = "4">Easy XML</chapter> <chapter num = "2" pages = "2">XML Elements</chapter> <appendix num = "1" pages = "9">Entities</appendix> </chapters> <media type = "CD"/> </book> XML Basic

Well formed verification ; book markup (II)
<?xml version = "1.0"?>  <book isbn = " X"> <title> XML Primer</title> <author> <firstName>Paul</firstName> <lastName>Deitel</lastName> </author> <chapters> <preface num = "1" pages = "2">Welcome</preface> <chapter num = "1" pages = "4">Easy XML</chapter> <chapter num = "2" pages = "2">XML Elements</chapter> <appendix num = "1" pages = "9">Entities</appendix>  <media type = "CD"/> </book> XML Basic

Markup del libro con output ottenuto con un foglio di stile
Usage.xml Foglio di stile Usage.xsl applico ottengo Istruzione di elaborazione (PI o Processing Instruction : <?xml:stylesheet type ="text/xsl" href ="usage.xsl"?> <? E ?> delimitano le PI Target o riferimento  (xml:stylesheet) Valore  type ="text/xsl" href ="usage.xsl” XML Basic

Analisi della validazione
Documento XML  intro.xml <?xml version = "1.0"?> <!DOCTYPE myMessage SYSTEM "intro.dtd"> Prologo del documento  Dichiarazione di tipo !DOCTYPE myMessage  nome del tipo (nome dell’elemento root SYSTEM  la dichiarazione è esterna al documento e si trova alla URL: intro.dtd <myMessage> <message>Welcome to XML!</message> </myMessage> Documento DTD  intro.dtd <!ELEMENT myMessage ( message )> Dichiara l’elemento myMessage come root con un unico child di nome message <!ELEMENT message ( #PCDATA )> Dichiara che l’elemento message deve contenere dati di caratteri riconosciuti dal parser XML XML Basic


<?xml version = "1.0"?>   <!DOCTYPE myMessage SYSTEM "intro.dtd">  <myMessage> </myMessage> XML Basic

Additional Resources (I)
XML 1.0 Recommendation ( ) The XML 1.0 Recommendation (Second Edition) from the W3C is the final word on XML. If you have a question about a technical aspect of XML, this should be the first source you consult. Annotated XML Recommendation ( The Annotated XML Recommendation is an excellent resource for making sense of the sometimes difficult-to-read XML Recommendation. Written by Tim Bray (one of the XML 1.0 Editors), the Annotated XML Recommendation provides some clarification on confusing areas of the Rec, and offers some historical tidbits as well. XML Basic

Additional Resources (II)
XML-DEV The XML-DEV mailing list is a good resource for developers actively working with XML. Discussion ranges from Recommendation debates to practical tips. To subscribe, send an to the address with "subscribe“ in the body of the message. comp.text.xml The comp.text.xml USENET Newsgroup can also be a great resource for interacting with other XML developers. The XML FAQ ( The XML Frequently Asked Questions can address some issues such as why XML is structured the way it is, and when it might be appropriate to use XML as a solution.offers some historical tidbits as well. XML Basic

Additional Resources (III)
XML.com ( XML.com is a commercial Web site dedicated to tracking and reporting on XML and XML-related issues. The site covers not only XML 1.0, but also any and all related activities and can be a great source of tutorials, articles,and general XML information. XML.org ( XML.org is another commercial site, billing itself as the industry portal for XML. The site features the XML Cover Pages, which is Robin Cover's news column tracking developments in SGML/XML. XML Basic

PARSER AltovaXML free parser from Altova, also included in XMLSpy, MapForce, and StyleVision RomXML Embedded XML commercial toolkit written in ANSI-C. XDOM open-source XML parser (and DOM and XPath implementation) in Delphi/Kylix. XML resources at the Open Directory Project TinyXml Simple and small C++ XML parser. FoX fully validating XML parser library, written in Fortran. Intel_XSS XML parsing, validation, XPath, XSLT. sw8t.xml Lightweight, high-performance, intuitive JavaScript XML Parser. Includes API docs and developer's guide XML Basic

Laurea Magistrale in Informatica Thecnologies for Innovation

Presentazioni simili

Presentazione sul tema: "Laurea Magistrale in Informatica Thecnologies for Innovation"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back

Entrare

Autorizzarsi attraverso i social network:

Laurea Magistrale in Informatica Thecnologies for Innovation

Presentazioni simili

Presentazione sul tema: "Laurea Magistrale in Informatica Thecnologies for Innovation"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back