NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Friday, 15-Nov-2024 10:55:43 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

Spiderman comic book cover
Behrend Travel Letters manuscript page with drawings

Options for this assignment

For this XSLT homework assignment, please choose one of the following options:

  1. Spiderman CBML option: Work on this if your team project involves namespaced TEI/CBML code.
  2. Behrend Travel Letters option: Work on this if your project involves your own (non-namespaced) XML code.

Processing a collection of Spiderman CBML Files

Accessing the Input collection

Download the zipped directory for the Spiderman CBML collection, prepared by students in the Fall 2024 Text Encoding class.

Find our class updates on these files in textEncoding-Hub

To view these files with edits / updates from class, find them on the textEncoding-Hub:

Document analysis: Plotting what we're processing

Take some time to study the CBML code for this project. For this assignment, you will be preparing a reading view of the CBML, as well as table of contents listing each page, and the characters to be found on it. Familiarize yourself with how these are encoded in the document.

For the XSLT: notice that we have provided you with two different starter files. You should probably start with the simpler of the two, and then come back later to try out the other one. Just keep these files in the same location with respect to the input XML and web-output folders:

Now, go on to read the Working with a Collection of Files in XSLT.

Processing a collection of XML letters

Accessing the Input collection

Download the zipped directory for the Behrend Travel Letters collection, prepared by students in the Fall 2024 Text Encoding class.

Find our class updates on these files in textEncoding-Hub

To view these files with edits / updates from class, find them on the textEncoding-Hub:

Document analysis: Plotting what we're processing

First take some time to study the XML code for this project. For this assignment, you will be preparing a reading view of the letters, as well as table of contents listing each letter, and the cpersons and locations to be found in it. Familiarize yourself with how these are encoded in the document.

For the XSLT: notice that we have provided you with a starter file that we encourage you to use and develop. Just keep it in the same location with respect to the input XML and web-output folders: collectionToHTML-Starter.xsl.

Now, go on to read the Working with a Collection of Files in XSLT.

Working with a Collection of Files in XSLT

We can process a whole directory of files using the collection() function in XSLT, so we can represent content from a whole collection of XML files in one or more output HTML files. One useful application for working with a collection is to process several short XML files and unify them on a single HTML page designed to merge their content. For this assignment, we will transform the small collection of XML files so that they output on one HTML page, which we will produce with a table of contents, followed by the full representations of each letter in the collection.

Since these documents are all encoded with the same structural elements, we use the collection() function to reach into them as a group, and output their content one by one based on their XML hierarchy. We are actually treating the collection itself as part of the hierarchy as we write our XSLT, so we move from the directory down into the document node of each file to do our XSLT processing.

We access the collection in the XSLT file using an xsl:variable, which looks like this (for the Spiderman collection):

<xsl:variable name="comicColl" as="document-node()+"
                   select="collection('cbml-spidey/?select=*.xml')"/>

The variable has a name (stored in @select attribute, and the collection() value is stored in the @select attribute.

Using Modal XSLT

Besides working with a collection of files and an xsl:variable, the other interesting new application in this assignment is modal XSLT, which lets you process the same nodes in your document in two different ways. How can you output the same element contents to sit in a table of contents at the top of an HTML page, and also in another section of your document, below the table of contents? Wouldn’t it be handy to be able to have two completely different template rules that match exactly the same elements: one rule to output an element node selectively to preview in a table of contents, and the other to output the same node more fully in section or div elements? You can write two template rules that will match the same nodes (have the same value for their @match attribute), but how do you make sure that the correct template rule is handling the data in the correct place?

To permit us to write multiple template rules that process the same input nodes in different ways for different purposes, we write modal XSLT, and that is what you will be learning to write with this assignment. Modal XSLT allows you to output the same parts of the input XML document in multiple locations and treat them differently each time. That is, it lets you have two different template rules for processing the same elements or other nodes in different ways, and you use the @mode attribute to control how the elements are processed at a particular place in the transformation. Take a look at the overview and examples on Obdurodon’s tutorial on Modal XSLT before proceeding with the assignment, so you can see where and how to set the @mode attribute and how it works to control processing.

Overview of the assignment

For this assignment you want to produce in one HTML page your input collection of XML documents. That page needs to have a table of contents at the top. The Table of Contents will process an element to retrieve its heading (for the letters) or an attribute value providing a page number (for the comic book files).

For the reading view, you will be able to output the same nodes more fully, and you should be attentive to the document structure. CBML panels have balloons inside and you will need to make styling decisions to block and format the various kinds of markup you find to make your reading view.

Sample starter outputs

Here are starter views of some output HTML to give you an idea of the structure we're creating. But your files do not need to look like ours. Just concentrate on outputting the full text and the table of contents at the top.

Housekeeping with the stylesheet template and output line: From XML to XHTML

To ensure that the output would be in the XHTML namespace, we add a default namespace declaration (in purple below). To output the required DOCTYPE declaration, we also created <xsl:output> element as the first child of our root <xsl:stylesheet> element (in green below), and we needed to include an attribute there to omit the default XML declaration because if we output it that XML line in our XHTML output, it will not produce valid HTML with the w3C and might produce quirky problems with rendering in various web browsers. So, our modified stylesheet template and xsl:output line is this, and you should locate this in the starter XSLT we provided (or otherwise copy it into a new XSLT file for working with a collection):

<?xml version="1.0" encoding="UTF-8"?>
         <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
    xmlns="http://www.w3.org/1999/xhtml">
    
   <xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" 
              include-content-type="no" indent="yes"/>
    
    </xsl:stylesheet>

How to begin

First of all, file locations will be important in this assignment. You may work with our starter XSLT file, or create a fresh copy, but be sure you read and follow all the comments in my starter file! The XSLT you're running hould be sitting outside the xml input directory and the web-out directory. NOTE: If this were for a real project, you would save this output in the docs/ directory of your GitHub repo to publish on GitHub Pages.

Forget about the table of contents for the moment and concentrate now on just outputting the full text of the documents. Except for having to pull the contents from a collection of files, this is just like the XML-to-HTML transformations you have already written, and you’ll use regular template rules (without a @mode attribute) to perform the transformation.

The collection() function: Here is how we write and run XSLT to process a collection of files. Just ahead of the first template match, after the <xsl:output method> statement, we define a variable in XSLT, which simply sets up a convenient shorthand for something complicated that we need to use more than once, so we don’t have to keep retyping it.

Here is what the xsl:output line and the global xsl:variable right after it (for the letters processing). Setting it there allows us to access the variable in any template rule where we need it: we don't need to reference that variable in many places, but definitely in the template match on the document node that controls what is being processed from the input!

<xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" include-content-type="no" indent="yes"/>
<xsl:variable name="travelColl" as="document-node()+" select="collection('letters-xml/?select=*.xml')"/>

An xsl:variable works by designating an @name which holds any name you like to refer to it later (we have used "travelColl" here to refer to the 2021 Behrend Travel Letters' collection of files), and with @select it holds anything you wish: a complicated XPath expression or a function, or whatever it is that is easier to store or process in a variable rather than typing it out multiple times. We use variables to help keep our code easy to read! In this case, we are using a variable to define our collection, using the collection() function in the @select attribute. The collection() function is set to designate the directory location of the collection of letters in relation to the stylesheet I am currently writing. My XSLT is saved in the directory immediately above the xml-letters/ directory, so I am simply instructing the XSLT parser to take a directory-path step down to that directory. Definitely keep the the ?select=*.xml because it helps make sure that only XML files are included in the collection, screening out the Relax NG file and hidden files that Mac or Windows operating systems sometimes add to file directories.

We will call this variable later in the XSLT file whenever we need it, to show how we are stepping into our collection of documents. That will happen in the first template rule that matches on the root element. Open any one of the input XML files in the XML collection in <oXygen/> and you will see that the transcription contents are all coded within the <letter> element, so we can write this stylesheet to look through the whole collection of files and process only the elements below <letter>. You call or invoke the variable name for the collection by signalling it first with a dollar sign $, giving the variable name, and then simply step down the descendant axis straight to the <letter> element in each file. Here is how the code looks to call or invoke the variable in our first template match:

<xsl:apply-templates select="$travelColl//letter"/>

Note on running the transformation: Unlike other transformations we do on single XML files, when we run this XSLT in <oXygen/> it actually doesn’t matter what file we have selected in the XML input, because we have indicated in the stylesheet itself what we are processing, with the collection() function. We can even set a file that is outside of our collection as the input XML file (and we actually ran it successfully with the HTML file of the previous exercise selected). You do need to enter something in the input window, but when you work with the collection() function, your input file is just a dummy or placeholder that <oXygen/> needs to have entered so it can run your XSLT transformation.

In our HTML output (scroll down past the table of contents, to where the full text of the letters is rendered), the parts of each letter are held and spaced apart using HTML <p> elements. Here’s a sample of HTML output for one of our letter documents:

        
          <!-- I output the xml:id from the letter element as an id on an HTML div element to organize my letters on the page and prepare them for linking. -->
           <div class="letter" id="Greenwich-1955-07-18">
            
            
            <p><span class="placeName">Departure. On Queen Elizabeth.</span> At Noon, <span class="date">July 18, 1955.</span></p>
            
            <p>   We got on board about 10
               <span class="persName">Harriet</span> had stayed in <span class="placeName">Greenwich</span> over night so went in town with us. . . 
            </p>
           </div>
      
        

The fine print: Don’t worry if your HTML output isn’t structured exactly the same way ours is. But you should open your HTML output in <oXygen/> and simply check to make sure that what you’re producing is valid HTML and renders the text appropriately.

Remember to output span elements for interesting markup in the texts that you can style (later) with CSS.

Once your documents are all being formatted correctly in HTML, you can add the functionality to create the table of contents at the top, using modal XSLT.

Adding the table of contents

For this portion we are outputting an HTML table to show a little preview of information from each file in the collection. It may help to orient yourself to HTML table coding. HTML tables are organized in rows, using <tr> elements, which contain <td> elements (which means table data). You control the columns in an HTML table by the setting the <td> cells in an ordered sequence. Inside a <tr>, the first <td> is set in column 1, the second <td> in column 2, the third in column 3, and so on. The top row conventionally contains headings in <th> cells, which HTML will emphasize by default. Here is a simple HTML table (styled following our newtfire CSS, in which I’ve outlined the borders and given a background color to the table heading cells). Next to it is a view of the HTML code that creates the table structure:

Heading 1 Heading 2 Heading 3
Row 1, cell 1 Row 1, cell 2 Row 1, cell 3
Row 2, cell 1 Row 2, cell 2 Row 2, cell 3
         <table>
          <tr>
             <th>Heading 1</th>
             <th>Heading 2</th>
             <th>Heading 3</th>
          </tr>
          <tr>
             <td>Row 1, cell 1</td>
             <td>Row 1, cell 2</td>
             <td>Row 1, cell 3</td>
          </tr>
          <tr>
             <td>Row 2, cell 1</td>
             <td>Row 2, cell 2</td>
             <td>Row 2, cell 3</td>
          </tr>
       </table>

The template rule for the document node in our solution, revised to output a table of contents with all the information we wish to show before the text of the letters, looks like this:

              <xsl:variable name="travelColl" as="document-node()+"
        select="collection('xml-letters/?select=*.xml')"/>

    <xsl:template match="/">
        <html>
            <head>
                <title>Behrend Travel Letters</title>
                <meta name="viewport" content="width=device-width, initial-scale=1.0" />
                <!--ebb: The line above helps your HTML scale to fit lots of different devices. -->
                <link rel="stylesheet" type="text/css" href="webstyle.css"/>
                
            </head>
            <body>
                <h1>The Behrends' Travel Adventures in Europe</h1>
                <section id="toc">
                    <h2>Contents</h2>
      
                <!-- ebb: Let's set up the HTML table here. -->
                    <table> 
                        <tr>
                            <th>Letter Date and opening</th><!--first column table heading-->
                            <th>Places Mentioned</th><!--third column table heading-->
                            <th>People Mentioned</th><!--second column table heading-->
                            
                        </tr>
                        
                        <!--ebb: Here we use our $travelColl variable pointing into the collection AND use modal XSLT set the toc mode for Table of Contents: -->
                      <xsl:apply-templates select="$travelColl//letter" mode="toc"/>
                       
                        
                    </table>
                </section>

                <section id="fulltext">
                    <xsl:apply-templates select="$travelColl//letter"/>
      
                </section>
            </body>
        </html>
    </xsl:template>  
        

The highlighted code is what we added to include a table of contents, and the important line is <xsl:apply-templates select="$travelColl//letter" mode="toc"/>. This is going to apply templates to each document with the @mode attribute value set to toc. The value of the @mode attribute is up to you (we used toc for table of contents), but whatever you call it, setting the @mode to any value means that only template rules that also specify a @mode with that same value will fire in response to this <xsl:apply-templates> element. Now we have to go write those template rules!

What this means is that if you write new template rules to process the <letter> elements to output the full text of the documents, you use <xsl:apply-templates> and <xsl:template> elements without any @mode attribute. To create the table of contents, though, you can have <xsl:apply-templates> and <xsl:template> elements that select or match the same elements, but that specify a mode and apply completely different rules. A template rule for <letter> elements in table-of-contents mode will start with <xsl:template match="letter" mode="toc">, and you need to tell it to create an <td> element that contains the text of the <date> element. You can then apply-templates with mode="toc". The rule for those same elements not in any mode will start with <xsl:template match="letter"> (without the @mode) attribute). That rule can create a <div> element for each letter, and then output the full text of the document using <p> elements. In this way, you can have two sets of rules for the letters, one for the table of contents and one to output the full text, and we use modes to ensure that each is used only in the correct place.

Remember: both the <xsl:apply-templates>, which tells the system to process certain nodes, and the <xsl:template> that responds to that call and does the processing must agree on their mode values. For the main output of the full text of every letter, neither the <xsl:apply-templates> nor the <xsl:template> elements specifies a mode. To output the table of contents, both specify the same mode.

Guidance / Suggestions for filling out the table of contents

In this assignment, we are inviting you to pull some data from the source files that provide a preview of what people will read about in the letters. Our starter XSLT files guide you to output lists of characters/persons and locations. But we also encourage you to reach in and feature different things: maybe some count() information, where you give a count of something interesting? Or how about adding the first 80 characters of the text of the letter to provide people a quick view o how the letter begins! To do this, review our XPath Exercise on string functions and specifically try reaching into the first paragraph of the letter and try the XPath substring() function to take the string value, only up to a certain length that you specify.

For the other table cell(s) showing string-joined lists of persons/characters, places, the XPath you want to try is a little pipeline. Try applying normalize-space() to each node (use the simple map ! operator for this) so you remove extra spaces from individual nodes first. Then use the arrow operator => to operate on the whole sequence of nodes and distinct-values() first, then sort(). Finally bundle these all together tidily with a comma separator by sending it to string-join() like this: string-join(',').

Completing and checking your work