Publishing Requirements for Industry Standard Metadata
Guide to the
PRISM Aggregator Message
for Web Content
January 7, 2015
Copyright and Legal Notices
Copyright (c) International Digital Enterprise Alliance, Inc. [IDEAlliance] (2001─ 2015).
All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to IDEAlliance, except as needed for the purpose of developing IDEAlliance specifications, in which case the procedures for copyrights defined in the IDEAlliance Intellectual Property Policy document must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by IDEAlliance or its successors or assigns.
IDEAlliance takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available. IDEAlliance does not represent that it has made any effort to identify any such rights. Information on IDEAlliance's procedures with respect to rights in IDEAlliance specifications can be found at the IDEAlliance website. Copies of claims of rights made available for publication, assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification, can be obtained from the President of IDEAlliance.
IDEAlliance requests interested parties to disclose any copyrights, trademarks, service marks, patents, patent applications, or other proprietary or intellectual property rights which may cover technology that may be required to implement this specification. Please address the information to the President of IDEAlliance.
Table of Contents
The PRISM Aggregator Message for Web Content (PAMW) is a standard format for publishers to use for capturing Web and mobile content and to transmit XM- encoded content and associated metadata to aggregators and syndicators. This document describes PAMW in detail and provides some examples of how it is used.
Using the PRISM Aggregator Message for Web and mobile ontent is consistent with the existing flow of content between suppliers and aggregators. PAMW simply provides an alternative format. However, adapting your processes to conform to PAMW will provide many advantages and financial benefits to you and your business partners.
· The use of a single, industry-standard format for extraction and acquisition reduces the errors and costs of tracking and deploying multiple formats to communicate with multiple business partners.
· The use of a single format for all organizations speeds the processing of content and speeds the integration of new business partners into your workflow. If a new partner is using a format that you can already handle, little if any process change is necessary to transmit content between you. The value and accessibility of the content will be increased because time to market is reduced.
· The use of a common industry format reduces the barrier to entry for all publishers and content aggregators. This is especially valuable for smaller organizations.
· Aggregators manage content from a large numbers of sources. Today, they receive metadata in many different formats. By providing a common metadata standard, PAMW helps everyone in the electronic content business track, use and re-use their content.
· Providing content encoded in XML adds to the content’s value because it makes it possible to repurpose it for multiple opportunities:
· Tables of information marked up as tables can take advantage of more formatting capabilities, making them look more professional on output than the fixed-width font style that many are forced to use. Furthermore, the information within them now becomes accessible as data.
· The inline XML markup that lets you identify names, key phrases and other important data elements within an article or paragraph, makes it easier to format them, search for them and turn them into links. This ability will also greatly contribute to search and display flexibility.
· Standardization of the use of special characters gives you wider access to more scientific symbols and foreign characters. Furthermore, they can be handled automatically.
All of these capabilities combine to enable publishers use their content on a wider variety of output media and products, getting more value from your information assets.
By enabling the delivery of detailed information in a consistent format, the PAMW XSD allows publishers and other content-related companies to better communicate with a broader range of partners who are just now standardizing on XML.
The PAMW XSD supports two instance types.
Fully qualified PAMW XML: You can used the pamw.xsd to validate content without defaulting the xhtml namespaces. This is done with the following statement that calls to the pam.xsd with “xmlns:xsi” and declares all namespaces:
PAMW XMLwith XHTML set as the tag default: If you wish to tag content using default XHTML tagging, you will need to call to the pamw.xsd and default the xhtml: namespace. This is done with the following statement that calls to the pamw.xsd with “xmlns:xsi”, declares all namespaces and defaults to the xhtml: namespace
PAMW uses the same namespaces as PAM for convienence. To call in specialized PAMW xsd modules, the <import statements need to use pointers to the following PAMW schema modules:
· For xmlns:pam=http://prismstandard.org/namespaces/pam/2.2/ the schema is pamw.xsd
· For xmlns:prism=http://prismstandard.org/namespaces/basic/2.2/ the schema is pamw-prism.xsd
· For xmlns:xhtml=http://www.w3.org/1999/xhtml the schema is pamw-xhtml.xsd
The elements that are included in PAMW Guide represent existing PAM elements combined with new and updated elements that represent multi-platform article content including web based content and mobile content.
The location of the PAMW Guide is http://www.prismstandard.org/guides/PAMWGuide_v1.0.pdf or http://www.prismstandard.org/guides/PAMWGuide_v1.0.htm.
The PRISM 3.0 Specification Documentation Package is referred to in this Guide because PAMW uses some of the new elements contained within the PRISM 3.0 Specifications Likewise PAMW also refers to documentation for PRISM 2.1 in order to maintain its backward compatibility. PAMW truly is made up of a mix of elements crossing both of these specifications.
In this guide, the XML model is often illustrated by a model diagram. Each diagram was produced with the XML Spy product. These diagrams show the elements and attributes that make up a model and their order and frequency.
The legend for reading XML model diagrams is shown in Figure 1.1. Elements that are required by the model are shown in a solid box. Elements that are optional are shown in a dotted box. Likewise attributes may be required (solid box) or optional (dotted box). A repeatable occurrence of elements is indicated by numbers below each element box to the right.
The diagrams also indicate how elements are assembled. When building some models, elements may occur in a sequence with a specified order. Other models provide a choice from among a number of elements. The legend in Figure 1.1 shows the connectors for sequence and choice.
PAMW is the PRISM Aggregator Message for Web Content. The use cases for PAMW include:
· Automated harvest of Web and Mobile content in an XHTML and PRISM Metadata format
· Distribute content to aggregators, syndicators and other online and mobile revenue stream
· Establish publisher DAM Systems to use for editorial research
· Establish publisher DAM Systems to create new content by reusing content originally published on websites
· Establish publisher systems to manage usage/reuseage rights for online and mobile content
· Provide a richly encoded alternative to static PDF replicas for EPUB2-based magazine newsstands
PAMW is the PRISM Aggregator Message for Web Content. PAMW has special use cases yet it is related to PRISM, PAM and PSV. See Figure 1.2.
PAMW is the PRISM Aggregator Message for Web Content. PAMW is an XML tag set built on the foundation of PRISM metadata and controlled vocabularies. PAMW is an application of PRISM, but PAMW and PRISM are not synonymous. PAM is an XML tag set that uses PRISM metadata for a very specific purpose while PRISM remains the core specification for metadata and controlled vocabularies. See Figure 1.2.
PAMW is the PRISM Aggregator Web Message. The use case for PAM is to encode online magazine content in XML to deliver content to aggregators. PAMW is the version of PAM that has been optimized for the capture of online magazine content and the interchange of that content with aggregators. PAMW is very similar to PAM, but it has the print publication elements removed and processing/formatting tagging included. The PAMW schema is located at: http://www.prismstandard.org/schemas/pamw/1.0/pamw.xsd.
PAMW is the PRISM Aggregator Message for Web Content. PSV, like PAMW is also built on the foundation of PRISM metadata and controlled vocabularies. But PSV and PAMW are not the same. Each has a very specific use case and each has a different XML tag set. PSV defines an architecture for content sources while PAMW is specific to Web and mobile content that has already been published and is being captured for archive and re-distribution.
The following is an alphabetical list of the metadata elements that are included in the PAM message. Following the element name is the namespace pointing to the document in the PRISM documentation package where that element appears. (XHTML elements are not listed here.)
These elements form the containers for PAMW metadata and text encoding elements. They, themselves do not encode specific metadata fields. Figure 2.1 shows the message framework structure.
The PAM message begins with the pam:message tag. There is an optional attribute to specify the version of the schema used for this message. Each article is made up of an XHTML head element that carries numerous descriptive metadata fields followed by an XHTML body element that carries the text of the article coded in XML.
Example: The following example shows how to code the PAMW message framework. Note that this example uses the pam.xsd as its schema and does not default to the xhtml: namespace.
There are many metadata elements in the article head. These can be grouped by function. The grouping of elements by function can be seen in the final order of elements prescribed by the XML schema.
Key Elements for Aggregators: These elements provide key identification and signal those receiving the content about the most important features of the article that follows. The dc:identifier is required for each article. The publication name is also required. Either the cover date or the publication date is required.
The status of the article can be indicated as A (Add, this article is new), C (correction, the original article is being resent with a published correction appended in prism:hasCorrection), U (update, replace the entire article previously sent), and D (delete article previously sent). The default is to add the article.
Elements Providing a Title: These element provide a number of titles, some of which may vary by delivery platform.
Elements indicating Creative Origin:Elements identifying creators and contributors
Elements providing Publication Information: Elements that help identify the publication to which this article belongs
Elements Identifying Position on a Website:
Element Identifying the Subject of an Article:
Rights and Usage
Best Practice is to employ the new rights description elements from the pur: namespace instead of the older, now deprecated elements from the original prism: namespace. The PRISM Usage Rights elements include:
The following is a list of the PAMW metadata elements that occur in the head element according to the specified PAMW structure. The elements in the message head are specifically ordered where deemed appropriate by publishers preparing content to deliver and aggregators who are receiving content and most importantly for automated capture of online content where formatting elements and style attributes are common. See Figure 2.2 for the head structural model.
Remember that you will not use all the elements in the article head. The example below shows how the article head might be coded with PAMW:
Example: This example shows a typical message head for an online article with the xhtml: namespace as the default.
The content of the article is coded within the XHTML body element. The body has been enhanced in several ways. A PRISM “class” attribute has been added to many elements so that we can specify what type of paragraph or heading they are. In addition a media element has been added to provide for special encoding of related media objects. PAMW differs from PAM because it contains many of the formatting elements that are used online but not transmitted from print editions to aggregators. See Figure 2.3.
Many body elements are what is called “block presentation elements.” These structures include parabraphs, block quotes and headings. These elements carry the prism:class= attribute and can be used to code special structures such as pull quotes and side bars. If you need to specify a class for other xhtml elements you can use the xhtml class= attritute as an alternative.
Figure 2.4 shows the paragraph structure. It is made up of text and allows numerous other elements within the text and is an example of a HTML block presentation element.
Example: This example shows typical body markup with the xhtml: namespace as the default.
The standard XHTML body elements have been modified to allow for the inclusion of additional PRISM inline markup elements. These elements allow for coding the subjects of an article right inline with the content. This type of encoding facilitates more exact search capabilities. Not only can one locate an article with a particular subject, but the exact area of the text can be targeted as well. The PAM 2.2 inline markeup elements include:
A special pam:media element has been added to encode media related to the article content. Note that for PAMW, just like for PAM, the media is not included. Only the reference to the media element is captured. See Figure 2.5.
Example: this example shows a sample pam:media element
Those who implement PAMW have a choice about how to encode images. Images can either be encoded with the <img tag or as a pam:media element. Best Practice is to choose one or the other encoding based on the use case. If, for example Web content is to be stored with PAM-encoded print content in a DAM for editorial research or reuse, then the <pam:media encoding would be best. However if the use case is to capture Web content and to deliver it as rich EPUB2 content for digital newsstands, then retaining the <img tag would be preferable. Note that one encoding format can easily be transformed into the other at any time in the workflow.
This appendix contains a glossary for the metadata elements within the PRISM Aggregator message. The elements are listed alphabetically. Following the element name is the namespace pointing to the document in the PRISM documentation package where that element appears.
academicField (prism:, pim:) Refines dc:subject by specifying an academic speciality.
adultContentWarning (pur:) Specifies an adult content warning for an article or media object..
aggregationType (prism:) The unit of aggregation such as a magazine or journal.
agreement (pur:) Specifies the contract, license or release for a media object.
alternateTitle (prism:) An alternate title or alternate headline for a resource that may be used in a table of contents, a popup etc. and can vary with platform.
article (pam:) Contains the metadata and markup for one article.
byteCount (prism:) The size of the article in bytes.
channel (prism:) Web channel assigned to the resource. A navigational aid. Has attributes for indication of subchannel1 -4 to indicate finer nagivation.
caption (pam:) Caption for a media object in PAM.
captureDate (prism:) Date this content was harvested or captured from the Web.
contributor (dc:) An entity responsible for making contributions to the content of a media resource.
copyright (pur:) Copyright statement for the resource.
corporateEntity (prism:) The name(s) of publisher’s organizational units related to the resource, either as the financial owner or group responsible for the resource, and at a lower hierarchical level than the corporate entity named in dc:publisher.
creator (dc:) An entity primarily responsible for creating the content of a media resource.
credit (pam:) A caption-style attribution for a media object as published.
creditLine(pur:) Specifies the credit line for a media asset required by an agreement. May be tied directly to an agreement by the agreementID attribute.
description (dc:) An account of the content of the resource.
doi (prism:) The Digital Object Identifier, DOI, for the article.
eIssn (prism:) The electronic ISSN for the publication in which the resource was published.
embargoDate (pur:) Earliest date (potentially including time) the resource may be made available to users or customers according to the rights agreement or to a clause in the rights agreement. May be specified by distribution platform.
event (prism:, pim:) An event (social gathering, phenomenon, or more generally something that happened at a specifiable place and time) referred to in order to indicate a subject of the resource.
excusivityEndDate (pur:) The date (potentially including time) when exclusive rights to a resource ends. May be specified by distribution platform.
expirationDate (prism:, pur:) The date (potentially including time) by which the resource must be removed from availablty to users or customers used according to a rights agreement. May be specified by distribution platform.
format (dc:) The physical or digital manifestation of the resource. Expressed as a MIME type.
genre (prism:) Describes the genre, or the intellectual content of the resource.
hasCorrection (prism:) Identifies any known corrections to the current resource.
hasPart (dcterms:) The described resource includes the referenced resource either physically or logically.
identifier (dc:) An unambiguous reference to the resource, within a given context. Required for each article sent within a PAM message.
imageSizeRestriction (pur:) Specifies restrictions on the usage size for an image. May be tied to agreement.
industry (prism:, pim:) An industry or industry sector, referred to in order to indicate a subject of the resource.
IssueType (prism:) Defines the type of serial publication issue. Serial publications often have two different types of issues. Regular issues are part of the subscription while Special Issues have a unique focus and content. Special Issues are typically not included with the magazine subscription.
keyword (pim:, prism:) An element used to tag keywords that are likely to be used in search queries. Note that this differs from a subject or elements such as prism:person, prism:event, or prism:organization that are the subject of the article.
link (prism:, pim:) Describes a link to an outside resource such as a website, email or hash tag.
location (prism:, pim:) A geospatial location, referred to in order to indicate a subject of the resource.
media (pam:) An alternative to the XHTML img element. Permits referring to and providing metadata for a media object related to an article.
mediaReference (pam:) Links to the media file referred to by pam:media.
mediaTitle (pam) Published title of the media element.
message (pam:) Root element for message from publisher to aggregator that ontains one or more articles.
nonpublishedMediaTitle (pam:) Nonpublished title of the media element.
object (prism:, pim:) The name of a physical or virtual object, referred to in order to indicate a subject of the resource.
optionEndDate (pur:) The date (potentially including time) when the option to use a resource ends. May be specified by distribution platform.
organization (prism:, pim:) The name of an organization, referred to in order to indicate a subject of the resource.
originPlatform (prism:) The original platform where a resource’s intellectual content was delivered.
permissions (pur:) A free text field used to pecify special permissions for the use of a media asset.
person (prism:, pim:) The proper name of a person, referred to in order to indicate a subject of the resource.
postDate (prism:) Date (and potentially the time) the identified resource is to be posted online. This includes both web and mobile content.
productCode (prism:) The product code for a publication. This may be a bipad or even a full UPC or Magazine Barcode.
publicationDate (prism:) This is the post date for digital content; suitable for storing in a database field with a 'date' data type. Because the publication date may vary by platform, it is the best practice to specify the platform using the PRISM Controlled Vocabulary for platform.
publicationDisplayDate (prism:) This is the close date in date time format for a print publication and the post date for digital content expressed as a text string. Because the publication date may vary by platform, it is the best practice to specify the platform using the PRISM Controlled Vocabulary for platform.
publicationName (prism:) Title of the magazine, or other publication, in which a resource was/will be published.
publisher (dc:) The entity responsible for making the resource available.
quote (pim:) Marks the words attributed to a specific person in the text.
restrictions (pur:) A free text field used to pecify special permissions for the use of a media asset.
reuseProhibited (pur:) Cannot be used.
rightsAgent (pur:) Can be used to specify the rights agent. This is a free text field so contact information may be included. The rights agent may not be the rights owner.
rightsOwner (pur:) Can be used to specify the rights owner. This is a free text field so contact information may be included. The rights owner may be different from the rights agent.
sport (prism:, pim:) Refines dc:subject. Describes a sport, or an athletic activity requiring skill or physical prowess and often of a competitive nature.
status (pam:) Defines the processing status of the article. The default is to add the article (A).
subchannel1 (prism:) First level Web sub channel assigned to the resource.
subchannel2 (prism:) Second level Web sub channel assigned to the resource.
subchannel3 (prism:) Third level Web sub channel assigned to the resource.
subchannel4 (prism:) Fourth level Web sub channel assigned to the resource.
subject (dc:) The main topic or topics of the content of the resource. Defines “aboutness”.
subtitle (prism:) The subtitle for the publication, typically a book.
teaser (prism:) A short description of the resource.
textDescription (pam:) Contains a textual description for the item referred to in a pam:media element.
ticker (pim:, prism:) Indicates a stock ticker symbol that is the subject of the article.
timePeriod (prism:, pim:) The temporal subject of the content of the resource.
title (dc:) The published name given to the resource.
type (dc:) The style of presentation of the resource’s content, such as an image or a table.
url (prism:) This element provides the url for an article or unit of content captured from the Web.
versionIdentifier (prism:) Provides an additional identifier, typically used to record a specific version of a resource. Best practice is to use a version identifier that implies sequence.
wordCount (prism:) The (approximate) count of the number of words in a textual resource.
This appendix contains a complete list of the class attributes that are allowed on elements within the body of the PRISM Aggregator Message.
The URI for the PRISM PAM Class Vocabulary is:http://prismstandard.org/vocabularies/2.0/pam.xml.
The principal component of the resource. [NewsML]
Ancillary content that is presented with an article and cannot stand alone.
The byline (author) of the story.
Text identifying or explaining, and printed in close proximity to, illustrations or other images. [AAT]
An acknowledgement, appearing in the style of a caption.
The geographical location where the story was filed, e.g., city, state, and/or country where the story originated.
A sub-head or secondary headline that generally is preceded by the article headline and precedes the body of the story.
Note above the footer of the page made up of the note and the reference to the note.
Eye catching beginning to a caption.
Eye catching quote pulled from the text of the body of an article.
A substantive piece of content that is presented with an article and can stand alone.
A subtitle of a resource.
A short description of the resource.
The title of a resource.
This example shows the coding for PAM xml that is defaulted to the xhtml: namespace and verified against the pamw.xsd. Note that this example is for content originally published online and validated against an XSD.
This example shows the coding for fully qualified PAMW xml verified against the pamw.xsd. Note that this example is for content originally published online and validated against an XSD.