How to Create Open, Structured Content
Structured content refers to the concept of organizing and treating digital content like data. It’s a way of publishing content as modular, discrete pieces of information that are tagged with machine-readable descriptions. Structured content has the potential to transform how people find, understand, share, and use government information.
Why Structured Content Matters
Currently, most online federal government information is found on static HTML Web pages, which are usually unstructured content, designed to be viewed on a PC. This unstructured content doesn’t always adapt well to smaller screens, and it’s harder to discover, share, or reuse the information. Given the rapid pace at which new devices are introduced, you can no longer publish content and trust that your audience will only view it on a PC. With the proliferation of tablets, smartphones, and other mobile devices, you need to publish content that is divorced from presentation, and structured so the content is available and consumable anytime, anywhere, on any device.
Create Once, Publish Everywhere
Structured content gives you the granular control over your information that you need to “Create Once, Publish Everywhere.” Read about the COPE: Create Once, Publish Everywhere method from National Public Radio.
Share and Re-Use Content via RSS and APIs
Structured content enables others to aggregate and reuse information via Really Simple Syndication (RSS) feeds, which automatically publish frequently changing information, and Application Programming Interfaces (APIs), which enable websites, programs, and devices to interact with one another. RSS and APIs are great ways to share information, because they can automate many tasks and automatically present the latest information, even combining information from several sources. But you can’t use these technologies without structured content.
As an example, if you publish information about an event, and publish it as structured content, the same event information could be displayed as part of a calendar of events, or published via a news feed, or aggregated with other related events via an API. A short description of the event could display on mobile devices, and a longer description could display on a PC. The possibilities are endless.
Search engines can also take advantage of structured content by offering more meaningful rich snippets, which are the descriptions that appear in search engine results. The more information you provide about your content, the more “machine-readable” it becomes, enabling Web services and helping search engines hone results to get your content to the people who need it.
Specific Policy or Legal Requirements
- Open Data Policy—Managing Information as an Asset (PDF, 6 MB, 12 pages, May 2013), in accordance with the May 9, 2013 Executive Order—Making Open and Machine Readable the New Default for Government Information.
- Digital Government Strategy (May 2012)
- OMB M-10-06, Open Government Directive (December 2009)
How to Implement
You can structure content by defining content types, and then publishing content so it conforms to the defined content type. You can also structure content by implementing a taxonomy, which is a way to classify information by attaching descriptive terms to each piece of content. Those terms might describe to what section of your site a particular page belongs, or what topics are discussed in a given blog post, or in what region of the country an office is located. The taxonomy should include pre-defined controlled vocabularies to describe and tag related pieces of content. Read this explanation of taxonomies (relates to Drupal but very clear and understandable).
Web pages are typically comprised of several common pieces of information, such as titles, dates, descriptions, or contributors. When you tag your content to identify and describe each of these elements, you’ve created structured content. This approach requires a significant shift from how federal agencies have traditionally managed content. The key is to think “building blocks” instead of “Web pages.”
How you approach implementation will depend on your strategy for making information available to your various audiences. Once you understand and define your content types, and tag your content, it will be open, shareable, and reusable through Application Programming Interfaces (APIs), RSS feeds, or distribution other methods.
- Read Deblobbing Your Chunks: Building a Flexible Content Model for some great insights into how to structure content
- Content Modelling: A Master Skill discusses how to create a content model
You can structure content with content types, which are essentially a pre-defined collection of related data fields. One content type usually has many fields with which it’s associated.
- An example of a content type is a contact list. The list itself is the Content Type, and the pieces of information about each person on the list (first name, last name, etc.) are the Fields
- Events are another common example of a content type, and the pieces of information about the event (title, date, location, etc.) are the fields
- You can create different content types to meet different needs. For example, an event could have a short description to use in an RSS feed or tweet, and a long description to use on your Web page. This is a key principle of “create once, publish everywhere.” You only create the event once, but you tag it so it's machine-readable, and include different pieces of information (e.g., both short and long descriptions), so you can use the same “event content” in many different places.
To take advantage of content types, developers should consult with Web teams to configure the CMS to collect information about certain types of content in a structured way, perhaps by creating a form with fields where each piece of information is entered.
Here's a simplified example of an "event" content type, and some of the standardized fields that make up this event.
When you tag your content with common, standard metadata, it enables discovery and aggregation of common information types across .gov websites, and helps agencies, the public, and commercial entities access, expose, and re-use government information.
- Tagging content with machine-readable descriptions helps search engines and Web services understand meaning and relationships in your content, opening up government information for re-use via syndication, APIs, or other technologies, and improving search results
Use industry standard pre-defined keyword vocabularies to tag your content. Two of the most common are Dublin Core and Schema.org.
- Dublin Core Metadata Initiative—metadata vocabulary and syntax
- Getting Started with Schema.org—html tags (schema) that improve the display of search results, making it easier for people to find your content
Incorporate tagging into your routine content creation process. Determine your content structure, then follow the Dublin Core and Schema.org vocabularies to tag your content. Web teams should consult with developers to configure your content management system to automatically apply the proper tags to content elements as appropriate.
See the example below from the Schema.org website on how the HTML code looks for an address that's been marked up with Schema.org tags.
Adaptive content is content that’s structured so it can be delivered in a variety of innovative ways, such as mobile or responsive design. A Mobile First approach means you design content for mobile devices first, and focus on those tasks most important for people to complete on mobile devices. You can then add additional features and functionality based on user priorities. Any content designed for mobile first can be scaled up so it also works on other devices and platforms. Responsive design is a way to construct content so that the same content scales up or down to fit the device on which it’s viewed. Only core or abbreviated pieces of information show on the small screen of your phone, but if you view the same page on your PC, you’ll see additional content such as expanded descriptions, sidebars, or related resources.
Review the information and case studies below to learn more about structured content. Note that a content management system (CMS) makes it much easier to publish structured digital content because it offers a standard way to collect and present information. If your agency is not currently using a CMS, learn about the benefits of content management systems.
- Adapting Ourselves to Adaptive Content—engaging and informative presentation explaining the importance of structured content to multi-channel digital publishing
- COPE: Create Once, Publish Everywhere—National Public Radio’s (NPR) COPE model to create content once, but structure it in such a way that it adapts to display correctly in different venues such as feeds, mobile, tweets, etc.
- NPR’s COPE API—SXSW presentation about “create once, publish everywhere”
- Structured Content First—Slideshare presentation on how structured content enables content re-use
These case studies illustrate how configuring your CMS to structure certain types of content can save time and streamline business processes:
- Press releases—Automating publication of press releases—Department of Education
- Events—Transforming events into open content—DigitalGov University, GSA
Examples of Agencies Using Structured Content
- The Census Bureau provides API access to several data sets
- The National Library of Medicine provides APIs to find, use, and incorporate medical literature, consumer health information, clinical trials, and other information
- The Small Business Administration APIs provide several small business resources and geographic data
- The blog at open.NASA.gov provides “page tags” (a site-wide taxonomy) for all its content
- U.S. Government RSS Directory at USA.gov
- vocab.data.gov—Data.gov vocabulary and reference models for government information
- Project Open Data is an OMB and OSTP resource; an online repository of tools, best practices, and schema to help agencies adopt the framework presented in OMB Memorandum M-13-13: Open Data Policy—Managing Information as an Asset (PDF, 6 MB, 12 pages, May 2013)
- Structured Content: An Overview, a March 2012 article at MeetContent.com argues for structured content so that it can be used anywhere
- Demystifying Big Data (PDF, 3.5 MB, 40 pages, October 2012), a TechAmerica Foundation report spells out the value big data holds for government agencies
- Developer resources at USA.gov, such as various APIs and RSS feeds
- EPA guide to using HTML, XHTML, and XML
- The Google Structured Data Testing Tool checks page tags and helps you improve rich snippets in search results
- The Data.gov Semantic Community highlights government work in the semantic Web/linked data space
- HTML5, a MobileGovWiki page explains the HTML5 standard
- CSS3, a MobileGovWiki page explains the CSS3 standard, with links to Web design frameworks
- GitHub, the coding site where many government agencies share code
- Sourceforge.net, where you can find, create, and publish open source software for free
- W3C Data Catalog Vocabulary (DCAT), an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web