Create SEO Juice From JSON LD Structured Data in Drupal

Google loves structured data and they prefer JSON LD. This article explains how Schema.org structured data works, and how to add it to your Drupal web pages.

TL;DR:

  • Structured data has become an important component of search engine optimization (SEO).
  • Schema.org has become the standard vocabulary for providing machines with an understanding of digital data.
  • Google prefers Schema.org data as JSON LD over the older methods using RDFa and microdata. Also, JSON LD might be a better solution for decoupled sites.
  • Google provides tools to validate structured data to ensure you’re creating the right results.
  • You can use the Schema.org Metatag module to add Schema.org structured data as JSON LD in Drupal and validate it using Google’s tools.

Why does structured data matter to SEO?

Humans can read a web page and understand who the author and publisher are, when it was posted, and what it is about. But machines, like search engine robots, can’t tell any of that automatically or easily. Structured data is a way to provide a summary, or TL;DR (Too long; didn't read), for machines, to ensure they accurately categorize the data that is being represented. Because structured data helps robots do their job, it should be a huge factor in improving SEO.

Google has a Structured Data Testing Tool that can provide a preview of what a page marked up with structured data will look like in search results. These enhanced results can make your page stand out, or at least ensure that the search results accurately represent the page. Pages that have AMP alternatives, as this example does, get extra benefits, but even non-AMP pages with structured data receive enhanced treatment in search results.

 








Structured data code example

Who is Schema.org and why should we care?

Schema.org has become the de-facto standard vocabulary for tagging digital data for machines. It’s used and recognized by Google and most or all of the other search engines.

If you go to the main Schema.org listing page, you’ll see a comprehensive list of all the types of objects that can be described, including articles, videos, recipes, events, people, organizations, and much much more. Schema.org uses an inheritance system for these object types. The basic type of object is a Thing, which is then subdivided into several top-level types of objects:

  • Thing
    • Action
    • CreativeWork
    • Event
    • Intangible
    • Organization
    • Person
    • Place
    • Product

These top-level Things are then further broken down. For example, a CreativeWork can be an Article, Book, Recipe, Review, WebPage, to name just a few options, and an Article can further be identified as a NewsArticle, TechArticle, or SocialMediaPosting.

Each of these object types has its properties, like ‘name,' ‘description,' and ‘image,' and each inherits the properties of its parents, and adds their own additional properties. For instance, a NewsArticle inherits properties from its parents, which are Thing, CreativeWork, and Article. Finally, NewsArticle has some additional properties of its own. So it inherits ‘author’ and ‘description’ from its parents and adds a ‘dateline’ property that its parents don’t have.

 








NewsArticle Schema.org specification

Some properties are simple key/value pairs, like description. Other properties are more complex, such as references to other objects. So a CreativeWork object may have a publisher property, which is a reference to a Person or Organization object.

Further complicating matters, an individual web page might be home multiple, related or unrelated, Schema.org objects. A page might have an article and also a video. There could be other elements on the page that are not part of the article itself, like a breadcrumb, or event information. Structured data can include as many objects as necessary to describe the page.

Because there’s no limit to the number of objects that might be described, there's also a property mainEntityOfPage, which can be used to indicate which of these objects is the primary object on the page.

What are JSON LD, RDFa, and Microdata, where do they go, and which is better?

Once you decide what Schema.org objects and properties you want to use, you have choices about how to represent them on a web page. There are three primary methods: JSON LD, RDFa, and Microdata.

RDFa and Microdata use slightly different methods of accomplishing the same end. They wrap individual items in the page markup with identifying information.

JSON LD takes a different approach. It creates a JSON array with all the Schema.org information and places that in the head of the page. The markup around the actual content of the page is left alone.

Schema.org includes examples of each method. For instance, here’s how the author of an article would be represented in each circumstance:

RDFa

<div vocab="http://schema.org/" typeof="Article">
 <h2 property="name">How to Tie a Reef Knot</h2>
 by <span property="author">John Doe</span>
 The article text.
</div>

Microdata

<div itemscope itemtype="http://schema.org/Article">
 <h2 itemprop="name">How to Tie a Reef Knot</h2>
 by <span itemprop="author">John Doe</span>
 The article text.
</div>

JSON LD

<script type="application/ld+json">
{
 "@context": "http://schema.org",
 "@type": "Article",
 "author": "John Doe",
 "name": "How to Tie a Reef Knot".

 “description”: “The article text”.
}
</script>

Which is better?

There are advantages and disadvantages to each of these. RDFa and Microdata add some complexity to the page markup and are a little less human-readable, but they avoid data duplication and keep the item's properties close to the item.

JSON LD is much more human-readable, but results in data duplication, since values already displayed in the page are repeated in the JSON LD array.

All of these are valid, and none is really “better” than the other. That said, there is some indication that Google may prefer JSON LD. JSON LD is the only method that validates for AMP pages, and Google indicates a preference for it in its guide to structured data.

From the standpoint of Drupal’s theme engine, the JSON LD method would be the easiest to implement, since there’s no need to inject changes into all the individual markup elements of the page. It also might be a better solution for decoupled sites, since you could theoretically use Drupal to create a JSON LD array that is not directly tied to Drupal’s theme engine, then add it to the page using a front-end framework.

What about properties that reference other objects?

As noted above, many properties in structured data are references to other objects. A WebPage has a publisher, which is either an Organization or a Person.

There are several ways to configure those references. You can indicate the author of a CreativeWork either by using a shortcut, the string name or URL of the author, or by embedding a Person or Organization object. That embedded object could include more information about the author than just the name, such as a URL to an image of the person or a web page about them. In the following example, you can see several embedded references: image, author, and publisher.

<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@graph": [
         {
            "@type": "Article",
            "description": "Example description.",
            "image": {
                "@type": "ImageObject",
                "url": "https://www.example.com/582753085.jpg",
                "width": "2408",
                "height": "1600"
            },
            "headline": "Example Title",
            "author": {
                "@type": "Person",
                "name": "Example Person",
                "sameAs": [
                    "https://www.example-person.com"
                ]
            },
            "dateModified": "2017-06-03T21:38:02-0500",
            "datePublished": "2017-03-03T19:14:50-0600",
            "publisher": {
                "@type": "Organization",
                "name": "Example.com",
                "url": "https://www.example.com//",
                "logo": {
                    "@type": "ImageObject",
                    "url": "https://www.example.com/logo.png",
                    "width": "600",
                    "height": "60"
                }
            }
        }
    ]
}</script>

JSON LD provides a third way to reference other objects, called Node Identifiers. An identifier is a globally unique identifier, usually an authoritative or canonical URL. In JSON LD, these identifiers are represented using @id. In the case of the publisher of a web site, you would provide structured data about the publisher that includes the @id property for that Organization. Then instead of repeating the publisher data over and over when referencing that publisher elsewhere, you could just provide the @id property that points back to the publisher record. Using @id, the above JSON LD might look like this instead:

<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@graph": [
         {
            "@type": "Article",
            "description": "Example description.",
            "image": {
                "@type": "ImageObject",
                "@id": "https://www.example.com/582753085.jpg"
            },
            "headline": "Example Title",
            "author": {
                "@type": "Person",
                "@id": "https://www.example-person.com"
            },
            "dateModified": "2017-06-03T21:38:02-0500",
            "datePublished": "2017-03-03T19:14:50-0600",
            "publisher": {
                "@type": "Organization",
                "@id": "https://www.example.com//"
             }
        }
    ]
}</script>

How can we be sure that Google understands our structured data?

Once you’ve gone to the work of marking up your pages with structured data, you’ll want to be sure that Google and other search engines understand it the way you intended. Google has created a handy tool to validate structured markup. You can either paste the URL of a web page or the markup you want to evaluate into the tool. The second option is handy if you’re working on changes that aren't yet public.

Once you paste your code into the tool, Google provides its interpretation of your structured data. You can see each object, what type of object it is, and all its properties.

If you’re linking to a live page rather than just providing a snippet of code, you will also see a ‘Preview’ button you can click to see what your page will look like in search results. The image at the top of this article is an example of that preview.

Schema.org doesn’t require specific properties to be provided for structured data, but Google has some properties that it considers to be “required” or “recommended.” If those are missing, validation will fail.

You can see what Google expects on different types of objects. Click into the links for each type of content to see what properties Google is looking for.

 








Structured data testing tool

How and where can we add structured data to Drupal?

The next logical question is what modules are available to accomplish the task of rendering structured data on the page in Drupal 8. Especially tricky is doing it in a way that is extensible enough to support that gigantic list of possible objects and properties instead of being limited to a simple subset of common properties.

Because of the complexity of the standards and the flexibility of Drupal’s entity type and field system, there is no one-size-fits-all solution for Drupal that will automatically map Schema.org properties to every kind of Drupal data.

The RDFa module is included in core and seems like a logical first step. Unfortunately, the core solution doesn’t provide everything needed to create content that fully validates. It marks up some common properties on the page but has no way to indicate what type of object a page represents. Is it an Article? Person? Organization? Event? There is no way to flag that. And there is no way to support anything other than a few simple properties without writing code.

There is a Drupal Summer of Code project called RDF UI. It adds a way to link a content type to a Schema.org object type and to link fields to Schema.org properties. Though the module pulls the whole list of possible values from Schema.org, some linkages aren’t possible, for instance, a way to identify the title or creation date as anything other than standard values. I tried it out, but content created using this module didn’t validate for me on Google’s tool. The module is very interesting, and it is a great starting point, but it still creates RDFa rather than JSON LD.

The architecture of the Schema.org Metatag module.

After looking for an existing solution for Drupal 8, I concluded there wasn’t a simple, valid, extensible solution available to create JSON LD, so I created a module to do it, Schema.org Metatag.

Most of the heavy lifting of Schema.org Metatag comes from the Metatag module. The Metatag module manages the mapping and storing of data is managed, allowing you to either input hard-coded values or use tokens to define patterns that describe where the data originates. It also has a robust system of overrides so that you can define global patterns, then override some of them at the entity type level, or at the individual content type level, and or even per individual item, if necessary. There is no reason not to build on that framework, and any sites that care about SEO are probably already using the Metatag module already. I considered it an ideal starting point for the Schema Metatag module.

The Schema.org Metatag module creates Metatag groups for each Schema.org object type and Metatag tags for the Schema.org properties that belong to that object.

The base classes created by the Schema.org Metatag module add a flag to groups and tags that can be used to identify those that belong to Schema.org, so they can be pulled out of the array that would otherwise be rendered as metatags, to be displayed as JSON LD instead.

Some Schema.org properties need more than the simple key/value pairs that Metatag provides, and this module creates a framework for creating complex arrays of values for properties like the Person/Organization relationship. These complex arrays are serialized down into the simple strings that Metatag expects and are unserialized when necessary to render the form elements or create the JSON LD array.

The primary goal was to make it easily and endlessly extensible. The initial module code focuses on the properties that Google notes as “Required” or “Recommended” for some basic object types. Other object types may be added in the future, but could also be added by other modules or in custom code. The module includes an example module as a model of how to add more properties to an existing type, and the existing modules provide examples of how to add other object types.

Also, there is a patch for the Metatag module to refactor it a bit to make it possible for a decoupled Drupal back end to share metatags with a front-end framework. Since this module is built on the Metatag model, hopefully, that change could be exploited to provide JSON LD to a decoupled front end as well.

This approach worked well enough in Drupal 8 that I am in the process of backporting it to Drupal 7 as well.

Enough talk, how do I get JSON LD on the page?

It’s helpful to understand how Schema.org objects and properties are intended to work, which is the reason for going into some detail about that here. It helps to figure out ahead of time what values you expect to see when you get done.

Start by scanning the Schema.org lists and Google’s requirements and recommendations to identify which objects and properties you want to define for the content on your site. If you’re doing this for SEO, spend some time reviewing Google's guide to structured data to see what interests Google. Not all content types are of interest to Google, and Google considers some properties to be essential while ignoring others.

Some likely scenarios are that you will have one or more types of Articles, each with images and relationships to the People that author them or the Organization that publishes them. You might have entity types that represent Events, or Organizations, or People or Places, or Products. Events might have connections to Organizations that sponsor them or People that perform in them. You should be able to create a map of the type of content you have and what kind of Schema.org object each represents.

Then install the Schema.org Metatag module and enable the sub-modules you need for the specific content types on your site. Use this module the same way you would use the Metatag module. If you understand how that works, you should find this relatively easy to do. See the detailed instructions for Metatag 8.x or Metatag 7.x. You can set up global default values using tokens, or override individual values on the node edit form.

In Conclusion

Providing JSON LD structured data on your website pages is bound to be good for SEO. But it takes a while to get comfortable with how structured data works and the somewhat confusing Schema.org standards, let alone Google’s unique set of requirements and recommendations.

No solution will automatically configure everything correctly out of the box, and you can’t avoid the need to know a little about structured data. Nevertheless, this article and the Schema.org Metadata module should enable you to generate valid JSON LD data on a Drupal site.

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!