When we built the Lullabot.com front-end using React, we needed a way to get data from Drupal. Setting up a REST API directly from Drupal was one way of doing this. We found, however, that the existing modules for implementing our own API were rather complex and not so performant. In addition, since the navigation at Lullabot.com was anonymous, we did not need the Drupal bootstrap for user sessions when querying the API. Instead, we decided to use CouchDB, a NoSQL database which stores its data as a flat collection of JSON documents. The best thing about CouchDB compared to other NoSQL offerings available out there is that it provides a REST API out of the box. This makes CouchDB the bee’s knees for this project. In this article, we will share a few recipes for querying and aggregating data with CouchDB.performant. In addition, since the navigation at Lullabot.com was anonymous, we did not need the Drupal bootstrap for user sessions when querying the API. Instead, we decided to use CouchDB, a NoSQL database which stores its data as a flat collection of JSON documents. The best thing about CouchDB compared to other NoSQL offerings available out there is that it provides a REST API out of the box. This makes CouchDB the bee’s knees for this project. In this article, we will share a few recipes for querying and aggregating data with CouchDB.
We will be using curl to access the API in the examples for this article, but you can use your preferred REST client.
Creating a Design Document
Since CouchDB is a NoSQL database, we can't write queries to join and aggregate data in the same way we would when using a relational database such as MySQL. There are no tables and no relationships. There is, instead, a JavaScript view engine to help us create and run queries using the map-reduce paradigm. The map portion of the paradigm takes a list of values and applies the same computation to each value to produce a new refined list. The reduce operation then collapses or combines those values into a single or given number of values. For example, the total sum of all the values. The views which act as the map part of the paradigm, are defined using JavaScript functions that take a single CouchDB document as an argument. These functions determine which documents are made available through the views.
View functions live in the database’s design documents. Design documents store application code that is run inside the database. The following example declares a view:
{
"views": {
"all": {
"map": "function(doc) { emit(doc.title, doc) }",
}
}
}
We can save this into a javascript file, upload it to CouchDB, and save it as design document to create the view. For example, using curl:
curl -X PUT @docs_view.js 'http://localhost:5984/lullabot/_design/[your_design_doc]'
In this case, we are making a PUT request because a new design document is being created at http://localhost:5984/lullabot/_design/docs. If the request was successful we get this response:
{"ok":true,"id":"_design/docs","rev":"1-485d01ba401d41a9e922780eca73a384"}
You can check on the design document by retrieving it:
curl -X GET 'http://localhost:5984/lullabot_1/_design/[your_design_doc]'
{
"_id":"_design/docs_1",
"_rev":"1-485d01ba401d41a9e922780eca73a384",
"views":{"all":{"map":"function(doc) { emit(doc.title, doc) }"}}
}
The Basic View
To start with, as an example, we will use the following document structure for storing a speaker's conference events:
{
events: [{
title: "DrupalCon Barcelona 2015",
subtitle: "Monday 21st-25th September 2015 — Barcelona",
role: "speaking",
link: "http://lanyrd.com/2015/drupalcon-barcelona/",
location: "Barcelona, Spain",
start_date: 1442808000,
end_date: 1443153600
}, {
title: "ZendCon 2015",
subtitle: "Monday 19th-22nd October 2015 — Las Vegas",
role: "attending",
link: "http://lanyrd.com/2015/zendcon/",
location: "Las Vegas, United States",
start_date: 1445227200,
end_date: 1445486400
}],
speaker: {
name: "Joe Shindelar",
headshot: {
uri: "bio_image/dsc_0272.jpg",
mime: "image/jpeg",
},
link: "who-we-are/joe-shindelar"
},
type: "speaker"
}
To get a list of all the speakers, the map function looks like this:
function(doc) {
if (doc.type && doc.type == 'speaker') {
emit(doc.speaker.name, doc);
}
}
When we query the database using a view for the first time, each record in the database is run through the map function. This only happens once when the view is first run so that the database can be indexed and stored in a B-tree (Further reading on performance can be found here). The emit() function outputs a key and the value, which we specify as the first and second parameter respectively. In this case, the key is the speaker's name and the value is the entire document. Here we also check if the document being returned is the type speaker so that only speaker documents are returned.
Each view comes with a neat way to access it via the REST API. The views can be retrieved at a URL that looks like this: /[your_database]/_design/[your_design_doc]/_view/[your_view]. The view we just created is accessed at http://localhost:5984/lullabot/_design/docs/_view/speakers.
To query the above view we can send a GET request using curl:
curl http://localhost:5984/lullabot/_design/docs/_view/speakers
This returns a list of speakers with their names as the key:
{
"total_rows": 22,
"offset": 0,
"rows": [{
"id": "84e77231cae12a572cc6724241004f43",
"key": "Angus Mak",
"value": {
"_id": "84e77231cae12a572cc6724241004f43",
"_rev": "6-5f9bee7b6b2c428657fdc8b2a7d5dcea",
"events": [...],
"username": "makangus",
"speaker": {... },
"type": "speaker"
}
}, {
"id": "84e77231cae12a572cc6724241001f47",
"key": "Jared Ponchot",
"value": {
"_id": "84e77231cae12a572cc6724241001f47",
"_rev": "6-87ee7de4b615a75dc786a85367c5e445",
"events": [...],
"username": "jponch",
"speaker": {... },
"type": "speaker",
}
}]
}
Querying a Document Set
Now that we've nailed down a simple view, we can look at an example of how to query it. Suppose we want to display a list of events a speaker is attending. We would need to query for their particular speaker record. As the map function above returns the speaker's name as the key, we can easily pass the in the query parameter when we are making our request:
curl http://localhost:5984/lullabot/_design/docs/_view/speakers?startkey=['Angus Mak']&&endkey=['Angus Mak']
Here we have used the startkey and endkey parameters. Under the hood, the view is stored in a tree structure, when we specify a startkey, it tells CouchDB the position to start looking. If we have an endkey defined, the lookup will stop when it hits the endkey. In this case, the start and end keys are the same, so we return just this record as there is only one match.
Building more complex queries
So a straightforward display of each record is simple enough, let’s complicate matters with a real scenario. For the Upcoming Talks page, we want to display a list of all talks that staff at Lullabot are presenting in chronological order.
The problem is that each event does not have its own record but is stored as an array as part of the speaker record.
Here is the map function:
function(doc) {
if (typeof doc.type !== 'undefined' && doc.type === 'speakers') {
doc.events.forEach(function(event) {
emit([event.role, event.session_date, event.title], {
event: event,
speaker: doc.speaker.name
});
});
}
}
To solve this problem, we iterate over all the events listed in each speaker record and output a result for each event. The resulting objects contain both the event and speaker objects. The key is a composite one made up of the role, event title, and session date. The fields in the key have been carefully chosen for the following reasons:
- We want to be able to order the sessions chronologically, so the session_date was included.
- We would also like to filter by whether or not a speaker is just attending or speaking at an event.
Here is a resulting row from the view with each event displayed as it’s own result row. A key/value pair is produced for each event:
{
"id": "84e77231cae12a572cc6724241004f43",
"key": ["attending", 1442808000, "DrupalCon Barcelona 2015"],
"value": {
"id": "84e77231cae12a572cc6724241004f43",
"event": {
"title": "DrupalCon Barcelona 2015",
"subtitle": "Monday 21st-25th September 2015 — Barcelona",
"role": "attending",
"link": "http://lanyrd.com/2015/drupalcon-barcelona/",
"location": "Barcelona, Spain",
"start_date": 1442808000,
"end_date": 1443153600
},
"speaker": "Angus Mak"
}
}
To display just the speaking events, the query looks like this:
curl http://localhost:5984/lullabot/_design/docs/_view/upcoming_talks?startkey=["speaking"]&endkey=["speaking",{},{}]
Since we are only using one part of the key to filter with, we need to also specify the values of the other two parts of the key in the end key. As we are only interested in filtering by role, we can use wildcards {} for the other parts.
As a further example of querying using a compound key, if we want to filter all the speaking events between a date range, the query is as follows:
curl http://localhost:5984/lullabot/_design/docs/_view/upcoming_talks?startkey=["speaking",1442808000]&endkey=["speaking",1444968000,{}]
Here we are filtering between 21st Sept 2015 - 16th Oct 2015. When we are querying using a compound key, the order of each key part is important. We can only filter in the order of each part of the key. For example in this case, we can filter by [role], [role, session date] and [role, session date, event title], but we cannot filter by [role, event title].
Sorting
On the Upcoming Talks page, we would like to display the most recent speaking session first. We can use the descending parameter.
curl http://localhost:5984/lullabot/_design/docs/_view/upcoming_talks?startkey=["speaking",1444968000,{}]&endkey=["speaking",1442808000]&descending=true
Notice that the start and end keys have been flipped compared to the previous query example. This is because when we sort in a descending order we reverse the position in which we begin searching. The example below displaying a list of keys illustrates this.
["speaking",1431359982,"DrupalCon Los Angeles 2015"]
["speaking",1442808000,"DrupalCon Barcelona 2015"] // End position here
["speaking",1443844800,"Open Source North"]
["speaking",1444241700,"HOW Interactive Design Conference: Chicago"]
["speaking",1444449600,"New England Drupalcamp (NEDCAMP)"]
["speaking",1444968000,"DrupalCamp Atlanta 2015"] // Start position here
["speaking",1445227200,"ZendCon 2015"]
When you have a compound key, the fields are sorted in the same order as they are in the key. So in this case, the results are sorted by role, date, and then title.
Using the Reduce Function
Suppose we want to display some statistics on how many sessions Lullabot staff are speaking at in total. We can use the reduce function to help aggregate the data. The reduce function is executed on the documents returned by the map function. Similar to the map function, the reduce function operates once on each document.
With a few modifications, the map function looks like this:
function(doc) {
if (typeof doc.type !== 'undefined' && doc.type === 'speakers') {
doc.events.forEach(function(event) {
emit([event.role, event.title], 1);
});
}
}
The resulting rows are of the following structure:
{"id": "84e77231cae12a572cc6724241004f43", "key": ["attending","DrupalCon Barcelona 2015"], "value": 1}
In order to count up all the speaking sessions, we can add a reduce function to our view in the design document:
{
"views": {
"upcoming_talks": {
"map": "function(doc){ //map function operations..}",
"reduce": "function(key, values){ //reduce function operations..}"
}
}
}
Here is the reduce function:
function(key, values) {
return sum(values);
}
The reduce function is called after the map function. The key parameter is an array containing all the keys returned by the map function and the id of the corresponding document. The values parameter contains an array of the values returned by the map function. When the function is run at index time, the values (in this case 1) are all added together using the the sum function to give us the number of sessions Lullabot are speaking at for each event.
The results are grouped using the whole key. In this case, we are adding the values where both parts of the key (role and title) are the same. For example:
["speaking", "DrupalCon Barcelona"]
["speaking", "DrupalCon Barcelona"]
["speaking", "DrupalCamp Atlanta 2015"]
The above contents of the key array will return the values 2 sessions at DrupalCon Barcelona and 1 session at DrupalCamp Atlanta 2015.
To return the number of total sessions overall, we can add the group_level parameter to the query.
curl http://localhost:5984/lullabot/_design/docs/_view/upcoming_talks?startkey=["speaking"]&endkey=["speaking",{}]&group_level=1
This will group the results by only the first part of the key and the reduce function will return the total number of sessions overall.
Conclusion
In this article, we discussed some basic techniques to get the data from couchDB. From our experience with using couchDB in the lullabot.com project, we found that it was a great alternative to building a Drupal REST API for powering the data layer of a decoupled site. The advantages are:
- If the Drupal database becomes unavailable, it does not affect the front end site.
- CouchDB provides a REST API out of the box.
- Increased security as only the content editors have access to the Drupal back end.
When considering if couchDB is a suitable solution for a decoupled site, the advantages need to be balanced with the overhead of managing how the data is exported from Drupal to couchDB, and maintaining a couchDB instance in addition to the back-end Drupal database.