Code reviews are an essential part of the software development process. Often a code review is considered to be a distinct process and kept separate from day-to-day development. At Lullabot, we consider code review to be a critical component of any development - just like QA, automated testing, and documentation. Code reviews are an acknowledgement that every developer is a human being, and humans make mistakes. No matter the skill or background of a developer, reviewing their code can only improve the final product.
Of course, code review is an integral part of Drupal development as well. The community convention is that at least two people (and often many more) should read and understand code for it to be considered for inclusion in Drupal. While contributed modules don't often have the resources for full code reviews of every patch, be sure that any module author would love reviews of their code.
What are the key components of a code review? While this is by no means a comprehensive list, here are some of the items I look for when reviewing code. Feel free to post your favourite code review tactics in the comments below.
The Story of the Code
All code committed to a project should be an atomic unit that describes:
- Why the change was made.
- What lines of code were changed, and how the new code works.
- How to verify that the change actually worked.
An easy way to figure out if code meets this criteria is to read through a changeset, and answer these questions as if you've never run the code and never talked to the developer who wrote it. Verify that the code does what it says it does (at least by reading it), and that additional functionality doesn't hitch along for the ride.
Much of this information might live in ticketing systems, which is fine as long as the commit messages reference the associated ticket number. I often use commit messages similar to the following:
Ticket #1071: Convert all tabs to spaces in template.php.
Of course, with commit-friendly systems like Git there might be many small commits, and the "atomic unit" becomes a merge, and not a single commit.
Appropriate use of system APIs
Websites and applications built with Drupal have a wide array of APIs and configuration options available at many layers of the system. Many of options available at each layer of the stack can be used to achieve the same result with varying levels of success. APIs and configuration options are usually available at these components in the system stack:
- Server software such as Apache, MySQL, and Varnish.
- PHP and any installed PHP modules.
- Drupal and any installed contributed modules.
- JavaScript APIs provided by browsers or JavaScript libraries.
- CSS for controlling visual aspects of a page.
It's important that code solve a problem at the right layer in a stack. For example, imagine you need to sort rows in a paged table. The sort could be executed:
- As a stored procedure in MySQL (but please, please don't do this).
- With ORDER BY and LIMIT statements manually added in the MySQL query that is executed.
- By fetching the unordered results and sorting them manually in PHP.
- As a call to db_query_range() that is provided by the Drupal API.
- By sorting and filtering the actual table rows in the browser with JavaScript.
The best solution for a given use case may not be clear. Novice developers often don't understand or aren't aware of all of the layers, and code reviews can help ensure that the right changes are made in the right layer of the stack.
Security
The security of a Drupal installation requires thought throughout each part of the system stack. Server configuration and access controls are critical components, but for code reviews it's important to focus just on the code itself. Items to watch for include:
- Any potential SQL injection exploits. Generally this is mitigated by properly using the Drupal Database API. It's still possible to misuse the API or ignore it entirely.
- Any potential cross-site scripting (XSS) exploits. Again, using Drupal's text APIs such as t(), check_plain(), and filter_xss() can mitigate these issues.
- Any potential cross-site request forgery (CSRF) exploits. For example, modifying data based on a GET request could be a security issue. Converting such code to use the Form API, or to use drupal_get_token() can mitigate these issues.
- Ensuring that user and content access controls are implemented properly. Missing node access grants on node queries is a common issue. Or, trusting user IDs passed from the client can expose data. Mitigating data exposure requires understanding both the logic of the code itself and understanding the minimum amount of data required to complete an operation.
These items are just a brief overview of potential security vulnerabilities. Drupal.org has an excellent guide to Writing secure code.
API-first design
All code should be composed of reusable functions that can be repurposed for other use without extensive refactoring. For example, most websites will have custom code that creates a custom menu item and shows a page or a form. The naïve approach is to implement hook_menu() and do something like this:
function example_menu() {
$items = array();
$items['account-balance'] = array(
'title' => 'Your account balance',
'description' => 'How much money you owe this awesome website.',
'page callback' => 'example_account_balance',
'access arguments' => array('access content'),
);
return $items;
}
function example_account_balance() {
global $user;
$output = "Your account balance
";
if ($user->uid > 0) {
$output .= t('Your account balance is %balance.', array('%balance' => $user->balance));
}
else {
$output .= t('You are not authorized to access this page.');
}
return $output;
}
There are several issues with this code:
- The example_account_balance() function is always tied to the currently logged in user. This means that other code would have to re-implement or refactor this function if they wanted to show the balance for a different user.
- Access control is both done improperly and tied to the page logic. Even if access is "denied," the menu system just sees a string to return. Even though the page text indicates that access is denied, the HTTP status code will still return 200 OK.
- The page logic (checking the account balance) is intertwined with the display itself. By not using the theme system, it's impossible to change the display without hacking the module. At most, this function should build an array of variables to pass into a theme() call.
To be "API first," this code should consist of the following functions:
- A page callback that accepts an $account parameter. If it's blank, it can fall back to global $user. The page URL itself should probably contain a user ID, just as a URL like user/[uid]/edit does.
- An access callback that checks for access to the given URL. In this case, it could probably just be set to 'user_is_logged_in', or a custom access callback if more complex logic is required.
- A theme function or template that would accept the string to print as a parameter. It would be responsible for setting the page heading and generating most of the HTML markup.
Documentation
Code isn't done until it's documented. At a basic level, that means that every function should have appropriate PHPDoc headings with information about what the function does, what parameters it accepts, and what it returns. It's just as important to verify that the documentation matches what the function actually does. Inline comments should be added as appropriate to describe particularly tweaky sections of code. Any hacks to work around bugs in other modules should include a comment describing the bug and a link to the upstream ticket or issue. If code is adding any new system variables (with variable_set() and variable_get(), those should be documented if they are not exposed through a UI. For new modules, a README.txt should be included to describe the overall purpose of the module as well as any installation instructions.
Unit and Functional Tests
If a project is using SimpleTest or Selenium, code should never be committed without the associated tests or changes to existing tests. "I'll add tests later" is a common phrase when a project is under a deadline, but those are exactly the times when automated testing is most important. Writing and validating automated tests is beyond the scope of this article, but for more information check out the SimpleTest tutorial on Drupal.org.
Code Style and Standards
Finally, code should follow whatever code standards have been decided for the project. Typically Drupal projects use Drupal's code standards to simplify integration of code from various sources. Following code standards reduces the effort required to read code and ensures that it's easy to identify components of code. It can also help reduce issues with merging code written by developers on different operating systems. If code is being submitted by developers that frequently breaks code standards, it can usually be resolved with a bit of education and text editor configuration. For more information about code standards, check out the Coding standards page on drupal.org, the Coder module, and the Drupal Code Sniffer module.
Next Steps
Code review is an ongoing process, and one that should be integrated into your development workflow. Code reviews don't just make for better code now; they push your team to write better code in the future. For more information about reviewing code, take a look at the How to review Full Project applications page on Drupal.org and the pages it links to.