API design: The problems with PUT and PATCH

Can we solve the usability problems of the humble update API?

At its heart an API must solve five core problems, affectionately known as CRUD plus querying. Your API allows developers to create, retrieve, update, and delete information — queries being a special type of retrieve action. Compared to the complexity of queries, the main CRUD methods are often considered trivial.

Are they, though? For a relational database, a single update can cause lots of writes and reads across multiple indexes. When multiple updates happen in sequence, the order of operations can cause unexpected consequences. Poorly constructed updates can even trigger data loss!

You have lots of choices when implementing updates, from PUT to PATCH to JsonPatch to GraphQL, and error handling options can range from overwriting to crashing to version conflict detection. Alternatively, some databases achieve high performance by disallowing updates and constructing an immutable index!

Let’s discuss how we can design an update API that satisfies our needs for performance, reliability, and usability.

Updating information through an API can be an interesting problem

Types of update APIs

In the REST standard there are two very common types of update APIs:

PUT — A complete update method that replaces an existing record with a new one.
PATCH — A partial update method that changes the values of one or more fields on a record.

When I built the AvaTax API I chose to use the PUT method. I favored this approach because PUT methods could be strongly typed, and automated Swagger / OpenAPI documentation would help users figure it out.

The PUT syntax also had the advantage of being a direct match for the POST method that was used to create the record in the first place. If you knew how to create a record, you also know how to update a record, correct?

I always intended to offer support for PATCH in the AvaTax API, but once it became successful I had my hands full growing the team and I never did it. But I always worried about risks like this example:

PUT /api/v1/widget/12345

{

  "id": 12345,

  "name": "Sprocket",

  "color": "blue"

}

In this tiny example, we are replacing the widget 12345 record with a new one. But why did the user make the call? Were they trying to change the name to Sprocket, or the color to blue, or both? What happens if the widget record has an optional field called location? Does the developer intend to set location to null? Did they just forget to pass in that value?

How can we know what they mean?

JSON itself distinguishes between an omitted field and an intentional null like "location": null. However, in some programming languages the absence of a location field within a JSON object can be misinterpreted as an attempt to set its value to null. Some serialization libraries don’t differentiate these two types of nulls.

Since your API cannot make assumptions about the programming languages used by your developers, I decided over time that I preferred the PATCH API. My team built the Lockstep API from the ground up using PATCH only to avoid this particular risk.

But PATCH APIs are less usable for novice developers. In the OpenAPI specification, a PATCH method that allows an open list of name-value pairs is very similar to an untyped object. The JsonPatch specification (RFC 6901) documents this approach … and it’s useful, but a bit fiddly.

Are there other options to consider?

Considering GraphQL and the mutator pattern

The clever developers behind GraphQL considered this problem and attempted to solve it by introducing mutators. Since GraphQL is structured around untyped JSON data, they decided to include the ability to define a method that could be invoked on the server side which would change data.

A mutator method is defined as a function that accepts certain parameters, and executes a defined set of changes. The official GraphQL website describes it using this example:

// An example mutation definition

mutation CreateReviewForEpisode($ep: Episode!, $review: ReviewInput!) {

  createReview(episode: $ep, review: $review) {

    stars

    commentary

  }

}



// An example mutation call with parameters

{

  "episode": "JEDI",

  "withFriends": false

}

This mutator is an example of a server-side remote procedure call (RPC) system. It is neither a PUT nor a PATCH — there is no strict expectation for how such a server-side RPC method must work. This uniqueness means that developers need to read the documentation for each RPC separately, and you can’t reason from experience with one how another might work.

We could do the same thing in REST with the syntax POST /api/v1/createReviewForEpisode?episode=JEDI&withFriends=false. In fact, this isn’t a terrible approach! It has a few interesting advantages:

Each update API can be defined with a firm contract that is applicable for its specific use case.
The fields that can be updated can be strictly defined for each use case.
If we don’t want a particular field to be changeable, we can simply avoid creating a method that allows the field to be changed.

On the other hand, creating dozens of individual update methods can be a lot of work. It can be tedious for a developer to browse through a dozen update methods to find the one they want. The volume of documentation required is challenging.

I recommend reserving RPC-like update calls for complex business actions. For example, if your API allows a customer to upgrade from one subscription tier to another, make a specific UpgradeSubscriptionTier API.

When deciding on RPC vs PUT/PATCH, you should consider the side effects of a specific change. For lightweight changes that don’t cause lots of consequences, allow the users to change fields with a standard PUT or PATCH method. But if changing a specific field has huge side effects, don’t allow those fields to be changed via PUT and PATCH. Create a specific RPC call for it and explain why in the documentation.

The interesting case of create-or-update

How can a developer reliably trigger updates via your API whenever an event happens in an external system?

In some cases, there is no place to store metadata in the existing internal system to match up with your API. In other cases, the system may not be as reliable as you might like, and maybe the records aren’t guaranteed to match on both sides.

In these cases, defensive programming might require three APIs:

Query to find if the data exists.
Retrieve the existing data.
If the data is old, call update to change the data to its new value.

This approach can be extremely slow, and it can cause burdens for both you and your developers. If you find that your users face this problem regularly, you might want to consider a create-or-update method to allow these customers to work using a single API call.

This approach requires the ability to do the following:

Use a customer’s reference ID number for the record, rather than your own internal ID number (which can’t be reliably generated by the customer’s code).
Within a database transaction, detect whether the record exists and determine whether to execute a create or an update method.

In the case your API requires create-or-update methods, you may want to standardize and document the behavior or you may choose to only enable it on certain objects. One good example of the create-or-update pattern is the CreateOrAdjustTransaction API call, which happens to be one of my favorite AvaTax usability features.

The risks of multiple conflicting updates

One of the most frequently discussed risks of update is one of the least common: the risk of overlapping update calls. The story goes like this:

Alice creates a record A with value 123.
Bill fetches A, then updates the record to value 456.
Charlie fetches A, then updates the record to value 789.

After all three commands are executed, what is the value of the record? Of course, this depends on the order in which changes are executed. Bill’s API call may be delayed and execute after Charlie’s.

This type of order-of-operations problem may occur often when two people are typing into the same online document together, but it doesn’t happen very often when business users are updating customer records. In most cases, business records are created once and updated never — which makes multiple conflicting updates unlikely.

Should we try to solve this problem? Is it worth taking defensive action? If we chose PATCH for our update methods, it will allow one user to change field X and another user to change field Y without conflicts. This is better than PUT, but it doesn’t solve the problem completely since two users can still try to modify the same field.

Another approach is to add a version parameter to our update method. If the retrieve API returns the record’s version number, we can require that the update method take, as input, the version number the user is attempting to update. We can then throw an error if the record’s version number on disk does not match — indicating that the record has been changed by someone else in the meantime.

// When you fetch the record, it tells you the version number

GET /api/v1/widget/12345

{

  "id": 12345,

  "name": "Flange",

  "version": 78

}



// This PUT call will fail if the record's version number is not 78

// After the PUT call succeeds, version number is incremented to 79

PUT /api/v1/widget/12345

{

  "id": 12345,

  "name": "Sprocket",

  "color": "blue",

  "version": 78, 

}

What about bulk updates?

Another interesting challenge is the concept of bulk updating. In the world of relational databases, it is straightforward to write a command UPDATE users SET location='Arizona' where location='Nevada' — and our database will make the changes as efficiently as possible.

But it may be prohibitive to do this with an API. If we allowed customers to make massive changes like this via an API call, the complexity of these updates could cause lag and slowdowns for other users working with our system at the same time.

I generally discourage developers from providing bulk update calls unless there is a strategic business action that makes this sort of thing useful. It is essentially a perfect use case for a custom RPC or mutator call.

Categories of use cases for an update API

Now that we know what types of update APIs we can create, let’s consider some real world use cases:

Fixup scripts — One of our support engineers discovers that a few thousand customers were updated with the wrong billing date. A developer will have to search for all the records with incorrect dates and update them with the corrected ones.
Extract, Transform, and Load — Our customer is trying to import a million records from a legacy system into our software. The ETL script crashes every once in a while and needs to restart itself and pick up where it left off. This means it may need to create some records and update others, if they were left in a broken state.
Ancient code — A customer wrote a program years ago that updates records. For some reason, the customer can’t update this program and we need to ensure that the update calls do not have any unexpected side effects.
Novice developers — Someone is trying to hack together a small program, and they’re experimenting with the API. They need guidance and clear error messages to explain what they’re doing wrong until they eventually figure it out.

Depending on the frequency of these scenarios for your developers, you may want to choose an architecture that provides your own unique combination of PUT, PATCH, RPC, and conflict resolution.

Whatever approach you pick, I encourage API developers to choose a standard pattern for update calls and to define their behavior for consistency, clarity, and conveyance. If your code always chooses a different update method for each type of object, your users will be frustrated — a standard and consistent approach will serve you well in the long run.