A misleading error when using gRPC with Go and nginx

My first glimpse into the workings of HTTP/2

I wanted to document an issue I ran into while trying to write a gRPC service in Go, in case someone else just so happens to run into the same problem.

The scenario was this:

I had a web service written in Go
serving both HTTP traffic and gRPC traffic in plaintext on the same port
using nginx as a reverse proxy to provide TLS.

This might seem a bit contrived, but it came about fairly naturally. I was working on a web service written in Go, and then I wanted to add a gRPC API to it as well, for use by a mobile client. Rather than setting up a new sub-domain or messing with firewall rules to expose another port, I had the Go service accept gRPC requests on the same port that it was already listening on. Then, some time later I decided it would be easier to let nginx handle TLS rather than figuring out how to set up certificate rotation in the Go service itself, and so I had nginx terminate the TLS connection and proxy requests to the Go service over plain HTTP.

The error message

However, when I attempted this, all my gRPC requests stopped working, and I started seeing errors something like this in the nginx logs:

2020/12/13 01:14:04 [error] 29#29: *3 upstream sent too large http2 frame:
4740180 while reading response header from upstream, client: 172.21.0.1,
server: , request: "POST /Service/Method HTTP/2.0", upstream:
"grpc://172.21.0.2:8080", host: "localhost:8080"

My attempts to Google this error message didn’t turn up much. I found a blog post about a similar-sounding error, but I wasn’t using Kubernetes, and changing the proxy_buffers settings in nginx (the recommended fix) didn’t seem to have any effect.

So what was going on here?

I didn’t make much progress towards understanding this, until I came across the httputil.DumpRequest() method. When I used this method to log an incoming request in my main handler method, I saw this:

PRI * HTTP/2.0
Connection: close

I’d never heard of the PRI method, but at least this was something easy enough to search for online. I found that this comes from the HTTP/2 “client connection preface”. The preface is sent by the client at the start of an HTTP/2 connection, and it begins with a fixed byte sequence consisting of:

PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n

(where \r and \n indicate the carriage return and newline characters, respectively). This byte sequence was specifically chosen to look like an invalid HTTP/1.1 request, so that servers which don’t understand HTTP/2 will error out right away.

But why did my Go service start treating this as a request in its own right?

HTTP/2 and Go

When I added the gRPC service on the same port as the rest of the web service, I followed a technique mentioned in the grpc-go documentation. It comes down to a check that the incoming request is using HTTP/2 (as gRPC is implemented using HTTP/2), and has a Content-Type header whose value begins with application/grpc:

To share one port (such as 443 for https) between gRPC and an existing http.Handler, use a root http.Handler such as:
if r.ProtoMajor == 2 && strings.HasPrefix(
	r.Header.Get("Content-Type"), "application/grpc") {
	grpcServer.ServeHTTP(w, r)
} else {
	yourMux.ServeHTTP(w, r)
}

(from https://pkg.go.dev/google.golang.org/grpc#Server.ServeHTTP)

Unfortunately, I either skipped right over the previous paragraph in the documentation, or I simply forgot about it by the time I tried to move the TLS handling into the nginx reverse proxy:

The provided HTTP request must have arrived on an HTTP/2 connection. When using the Go standard library’s server, practically this means that the Request must also have arrived over TLS.

This is the core of the issue. The HTTP server in Go’s standard library does not support plaintext HTTP/2 by default.

(Why doesn’t Go support this by default? Well, HTTP/2 on the web effectively requires encryption, in the sense that currently all major web browsers will use HTTP/2 only with https:// URLs.¹ If Go’s standard library HTTP server is mainly used for writing web services, then this seems like a reasonable design decision.)

But what does it mean to “support plaintext HTTP/2”? Why does the HTTP/2 protocol care if the underlying transport is encrypted or not?

Starting an HTTP/2 connection

Before all of this, I was only vaguely aware that HTTP/2 existed. I had a basic understanding of HTTP/1 from a networking course in college, but HTTP/2 is quite different. To begin with, while HTTP/1 is a text-based protocol, HTTP/2 is a binary protocol. One implication of this is that HTTP/2 is not backwards-compatible with HTTP/1, and so clients and servers need some way to signal to each other that they support HTTP/2. There are three main ways to do this.

The first way is specific to https. As part of any https connection, the client and server first engage in something called a “TLS handshake” to agree on encryption parameters and other connection settings. When HTTP/2 was standardized, an extension was also added to the TLS protocol called “Application-Layer Protocol Negotiation” (ALPN) to let the client and server decide which protocol to use for the connection. The client can use this extension to indicate that it supports both HTTP/1 and HTTP/2, and then the server will indicate back to the client which of these options it has decided to use.

However, for plaintext http, there is no corresponding negotiation step when starting a connection. This brings us to the second way to signal support for HTTP/2: using the connection “upgrade” feature of HTTP/1.1. A client can send an HTTP/1 request that includes the header Upgrade: h2c (along with a few other required headers²) to signal that it can speak HTTP/2.

The third way to use HTTP/2 is for a client to have “prior knowledge” that a particular server supports it. Essentially, if some out-of-band signaling mechanism exists, a client can start speaking HTTP/2 directly.

This last method is the one that nginx uses for gRPC requests. As the gRPC protocol is implemented using HTTP/2, this constitutes “prior knowledge” that any gRPC endpoint must support HTTP/2. When nginx connects to the Go service, it immediately begins speaking HTTP/2, starting with the client connection preface.

However, because the Go standard library HTTP supports only the first method (ALPN), when it receives the HTTP/2 client connection preface, it parses this as an HTTP/1 request, and responds with an HTTP/1 response. When nginx receives this HTTP/1 response, this leads to the original error.

But why does the error message read “upstream sent too large http2 frame”? What’s that about?

HTTP/2 message framing

To answer that question, we need to understand just a little bit more about the HTTP/2 protocol:

HTTP/2 messages consist of a number of “frames.”
Each frame begin with a fixed-length header.
The first 3 bytes of this header is a Length field.
Unless otherwise indicated, the maximum value of this Length field is 16,384. (cf. RFC 7540 §4.1)

When nginx starts reading the erroneous HTTP/1 response from the Go service, it will try to start parsing it as an HTTP/2 message frame. Now, the first line of an HTTP/1 response starts contains the HTTP version and status code, for example:

HTTP/1.1 400 Bad Request

Trying to parse these bytes as the start of an HTTP/2 frame results in reading the first three bytes (“HTT”) as the frame length. In hexadecimal, these bytes are 0x485454, or 4,740,180 in decimal. Obviously this exceeds the maximum frame length of 16,384, and so this explains why nginx thinks the frame is too large.

The number 4740180 even shows up in the error message, and this could have been an important clue, but I did not realize its significance at the time. (This is the same exact idea as the magic number 1213486160, but with only 3 bytes instead of all 4 of “HTTP”.)

Putting it all together

To recap:

When nginx connects to an upstream gRPC server, it begins speaking HTTP/2 directly, starting with the HTTP/2 client connection preface. (The use of gRPC is the “prior knowledge” that the server must support HTTP/2.)
The connection preface begins with a fixed byte sequence chosen to look like like an HTTP/1 request, but with an invalid method and version.
The Go http standard library package does not recognize plaintext HTTP/2 requests, and so the Go service misinterprets the client connection preface as the start of an HTTP/1 request, and responds with an HTTP/1 response.
Then nginx attempts to parse the HTTP/1 response as an HTTP/2 frame, which leads to the original error message.

The nginx error message is misleading because it talks about an HTTP/2 frame being too large, but the real problem is that the Go service wasn’t speaking HTTP/2 at all.

So what is the fix?

Luckily, there’s an easy way to add plaintext HTTP/2 support to the Go standard library HTTP server, using the golang.org/x/net/http2/h2c package.

If the default http2.Server options are acceptable, this can be as simple as replacing something like this:

http.ListenAndServe(addr, handler)

with this:

http.ListenAndServe(addr, h2c.NewHandler(handler, &http2.Server{}))

Of course, another option would be to move the gRPC portion of the Go service onto a different port, and use the gRPC library’s built-in server (which does recognize plaintext HTTP/2 requests just fine). If I had simply done that to begin with, I wouldn’t have run in to this issue.