Graceful shutdown of golang web server

860 Views Asked by At

I'm trying to find an optimal way to handle ongoing PostgreSQL transactions during the shutdown of a golang server running on Kubernetes.

Does it make sense to wait for transactions to finish, when these transaction are serving requests initiated by a server that has already shutdown? And even if the transaction completes within the graceful shutdown timeout duration - will the server be able to send the response?

Even if responding to ongoing requests during shutdown is not possible, I prefer to cancel the context of all running transaction so they don't continue to run on the database after the server terminates, adding unnecessary load. But whenever I wait for transactions to finish, it seems there's a trade-off: The longer I wait for ongoing transactions to finish - the longer the container exists with a non responsive server that would error on each incoming request.

Here's some sample code that demonstrates this:

import (
    "github.com/jackc/pgx/v5/pgxpool"
    "os/signal"
    "context"
    "net/http"
    "syscall"
    "time"
)

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)
    defer cancel()

    // db is used by the API handler functions
    db, err := pgxpool.NewWithConfig(ctx, <some_config>)
    if err != nil {
        logger.Error("server failed to Shutdown", err)
    }


    server := http.Server{<some_values>}
    serverErr := make(chan error)
    go func() {
        serverErr <- server.ListenAndServe()
    }()

    select {
    case <-ctx.Done():
        if err := Shutdown(closeCtx, time.Second*10, server, db); err != nil {
            logger.Error("server failed to Shutdown", err)
        }

    case err := <-serverErr:
        logger.Error("server failed to ListenAndServe", err)
    }
}

func Shutdown(ctx context.Context, timeout time.Duration, server *http.Server, db *pgxpool.Pool) error {

    closeCtx, cancel := context.WithTimeout(ctx, timeout)
    defer cancel()

    // first, shutdown the server to stop accepting new requests
    if err := server.Shutdown(closeCtx); err != nil {
        return err
    }

    // allow running transactions to finish, but if they don't finish within
    // ten seconds, cancel the context of all running transactions so that they
    // are forced to finish (albeit, with error)
    transactionsComplete := waitForTransacitons(time.Second*10, db)
    if !transactionsComplete {
        cancelContextOfEveryTransaction()
    }
    
    // since this call blocks until all transactions finished we must call it
    // only once we are sure that there are no more running transactions.
    db.Close(ctx)

    return nil
}

Would the optimal graceful termination sequence be:

  • Shutdown the server.
  • Immediately cancel context of all ongoing requests (killing the transaction as soon as the database driver tries to do anything with it).
  • Close the connection pool.
  • Exit.

[edit]: alternative termination sequence (more graceful):

  • Termination signal is received.
  • The pod is in 'terminating' state and is removed from the load balancer.
  • Shutdown the server with some timeout N.
  • Shutdown the connection pool - with a short timeout. Reasoning: since server.Shutdown returned, no responses will be returned. The only reason to wait for ongoing transactions is for background workers to finish their work, such as writing logs to the database.
  • If there are still open transaction that prevent the connection pool from closing - kill these transactions and try to close the pool again.
  • Exit.
2

There are 2 best solutions below

2
shadyyx On

Why reinventing the wheel and not using some of the existing libraries, that do the magic for you?

In our production services, we have used this graceful shutdown lib a lot and never had issues with it. It waits until all HTTP requests are served (within given timeout) and shuts down afterwards.

The usage couldn't be simpler. After installing it

go mod download github.com/TV4/graceful

(eventually:

go get -u github.com/TV4/graceful

)

you only need to import it:

import (
    // ...

    "github.com/TV4/graceful"
)

and then you can replace all your code after instantiating a server (including your Shutdown function) with this one-liner:

server := ...
graceful.LogListenAndServe(server, logger)
0
xpmatteo On

Not an answer to the question by the OP, but a followup fo @shadyyx's answer that mentions a deprecated library.

The example provided by the Go documentation seems to work well.

I reworked it in a simple function that hides the complexity away. You can call it in place of http#ListenAndServe.

Disclaimer: I am a hobby Gopher, I never used this code in production.

// GracefulListenAndServe will ensure that when the server receives a SIGTERM signal,
// it will shut down gracefully.  It will wait for all connections, both open and idle
// to be closed before shutting down the process.
//
// There is no timeout enforced, because it is the job of the container to do that.
// For instance, Kubernetes will eventually forcefully kill a pod after waiting
// for a configured timeout for it to exit cleanly.
func GracefulListenAndServe(addr string, handler http.Handler) {
    server := &http.Server{Addr: addr, Handler: handler}

    idleConnsClosed := make(chan struct{})
    go func() {
        sigint := make(chan os.Signal, 1)
        signal.Notify(sigint, os.Interrupt)
        <-sigint

        // We received an interrupt signal, shut down.
        if err := server.Shutdown(context.Background()); err != nil {
            // Error from closing listeners, or context timeout:
            log.Printf("HTTP server Shutdown: %v", err)
        }
        close(idleConnsClosed)
    }()

    err := server.ListenAndServe()
    if !errors.Is(err, http.ErrServerClosed) {
        // Error starting or closing listener
        log.Fatal(err)
    }

    <-idleConnsClosed
}