"pg: connection pool error" and the service never reconnect to db

42 Views Asked by At

I have two services, a backend and a DB. Backend is written in go, where i use go-pg v9 to manage DB connection; DB is a postgres DB. Those two services runs in two containers in a dedicated docker-network.

The services acts normally usually, but at some point, very not-deterministic and randomly, the backend returns an error error="pg: connection pool timeout", after trying for 30 seconds (the default time) to entabilish a new connection from the pool of DB.

After this error appears, then the backend will always return this error on every action on DB, and the only way I found to get rid of this is to restart it.

I don't have so many connections (about 1 at second, bu parallel no more than 10), so I don't understand why, even if the connection is not available in a certain moment, the connection is no more available after this.

Here my backend configurations and connection to DB:

func ConnectToDB(opts DBConfig) (*DBModel, error) {
    pgopts := pg.Options{
        User:               opts.User,
        Addr:               opts.Host + ":" + opts.Port,
        Password:           opts.Password,
        Database:           opts.Name,
        IdleCheckFrequency: 5 * time.Second,
        IdleTimeout:        30 * time.Second,
        PoolSize:           200,
    }

    creds := fmt.Sprintf("postgres://%s:%s@%s:%s/%s?sslmode=disable", opts.User, opts.Password, opts.Host, opts.Port, opts.Name)
    fmt.Println("DB Connect to:", creds)

    for i := 0; i < opts.RetryConnect; i++ {
        gdb := pg.Connect(&pgopts)
        rdb := &DBModel{gdb, creds}
        if err := rdb.Ping(); err != nil {
            fmt.Println("Cannot ping db: ", err.Error())
            time.Sleep(time.Duration(opts.RetryDelay) * time.Second)
        } else {
            fmt.Println("Connected to db succesfully")
            return rdb, nil
        }
    }

    return nil, errors.MakeDatabaseError("Can't connect to database")
}

All my functions have this sort of implementation in which they use DB connection:

func (s Service) PostMarpossData(_ context.Context, req *models.PostMarpossDataRequest) error {
    return s.DbRepository.DB().RunInTransaction(func(tx *pg.Tx) error {
        for i := 0; i < len(req.Data); i++ {
            err := s.DbRepository.PostMarpossData(tx, req.Data[i])
            if err != nil {
                return err
            }
        }
        return nil
    })
}

func (d DBModel) PostMarpossData(tx *pg.Tx, md models.MarpossData) error {

    q := `INSERT INTO marposs_data (id, dim, sid, date, result, batch_name, prop_name, prop_desc, code_desc, code_name, operator, 
           range, value, check_date, anomaly, created_at, ucl_rng, lcl_rng, ucl_avg, lcl_avg) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) 
           ON CONFLICT (id, sid) DO UPDATE SET dim = ?, date = ?, result = ?, batch_name = ?, range = ?, value = ?, prop_desc = ?,
           code_desc = ?, prop_name = ?, code_name = ?, operator = ?, check_date = ?, anomaly = ?, created_at = ?, ucl_rng = ?, lcl_rng = ?, ucl_avg = ?, lcl_avg = ?`

    res, err := tx.Query(&md, q, md.Id, md.Dim, md.Sid, md.Date, md.Result, md.BatchName, md.PropName, md.PropDesc, md.CodeDesc, // insert values
        md.CodeName, md.Operator, md.Range, md.Value, md.CheckDate, md.Anomaly, time.Now().UTC(), md.UclRng, md.LclRng, md.UclAvg, md.LclAvg, // insert values

        md.Dim, md.Date, md.Result, md.BatchName, md.Range, md.Value, md.PropDesc, md.CodeDesc, // on conflict values
        md.PropName, md.CodeName, md.Operator, md.CheckDate, md.Anomaly, time.Now().UTC(), md.UclRng, md.LclRng, md.UclAvg, md.LclAvg) // on conflict values
    if err != nil {
        return errors.MakeErrBadQuery("error inserting marposs_data: " + err.Error())
    }
    if res.RowsAffected() != 1 {
        return errors.MakeErrBadQuery("error inserting marposs_data")
    }
    return nil
}

As I said, I don't get why the backend service stucks on the pool timeout and also further and future queries or connections to DB don't work.

Moreover, i check also from pgadmin and DB logs that, while the error was still there and new function calls return that error, in the DB there were only 3 active connections.

Some ideas?

I would expect that after some times, the pool return to be reachable. But this never happens.

0

There are 0 best solutions below