High spanner CPU utilization and Session Count

86 Views Asked by Vijay Srinivasaraghavan At 03 March 2024 at 04:42

I am seeing a high CPU utilization (~50% - single node) from the Spanner node even for a smaller QPS (~200 to 300). In my setup, I am running 40 pods and each pod subscribes to a PubSub Lite partition from where the messages are pulled. Each message is a batch compressed (10 to 30 events max) set of events out of which I derive the primary keys (~20 events * 7 = 140) and use them in the IN clause to query the database. Each pod instantiates a session pool (30 max sessions, 5 gRPC channels) and uses a parameterized query option to query the database.

        stmt := spanner.Statement{
            SQL:    query,
            Params: paramsVal,
        }
        readOnlyTxn := cc.client.Single()
        defer readOnlyTxn.Close()
        iter := readOnlyTxn.WithTimestampBound(spanner.MaxStaleness(time.Hour)).Query(cc.ctx, stmt)

I am seeing more active sessions in the spanner metrics than what is given in the session pool and I am not sure if this is causing additional CPU resources to consume.

        config := spanner.ClientConfig{
            NumChannels:   numGRPCChannels,
            SessionLabels: labelMap,
            SessionPoolConfig: spanner.SessionPoolConfig{
                InactiveTransactionRemovalOptions: spanner.InactiveTransactionRemovalOptions{
                    ActionOnInactiveTransaction: spanner.WarnAndClose,
                },
                MaxOpened: uint64(sessionPoolSize),
            },
        }

Does IN clause query using the primary key have any effect on the query performance?

I have tried to see the query stats view data and I don't see anything obvious to suspect any issues on the query side.

P.S: I did a PoC sometime ago using test data with predefined primary keys and I was able to get 22K QPS on a single node with 70% CPU usage. I am using the same code in the prod environment where the data set is dynamic and I suspect here the prepared statements are not getting cached.

Sampled Query Plan

I am using the primary key to query the records but the plan is showing full table scan. Is that expected? Also, it is showing that there were 20 RPC calls and the total CPU time and the response time is very high.

Original Q&A

High spanner CPU utilization and Session Count

There are 0 best solutions below

Related Questions in GOOGLE-CLOUD-SPANNER

Trending Questions

Popular # Hahtags

Popular Questions