Cassandra message deserialization exception on ALTER TABLE ADD

49 Views Asked by At

I have some trouble when I trying to add column to Cassandra(4.0.5) ColumnFamily.

I use 3 node cluster in 1 DC

./nodetool -h cass-host status
Datacenter: PERF-DC
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load        Tokens  Owns (effective)  Host ID                               Rack         
UN  10.24.40.53   391.57 KiB  16      100.0%            2e8207df-1954-4ba6-b438-1c1215cd264f  PERF-DC-RACK1
UN  10.24.40.133  388.83 KiB  16      100.0%            faaa220b-83ec-4489-8814-83739cf0c6c5  PERF-DC-RACK7
UN  10.24.40.132  402.47 KiB  16      100.0%            ef89f057-a315-4451-83e3-a9c27a0ee0c9  PERF-DC-RACK8


10.24.40.53  - cass-host1    
10.24.40.133 - cass-host2
10.24.40.132 - cass-host3

I have one ColumnFamily

CREATE TABLE table_01
(
    tdsrId         BIGINT,
    id             BIGINT,
    contents       VARCHAR,
    PRIMARY KEY ((tdsrId),id) 
) WITH default_time_to_live = 3600 
AND gc_grace_seconds = 3600;

At first I have started activity for reading data from my java-app

HOST: cass-host3
Consistency: LOCAL_QUORUM
QUERY: SELECT tdsrId,id,contents FROM table_01 WHERE tdsrId=1234567
load: 1000 operation per second

And then I tried to add column to ColumnFamily:

$cqlsh cass-host1

Connected to PERF-CLUSTER at cass-host1:9042
[cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.

cassandra@cqlsh> alter table ks.table_01 add t01 int;

As a results:

1)On the cass-host3 I got 5 errors in cassandra/log/debug.log


    ERROR [Messaging-EventLoop-3-1] 2023-12-26 18:03:01,094 InboundMessageHandler.java:182 - /10.24.40.53:7000->/10.24.40.132:7000-SMALL_MESSAGES-ff4dd09b unexpected exception caught while deserializing a message
    java.lang.RuntimeException: Unknown column t01 during deserialization
        at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:489)
        at org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserializeRegularAndStaticColumns(ColumnFilter.java:1072)
        at org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:1021)
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:928)
        at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:833)
        at org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
        at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
        at org.apache.cassandra.net.InboundMessageHandler.processSmallMessage(InboundMessageHandler.java:168)
        at org.apache.cassandra.net.InboundMessageHandler.processOneContainedMessage(InboundMessageHandler.java:151)
        at org.apache.cassandra.net.AbstractMessageHandler.processFrameOfContainedMessages(AbstractMessageHandler.java:242)
        at org.apache.cassandra.net.AbstractMessageHandler.processIntactFrame(AbstractMessageHandler.java:227)
        at org.apache.cassandra.net.AbstractMessageHandler.process(AbstractMessageHandler.java:218)
        at org.apache.cassandra.net.FrameDecoder.deliver(FrameDecoder.java:321)
        at org.apache.cassandra.net.FrameDecoder.channelRead(FrameDecoder.java:285)
        at org.apache.cassandra.net.FrameDecoder.channelRead(FrameDecoder.java:269)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

2)On my java-app I have 5 errors

2023-12-26 18:02:53.941 ERROR 1821233 --- [   scheduling-1] console_out     : err count: 5, message:Query; 
CQL [SELECT tdsrId,id,contents FROM table_01 WHERE tdsrId=?]; 
Cassandra failure during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed); 
nested exception is com.datastax.oss.driver.api.core.servererrors.ReadFailureException:
 Cassandra failure during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed)

And after that, next cql works fine without any error.

I expect, that ALTER TABLE is danger operation and I lost some query result, witch had been executing simultaneously with ALTER.

Can someone explain, is this behave correctly or may be I tune my cassandra and use ALTER TABLE ADD COLUMN without this errors?

1

There are 1 best solutions below

0
Madhavan On

Are you running alter and select statements via some automation job that does these in quick successions? Could you perform nodetool describecluster as soon as you do the ALTER statement to ensure the schema changes have propagated to all the nodes in the cluster (in which case all nodes will have the same UUID of the schema) prior to issuing a select on the newly added column on that table? If the result of the command has multiple UUIDs, that means the schema changes have not propagated to other/all nodes in the cluster and you need to investigate it by looking at system.log and debug.log file(s).

On the other issue that you ran into,

Cassandra failure during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed);

This is case where the C* cluster isn't properly spec'd for the load that you're attempting to put on the cluster. Have you already gone through the proper cluster sizing and testing of the antipated (+ some buffer for cushion) testing of the actual load that will be operated on this cluster during the initial planning stages? If not, I'd strongly encourage you to check out this documentation and perform the necessary testing/sizing of the cluster. Cheers!