How does Google file system deal with write failures at replicas?

860 Views Asked by At

I was reading the Google file system paper and am not sure how it deals with write (not atomic record append) failures at replicas. If it returns success, then it will wait until the next heartbeat for the master to get the updated state of the world and detect the corruption/stale chunk version and delete the chunk. I guess that the master can verify the validity of all the replicas whenever clients ask for replica locations, to prevent client from ever getting stale/corrupted data. Is this how it deals with replica write failure?

2

There are 2 best solutions below

0
On

The invalid chunks can be divided into two categories: stale and corrupted checksum. They are checked in two different ways.

Stale chunk. The version of chunk is not up to date. The master checks stale chunks during regular heartbeat with chunk servers.

Corrupted checksum. Since divergent replicas may be legal, replicas across the GFS are not guaranteed to be identical. In addition, for performance consideration, the checksum is performed independently on the chunk servers themselves rather than on the master.

Checksum can be checked in two phases:

  1. When clients or other chunk servers request for the chunk
  2. When the chunk servers are during idle periods, they scan and verify the inactive chunks to avoid corrupted chunks are considered as valid replicas.

If the checksum is corrupted, the chunk server reports the master about the problem. The master clones the replica from other chunk servers with healthy replicas. After that, the master instructs the chunk server that reports the problem to delete the chunk.

Back to your question, how GFS deal with replica write failure?

If any error is encounter during replication, the failure of mutation is reported to the client. The client must handle the error and retry the mutation. The inconsistent chunks will be garbage collected during regular scan in chunk servers.

0
On

Your question is not very clear but I'll answer what I made an educated guess of what your question would be. It seems related to how chunk version number helps in stale replica detection-:

  1. Clients asks to write in files.

  2. Master sends metadata response to client. In that metadata response,a chunk version number is also included. This is indication to grant lease of a chunkserver to client.

  3. Now, the master increments the chunk version number and asks all the chunkservers (chunks) to do the same once the WRITE is COMPLETED.

  4. All this has happened before the master starts writing on file on the chunk.

  5. Say the chunkserver crashed.

  6. Once the chunkserver restarts, master-chunkserver communicates in heartbeat messages and compares the "chunk version number". The thing is, if the write was accomplished, then the chunk version number in all chunks/replicas should've been same as the master's chunk version number. If it's not the same, failure has occured during writing.

  7. So the master decrements its chunk version number and during garbage collection, all those failed replicas are removed.