GFS/Hadoop master's storage capacity

140 Views Asked by At

I'm reading GFS paper but unable to understand one point, does master maintains 64kb of metadata for each replica of file(s) too? Say if master's memory is 8 gb and I store 1000 files of 1 kb each, how much memory it's going to take? if replication factor is 3.

2

There are 2 best solutions below

0
On

No. The main metadata for each replica are only on chunk servers' memory. Master only store 2 types of chunk metadata.

  1. the chunk handle which less than 64 bytes of metadata for each 64 MB chunk
  2. the location of each chunk which maintained by HeartBeat between chunk servers and master.

Here's details from the paper.

The master stores three major types of metadata: the file and chunkna mespaces, the apping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory.

The master does not keep a persistent record of which chunkservers have a replica of a given chunk. It simply polls chunkservers for that information at startup. The master can keep itself up-to-date thereafter because it controls all chunk placement and monitors chunkserver status with regular HeartBeat messages.

The most important sentence:

a chunkserver has the final word over what chunks it does or does not have on its own disks.

0
On

GFS maintains less than 64 bytes of metadata for each 64 MB chunk, not for an individual file. Each replica costs the same overhead of metadata. Therefore, how much memory 1000 files takes depends on how many chunks in total for those files.