writing to hdfs error: Not enough replicas was chosen. Reason: {NO_REQUIRED_STORAGE_TYPE=1}

Question

writing to hdfs error: Not enough replicas was chosen. Reason: {NO_REQUIRED_STORAGE_TYPE=1}

1.6k Views Asked by mehran At 21 May 2023 at 11:45

I encountered the problem similar to this:

https://stackoverflow.com/questions/52809233/failed-to-place-enough-replicas-expected-size-is-1-but-only-0-storage-types-can

my name node logs in $HADOOP_HOME/logs/ while the storage policy is set to ALL_SSD:

2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: Choosing random from 4 available nodes on node /default-rack, scope=/default-rack, excludedScope=null, excludeNodes=[192.168.132.41:9866]. numOfDatanodes=5.
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: nthValidToReturn is 0
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: Chosen node 192.168.132.44:9866 from first random
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: chooseRandom returning 192.168.132.44:9866
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: Failed to find datanode (scope="" excludedScope="/default-rack"). numOfDatanodes=0
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.net.NetworkTopology: No node to choose.
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: [
  Datanode None is not chosen since required storage types are unavailable  for storage type DISK.
2023-05-21 09:17:31,380 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not enough replicas was chosen. Reason: {NO_REQUIRED_STORAGE_TYPE=1}
2023-05-21 09:17:31,380 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to choose remote rack (location = ~/default-rack), fallback to local rack
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: 
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:914)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:774)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:566)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:478)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:524)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:350)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:170)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:195)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2307)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2960)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:904)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:593)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)
2023-05-21 09:17:31,380 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1113051428_39314554, replicas=192.168.132.43:9866, 192.168.132.41:9866, 192.168.132.44:9866 for /Hakim/archive_players/partitions=1/date=2023-05-20/part-00003-dfce3f68-b1f5-45eb-8f91-43da0ae42139.c000.snappy.parquet

this error log occurs repeatedly for ever write operation until the name node crashes.

I'v done whatever I can, but still exists. I'v tried all storage policies of ONE_SSD, ALL_SSD,HOT, COLD

when I run lsblk

this is the result on all nodes(namenode and all datanodes):

/hdfs partition is the location of data nodes and namenode data in all cluster nodes.

the lvm type shown in picture below consist of physical SSDs:

lsblk -d -n -o name,rota the result is:

my hdfs-site.xml configuration:

<configuration>
        <property>
                <name>dfs.replication.min</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.replication.max</name>
                <value>3</value>
        </property>

        <property>
                <name>dfs.name.dir</name>
                <value>file:///hdfs/hadoop_data/hdfs/nameNode</value>
        </property>

        <property>
                <name>dfs.data.dir</name>
                <value>file:///hdfs/hadoop_data/hdfs/dataNode</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
                <name>dfs.namenode.acls.enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.storage.policy.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>hadoop.security.hdfs.umask-mode</name>
                <value>000</value>
        </property>
        <property>
                <name>dfs.blocksize</name>
                <value>536870912</value>
        </property>

</configuration>

my core-site.xml configuration:

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://192.168.132.37:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/hdfs/hadoop_data/hdfs/tempDir</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
        <property>
                <name>hadoop.proxyuser.simra.groups</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.simra.hosts</name>
                <value>*</value>
        </property>
                <property>
                <name>hadoop.proxyuser.server.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.server.groups</name>
                <value>*</value>
                <name>hadoop.user.group.static.mapping.overrides</name>
                <value>dr.who=dr.who,user1,user2,user3,user4,user5,user6;</value>
        </property>

checking the datanode ports:

my namenode web ui overview:

update: there's a point, when I wanto to write a parquet to hdfs using pyspark, the Df is partitioned to n files based on partitioning rules of pyspark(as default n= spark worker cores), some of these files are written successfully, and some others encounter that error.

I'v spent more than a month and no result, I'm so confused what to do. Any one can help?

Original Q&A

There are 2 best solutions below

**DieterDP** · Answer 1 · 2023-07-10T11:46:14.937000

I encountered a similar issue. Though I don't OP has this issue, it may help others coming across this question.

In my case, HDFS had been configured to use 2 storage dirs, with a storage policy restricting a specific HDFS path to a specific storage dir.

# in hdfs-site.xml:
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file://data-0, [SSD]file://data-1</value>
</property>

HDFS writes failed because the disk holding the first data dir was full (only this dir was allowed due to the hdfs storagepolicies that we had). Note that the other data dir was not full, so the HDFS UI showed a disk capacity of less than 100% (which was pretty confusing).

**csv** · Answer 2 · 2024-01-31T08:12:13.737000

For me it helped to set dfs.use.dfs.network.topology = false. As far as I have understood, hadoop/hdfs is rack-ware. It tries to not write the same block to the same node, and to no more than two nodes in the same rack. And therefore distributing the data across multiple racks. I run a very simplistic hdfs in a single Zone, so I do not care about the topology at all.

writing to hdfs error: Not enough replicas was chosen. Reason: {NO_REQUIRED_STORAGE_TYPE=1}

update: there's a point, when I wanto to write a parquet to hdfs using pyspark, the Df is partitioned to n files based on partitioning rules of pyspark(as default n= spark worker cores), some of these files are written successfully, and some others encounter that error.

There are 2 best solutions below

Related Questions in HADOOP

Related Questions in HDFS

Related Questions in NAMENODE

Related Questions in DATANODE

Trending Questions

Popular # Hahtags

Popular Questions