How to have a cluster of 3 datanodes that work at the same time?

56 Views Asked by At

I run one datanode with: ./bin/hdfs datanode -conf ./etc/hadoop/datanode1.xml only one work when i try run two: "datanode is running as process. Stop it first and ensure /tmp/hadoop-user-datanode.pid file is empty before retry.

Hadoop3.3.5

core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml:

<configuration>
    <property>
       <name>dfs.replication</name>
       <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/user/code/hdfs/namenode/</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/user/code/hdfs/datanode1,/home/user/code/hdfs/datanode2,/home/user/code/hdfs/datanode3</value>
    </property>
</configuration>

datanode1.xml:

<configuration>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:9011</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9076</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:9021</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/user/code/hdfs/datanode1/</value>
    </property>
</configuration>

datanode2.xml:

<configuration>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:9012</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9077</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:9022</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/user/code/hdfs/datanode2/</value>
    </property>
</configuration>

datanode3.xml:

<configuration>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:9013</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9078</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:9023</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/user/code/hdfs/datanode3/</value>
    </property>
</configuration>
1

There are 1 best solutions below

0
OneCricketeer On

There probably is some property to relocate the PID file, but you really should not run three data nodes on one machine anyway. Especially if they all use one physical hard drive.

If you want to simulate a Hadoop cluster, use Docker or VMs. Otherwise, use a service like EMR or Dataproc.