I can't explain more but I edited core-site.xml of Hadoop as follows.
- core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.abfs.impl</name>
<value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.abfs.impl</name>
<value>org.apache.hadoop.fs.azurebfs.Abfs</value>
</property>
<property>
<name>fs.azure.account.auth.type.example.dfs.core.windows.net</name>
<value>SharedKey</value>
</property>
<property>
<name>fs.azure.account.key.example.dfs.core.windows.net</name>
<value>{{STORAGEACCOUNT ACCESS KEY HERE}}</value>
</property>
<property>
<name>fs.azure.account.auth.type</name>
<value>OAuth</value>
</property>
<property>
<name>fs.azure.account.oauth.provider.type</name>
<value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.endpoint</name>
<value>https://login.microsoftonline.com/{{MY TENANT ID}}/oauth2/token</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.id</name>
<value>{{MY SERVICE PRINCIPAL ID}}</value>
</property>
<property>
<name>fs.azure.account.oauth2.client.secret</name>
<value>{{MY SERVICE PRINCIPAL SECRET}}</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>abfs://[email protected]</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
</property>
<property>
<name>fs.azure.createRemoteFileSystemDuringInitialization</name>
<value>true</value>
</property>
</configuration>
And I have configured the Proxy settings as follows by referring to the site link below. enter link description here
export DISTCP_PROXY_OPTS="-Dhttps.proxyHost=mycluster01 -Dhttps.proxyPort=9000"
hadoop distcp \
-D mapreduce.map.java.opts="$DISTCP_PROXY_OPTS" \
-D mapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" \
-D mapreduce.job.hdfs-servers.token-renewal.exclude=server \
hdfs://mycluster01:9000/ abfs://[email protected]/
Nevertheless, I've run into numerous errors as below. I'd appreciate some advice if anyone could help me fix this.
2024-02-28 21:22:56,611 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://mycluster01:9000/], targetPath=abfs://[email protected]/, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false, useiterator=false}, sourcePaths=[hdfs://mycluster01:9000/], targetPathExists=true, preserveRawXattrs=false
2024-02-28 21:22:56,703 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2024-02-28 21:22:57,459 INFO tools.SimpleCopyListing: Starting: Building listing using multi threaded approach for hdfs://mycluster01:9000/
2024-02-28 21:22:57,651 INFO tools.SimpleCopyListing: Building listing using multi threaded approach for hdfs://mycluster01:9000/: duration 0:00.192s
2024-02-28 21:22:57,746 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 963; dirCnt = 246
2024-02-28 21:22:57,746 INFO tools.SimpleCopyListing: Build file listing completed.
2024-02-28 21:22:57,748 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2024-02-28 21:22:57,748 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2024-02-28 21:22:57,987 INFO tools.DistCp: Number of paths in the copy list: 963
2024-02-28 21:22:58,176 INFO tools.DistCp: Number of paths in the copy list: 963
2024-02-28 21:22:58,193 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2024-02-28 21:22:58,569 INFO mapreduce.JobSubmitter: number of splits:21
2024-02-28 21:22:58,764 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1709106591979_0016
2024-02-28 21:22:58,764 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-02-28 21:22:58,939 INFO conf.Configuration: resource-types.xml not found
2024-02-28 21:22:58,939 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-02-28 21:22:58,987 INFO impl.YarnClientImpl: Submitted application application_1709106591979_0016
2024-02-28 21:22:59,012 INFO mapreduce.Job: The url to track the job: http://mycluster01:8088/proxy/application_1709106591979_0016/
2024-02-28 21:22:59,012 INFO tools.DistCp: DistCp job-id: job_1709106591979_0016
2024-02-28 21:22:59,012 INFO mapreduce.Job: Running job: job_1709106591979_0016
2024-02-28 21:23:01,024 INFO mapreduce.Job: Job job_1709106591979_0016 running in uber mode : false
2024-02-28 21:23:01,025 INFO mapreduce.Job: map 0% reduce 0%
2024-02-28 21:23:01,032 INFO mapreduce.Job: Job job_1709106591979_0016 failed with state FAILED due to: Application application_1709106591979_0016 failed 2 times due to AM Container for appattempt_1709106591979_0016_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2024-02-28 21:23:00.875]Exception from container-launch.
Container id: container_1709106591979_0016_02_000001
Exit code: 1
[2024-02-28 21:23:00.877]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:44)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 1 more
[2024-02-28 21:23:00.877]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:44)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 1 more
For more detailed output, check the application tracking page: http://mycluster01:8088/cluster/app/application_1709106591979_0016 Then click on links to logs of each attempt.
. Failing the application.
2024-02-28 21:23:01,044 INFO mapreduce.Job: Counters: 0
2024-02-28 21:23:01,047 ERROR tools.DistCp: Exception encountered
java.io.IOException: DistCp failure: Job job_1709106591979_0016 has failed: Application application_1709106591979_0016 failed 2 times due to AM Container for appattempt_1709106591979_0016_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2024-02-28 21:23:00.875]Exception from container-launch.
Container id: container_1709106591979_0016_02_000001
Exit code: 1
[2024-02-28 21:23:00.877]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:44)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 1 more
[2024-02-28 21:23:00.877]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:44)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 1 more
For more detailed output, check the application tracking page: http://mycluster01:8088/cluster/app/application_1709106591979_0016 Then click on links to logs of each attempt.
. Failing the application.
at org.apache.hadoop.tools.DistCp.waitForJobCompletion(DistCp.java:232)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:185)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:443)