I have a fairly small setup (HDP 2.6) with roughly 1429 blocks on a 15 TB HDD. The system has 512 GB RAM and 128 cores (256 threads). Over last few days, I've seen the startup of entire HDP setup go from about 10 minutes to several hours. The culprit turned out to be the NameNode.When the box was first setup without any data, the entire HDP + HCP setup would startup in about 10 minutes (including data and name nodes). We start testing with large volumes of data and over time our block went over 23 million. At this point the system took around 3 hours to start. This was mostly due to NameNode startup time, which seems understandable given the large number of blocks.
However, even after deleting all the folders/blocks and leaving behind just 1429 blocks, the system is still taking over 50 minutes to start name node and come out of Safe Mode automatically.
The DataNode logs pause after the Replica Cache line below and then start displaying "Detected pause in JVM or host machine (eg GC)".
************************************************************/
2019-10-29 00:30:01,711 INFO datanode.DataNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3.2.6.5.1100-53
STARTUP_MSG: classpath = removed for brevity
STARTUP_MSG: build = [email protected]:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG: java = 1.8.0_112
************************************************************/
2019-10-29 00:30:02,253 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for [DISK]file:/hadoop/hdfs/data/
2019-10-29 00:30:04,189 INFO datanode.BlockScanner (BlockScanner.java:<init>(180)) - Initialized block scanner with targetBytesPerSec 1048576
2019-10-29 00:30:04,193 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,197 INFO datanode.DataNode (DataNode.java:<init>(444)) - File descriptor passing is enabled.
2019-10-29 00:30:04,197 INFO datanode.DataNode (DataNode.java:<init>(455)) - Configured hostname is xxxx.corp.com
2019-10-29 00:30:04,197 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,198 WARN conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2019-10-29 00:30:04,204 INFO datanode.DataNode (DataNode.java:startDataNode(1251)) - Starting DataNode with maxLockedMemory = 0
2019-10-29 00:30:04,221 INFO datanode.DataNode (DataNode.java:initDataXceiver(1028)) - Opened streaming server at /0.0.0.0:50010
2019-10-29 00:30:04,223 INFO datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,223 INFO datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,225 INFO datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,225 INFO datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,226 INFO datanode.DataNode (DataNode.java:initDataXceiver(1043)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2019-10-29 00:30:04,296 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:04,304 INFO server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:04,308 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined
2019-10-29 00:30:04,313 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:04,315 INFO http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-29 00:30:04,337 INFO http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 43272
2019-10-29 00:30:04,338 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:04,511 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:43272
2019-10-29 00:30:07,643 INFO web.DatanodeHttpServer (DatanodeHttpServer.java:start(233)) - Listening HTTP traffic on /0.0.0.0:50075
2019-10-29 00:30:07,647 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:08,366 INFO datanode.DataNode (DataNode.java:startDataNode(1277)) - dnUserName = hdfs
2019-10-29 00:30:08,366 INFO datanode.DataNode (DataNode.java:startDataNode(1278)) - supergroup = hdfs
2019-10-29 00:30:08,579 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:08,734 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8010
2019-10-29 00:30:09,244 INFO datanode.DataNode (DataNode.java:initIpcServer(941)) - Opened IPC server at /0.0.0.0:8010
2019-10-29 00:30:09,258 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2019-10-29 00:30:09,274 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2019-10-29 00:30:09,430 INFO datanode.DataNode (BPServiceActor.java:run(761)) - Block pool <registering> (Datanode Uuid unassigned) service to xxxx.corp.com/scrambled.private.ip.address:8020 starting to offer service
2019-10-29 00:30:09,434 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:09,434 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8010: starting
2019-10-29 00:30:10,930 INFO common.Storage (DataStorage.java:getParallelVolumeLoadThreadsNum(384)) - Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2019-10-29 00:30:10,962 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /hadoop/hdfs/data/in_use.lock acquired by nodename [email protected]
2019-10-29 00:30:11,121 INFO common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(250)) - Analyzing storage directories for bpid BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,121 INFO common.Storage (Storage.java:lock(735)) - Locking is disabled for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,139 INFO datanode.DataNode (DataNode.java:initStorage(1546)) - Setting up storage: nsid=875919329;bpid=BP-814497463-127.0.0.1-1558792659773;lv=-56;nsInfo=lv=-63;cid=CID-49b9105e-fc0d-4ea4-9d2f-caceb95ce4bb;nsid=875919329;c=0;bpid=BP-814497463-127.0.0.1-1558792659773;dnuuid=0aff4a22-3f1a-485b-9aec-46fd881dfab0
2019-10-29 00:30:11,523 INFO impl.FsDatasetImpl (FsVolumeList.java:addVolume(295)) - Added new volume: DS-ea7ed3be-90ad-4424-a00c-577601814d81
2019-10-29 00:30:11,523 INFO impl.FsDatasetImpl (FsDatasetImpl.java:addVolume(426)) - Added volume - /hadoop/hdfs/data/current, StorageType: DISK
2019-10-29 00:30:11,527 INFO impl.FsDatasetImpl (FsDatasetImpl.java:registerMBean(2203)) - Registered FSDatasetState MBean
2019-10-29 00:30:11,711 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for /hadoop/hdfs/data/current
2019-10-29 00:30:11,719 INFO checker.DatasetVolumeChecker (DatasetVolumeChecker.java:checkAllVolumes(210)) - Scheduled health check for volume /hadoop/hdfs/data/current
2019-10-29 00:30:11,721 INFO impl.FsDatasetImpl (FsDatasetImpl.java:addBlockPool(2686)) - Adding block pool BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,722 INFO impl.FsDatasetImpl (FsVolumeList.java:run(392)) - Scanning block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,898 INFO impl.FsDatasetImpl (BlockPoolSlice.java:loadDfsUsed(251)) - Cached dfsUsed found for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current: 414855315456
2019-10-29 00:30:11,901 INFO impl.FsDatasetImpl (FsVolumeList.java:run(397)) - Time taken to scan block pool BP-814497463-127.0.0.1-1558792659773 on /hadoop/hdfs/data/current: 178ms
2019-10-29 00:30:11,901 INFO impl.FsDatasetImpl (FsVolumeList.java:addBlockPool(423)) - Total time to scan all replicas for block pool BP-814497463-127.0.0.1-1558792659773: 180ms
2019-10-29 00:30:11,906 INFO impl.FsDatasetImpl (FsVolumeList.java:run(188)) - Adding replicas to map for block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,906 INFO impl.BlockPoolSlice (BlockPoolSlice.java:readReplicasFromCache(738)) - Replica Cache file: /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current/replicas doesn't exist
2019-10-29 00:31:24,607 INFO timeline.HadoopTimelineMetricsSink
The corresponding NameNode log shows the following and keeps repeating "The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428."
***********************************************/
2019-10-29 00:30:20,165 INFO namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3.2.6.5.1100-53
STARTUP_MSG: classpath = removed for brevity
STARTUP_MSG: build = [email protected]:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG: java = 1.8.0_112
***************/
2019-10-29 00:30:20,176 INFO namenode.NameNode (NameNode.java:createNameNode(1624)) - createNameNode []
2019-10-29 00:30:20,747 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is hdfs://xxxx.corp.com:8020
2019-10-29 00:30:20,748 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use xxxx.corp.com:8020 to access this namenode/service.
2019-10-29 00:30:20,866 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:20,874 INFO hdfs.DFSUtil (DFSUtil.java:httpServerTemplateForNNAndJN(1803)) - Starting Web-server for hdfs at: http://xxxx.corp.com:50070
2019-10-29 00:30:20,923 INFO server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:20,927 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.namenode is not defined
2019-10-29 00:30:20,931 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:20,934 INFO security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:20,953 INFO http.HttpServer2 (NameNodeHttpServer.java:initWebHdfs(93)) - Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2019-10-29 00:30:20,954 INFO http.HttpServer2 (HttpServer2.java:addJerseyResourcePackage(687)) - addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2019-10-29 00:30:20,961 INFO http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 50070
2019-10-29 00:30:20,962 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:20,986 WARN mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m, using /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m_2845921744604868870
2019-10-29 00:30:21,121 INFO mortbay.log (Slf4jLog.java:info(67)) - Started [email protected]:50070
2019-10-29 00:30:21,143 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(690)) - Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,143 WARN namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(695)) - Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,148 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,148 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,153 WARN common.Storage (NNStorage.java:setRestoreFailedStorage(208)) - set restore failed storage to true
2019-10-29 00:30:21,172 INFO namenode.FSEditLog (FSEditLog.java:newInstance(225)) - Edit logging is async:false
2019-10-29 00:30:21,176 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(759)) - No KeyProvider found.
2019-10-29 00:30:21,176 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(765)) - Enabling async auditlog
2019-10-29 00:30:21,178 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(769)) - fsLock is fair:false
2019-10-29 00:30:21,204 INFO blockmanagement.HeartbeatManager (HeartbeatManager.java:<init>(91)) - Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
2019-10-29 00:30:21,207 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:21,214 INFO blockmanagement.DatanodeManager (DatanodeManager.java:<init>(274)) - dfs.block.invalidate.limit=1000
2019-10-29 00:30:21,214 INFO blockmanagement.DatanodeManager (DatanodeManager.java:<init>(280)) - dfs.namenode.datanode.registration.ip-hostname-check=true
2019-10-29 00:30:21,215 INFO blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(71)) - dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
2019-10-29 00:30:21,215 INFO blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will start around 2019 Oct 29 01:30:21
2019-10-29 00:30:21,217 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map BlocksMap
2019-10-29 00:30:21,217 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,218 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 2.0% max memory 1011.3 MB = 20.2 MB
2019-10-29 00:30:21,218 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^21 = 2097152 entries
2019-10-29 00:30:21,231 INFO blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(409)) - dfs.block.access.token.enable=true
2019-10-29 00:30:21,231 INFO blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(430)) - dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(395)) - defaultReplication = 1
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(396)) - maxReplication = 50
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(397)) - minReplication = 1
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(398)) - maxReplicationStreams = 2
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(399)) - replicationRecheckInterval = 3000
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(400)) - encryptDataTransfer = false
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(401)) - maxNumBlocksToLog = 1000
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(790)) - fsOwner = hdfs (auth:SIMPLE)
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(791)) - supergroup = hdfs
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(792)) - isPermissionEnabled = true
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(803)) - HA Enabled: false
2019-10-29 00:30:21,361 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(843)) - Append Enabled: true
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map INodeMap
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1011.3 MB = 10.1 MB
2019-10-29 00:30:21,389 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^20 = 1048576 entries
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(256)) - ACLs enabled? false
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(260)) - XAttrs enabled? true
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(268)) - Maximum size of an xattr: 16384
2019-10-29 00:30:21,393 INFO namenode.NameNode (FSDirectory.java:<init>(321)) - Caching file names occuring more than 10 times
2019-10-29 00:30:21,399 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map cachedBlocks
2019-10-29 00:30:21,399 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,400 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.25% max memory 1011.3 MB = 2.5 MB
2019-10-29 00:30:21,400 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^18 = 262144 entries
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5582)) - dfs.namenode.safemode.threshold-pct = 1.0
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5583)) - dfs.namenode.safemode.min.datanodes = 0
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5584)) - dfs.namenode.safemode.extension = 30000
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(76)) - NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(78)) - NNTop conf: dfs.namenode.top.num.users = 10
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(80)) - NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-10-29 00:30:21,408 INFO namenode.FSNamesystem (FSNamesystem.java:initRetryCache(971)) - Retry cache on namenode is enabled
2019-10-29 00:30:21,408 INFO namenode.FSNamesystem (FSNamesystem.java:initRetryCache(979)) - Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-10-29 00:30:21,410 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map NameNodeRetryCache
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^15 = 32768 entries
2019-10-29 00:30:21,456 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /home/hadoop/hdfs/namenode/in_use.lock acquired by nodename [email protected]
2019-10-29 00:30:21,503 INFO namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(388)) - Recovering unfinalized segments in /home/hadoop/hdfs/namenode/current
2019-10-29 00:30:21,527 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /home/hadoop/hdfs/namenode/current/edits_inprogress_0000000000199282266 -> /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,532 INFO namenode.FSImage (FSImage.java:loadFSImageFile(745)) - Planning to load image: FSImageFile(file=/home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232, cpktTxId=0000000000199282232)
2019-10-29 00:30:21,562 INFO namenode.FSImageFormatPBINode (FSImageFormatPBINode.java:loadINodeSection(257)) - Loading 1993 INodes.
2019-10-29 00:30:21,724 INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:load(184)) - Loaded FSImage in 0 seconds.
2019-10-29 00:30:21,725 INFO namenode.FSImage (FSImage.java:loadFSImage(911)) - Loaded image for txid 199282232 from /home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232
2019-10-29 00:30:21,725 INFO namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@63fd4873 expecting start txid #199282233
2019-10-29 00:30:21,726 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265
2019-10-29 00:30:21,729 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265' to transaction ID 199282233
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265 of size 1048576 edits # 33 loaded in 0 seconds
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1e11bc55 expecting start txid #199282266
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,752 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266' to transaction ID 199282233
2019-10-29 00:30:21,753 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266 of size 1048576 edits # 1 loaded in 0 seconds
2019-10-29 00:30:21,753 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1083)) - Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2019-10-29 00:30:21,754 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1294)) - Starting log segment at 199282267
2019-10-29 00:30:21,880 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 8 entries 214 lookups
2019-10-29 00:30:21,881 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(731)) - Finished loading FSImage in 465 msecs
2019-10-29 00:30:22,002 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to xxxx.corp.com:8020
2019-10-29 00:30:22,007 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:22,015 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8020
2019-10-29 00:30:22,049 INFO namenode.FSNamesystem (FSNamesystem.java:registerMBean(6517)) - Registered FSNamesystemState MBean
2019-10-29 00:30:22,050 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:22,064 INFO namenode.LeaseManager (LeaseManager.java:getNumUnderConstructionBlocks(139)) - Number of blocks under construction: 0
2019-10-29 00:30:22,065 INFO hdfs.StateChange (FSNamesystem.java:reportStatus(5952)) - STATE* Safe mode ON.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:30:22,075 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2019-10-29 00:30:22,077 INFO block.BlockTokenSecretManager (BlockTokenSecretManager.java:updateKeys(222)) - Updating block keys
2019-10-29 00:30:22,095 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:22,095 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8020: starting
2019-10-29 00:30:22,115 INFO namenode.NameNode (NameNode.java:startCommonServices(885)) - NameNode RPC up at: xxxx.corp.com/scrambled.private.ip.address:8020
2019-10-29 00:30:22,116 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1191)) - Starting services required for active state
2019-10-29 00:30:22,116 INFO namenode.FSDirectory (FSDirectory.java:updateCountForQuota(708)) - Initializing quota with 4 thread(s)
2019-10-29 00:30:22,127 INFO namenode.FSDirectory (FSDirectory.java:updateCountForQuota(717)) - Quota initialization completed in 11 milliseconds
name space=1995
storage space=6473571992
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0
2019-10-29 00:30:22,131 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting CacheReplicationMonitor with interval 30000 milliseconds
2019-10-29 00:30:22,525 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(228)) - The configured checkpoint interval is 0 minutes. Using an interval of 60 minutes that is used for deletion instead
2019-10-29 00:31:52,817 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55080 Call#143 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
What exactly is going on and how do I go about troubleshooting this? I tried increasing the HeapSize for both NameNode and DataNode as well. The GC message from DataNode disappear but I still see them in NameNode logs when it reads iNODES.
Any help will be greatly appreciated.