HDFS NameNode startup very slow with few blocks

1.3k Views Asked by At

I have a fairly small setup (HDP 2.6) with roughly 1429 blocks on a 15 TB HDD. The system has 512 GB RAM and 128 cores (256 threads). Over last few days, I've seen the startup of entire HDP setup go from about 10 minutes to several hours. The culprit turned out to be the NameNode.When the box was first setup without any data, the entire HDP + HCP setup would startup in about 10 minutes (including data and name nodes). We start testing with large volumes of data and over time our block went over 23 million. At this point the system took around 3 hours to start. This was mostly due to NameNode startup time, which seems understandable given the large number of blocks.

However, even after deleting all the folders/blocks and leaving behind just 1429 blocks, the system is still taking over 50 minutes to start name node and come out of Safe Mode automatically.

The DataNode logs pause after the Replica Cache line below and then start displaying "Detected pause in JVM or host machine (eg GC)".

************************************************************/
2019-10-29 00:30:01,711 INFO  datanode.DataNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.6.5.1100-53
STARTUP_MSG:   classpath = removed for brevity
STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
2019-10-29 00:30:02,253 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for [DISK]file:/hadoop/hdfs/data/
2019-10-29 00:30:04,189 INFO  datanode.BlockScanner (BlockScanner.java:<init>(180)) - Initialized block scanner with targetBytesPerSec 1048576
2019-10-29 00:30:04,193 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,197 INFO  datanode.DataNode (DataNode.java:<init>(444)) - File descriptor passing is enabled.
2019-10-29 00:30:04,197 INFO  datanode.DataNode (DataNode.java:<init>(455)) - Configured hostname is xxxx.corp.com
2019-10-29 00:30:04,197 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,198 WARN  conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2019-10-29 00:30:04,204 INFO  datanode.DataNode (DataNode.java:startDataNode(1251)) - Starting DataNode with maxLockedMemory = 0
2019-10-29 00:30:04,221 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1028)) - Opened streaming server at /0.0.0.0:50010
2019-10-29 00:30:04,223 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,223 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,225 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,225 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,226 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1043)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2019-10-29 00:30:04,296 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:04,304 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:04,308 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined
2019-10-29 00:30:04,313 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:04,315 INFO  http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-29 00:30:04,337 INFO  http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 43272
2019-10-29 00:30:04,338 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:04,511 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:43272
2019-10-29 00:30:07,643 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(233)) - Listening HTTP traffic on /0.0.0.0:50075
2019-10-29 00:30:07,647 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:08,366 INFO  datanode.DataNode (DataNode.java:startDataNode(1277)) - dnUserName = hdfs
2019-10-29 00:30:08,366 INFO  datanode.DataNode (DataNode.java:startDataNode(1278)) - supergroup = hdfs
2019-10-29 00:30:08,579 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:08,734 INFO  ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8010
2019-10-29 00:30:09,244 INFO  datanode.DataNode (DataNode.java:initIpcServer(941)) - Opened IPC server at /0.0.0.0:8010
2019-10-29 00:30:09,258 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2019-10-29 00:30:09,274 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2019-10-29 00:30:09,430 INFO  datanode.DataNode (BPServiceActor.java:run(761)) - Block pool <registering> (Datanode Uuid unassigned) service to xxxx.corp.com/scrambled.private.ip.address:8020 starting to offer service
2019-10-29 00:30:09,434 INFO  ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:09,434 INFO  ipc.Server (Server.java:run(900)) - IPC Server listener on 8010: starting
2019-10-29 00:30:10,930 INFO  common.Storage (DataStorage.java:getParallelVolumeLoadThreadsNum(384)) - Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2019-10-29 00:30:10,962 INFO  common.Storage (Storage.java:tryLock(776)) - Lock on /hadoop/hdfs/data/in_use.lock acquired by nodename [email protected]
2019-10-29 00:30:11,121 INFO  common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(250)) - Analyzing storage directories for bpid BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,121 INFO  common.Storage (Storage.java:lock(735)) - Locking is disabled for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,139 INFO  datanode.DataNode (DataNode.java:initStorage(1546)) - Setting up storage: nsid=875919329;bpid=BP-814497463-127.0.0.1-1558792659773;lv=-56;nsInfo=lv=-63;cid=CID-49b9105e-fc0d-4ea4-9d2f-caceb95ce4bb;nsid=875919329;c=0;bpid=BP-814497463-127.0.0.1-1558792659773;dnuuid=0aff4a22-3f1a-485b-9aec-46fd881dfab0
2019-10-29 00:30:11,523 INFO  impl.FsDatasetImpl (FsVolumeList.java:addVolume(295)) - Added new volume: DS-ea7ed3be-90ad-4424-a00c-577601814d81
2019-10-29 00:30:11,523 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:addVolume(426)) - Added volume - /hadoop/hdfs/data/current, StorageType: DISK
2019-10-29 00:30:11,527 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:registerMBean(2203)) - Registered FSDatasetState MBean
2019-10-29 00:30:11,711 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for /hadoop/hdfs/data/current
2019-10-29 00:30:11,719 INFO  checker.DatasetVolumeChecker (DatasetVolumeChecker.java:checkAllVolumes(210)) - Scheduled health check for volume /hadoop/hdfs/data/current
2019-10-29 00:30:11,721 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:addBlockPool(2686)) - Adding block pool BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,722 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(392)) - Scanning block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,898 INFO  impl.FsDatasetImpl (BlockPoolSlice.java:loadDfsUsed(251)) - Cached dfsUsed found for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current: 414855315456
2019-10-29 00:30:11,901 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(397)) - Time taken to scan block pool BP-814497463-127.0.0.1-1558792659773 on /hadoop/hdfs/data/current: 178ms
2019-10-29 00:30:11,901 INFO  impl.FsDatasetImpl (FsVolumeList.java:addBlockPool(423)) - Total time to scan all replicas for block pool BP-814497463-127.0.0.1-1558792659773: 180ms
2019-10-29 00:30:11,906 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(188)) - Adding replicas to map for block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,906 INFO  impl.BlockPoolSlice (BlockPoolSlice.java:readReplicasFromCache(738)) - Replica Cache file: /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current/replicas doesn't exist
2019-10-29 00:31:24,607 INFO  timeline.HadoopTimelineMetricsSink 

The corresponding NameNode log shows the following and keeps repeating "The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428."

***********************************************/
2019-10-29 00:30:20,165 INFO  namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.6.5.1100-53
STARTUP_MSG:   classpath = removed for brevity
STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG:   java = 1.8.0_112
***************/
2019-10-29 00:30:20,176 INFO  namenode.NameNode (NameNode.java:createNameNode(1624)) - createNameNode []
2019-10-29 00:30:20,747 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is hdfs://xxxx.corp.com:8020
2019-10-29 00:30:20,748 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use xxxx.corp.com:8020 to access this namenode/service.
2019-10-29 00:30:20,866 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:20,874 INFO  hdfs.DFSUtil (DFSUtil.java:httpServerTemplateForNNAndJN(1803)) - Starting Web-server for hdfs at: http://xxxx.corp.com:50070
2019-10-29 00:30:20,923 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:20,927 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.namenode is not defined
2019-10-29 00:30:20,931 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:20,934 INFO  security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:20,953 INFO  http.HttpServer2 (NameNodeHttpServer.java:initWebHdfs(93)) - Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2019-10-29 00:30:20,954 INFO  http.HttpServer2 (HttpServer2.java:addJerseyResourcePackage(687)) - addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2019-10-29 00:30:20,961 INFO  http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 50070
2019-10-29 00:30:20,962 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:20,986 WARN  mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m, using /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m_2845921744604868870
2019-10-29 00:30:21,121 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started [email protected]:50070
2019-10-29 00:30:21,143 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN  namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(690)) - Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,143 WARN  namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(695)) - Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,148 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,148 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,153 WARN  common.Storage (NNStorage.java:setRestoreFailedStorage(208)) - set restore failed storage to true
2019-10-29 00:30:21,172 INFO  namenode.FSEditLog (FSEditLog.java:newInstance(225)) - Edit logging is async:false
2019-10-29 00:30:21,176 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(759)) - No KeyProvider found.
2019-10-29 00:30:21,176 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(765)) - Enabling async auditlog
2019-10-29 00:30:21,178 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(769)) - fsLock is fair:false
2019-10-29 00:30:21,204 INFO  blockmanagement.HeartbeatManager (HeartbeatManager.java:<init>(91)) - Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
2019-10-29 00:30:21,207 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:21,214 INFO  blockmanagement.DatanodeManager (DatanodeManager.java:<init>(274)) - dfs.block.invalidate.limit=1000
2019-10-29 00:30:21,214 INFO  blockmanagement.DatanodeManager (DatanodeManager.java:<init>(280)) - dfs.namenode.datanode.registration.ip-hostname-check=true
2019-10-29 00:30:21,215 INFO  blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(71)) - dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
2019-10-29 00:30:21,215 INFO  blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will start around 2019 Oct 29 01:30:21
2019-10-29 00:30:21,217 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map BlocksMap
2019-10-29 00:30:21,217 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,218 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 2.0% max memory 1011.3 MB = 20.2 MB
2019-10-29 00:30:21,218 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^21 = 2097152 entries
2019-10-29 00:30:21,231 INFO  blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(409)) - dfs.block.access.token.enable=true
2019-10-29 00:30:21,231 INFO  blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(430)) - dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(395)) - defaultReplication         = 1
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(396)) - maxReplication             = 50
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(397)) - minReplication             = 1
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(398)) - maxReplicationStreams      = 2
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(399)) - replicationRecheckInterval = 3000
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(400)) - encryptDataTransfer        = false
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(401)) - maxNumBlocksToLog          = 1000
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(790)) - fsOwner             = hdfs (auth:SIMPLE)
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(791)) - supergroup          = hdfs
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(792)) - isPermissionEnabled = true
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(803)) - HA Enabled: false
2019-10-29 00:30:21,361 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(843)) - Append Enabled: true
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map INodeMap
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1011.3 MB = 10.1 MB
2019-10-29 00:30:21,389 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^20 = 1048576 entries
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(256)) - ACLs enabled? false
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(260)) - XAttrs enabled? true
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(268)) - Maximum size of an xattr: 16384
2019-10-29 00:30:21,393 INFO  namenode.NameNode (FSDirectory.java:<init>(321)) - Caching file names occuring more than 10 times
2019-10-29 00:30:21,399 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map cachedBlocks
2019-10-29 00:30:21,399 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,400 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.25% max memory 1011.3 MB = 2.5 MB
2019-10-29 00:30:21,400 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^18 = 262144 entries
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5582)) - dfs.namenode.safemode.threshold-pct = 1.0
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5583)) - dfs.namenode.safemode.min.datanodes = 0
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5584)) - dfs.namenode.safemode.extension     = 30000
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(76)) - NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(78)) - NNTop conf: dfs.namenode.top.num.users = 10
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(80)) - NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-10-29 00:30:21,408 INFO  namenode.FSNamesystem (FSNamesystem.java:initRetryCache(971)) - Retry cache on namenode is enabled
2019-10-29 00:30:21,408 INFO  namenode.FSNamesystem (FSNamesystem.java:initRetryCache(979)) - Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-10-29 00:30:21,410 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map NameNodeRetryCache
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^15 = 32768 entries
2019-10-29 00:30:21,456 INFO  common.Storage (Storage.java:tryLock(776)) - Lock on /home/hadoop/hdfs/namenode/in_use.lock acquired by nodename [email protected]
2019-10-29 00:30:21,503 INFO  namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(388)) - Recovering unfinalized segments in /home/hadoop/hdfs/namenode/current
2019-10-29 00:30:21,527 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /home/hadoop/hdfs/namenode/current/edits_inprogress_0000000000199282266 -> /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,532 INFO  namenode.FSImage (FSImage.java:loadFSImageFile(745)) - Planning to load image: FSImageFile(file=/home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232, cpktTxId=0000000000199282232)
2019-10-29 00:30:21,562 INFO  namenode.FSImageFormatPBINode (FSImageFormatPBINode.java:loadINodeSection(257)) - Loading 1993 INodes.
2019-10-29 00:30:21,724 INFO  namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:load(184)) - Loaded FSImage in 0 seconds.
2019-10-29 00:30:21,725 INFO  namenode.FSImage (FSImage.java:loadFSImage(911)) - Loaded image for txid 199282232 from /home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232
2019-10-29 00:30:21,725 INFO  namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@63fd4873 expecting start txid #199282233
2019-10-29 00:30:21,726 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265
2019-10-29 00:30:21,729 INFO  namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265' to transaction ID 199282233
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265 of size 1048576 edits # 33 loaded in 0 seconds
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1e11bc55 expecting start txid #199282266
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,752 INFO  namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266' to transaction ID 199282233
2019-10-29 00:30:21,753 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266 of size 1048576 edits # 1 loaded in 0 seconds
2019-10-29 00:30:21,753 INFO  namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1083)) - Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2019-10-29 00:30:21,754 INFO  namenode.FSEditLog (FSEditLog.java:startLogSegment(1294)) - Starting log segment at 199282267
2019-10-29 00:30:21,880 INFO  namenode.NameCache (NameCache.java:initialized(143)) - initialized with 8 entries 214 lookups
2019-10-29 00:30:21,881 INFO  namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(731)) - Finished loading FSImage in 465 msecs
2019-10-29 00:30:22,002 INFO  namenode.NameNode (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to xxxx.corp.com:8020
2019-10-29 00:30:22,007 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:22,015 INFO  ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8020
2019-10-29 00:30:22,049 INFO  namenode.FSNamesystem (FSNamesystem.java:registerMBean(6517)) - Registered FSNamesystemState MBean
2019-10-29 00:30:22,050 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:22,064 INFO  namenode.LeaseManager (LeaseManager.java:getNumUnderConstructionBlocks(139)) - Number of blocks under construction: 0
2019-10-29 00:30:22,065 INFO  hdfs.StateChange (FSNamesystem.java:reportStatus(5952)) - STATE* Safe mode ON.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:30:22,075 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2019-10-29 00:30:22,077 INFO  block.BlockTokenSecretManager (BlockTokenSecretManager.java:updateKeys(222)) - Updating block keys
2019-10-29 00:30:22,095 INFO  ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:22,095 INFO  ipc.Server (Server.java:run(900)) - IPC Server listener on 8020: starting
2019-10-29 00:30:22,115 INFO  namenode.NameNode (NameNode.java:startCommonServices(885)) - NameNode RPC up at: xxxx.corp.com/scrambled.private.ip.address:8020
2019-10-29 00:30:22,116 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1191)) - Starting services required for active state
2019-10-29 00:30:22,116 INFO  namenode.FSDirectory (FSDirectory.java:updateCountForQuota(708)) - Initializing quota with 4 thread(s)
2019-10-29 00:30:22,127 INFO  namenode.FSDirectory (FSDirectory.java:updateCountForQuota(717)) - Quota initialization completed in 11 milliseconds
name space=1995
storage space=6473571992
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0
2019-10-29 00:30:22,131 INFO  blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting CacheReplicationMonitor with interval 30000 milliseconds
2019-10-29 00:30:22,525 INFO  fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(228)) - The configured checkpoint interval is 0 minutes. Using an interval of 60 minutes that is used for deletion instead

2019-10-29 00:31:52,817 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55080 Call#143 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

What exactly is going on and how do I go about troubleshooting this? I tried increasing the HeapSize for both NameNode and DataNode as well. The GC message from DataNode disappear but I still see them in NameNode logs when it reads iNODES.

Any help will be greatly appreciated.

0

There are 0 best solutions below