How can i solve problem in Apache pig that does not load data from hdfs?

45 Views Asked by At

I want to load simple text from hdfs with apache pig but error is , i run it on windows 10 :

`C:\Users\Adar>pig -x mapreduce
2023-09-18 12:25:50,337 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2023-09-18 12:25:50,338 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
2023-09-18 12:25:50,339 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2023-09-18 12:25:50,810 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2023-09-18 12:25:50,810 [main] INFO  org.apache.pig.Main - Logging error messages to: C:\hadoop-3.3.0\logs\pig_1695065150801.log
2023-09-18 12:25:50,837 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file C:\Users\Adar/.pigbootup not found
2023-09-18 12:25:51,256 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2023-09-18 12:25:51,256 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2023-09-18 12:25:52,060 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-5f0d5718-93c7-497a-8587-3b79341f1a69
2023-09-18 12:25:52,060 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/s.txt' USING PigStorage(',')
>>    as (id:int,name:chararray,city:chararray);
grunt> DUMP student;
2023-09-18 12:27:08,934 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2023-09-18 12:27:08,966 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2023-09-18 12:27:09,001 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2023-09-18 12:27:09,057 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2023-09-18 12:27:09,123 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2023-09-18 12:27:09,150 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2023-09-18 12:27:09,150 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2023-09-18 12:27:09,231 [main] INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at localhost/127.0.0.1:8032
2023-09-18 12:27:09,473 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2023-09-18 12:27:09,487 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2023-09-18 12:27:09,494 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2023-09-18 12:27:09,494 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2023-09-18 12:27:09,497 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2023-09-18 12:27:09,500 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2023-09-18 12:27:09,511 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2023-09-18 12:27:09,744 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp780055956/tmp868407826/pig-0.17.0-core-h2.jar
2023-09-18 12:27:09,774 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp780055956/tmp-1827640167/automaton-1.11-8.jar
2023-09-18 12:27:09,804 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.17.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp780055956/tmp-2061916448/antlr-runtime-3.4.jar
2023-09-18 12:27:09,835 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.17.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp780055956/tmp1959838005/joda-time-2.9.3.jar
2023-09-18 12:27:09,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2023-09-18 12:27:09,856 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2023-09-18 12:27:09,856 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2023-09-18 12:27:09,857 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2023-09-18 12:27:09,903 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2023-09-18 12:27:09,912 [JobControl] INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at localhost/127.0.0.1:8032
2023-09-18 12:27:09,928 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2023-09-18 12:27:10,616 [JobControl] INFO  org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/Adar/.staging/job_1695061523792_0004
2023-09-18 12:27:10,628 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2023-09-18 12:27:10,646 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2023-09-18 12:27:10,653 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2023-09-18 12:27:10,653 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2023-09-18 12:27:10,675 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2023-09-18 12:27:10,789 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2023-09-18 12:27:10,834 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2023-09-18 12:27:10,964 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1695061523792_0004
2023-09-18 12:27:10,964 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: []
2023-09-18 12:27:11,103 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2023-09-18 12:27:11,156 [JobControl] INFO  org.apache.hadoop.conf.Configuration - resource-types.xml not found
2023-09-18 12:27:11,157 [JobControl] INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils - Unable to find 'resource-types.xml'.
2023-09-18 12:27:11,231 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1695061523792_0004
2023-09-18 12:27:11,351 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://DESKTOP-6E4GPV0:8088/proxy/application_1695061523792_0004/
2023-09-18 12:27:11,352 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1695061523792_0004
2023-09-18 12:27:11,355 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student
2023-09-18 12:27:11,358 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student[1,10],student[-1,-1] C:  R:
2023-09-18 12:27:11,379 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2023-09-18 12:27:11,379 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1695061523792_0004]
2023-09-18 12:37:57,515 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2023-09-18 12:38:29,498 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1695061523792_0004 has failed! Stop running all dependent jobs
2023-09-18 12:38:29,501 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2023-09-18 12:38:29,505 [main] INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at localhost/127.0.0.1:8032
2023-09-18 12:38:29,549 [main] INFO  org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider - Connecting to ResourceManager at localhost/127.0.0.1:8032
2023-09-18 12:38:29,566 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2023-09-18 12:38:29,569 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
3.3.0   0.17.0  Adar    2023-09-18 12:27:09     2023-09-18 12:38:29     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1695061523792_0004  student MAP_ONLY        Message: Job failed!    hdfs://localhost:9000/tmp/temp780055956/tmp1291327947,

Input(s):
Failed to read data from "hdfs://localhost:9000/pig_data/s.txt"

Output(s):
Failed to produce result in "hdfs://localhost:9000/tmp/temp780055956/tmp1291327947"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1695061523792_0004


2023-09-18 12:38:29,570 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2023-09-18 12:38:29,578 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias student
Details at logfile: C:\hadoop-3.3.0\logs\pig_1695065150801.log```


0

There are 0 best solutions below