Crunch SparkPipeline does not work as expected

99 Views Asked by qingpan At 26 November 2025 at 18:24

I am trying to migrate our code from Crunch MRPipeline to SparkPipeline. I tried a simple example like this

SparkConf sc = new SparkConf().setAppName("Crunch Spark Count").setMaster("local");
JavaSparkContext jsc = new JavaSparkContext(sc);
SparkPipeline p = new SparkPipeline(jsc, "Crunch Spark Count");
PCollection<String> lines = p.read(From.textFile(new Path(fileUrl)));
PCollection<String> words = lines.parallelDo(new Tokenizer(), Writables.strings());
PTable<String, Long> counts = words.count();

My input file is like file1: hello world hello hadoop file2: hello spark

After running spark program, the output result always is

[hello, 1]
[hadoop, 1]
[world, 1]
[spark, 1]

Actually, the count of hello should be 3

That is Crunch 'count' function bug ?

Original Q&A

Crunch SparkPipeline does not work as expected

There are 0 best solutions below

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in APACHE-CRUNCH

Trending Questions

Popular # Hahtags

Popular Questions