Can I create sequence file using spark dataframes?

4.2k Views Asked by mahan07 At 13 October 2025 at 23:33

I have a requirement in which I need to create a sequence file.Right now we have written custom api on top of hadoop api,but since we are moving in spark we have to achieve the same using spark.Can this be achieved using spark dataframes?

Original Q&A

There are 1 best solutions below

Ram Ghadiyaram On 27 November 2016 at 18:30 BEST ANSWER

AFAIK there is no native api available directly in DataFrame except the below approach

Please try/think some thing like(which is RDD of DataFrame style, inspired by SequenceFileRDDFunctions.scala & method saveAsSequenceFile) in below example :

Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion.

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.SequenceFileRDDFunctions
import org.apache.hadoop.io.NullWritable

object driver extends App {

   val conf = new SparkConf()
        .setAppName("HDFS writable test")
   val sc = new SparkContext(conf)

   val empty = sc.emptyRDD[Any].repartition(10)

   val data = empty.mapPartitions(Generator.generate).map{ (NullWritable.get(), _) }

   val seq = new SequenceFileRDDFunctions(data)

   // seq.saveAsSequenceFile("/tmp/s1", None)

   seq.saveAsSequenceFile(s"hdfs://localdomain/tmp/s1/${new scala.util.Random().nextInt()}", None)
   sc.stop()
}

Further information pls see ..

Can I create sequence file using spark dataframes?

There are 1 best solutions below

AFAIK there is no native api available directly in DataFrame except the below approach

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SEQUENCEFILE

Related Questions in OUTPUTFORMAT

Trending Questions

Popular # Hahtags

Popular Questions