csv to avro without apache spark in scala

Question

csv to avro without apache spark in scala

355 Views Asked by Explorer At 27 July 2025 at 13:58

Is there a way I can convert a scv file to Avro without using Apache Spark. I see most of the post suggests using spark which I cannot in my case. I have a schema in a separate file. I was thinking of some custom serializer and deserializer that will use the Schema and convert csv to avro. Any kind of reference would work for me. Thanks

Original Q&A

There are 2 best solutions below

**Setop** · Answer 1

Setop On 07 July 2017 at 14:50

Avro is an open format, there are many languages which support it.

Just pick one, like python for example which also support csv. But Go would do, and Java also.

**Dima** · Answer 2

If you only have strings and primitives, you could put together a crude implementation like this fairly easily:

def csvToAvro(file: Sting, schema: Schema) = {
  val rec = new GenericData.Record(schema)
  val types = schema
    .getFields
    .map { f => f.pos -> f.schema.getType }

  Source.fromFile(file)
   .getLines
   .map(_.split("_").toSeq)
   .foreach { data => 
     (data zip types)
       .foreach {
         case (str, (idx, STRING)) => rec.put(idx, str)
         case (str, (idx, INT)) => rec.put(idx, str.toInt)
         case (str, (idx, LONG)) => rec.put(idx, str.toLong)
         case (str, (idx, FLOAT)) => rec.put(idx, str.toFloat)
         case (str, (idx, DOUBLE)) => rec.put(idx, str.toDouble)
         case (str, (idx, BOOLEAN)) => rec.put(idx, str.toBoolean)  
         case (str, (idx, unknown)) => throw new IllegalArgumentException(s"Don't know how to convert $str to $unknown at $idx))
       }
  }
  rec
}

Note this does not handle nullable fields: for those the type is going to be UNION, and you'll have to look inside the schema to find out the actual data type.

Also, "parsing csv" is very crude here (just splitting at the comma isn't really a good idea, because it'll break if a string field happens to contain , in the data, or if fields are escaped with double-quotes).

And also, you'll probably want to add some sanity-checking to make sure, for example, that the number of fields in the csv line matches the number of fields in the schema etc.

But the above considerations notwithstanding, this should be sufficient to illustrate the approach and get you started.

csv to avro without apache spark in scala

There are 2 best solutions below

Related Questions in SCALA

Related Questions in AVRO

Related Questions in AVRO4S

Trending Questions

Popular # Hahtags

Popular Questions