How to refer one avro schema inside another one

11.1k Views Asked by At

I need to refer to this Student schema from inside another schema:

{
  "type": "record",
  "namespace": "data.add",
  "name": "Student",
  "fields": [
    {
      "name": "Name",
      "type": "string"
    },
    {
      "name": "Age",
      "type": "int"
    }
  ]
}

This is the parent Address schema which needs to refer to Student:

{
  "type": "record",
  "namespace": "data.add",
  "name": "Address",
  "fields": [
    {
      "name": "student",
      "type": "Student"
    }
  ]
}

The above throws an error when I build using Gradle with the Avro plugin. Both schemas are located in the same folder.

4

There are 4 best solutions below

0
On

Sorry if I'm too late to the party, but it looks to me like both the maven avro plugin and the avro-tools compiler do not determine dependency order when loading, but will succeed if you order them yourself on the commandline. I have an example demonstrating this from your sample files in a standard maven directory structure.

When I put the schema with no dependencies first in the commandline, it succeeds:

java -jar /path/to/avro-tools-1.11.0.jar \
    compile schema  \
    src/main/avro/student.avsc \
    src/main/avro/address.avsc \
    target/generated-sources/avro

ls target/generated-sources/avro/data/add/*
target/generated-sources/avro/data/add/Address.java target/generated-sources/avro/data/add/Student.java

When I put the schema with dependencies first in the commandline, it fails:

java -jar /path/to/avro-tools-1.11.0.jar \
    compile schema  \
    src/main/avro/address.avsc \
    src/main/avro/student.avsc \
    target/generated-sources/avro
Exception in thread "main" org.apache.avro.SchemaParseException: "data.add.Student" is not a defined name. The type of the "student" field must be a defined name or a {"type": ...} expression.
    at org.apache.avro.Schema.parse(Schema.java:1676)
    at org.apache.avro.Schema$Parser.parse(Schema.java:1433)
    at org.apache.avro.Schema$Parser.parse(Schema.java:1396)
    at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:154)
    at org.apache.avro.tool.Main.run(Main.java:67)
    at org.apache.avro.tool.Main.main(Main.java:56)
0
On

We had a scenario where we need to use array of BasketItem in Basket

So we defined schema of BasketItem as below

    {
    "type": "record",
    "name": "BasketItem",
    "namespace": "com.demo",
    "fields": [{
            "name": "id",
            "type": "string"
        },
        {
            "name": "shortDesc",
            "type": [
                "null",
                "string"
            ],
            "default": null
        },
        {
            "name": "longDesc",
            "type": [
                "null",
                "string"
            ],
            "default": null
        },
        {
            "name": "requestedQuantity",
            "type": [
                "null",
                "string"
            ],
            "default": null
        }
    ]
  }

And then this type BasketItem is used in the Basket schema with reference to namespace.

{
    "name": "Basket",
    "type": "record",
    "namespace": "com.demo",
    "fields": [{
            "name": "id",
            "type": "string"
        },
        {
            "name": "creationTimeStamp",
            "type": "string"
        },
        {
            "name": "basketItems",
            "type": {
                "type": "array",
                "items": {
                    "type": "com.demo.BasketItems",
                    "name": "basketItem"
                }
            }
        }
    ]
}

This is simple solution to define reusable Avro schema and refer them as per need, to avoid conflicts in model generation.

0
On

For Maven one can use the imports configuration.

In this example include the following in your pom.xml under project/build/plugins

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>${avro.version}</version>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <imports>
                    <import>${project.basedir}/path/to/student.avsc</import>
                </imports>
                <sourceDirectory>${project.basedir}/path/to/schema/files</sourceDirectory>
                <includes>
                    <include>*.avsc</include>
                </includes>
                <outputDirectory>${project.basedir}/path/to/target/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

then you should be able to do a maven build...

checkout this link for more details

https://feitam.es/use-of-avro-maven-plugin-with-complex-schemas-defined-in-several-files-to-be-reused-in-different-typed-messages/

4
On

This was successful:

{
  "type" : "record",
  "namespace" : "data.add",
  "name" : "Address",
  "fields" : [
    {
      "name": "student",
      "type": "data.add.Student"
    }
  ]
}