How can I "pimp my library" with Scala in a future-proof way?

1.1k Views Asked by At

I use Scala implicit classes to extend objects I work with frequently. As an example, I have a method similar to this defined on Spark DataFrame:

implicit class DataFrameExtensions(df: DataFrame) {
  def deduplicate: Boolean = 
    df.groupBy(df.columns.map(col): _*).count
}

But implicit defs are not invoked if the class already defines the same method. What happens if I later upgrade to a new version of Spark that defines a DataFrame#deduplicate method? Client code will silently switch to the new implementation, which might cause subtle errors (or obvious ones, which are less problematic).

Using reflection, I can throw a runtime error if DataFrame already defines deduplicate before my implicit defines it. Theoretically, then, if my implicit method conflicts with an existing one, I can detect it and rename my implicit version. However, once I upgrade Spark, run the app, and detect the issue, it's too late to use the IDE to rename the old method, since any references to df.deduplicate now refer to the native Spark version. I would have to revert my Spark version, rename the method through the IDE, and then upgrade again. Not the end of the world, but not a great workflow.

Is there a better way to deal with this scenario? How can I use the "pimp my library" pattern safely?

3

There are 3 best solutions below

5
On

You could add a test that ensures that certain code snippets do not compile into the test suite of DataFrameExtension. Maybe something like this:

"(???: DataFrame).deduplicate" shouldNot compile

If it compiles without your implicit conversion, then it means that the method deduplicate has been introduced by the Spark library. In this case, the test fails, and you know that you have to update your implicits.

7
On

The solution for doing it safely is to ask explicitly for an extended data frame, to minimize the impact, you can use the implicit to have a nice syntax for conversions (like toJava/toScala, etc):

implicit class DataFrameExtSyntax(df: DataFrame) { 
 def toExtended: DataFrameExtensions = DataFrameExtensions(df)
}

And then your invocation will look:

myDf.asExtended
  .deduplicate
  .someOtherExtensionMethod
  .andMore

That way you're future-proofing your extension methods without runtime checks/linting/unit-test tricks ( You can even use myDf.ext it myDf.toExtended is too long :) )

1
On

If the extension method is enabled by an import, use -Xlint to show that the import is no longer used:

//class C
class C { def x = 17 }

trait T {
  import Extras._
  def f = new C().x
}

object Extras {
  implicit class X(val c: C) {
    def x = 42
  }
}

Another view, where the evidence must be used under -Xlint -Xfatal-warnings:

//class C[A]
class C[A] { def x = 17 }

trait T {
  import Mine.ev
  val c = new C[Mine]
  def f = c.x
}

trait Mine
object Mine {
  implicit class X[A](val c: C[A]) {
    def x(implicit @deprecated("unused","") ev: Mine) = 42
  }
  implicit val ev: Mine = null
}

object Test {
  def main(args: Array[String]): Unit = println {
    val t = new T {}
    t.f
  }
}