Getting different result using same library in spark-shell and intellij project

40 Views Asked by At

I am using library guava from google to get the TLD and suffix from domains my Implementation is like

def getTopPrivateDomain(urlString: String): String = {
                try {
                    val domain = InternetDomainName.from(urlString).topPrivateDomain().toString
                    println("domain from url: ", urlString + " is: " + domain)
                    domain
                } catch {
                    case e: Exception =>
                        println("Exception occured ", e)
                        val domain = urlString.split("\\.").takeRight(2).mkString(".")
                        println("after exception" + domain)
                        domain
                }
}

       
val hostedExtractUDF = udf((urlString: String) => getTopPrivateDomain(urlString))

while running the code from my project

filteredRecords = filtered
              .withColumn("suffix", hostedExtractUDF(col("fullyQualifiedDomainName")))

for domain "remotedesktop-pa.googleapis.com" this is my output running from intellij

(domain from url: ,remotedesktop-pa.googleapis.com is: remotedesktop-pa.googleapis.com)

if I run the same function getTopPrivateDomain in spark-shell and pass the same domain I get different answer.

def getTopPrivateDomain(urlString: String): String = {
                try {
                    val domain = InternetDomainName.from(urlString).topPrivateDomain().toString
                    println("domain from url: ", urlString + " is: " + domain)
                    domain
                } catch {
                    case e: Exception =>
                        println("Exception occured ", e)
                        val domain = urlString.split("\\.").takeRight(2).mkString(".")
                        println("after exception" + domain)
                        domain
                }
            }

// Exiting paste mode, now interpreting.

getTopPrivateDomain: (urlString: String)String

scala> println(getTopPrivateDomain("remotedesktop-pa.googleapis.com"))
(domain from url: ,remotedesktop-pa.googleapis.com is: InternetDomainName{name=googleapis.com})
InternetDomainName{name=googleapis.com}

scala> 

I am getting different result from both what can be the reason and I belive output from the spark-shell is correct

EDIT: version I am using is

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>33.0.0-jre</version>
</dependency>
0

There are 0 best solutions below