Adding Sparse Vectors 3.0.0 Apache Spark Scala

Question

Adding Sparse Vectors 3.0.0 Apache Spark Scala

623 Views Asked by Sajeed At 07 August 2020 at 20:03

I am trying to create a function as the following to add two org.apache.spark.ml.linalg.Vector. or i.e two sparse vectors

This vector could look as the following

(28,[1,2,3,4,7,11,12,13,14,15,17,20,22,23,24,25],[0.13028398104008743,0.23648605632753023,0.7094581689825907,0.13028398104008743,0.23648605632753023,0.0,0.14218861229025295,0.3580566057240087,0.14218861229025295,0.13028398104008743,0.26056796208017485,0.0,0.14218861229025295,0.06514199052004371,0.13028398104008743,0.23648605632753023])

For e.g.

def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector): org.apache.spark.ml.linalg.Vector = {
      
    }

Let's look at a use case

val x = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val y = Vectors.sparse(2, List(1), List(1)) // [0, 1]

I want to output to be 

Vectors.sparse(2, List(0,1), List(1,1))

Here's another case where they share the same indices

val x = Vectors.sparse(2, List(1), List(1))
val y = Vectors.sparse(2, List(1), List(1))

This output should be

Vectors.sparse(2, List(1), List(2))

I've realized doing this is harder than it seems. I looked into one possible solution of converting the vectors into breeze, adding them in breeze and then converting it back to a vector. e.g Addition of two RDD[mllib.linalg.Vector]'s. So I tried implementing this.

def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector) ={

   val dense_x = x.toDense
   val dense_y = y.toDense

  val bv1 = new DenseVector(dense_x.toArray)
  val bv2 = new DenseVector(dense_y.toArray)

  val vectout = Vectors.dense((bv1 + bv2).toArray)
  vectout
}

however this gave me an error in the last line

val vectout = Vectors.dense((bv1 + bv2).toArray)

Cannot resolve the overloaded method 'dense'. I'm wondering why is error is occurring and ways to fix it?

Original Q&A

There are 1 best solutions below

**Sajeed** · Accepted Answer · 2020-08-08T01:19:49.307000

To answer my own question, I had to think about how sparse vectors are. For e.g. Sparse Vectors require 3 arguments. the number of dimensions, an array of indices, and finally an array of values. For e.g.

val indices: Array[Int] = Array(1,2)
      val norms: Array[Double] = Array(0.5,0.3)
      val num_int = 4
      val vector: Vector = Vectors.sparse(num_int, indices, norms)

If I converted this SparseVector to an Array I would get the following.

code:

 val choiced_array = vector.toArray

 choiced_array.map(element => print(element + " "))

Output:

   [0.0, 0.5,0.3,0.0].

This is considered a more dense representation of it. So once you convert the two vectors to array you can add them with the following code

val add: Array[Double] = (vector.toArray, vector_2.toArray).zipped.map(_ + _)

This gives you another array of them both added. Next to create your new sparse vector, you would want to create an indices array as shown in the construction

 var i = -1;
  val new_indices_pre = add.map( (element:Double) => {
    i = i + 1
    if(element > 0.0)
      i
    else{
      -1
    }
  })

Then lets filter out all -1 indices indication that indicate zero for that indice.

new_indices_pre.filter(element => element != -1)

Remember to filter out none zero values from the array which has the addition of the two vectors.

val final_add = add.filter(element => element > 0.0)

Lastly, we can make the new sparse Vector

Vectors.sparse(num_int,new_indices,final_add)

Adding Sparse Vectors 3.0.0 Apache Spark Scala

There are 1 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-MLLIB

Related Questions in SCALA-BREEZE

Trending Questions

Popular # Hahtags

Popular Questions