I'm doing a program with big data and that's why I'm using Spark and Scala. I need to partition the database and for this I use
var data0 = conf.dataBase.repartition (8) .persist (StorageLevel.MEMORY_AND_DISK_SER)
but then I need to do things in the partition before proceeding to work with the piece of database corresponding to that partition and for that I use
var tester = data0.mapPartitions {x =>
configFuzzyPredProblem ()
Strategy.getStrategy.executeStrategy (conf.iterByRun, 5, GeneratorType.HillClimbing)
} .persist (StorageLevel.MEMORY_AND_DISK_SER)
Within the method executeStrategy()
I use the database but I do not know if it is the global one or the one corresponding to that partition. How can I know which one I am using and then only perform the partition processing with the database of that partition?
Here is a simple example using mapPartitionsWithIndex which follows the same rules of mapPartitions - excluding the index aspect.
You can see that inside mapPartitions you need to process an interable, an Interator Int in this example. In this case 3 partitions are processed, in your case 8, with either some entries or possibly zero entries.
I cannot see inside your function but I assume it is OK and it will process a partition - part of your database.