How to perform distributed combinatorial (N choose K) in Spark .NET?

213 Views Asked by At

I have a project where I have a large C(100,20) number of combinations with minor work being done for each combination set.

I am using Spark .NET with visual studio as my technology (see setup below): https://learn.microsoft.com/en-us/dotnet/spark/tutorials/get-started

Spark .NET has a dataframe with SQL type commands. I am assuming I need to do a SQL type command to create the N choose K combinations with a user defined worker function to process the combinations.

The question is what does the code look like using Spark .NET with a DataFrame? If a DataFrame doesn't support an N choose K option, are there other options to keep the generation of the combinations distributed?

1

There are 1 best solutions below

0
CPGAdmin On

My basic question was answered from the spark dotnet github area

https://github.com/dotnet/spark/issues/627

By using a cross join on two dataframes, I was able to create the combinations. This may not be the best way, and perhaps others will follow up with a better solution.

For N Choose K that would be K crossjoins using the N set.