Getting Started with Mobius SparkClr (on Linux)

405 Views Asked by At

I am looking to try the C# driver with an existing (stand alone) spark cluster (on Ubuntu Linux) which I interact happily with via python or scala.

I am unclear as to how to run a simple c# example having downloaded the latest Mobius release to the linux box. What I am unclear about are those two extra parameters required for the clr spark submit (over and above the ones that are normally required). I am encountering various errors when i try to follow the submit args as documented (or I have misunderstood the instructions)

Firstly, for the --exe, does one simply point to the .exe file or is it required to pass; --exe [mono] [my_app.exe] [params] Secondly, remote-spark-clr seems to insist on a HDFS path but I am running spark without HDFS. Is HDFS actually necessary? Thirdly, and related to question (two), if distributing exe/packages for workers, must these also be in a hdfs path or can I put them somewhere sensible on the "regular" file system.

In short, I am looking for confirmation that HDFS is not required and a simple one-liner submit example that can run an exe in some location. The combinations I have tried are not working for me sadly.

1

There are 1 best solutions below

0
On

Running Mobius on Linux requires a small trick:

  • Create shell scripts that are launching your executables using mono
  • Add the extension .exe to your shell scripts so that they are accepted by sparkclr-submit.
  • Make sure your shell scripts are linux encoded - we had issues when they had CRLF line endings.

If your application is called Driver.exe, I recommend to create a file driver.sh.exe with the following content:

#!/bin/sh
exec mono ./Driver.exe "$@"

Similarly, create a file CSharpDriver.sh.exe with the following content:

#!/bin/sh
exec mono ./CSharpWorker.exe "$@"

In your App.config set the following value in appSettings:

<add key="CSharpWorkerPath" value="CSharpWorker.sh.exe"/>

Finally, when submitting your application, use the following arguments:

$SPARKCLR_HOME/scripts/sparkclr-submit.sh \
--master yarn \
--deploy-mode client \
--exe driver.sh.exe \
/path/to/driver

Note that the --exe argument only takes the name of the file, the path is the next argument.

You can place your applications on the regular file system (don't need to use HDFS), but according to my experience, Mobius will internally use HDFS to distribute the application to the workers. I don't know if you can avoid it.