I am relatively new to distributed computing, so forgive me if I misunderstand some of the basic concepts here. I am looking for a (preferably) Python-based alternative to Hadoop for processing large data sets via MapReduce on a cluster using an SGE-based grid engine (eg. OpenGrid or Sun of Grid Engine). I have had good luck running basic distributed jobs with PythonGrid, but I'd really like a more feature-rich framework for running my jobs. I have read up on tools like Disco and MinceMeatPy, both of which seem to offer true Map-Sort-Reduce job processing, but their does not seem to be any obvious support for SGE. This makes me wonder if it is possible to achieve true MapReduce functionality using a grid scheduler, or if people just don't support it out-of-the-box because they are not frequently used. Can you perform Map-Sort-Reduce tasks on a grid engine? Are their Python tools that support this? How difficult would it be to rig existing MapReduce tools to use SGE job schedulers?
Python MapReduce on Sun Grid Engine
1k Views Asked by woemler At
1
There are 1 best solutions below
Related Questions in PYTHON
- Delay in loading Html Page(WebView) from assets folder in real android device
- MPAndroidChart method setWordWrapEnabled() not found
- Designing a 'new post' android activity
- Android :EditText inside ListView always update first item in the listview
- Android: Transferring Data via ContentIntent
- Wrong xml being inflated android
- AsyncTask Class
- Unable to receive extras in Android Intent
- Website zoomed out on Android default browser
- Square FloatingActionButton with Android Design Library
Related Questions in MAPREDUCE
- Delay in loading Html Page(WebView) from assets folder in real android device
- MPAndroidChart method setWordWrapEnabled() not found
- Designing a 'new post' android activity
- Android :EditText inside ListView always update first item in the listview
- Android: Transferring Data via ContentIntent
- Wrong xml being inflated android
- AsyncTask Class
- Unable to receive extras in Android Intent
- Website zoomed out on Android default browser
- Square FloatingActionButton with Android Design Library
Related Questions in DISTRIBUTED-COMPUTING
- Delay in loading Html Page(WebView) from assets folder in real android device
- MPAndroidChart method setWordWrapEnabled() not found
- Designing a 'new post' android activity
- Android :EditText inside ListView always update first item in the listview
- Android: Transferring Data via ContentIntent
- Wrong xml being inflated android
- AsyncTask Class
- Unable to receive extras in Android Intent
- Website zoomed out on Android default browser
- Square FloatingActionButton with Android Design Library
Related Questions in SUNGRIDENGINE
- Delay in loading Html Page(WebView) from assets folder in real android device
- MPAndroidChart method setWordWrapEnabled() not found
- Designing a 'new post' android activity
- Android :EditText inside ListView always update first item in the listview
- Android: Transferring Data via ContentIntent
- Wrong xml being inflated android
- AsyncTask Class
- Unable to receive extras in Android Intent
- Website zoomed out on Android default browser
- Square FloatingActionButton with Android Design Library
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I've heard that Jug works. It's using the filesystem for coordination amongst the parallel tasks. In that kind of framework, you'd write your code and run "jug status primes.py" on the machine you're on then start a grid array job with as many workers as you like, all running "jug execute primes.py".
mincemeat.py should be able to function in the same way but looks to use the network for coordination. So that may depend on whether your nodes can talk to a server running the overall script.
There are several release notes about running actual Hadoop MapReduce and HDFS on SGE, but I haven't been able to find good documentation.
If you're used to Hadoop streaming with Python, it's not too bad to replicate on SGE. I've had some success with this at work: I run an array job that does map + shuffle for each input file. Then another array job that does sort + reduce for each reducer number. The shuffle part just writes files to a network dir like mapper00000_reducer00000, mapper00000_reducer00001, and so on (all pairs of mapper and reducer numbers). Then reducer 00001 sorts all files labeled reducer00001 together and pipes to reducer code.
Unfortunately, Hadoop streaming isn't very full-featured.