Streaming Kmeans Mahout one file output

124 Views Asked by At

I'm running Mahout Streaming K means algorithm on a cluster and I'm getting only one file as output.

I'm new to Mahoot/Hadoop,but if I understood well there should be more than one file,since the job is split on multiple nodes. If I'm correct why isn't that so in my case?

Could it be that I'm having too little data so the processing is done on one machine, or I have messed up something when running the job(paths for Hadoop or something like that) and that is the reason why it runs on a single machine?

1

There are 1 best solutions below

1
On

Hadoop manages data chunking (ie : splitting a file into multiple ones).

This means that from your perspective (ie, from HDFS), there is one file. Howver, for the datanodes file system, there are many.