I have two RxXdfData data source and i want to merge them on some column in RxHadoopMR compute context.
Both my xdf data source are big and present on hdfs. How can we merge them?
I tried RxDataStep append option but revolution r complains, it can't take composite xdf files and suggest me to use rxExec instead.
I know this can be done using rxMerge function in local compute context but then i have to do following steps:
- Copy data to edge node(local context)
- Make .xdf files
- Use rxMerge to merge .xdf files
- Convert output .xdf file to txt/csv format
- Transfer txt/csv files to hdfs
- Again use rxImport to convert these text files back to composite xdf files
Such a long process for simple merge is an overkill i suppose.
Please help me with any optimal solution for this problem.
Edit: I have also asked the same question at revolution r support forum @ https://revolutionanalytics.zendesk.com/entries/53777899-Merging-two-composite-xdf-files-
But i haven't received any reply till now.