Hadoop Integration with Document Capture Software

19 Views Asked by At

We have requirement to send documents to Hadoop (Hortonworks) from our Image Capture Software: Image Capture Software release PDF document with metadata. I don't have much idea about HDP. Is there any REST service or any tool that can able to add documents to Hadoop by providing Documents with metadata.

Please help

1

There are 1 best solutions below

0
On

Hadoop HDFS has both WebHDFS and NFSGateway

However, it's generally recommended not to just store raw data immediately onto HDFS if you have better control over how the data gets there. That way, you have better control over auditing where and how data gets written.

For example, you could use Apache Nifi processors to start a ListenHTTP processor, read the document data, parse it, filter and enrich, then you can optionally write to HDFS or many other destinations.