I understand that distcp is used for inter/intra cluster transfer of data. Is it possible to use distcp to ingest data from the local file system to HDFS. I understand that you can use file:///.... to point to a local file outside of HDFS but how reliable and fast is that compared to the inter/intra cluster transfer.
Data ingestion in Hadoop using Distcp
642 Views Asked by bytebiscuit At
1
There are 1 best solutions below
Related Questions in HADOOP
- pcap to Avro on Hadoop
- schedule and automate sqoop import/export tasks
- How to diagnose Kafka topics failing globally to be found
- Only 32 bit available in Oracle VM - Hadoop Installation
- Using HDFS with Apache Spark on Amazon EC2
- How to get raw hadoop metrics
- How to output multiple values with the same key in reducer?
- Loading chararray from embedded JSON using Pig
- Oozie Pig action stuck in PREP state and job is in RUNNING state
- InstanceProfile is required for creating cluster - create python function to install module
- mapreduce job not setting compression codec correctly
- What does namespace and block pool mean in MapReduce 2.0 YARN?
- Hadoop distributed mode
- Building apache hadoop 2.6.0 throwing maven error
- I am using Hbase 1.0.0 and Apache phoenix 4.3.0 on CDH5.4. When I restart Hbase regionserver is down
Related Questions in HDFS
- Using HDFS with Apache Spark on Amazon EC2
- How to read a CSV file from HDFS via Hadoopy?
- How can I migrate data from one HDFS cluster to another over the network?
- Spark/Spark Streaming in production without HDFS
- Jcascalog to query thrift data on HDFS
- What is Metadata DB Derby?
- Can Solr or ElasticSearch be configured to use HDFS as their persistence layer in a way that also supports MapReduce?
- How to import only new data by using Sqoop?
- How to access hdfs by URI consisting of H/A namenodes in Spark which is outer hadoop cluster?
- Force HDFS globStatus to skip directories it doesn't have permissions to
- Trying to use WinInet to upload a file to HDFS
- Apache Spark architecture
- Is possible to set hadoop blocksize 24 MB?
- Unable to create file using Pail DFS
- Hadoop Distributed File Systmes
Related Questions in FLUME
- API data to hadoop via Flume
- Does anyone know how to read gzip file(gzip in thr spoolSourceDirectory) in Flume process?
- Apache flume Regex Extractor Interceptor
- How to insert JSON in HDFS using Flume correctly
- Save flume output to hive table with Hive Sink
- error in streaming twitter data to Hadoop using flume
- Exception follows. org.apache.flume.FlumeException: Unable to load source type in flume twitter analysis
- log4j2 logger levels not working
- Why does an optional flume channel cause a non-optional flume channel to have problems?
- Collect logs from Mesos Cluster
- Flume interceptor to ignore input CSV file header while reading
- Use Flume to stream a webpage data to HDFS
- Flume interceptor for kafka message timestamp?
- Why does Apache Flume regex extractor accept only "1 digit" ?
- How to resolve unhandled error in Flume while extracting Tweets
Related Questions in DISTCP
- Hadoop distcp not working
- Hadoop distcp with partition
- NoSuchMethodError while using Distcp java API
- Distcp - Container is running beyond physical memory limits
- Hadoop distcp jobs SUCCEEDED but attempt_xxx killed by ApplicationMaster
- Distcp Mismatch in length of source
- distcp2 in CDH5.2 with MR1
- Data ingestion in Hadoop using Distcp
- java.io.IOException: Error writing request body to server while submitting DistCp job
- java.lang.IllegalArgumentException: Both source file listing and source paths present
- Hadoop distcp copy from on prem to gcp strange behavior
- s3-dist-cp groupby Regex Capture
- Does node with mapr client need to have an access to the files I want to copy with distcp?
- DistCP - Even simple copies result in CRC Exceptions
- Unable to copy HDFS data to S3 bucket
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Distcp is a mapreduce job that is executed inside the hadoop cluster. For hadoop cluster perspective, your local machine is not a local file system. Then you can't use your local file sytem with distcp. An alternative could be configure a FTP server in your machine that hadoop cluster can read. The performance depends on the network and the protocol used (ftp with hadoop has a very bad performance).
Use hdfs dfs -put command could be better for small amount of data but it isn't work in parallel like distcp.