hadoop - how would input splits form if a file has only one record and the size of file is more than block size?

190 Views Asked by At

example to explain the question -

i have a file of size 500MB (input.csv)

the file contains only one line (record) in it

so how the file will be stored in HDFS blocks and how the input splits would be computed ?

1

There are 1 best solutions below

0
On BEST ANSWER

You probably will have to check this link: How does Hadoop process records split across block boundaries? Pay attention to the 'remote read' mentioned.

The single record mentioned in your question will be stored across many blocks. But if you use TextInputFormat to read, the mapper would have to perform remote-reads across the blocks to process the record.