What's the difference between the two S3 source options that are available in Foundry Data Connection?
- S3 (through Hadoop)
- S3 (Direct)
Is one preferred for ingesting parquet files?
What's the difference between the two S3 source options that are available in Foundry Data Connection?
Is one preferred for ingesting parquet files?
Copyright © 2021 Jogjafile Inc.
S3 through Hadoop is currently the best tested and most flexible S3 option but the performance for large numbers of files is very poor.
S3 Direct is read from S3 using the Amazon S3 SDK directly and performs significantly better than Hadoop as it requires
O(1)
rather thanO(number of files)
network calls.We recommend using S3-direct source instead where possible.