Loading any of Amazon's listed public data sets (http://aws.amazon.com/datasets) would take a lot of resources and bandwidth. What's the best way to import them into AWS so you start working with them quickly?
How do you import Big Data public data sets into AWS?
719 Views Asked by sheanineseven At
2
There are 2 best solutions below
0
Ashish Pancholi
On
FYI : SDBExplorer uses Multithreaded BatchPutAttributes to achieve high write throughput while uploading bulk data to Amazon SimpleDB. SDB Explorer allows multiple parallel uploads. If you have the bandwidth, you can take full advantage of that bandwidth by running number of BatchPutAttributes processes at once in parallel queue that will reduce the time spend in processing. SDBExplorer supports Import data from MySql and CSV to Amazon SimpleDB.
Disclosure : I am the developer of SDBExplorer.
Related Questions in AMAZON-EC2
- Using HDFS with Apache Spark on Amazon EC2
- How can I access Mule ESB Community edition via browser?
- AWS EC2: Migrating from Windows to Linux Server
- AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances
- How to fix WordPress HTTPS issues when behind an Amazon Load Balancer?
- Determine Deployment Group from appspec.yml
- easy_install does not configure SimpleITK properly
- Bad Request (400) while hosting osqa to AWS EC2
- AWS CLI for EBS snapshots
- test-kitchen: how to read platform specific attributes in kitchen.yml
- Best way to store shared files between ec2 instances
- WebSocket connection failed: WebSocket opening handshake was canceled
- Rails scheduled task behind a load balancer
- Install google mod- pagespeed on elastic beanstalk on every instance added
- ELB generating 504 GATEWAY_TIMEOUTS w/ 2 EC2 instances - Packets not reaching Servers
Related Questions in AMAZON-WEB-SERVICES
- "Access Denied" - User's Permissions to S3 Bucket
- Cohort analysis with Amazon Redshift / PostgreSQL
- Using Amazon KMS service on Heroku
- can't ssh in after cloning an EC2 instance on Amazon AWS
- Using HDFS with Apache Spark on Amazon EC2
- How can I access Mule ESB Community edition via browser?
- AWS EC2: Migrating from Windows to Linux Server
- AWS ELB Load Balancer: is it possible to set multiple session cookies?
- AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances
- Unable to access files from public s3 bucket with boto
- Cloudfront stream only part of the video
- s3cmd not working as cron-task when echos/dates are added
- How to deploy django 1.8 on Elastic Beanstalk using Docker
- InstanceProfile is required for creating cluster - create python function to install module
- How to fix WordPress HTTPS issues when behind an Amazon Load Balancer?
Related Questions in AMAZON-SIMPLEDB
- AWS SimpleDB - less dense drives
- how to add record in amazon simple db in domain in iphone application
- ORDERY clause not working in simple db amazon in iphone application
- What tool can I use to easily manage data within a cloud database (Azure/SimpleDB/etc)?
- Simple DB accessing
- Problems with Zend_Service_Amazon_Simpledb and specialchars
- Suitable cloud service for storing web service output cache?
- Insert item only if it doesn't already exist
- anonymous read with amazon simpledb
- Integrating Django with Amazon's Database 'SimpleDB'
- Amazon SimpleDB high latency on first request
- Simple DB policy being ignored?
- Remove simpledb mapWith by meta programming in dev mode
- Amazon simple DB select query with order by clause
- node.js and SimpleDB too many server connections?
Related Questions in AMAZON-EBS
- Getting higher number of aws snapshot than the snapshots I've in my AWS account
- How to transfer files from iPhone to EC2 instance or EBS?
- Increase storage on running EC2 using EBS
- How to use separate volumes for the commit log and data in EBS environment?
- Accessing volume/snapshot data without starting instance Amazon EC2
- Mount EBS volume to a running AWS instance with a script
- Storage requirement when using for AWS ECS
- attach EBS volume to EC2 instance
- During hardware failure, do EBS-based EC2 instances terminate or stop?
- What effect does deleting EC2 snapshots have?
- Boot EC2 EBS volume locally?
- Mysql and High CPU IO Wait
- Hosting wordpress blog on AWS
- How does mongodb replica compare with amazon ebs?
- EBS storage for Amazon Elasticsearch
Related Questions in BIGDATA
- How to add a new event to Apache Spark Event Log
- DB candidate as CouchDB/Schema replacement
- Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application
- More than expected jobs running in apache spark
- Does Cassandra support aggregation function or any other capabilities like Map Reduce?
- Accessing a large number of unsorted array elements in Python
- What are the approaches to the Big-Data problems?
- Talend Open Studio for Big Data
- How to store and retrieve time series using google appengine using python
- Connecting Spark code from web application
- Designing an API on top of BigQuery
- Apache Spark architecture
- Hive(Bigdata)- difference between bucketing and indexing
- When does an action not run on the driver in Apache Spark?
- Use of core-site.xml in mapreduce program
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You will need to create a new EBS Instance using the Snapshot-ID for the public dataset. That way you won't need to pay for transfer.
But be careful, some data sets are only available in one region, most likely denoted by a note similar to this. You should register your EC2 instance in the same region then.