I have a list of questions regarding AWS usage which I am not sure if I get the correct answer or if I am using the best practice available.
Before the use of AWS, I have or do the following in my Macbook: - Maintain a small .odb database (Around 100MB) but its expected to grow to a few GB in a year. - Have a few R scripts to do web scraping and import the data into the database. - Have another few R scripts to extract data from the database and do analysis.
Given the growing data volume and more complex analytics have to be performed, my Macbook is always heavily loaded and i decided to switch to AWS for better computing power if needed. I am using the AWS free tier and below is what I have successfully done using AWS so far:
- I created an EC2 instance and could retrieve files from my S3 bucket.
- I can perform analysis using my R scripts and save the result in my S3 bucket.
And here is the list of my questions:
For maintaining a database of size ~1GB, is it good to simply put it in S3 and load the whole file into R everytime ?Or should I try the RDS service?
Is there a charge on data transfer between EC2 instances and my S3 bucket?(i.e. Does it matter if I transfer in and out 10GB data between an instance and S3 as compared to 1000GB?)I am not sure where to find this piece of info.
For web scraping using an EC2 instance, is there a charge on the internet connection?Or the cost is only applicable to the instance type I choose to use, no matter if I perform computation or web scraping?
I also read a few articles on AWS EBS but I am quite confused about the difference between S3, EBS or setting up RDS.
I expect my data volume to grow exponentially as I write more R scripts to scrap different publicly available data for analysis. In terms of computing power, currently I need more than what my MacBook offers, mainly to do some parallel processing and analytics. I will also test some machine learning algorithm in the future.
Any advice would be useful.