We are thinking about creating a dynamic and scheduled data-fetching application from a number of data sources (rest API calls). The considerations are as follows
User shall be able to configure API/webservice endpoints, frequency of fetching, and response content type (can be JSON or CSV)
Once the user completes the configuration part, a job queue will be created programmatically.
A scheduler framework shall be used used to make requests to the endpoints and push the response into the respective queues. We are thinking of a queue here to preserve the order of the responses and also as an intermediate storage for the raw response from the endpoints.
The items stored in the queues shall be processed using python/pandas. We are planning to use a NoSQL DB storage for this.
Question
For this purpose is it better to use celery or RabbitMQ? We are thinking of using Celery as it has a relatively simple implementation.
Any thoughts on this is greatly appreciated.
Thank you.