Azure Data Lake Store Benchmarks

1.1k Views Asked by At

To Developers,

I am doing benchmarks for Azure Data Lake and I am seeing about ~7.5 MB/S for a read of an ADL Store and a write to a VHD all in the same region. This is the case for PowerShell and C# with the code taken from the following examples:

PowerShell Code is from https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-get-started-powershell/ C# Code is from https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-get-started-net-sdk/

Are the above code samples acceptable for a benchmark test or will a new SDK be delivered that will enhance the throughput? Also, are there expected throughput numbers when ADL Store becomes generally available?

Thanks, Marc

2

There are 2 best solutions below

1
On

I started to write an Azure Data Lake Storage Throughput Analyzer and put the first code bits on GitHub.

You should run that tool on an Azure VM to not measure you internet connection.

Please feel free to add you thoughts and code contributions to my GitHub repo as well.

I hope this helps.

0
On

The code provided in the documentation can be used to build benchmark tests. The SDK will go through a few releases and updates prior to Azure Data Lake being generally available. These will include performance improvements in addition to features.

On the topic of performance benchmarks, our general guidance is as follows. The Azure Data Lake services are currently in preview. We are continually working to improve the services including performance through this preview phase. As we get closer to general availability, we will consider releasing additional guidance on the type of performance results to expect. Performance results depend heavily on many factors such as test topology, configuration and workload. Therefore it is difficult to comment your observations without examining all of these. If you can reach us offline with the details, we will be happy to take a look.

Amit Kulkarni (Program Manager - Azure Data Lake)