How to specify a Take value with TableClient.QueryAsync

1.6k Views Asked by At

I'm updating my project to use Azure.Data.Tables 12.6.1, and I can't figure out where to specify a Take value to limit the number of entities returned from a query.

In other words, I want to do something like this:

var limit = 150;
var results = table.QueryAsync<T>(limit);
await foreach (var page in results.AsPages().ConfigureAwait(false)) {
    // Regardless of how the server pages the results, 
    // only the top [limit] items are returned.
}

In the old API, you could set a Take property on the query object. How do I do this in the new API?

3

There are 3 best solutions below

5
Christopher Scott On BEST ANSWER

As @Skin points out, the current SDK does not expose an explicit API for Take, but that was an intentional decision to ensure that it is clearer to developers what is really happening from the service perspective.

The old SDK supported a full IQueryable API, which made it easy to create very expensive queries that performed filtering client side after fetching the whole table from the service.

Although Take doesn't have the same problems as other Linq methods, the service doesn't really support it. It can only limit the number of results for a paged result (which is capped at 1000 by the service).

While I agree it is not as simple as a Take API, the current API makes it fairly straightforward to implement the equivalent functionality, while not hiding the fact that you may actually fetch more than your Take limit from the service.

This sample demonstrates how to iterate through the pages with a max items per page set.

0
Skin On

This may be a little controversial but I'm going to add this as an answer ... it looks like it's been raised as a feature request only a number of weeks ago and it's now been added to the backlog ...

https://github.com/Azure/azure-sdk-for-net/issues/30985

You're not the only one who has the same requirement.

0
Pawel Cioch On

When limit is under 1000 Azure max, use explicit limit, when limit is 1000+ calculate how many pages and stop when you reach the count, and then trim to precise count.

When a query found 5000 and your wanted limit is 1001 then maximum overhead would be 999 because you'll download 2000 items and cut off 999. When your limit is < 1000 you pull only 1 page with exact count of items (no ovearhaed).

You could in theory do sophisticated maxPerPage calculation but it makes no sense, the HTTP is more costly on latency side, especially on mobile, so it is better to make fewer requests of biggest possible pages, than more requests of smaller pages.

public async Task<IEnumerable<T>> QueryAsyncLimit<T>(string tableName, string filter, IEnumerable<string> select, int limit) where T : class, ITableEntity, new()
{
    var tableClient = ...;
    var entities = new List<T>();
    var maxPerPage = limit < 1000 ? limit : 1000; // Azure page default is 1000 and it's max
    var pages = tableClient.QueryAsync<T>(filter, maxPerPage, select).AsPages();
    var currentPage = 1;

    await foreach (var page in pages)
    {
        entities.AddRange(page.Values);
        if (currentPage == GetPageCount(limit))
            break; //stop "downloading" pages
        currentPage++;
    }
    return entities.Take(limit); //cut off the excess when more than 1 page
}


private int GetPageCount(int limit)
{
    int pageSize = 1000; // Azure page default is 1000 and it's max
    int pageCount = (int)Math.Ceiling((double)limit / pageSize);
    return pageCount;
}