Web API, C# - Call a document library API recursively to get all content

355 Views Asked by At

I am using a document library. It could have folders and files in it. My requirement is to fetch all content from the document library using an API (https://some.domain.com/folder/{id})

My current logic is using STACK

At root folder -

  public async Task<Content> RetrieveDocuments(string url, string id)
    {
        var files = new List<string>();
        var stack = new Stack<string>();
        stack.Push(id);

        while (stack.Count > 0)
        {
            var roundId = stack.Pop();
            var response = Make An API call (https://some.domain.com/folder/{id})
            if (response != null)
            {
                response.Folders?.ForEach(f => stack.Push(f.FolderId));
                response.Files?.ForEach(f => files.Add(resourceRegex.Replace(f.Path, "/")));
            }
        }

        return files;
    }

Now the problem here is - if the there are a lot of files in recursively placed folders.. it becomes very time consuming and this function call often leads to timeout.

can anyone suggest a better way of doing it.

1

There are 1 best solutions below

0
Mik On

The problem is hard to solve and it really depends on the structure of folders. The biggest problem is, that you are waiting to perform the next request, after the previous finishes.

The algorithm you're implementing is BFS (but in this case there will be no difference if it's DFS or BFS). You can try to implement it to run in parallel, but it feels really hard.

I think you could try a hybrid solution, where you read the first folder, and then read all first-level folders in parallel. Here is a hint on how you can implement sending requests in parallel: https://www.michalbialecki.com/2018/04/19/how-to-send-many-requests-in-parallel-in-asp-net-core/

Anyway, I don't think there is a good solution to this one when you don't have control over the API. You might just need to extend the timeout.

Good luck! :)