MS Search API: How to limit scope to just ms word or pdf documents

73 Views Asked by At

Background

I'm writing logic that uses the Microsoft search api to find sharepoint documentation that matches a query. (using both phrase searches and keywords)

Problem

There are two issues:

  1. It seems that I'm getting back more than just sharepoint / onedrive documents. It's checking teams message history etc. I'd like to know how to control the scope so that it only checks word docs, or pdfs for example. The following screen shot shows some of the data that

enter image description here

I used this article as a guide: https://learn.microsoft.com/en-us/graph/search-concept-files

  1. How is it that driveItems have no drive id? Some of the data that's being returned missing driveIds. I need the drive id bc later on, I need to call graph again to get the contents of the files.

The following screenshot shows you some of the data I'm getting back with no drive id:

enter image description here

The Environmment / Setup

I'm using MS Graph 1.0. Here are the references from the csproj file:

<PackageReference Include="AutoMapper" Version="12.0.1" />
<PackageReference Include="AutoMapper.Extensions.Microsoft.DependencyInjection" Version="12.0.1" />
<PackageReference Include="Azure.Identity" Version="1.9.0" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="4.1.1" />
<PackageReference Include="Microsoft.Azure.Services.AppAuthentication" Version="1.6.0" />
<PackageReference Include="Microsoft.Graph" Version="5.18.0" />
<PackageReference Include="Microsoft.Graph.Core" Version="3.0.9" />
<PackageReference Include="Microsoft.Identity.Client" Version="4.54.1" />  

The Code

The function app includes logic like this:

  public async Task<string> Search()
    {
        var ApplicationClientID = Environment.GetEnvironmentVariable("CLIENT_ID");
        var ApplicationClientSecret = Environment.GetEnvironmentVariable("CLIENT_SECRET");
        var AzureTenantID = Environment.GetEnvironmentVariable("TENANT_ID");
        string[] scopes = new[] { "https://graph.microsoft.com/.default" };

        // using Azure.Identity;
        var options = new ClientSecretCredentialOptions
        {
            AuthorityHost = AzureAuthorityHosts.AzurePublicCloud,
        };

        try
        {
            var clientSecretCredential = new ClientSecretCredential(
                AzureTenantID, ApplicationClientID, ApplicationClientSecret, options);
            var graphClient = new GraphServiceClient(clientSecretCredential, scopes, "https://graph.microsoft.com/v1.0");
            Console.WriteLine($"searchQuery :{searchQuery}");
            var requestBody = new Microsoft.Graph.Search.Query.QueryPostRequestBody
            {
                Requests = new List<SearchRequest>
                    {
                        new SearchRequest
                        {
                            EntityTypes = new List<EntityType?>
                            {
                                EntityType.DriveItem,
                            },
                            Query = new SearchQuery
                            {
                                QueryString = this.searchQuery,
                            },
                            Fields = new List<string>
                            {
                                "driveId",
                                "listItemId",
                                "author",
                                "title",
                                "url",
                                "rank"
                            },
                            QueryAlterationOptions = new SearchAlterationOptions
                            {
                                EnableSuggestion = true,
                                EnableModification = true,
                            }
                            , Region ="US"
                        },
                    },
            };

         
            var searchResults = await graphClient.Search.Query.PostAsync(requestBody).ConfigureAwait(false);
            //return searchResults;
            var jsonResults = ExtractDriveItemDetails(searchResults);
            return jsonResults;
        }
        catch (Exception ex)
        {
            log.LogError(ex.Message);
            return null;
        }
    }

The Extraction method looks like this:

  private static string ExtractDriveItemDetails(Microsoft.Graph.Search.Query.QueryResponse searchResults)
    {
        List<System.Collections.Generic.Dictionary<string, string?>> listOfDriveItems = new List<System.Collections.Generic.Dictionary<string, string?>>();

        // Iterate through all the items in searchResults
        foreach (var hitContainer in searchResults.Value.SelectMany(result => result.HitsContainers))
        {
            foreach (var hit in hitContainer.Hits)
            {
                // Check if the resource is a DriveItem
                if (hit.Resource is DriveItem driveItem)
                {
                    var driveItemDetails = new Dictionary<string, string?>();

                    // Get the AdditionalData dictionary from the driveItem
                    var additionalData = driveItem.ListItem.Fields.AdditionalData;

                    // Use TryGetValue to safely access the values and handle null or missing keys
                    string? driveId;
                    if (additionalData.TryGetValue("driveId", out var driveIdValue) && driveIdValue is string driveIdString)
                    {
                        driveId = driveIdString;
                    }
                    else
                    {
                        driveId = null;
                    }
                    
                    driveItemDetails.Add("itemId", driveItem.ListItem.Id.ToString());
                    driveItemDetails.Add("title", additionalData["title"] as string);
                    driveItemDetails.Add("author", additionalData["author"] as string);
                    driveItemDetails.Add("url", additionalData["url"] as string);

                    listOfDriveItems.Add(driveItemDetails);
                }
            }
        }

        string json = JsonConvert.SerializeObject(listOfDriveItems);
        return json;
    }
1

There are 1 best solutions below

0
On BEST ANSWER

To scope the search query to docx and pdf files only, add filetype:docx OR filetype:pdf to you search query defined in this.searchQuery

...
Query = new SearchQuery
{
    QueryString = this.searchQuery,
},
...

If hit.Resource is DriveItem, you can read driveId from the parentReference

if (hit.Resource is DriveItem driveItem)
{
    ...
    var driveId = driveItem.ParentReference.DriveId;
    ..
}