Background
I'm writing logic that uses the Microsoft search api to find sharepoint documentation that matches a query. (using both phrase searches and keywords)
Problem
There are two issues:
- It seems that I'm getting back more than just sharepoint / onedrive documents. It's checking teams message history etc. I'd like to know how to control the scope so that it only checks word docs, or pdfs for example. The following screen shot shows some of the data that
I used this article as a guide: https://learn.microsoft.com/en-us/graph/search-concept-files
- How is it that driveItems have no drive id? Some of the data that's being returned missing driveIds. I need the drive id bc later on, I need to call graph again to get the contents of the files.
The following screenshot shows you some of the data I'm getting back with no drive id:
The Environmment / Setup
I'm using MS Graph 1.0. Here are the references from the csproj file:
<PackageReference Include="AutoMapper" Version="12.0.1" />
<PackageReference Include="AutoMapper.Extensions.Microsoft.DependencyInjection" Version="12.0.1" />
<PackageReference Include="Azure.Identity" Version="1.9.0" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="4.1.1" />
<PackageReference Include="Microsoft.Azure.Services.AppAuthentication" Version="1.6.0" />
<PackageReference Include="Microsoft.Graph" Version="5.18.0" />
<PackageReference Include="Microsoft.Graph.Core" Version="3.0.9" />
<PackageReference Include="Microsoft.Identity.Client" Version="4.54.1" />
The Code
The function app includes logic like this:
public async Task<string> Search()
{
var ApplicationClientID = Environment.GetEnvironmentVariable("CLIENT_ID");
var ApplicationClientSecret = Environment.GetEnvironmentVariable("CLIENT_SECRET");
var AzureTenantID = Environment.GetEnvironmentVariable("TENANT_ID");
string[] scopes = new[] { "https://graph.microsoft.com/.default" };
// using Azure.Identity;
var options = new ClientSecretCredentialOptions
{
AuthorityHost = AzureAuthorityHosts.AzurePublicCloud,
};
try
{
var clientSecretCredential = new ClientSecretCredential(
AzureTenantID, ApplicationClientID, ApplicationClientSecret, options);
var graphClient = new GraphServiceClient(clientSecretCredential, scopes, "https://graph.microsoft.com/v1.0");
Console.WriteLine($"searchQuery :{searchQuery}");
var requestBody = new Microsoft.Graph.Search.Query.QueryPostRequestBody
{
Requests = new List<SearchRequest>
{
new SearchRequest
{
EntityTypes = new List<EntityType?>
{
EntityType.DriveItem,
},
Query = new SearchQuery
{
QueryString = this.searchQuery,
},
Fields = new List<string>
{
"driveId",
"listItemId",
"author",
"title",
"url",
"rank"
},
QueryAlterationOptions = new SearchAlterationOptions
{
EnableSuggestion = true,
EnableModification = true,
}
, Region ="US"
},
},
};
var searchResults = await graphClient.Search.Query.PostAsync(requestBody).ConfigureAwait(false);
//return searchResults;
var jsonResults = ExtractDriveItemDetails(searchResults);
return jsonResults;
}
catch (Exception ex)
{
log.LogError(ex.Message);
return null;
}
}
The Extraction method looks like this:
private static string ExtractDriveItemDetails(Microsoft.Graph.Search.Query.QueryResponse searchResults)
{
List<System.Collections.Generic.Dictionary<string, string?>> listOfDriveItems = new List<System.Collections.Generic.Dictionary<string, string?>>();
// Iterate through all the items in searchResults
foreach (var hitContainer in searchResults.Value.SelectMany(result => result.HitsContainers))
{
foreach (var hit in hitContainer.Hits)
{
// Check if the resource is a DriveItem
if (hit.Resource is DriveItem driveItem)
{
var driveItemDetails = new Dictionary<string, string?>();
// Get the AdditionalData dictionary from the driveItem
var additionalData = driveItem.ListItem.Fields.AdditionalData;
// Use TryGetValue to safely access the values and handle null or missing keys
string? driveId;
if (additionalData.TryGetValue("driveId", out var driveIdValue) && driveIdValue is string driveIdString)
{
driveId = driveIdString;
}
else
{
driveId = null;
}
driveItemDetails.Add("itemId", driveItem.ListItem.Id.ToString());
driveItemDetails.Add("title", additionalData["title"] as string);
driveItemDetails.Add("author", additionalData["author"] as string);
driveItemDetails.Add("url", additionalData["url"] as string);
listOfDriveItems.Add(driveItemDetails);
}
}
}
string json = JsonConvert.SerializeObject(listOfDriveItems);
return json;
}
To scope the search query to docx and pdf files only, add
filetype:docx OR filetype:pdf
to you search query defined inthis.searchQuery
If
hit.Resource
isDriveItem
, you can readdriveId
from the parentReference