Azure AI Search using blob storage - can't get past a Base64 decode issue

52 Views Asked by At

I am successfully using Azure AI Search pointing at a storage container which lives inside an Azure Storage Account. I have everything working as expected DataSource Index, Indexer and Skill set.

The only issue I cannot solve (I have spent a lot of time searching for a solution and trying various fixes recommended by others but nothing resolves the issue) is that my REST API search endpoint successfully returns results. And when I decode the Base64 strings manually using a Base64 decoding site they are correctly converted to valid URLs that point to my files in Azure storage. Here is the following base64 string:

aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTAucG5n0

And here it is decoded manually:

https://rdmc01devazuresearchsa.blob.core.windows.net/rdmc01-dev-docs/10.png

Here are the full REST API search results:

    {
    "@odata.context": "https://rdmc01-dev-azure-search-service.search.windows.net/indexes('azureblob-index')/$metadata#docs(*)",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 8.4224205,
            "language": "English",
            "organizations": [
                "Microsoft",
                "Open source",
                "FEDORA",
                "Centos",
                "Linux Foundation"
            ],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTYuZG9jeA2",
            "metadata_storage_name": "16.docx"
        },
        {
            "@search.score": 6.806098,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTAucG5n0",
            "metadata_storage_name": "10.png"
        },
        {
            "@search.score": 6.806098,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvbW9sbGllLnBuZw2",
            "metadata_storage_name": "mollie.png"
        },
        {
            "@search.score": 6.7477694,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTQuanBn0",
            "metadata_storage_name": "14.jpg"
        }
    ]
}

However, when I use .NET C# to decode them I get the following error:

FormatException: The input is not a valid Base-64 string as it contains a 
non-base 64 character, more than two padding characters, 
or an illegal character among the padding characters.

Any help would be great as I have run out of ideas.

1

There are 1 best solutions below

2
JayashankarGS On BEST ANSWER

The error is due to padding. The length of base64 should be a multiple of 4.

Use the sample code below:

using System;
using System.Text.RegularExpressions;
public class Program
{
    public static void Main()
    {
        string base64String = "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvbW9sbGllLnBuZw2";
        var rem = base64String.Length % 4;
        
        base64String += new string('=', 4 - rem);
        Console.WriteLine(base64String);
        Console.WriteLine(System.Text.Encoding.UTF8.GetString(Convert.FromBase64String(base64String)));
    }
 
}

In this code, I am adding the missing lengths.

Output:

enter image description here

It works for all the file paths provided except 10.png and 14.jpg, as they are corrupted somewhere during the process.

Removing the last character 0 resolves errors for both files.

enter image description here