so I've been told I might not have access to the registry or programs with which usually load their IFilters onto the system, so I have to include the IFilter dlls in the application and load them directly from there. I'm currently using CodeProject's C# IFilter classes, but their are still a few things that are over my head when it comes to the filterPersistClass, persistentHandlerClass and COM and as such I am a bit lost on how I could get this to work.
I've done all the mundane stuff like, get the dlls, setup a resource file with "Extension, DLL Path" and that, but just can't seem to get a grasp on how to now load the IFilter DLL. It's maybe that I should just start from scratch, but thought I would ask for some help first.
EDIT (Partial Solution)
Well I figured out how to load query.dll using the code below in the FilterReader constructor in FilterReader.cs, though I'm having problems now loading the PDFFilter.dll file and am getting the following error:
Unable to find an entry point named 'LoadIFilter' in DLL 'C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\PDFFilter.dll'
The problem I think I am now stuck at is that PDFFilter.dll uses STA and C# applications are MTA.
[DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)]
static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] ref object pUnkOuter, ref IFilter ppIUnk);
// --------------------------- constructor ----------------------------------
var isFilter = false;
object iUnknown = null;
LoadIFilter(fileName, ref iUnknown, ref _filter);
var persistFile = (_filter as IPersistFile);
if (persistFile != null)
{
persistFile.Load(fileName, 0);
IFILTER_FLAGS flags;
IFILTER_INIT iflags =
IFILTER_INIT.CANON_HYPHENS |
IFILTER_INIT.CANON_PARAGRAPHS |
IFILTER_INIT.CANON_SPACES |
IFILTER_INIT.APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT.HARD_LINE_BREAKS |
IFILTER_INIT.FILTER_OWNED_VALUE_OK;
if (_filter.Init(iflags, 0, IntPtr.Zero, out flags) == IFilterReturnCode.S_OK)
isFilter = true;
}
if (_filter != null && isFilter) return;
if (_filter != null) Marshal.ReleaseComObject(_filter);
There is nothing magical about
IFilter
objects. They are housed in standard COM dlls. In the end, all you need theclsid
of the class which knows how to processpdf
files.The
LoadIFilter
function inquery.dll
is just a convenient helper function. Everything it does you can do yourself.There is a standard way, in the registry, in which a file extension (e.g.
.pdf
) is resolved to aclsid
(e.g.{E8978DA6-047F-4E3D-9C78-CDBE46041603}
)The algorithm to resolve an
.ext
to theclsid
of an object that implementsIFilter
is:Once you have the
clsid
of the appropriate object, you create it with:You now have the entire guts of the
LoadIFilter
function fromquery.dll
:Now, all that still requires the registry, because you still have to be able to resolve an extension into a
clsid
. If you already know the classid, then you don't need the registry:And you're good to go.
The important point is that the function you're trying to call,
LoadIFilter
is not inside Adobe's dll (or any other IFilter dll provided by any other company, to crawl any other file types). TheLoadIFilter
function is exported byquery.dll
, and is simply a helper function for the above steps i described.All
IFilter
dlls are COM dlls. The documented way to load a COM dll is through theCoCreateInstance
function:I'll leave it to you to find the correct way to create a COM object from C# managed code. I've forgotten.