so I've been told I might not have access to the registry or programs with which usually load their IFilters onto the system, so I have to include the IFilter dlls in the application and load them directly from there. I'm currently using CodeProject's C# IFilter classes, but their are still a few things that are over my head when it comes to the filterPersistClass, persistentHandlerClass and COM and as such I am a bit lost on how I could get this to work.
I've done all the mundane stuff like, get the dlls, setup a resource file with "Extension, DLL Path" and that, but just can't seem to get a grasp on how to now load the IFilter DLL. It's maybe that I should just start from scratch, but thought I would ask for some help first.
EDIT (Partial Solution)
Well I figured out how to load query.dll using the code below in the FilterReader constructor in FilterReader.cs, though I'm having problems now loading the PDFFilter.dll file and am getting the following error:
Unable to find an entry point named 'LoadIFilter' in DLL 'C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\PDFFilter.dll'
The problem I think I am now stuck at is that PDFFilter.dll uses STA and C# applications are MTA.
[DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)]
static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] ref object pUnkOuter, ref IFilter ppIUnk);
// --------------------------- constructor ----------------------------------
var isFilter = false;
object iUnknown = null;
LoadIFilter(fileName, ref iUnknown, ref _filter);
var persistFile = (_filter as IPersistFile);
if (persistFile != null)
{
persistFile.Load(fileName, 0);
IFILTER_FLAGS flags;
IFILTER_INIT iflags =
IFILTER_INIT.CANON_HYPHENS |
IFILTER_INIT.CANON_PARAGRAPHS |
IFILTER_INIT.CANON_SPACES |
IFILTER_INIT.APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT.HARD_LINE_BREAKS |
IFILTER_INIT.FILTER_OWNED_VALUE_OK;
if (_filter.Init(iflags, 0, IntPtr.Zero, out flags) == IFilterReturnCode.S_OK)
isFilter = true;
}
if (_filter != null && isFilter) return;
if (_filter != null) Marshal.ReleaseComObject(_filter);
There is nothing magical about
IFilterobjects. They are housed in standard COM dlls. In the end, all you need theclsidof the class which knows how to processpdffiles.The
LoadIFilterfunction inquery.dllis just a convenient helper function. Everything it does you can do yourself.There is a standard way, in the registry, in which a file extension (e.g.
.pdf) is resolved to aclsid(e.g.{E8978DA6-047F-4E3D-9C78-CDBE46041603})The algorithm to resolve an
.extto theclsidof an object that implementsIFilteris:Once you have the
clsidof the appropriate object, you create it with:You now have the entire guts of the
LoadIFilterfunction fromquery.dll:Now, all that still requires the registry, because you still have to be able to resolve an extension into a
clsid. If you already know the classid, then you don't need the registry:And you're good to go.
The important point is that the function you're trying to call,
LoadIFilteris not inside Adobe's dll (or any other IFilter dll provided by any other company, to crawl any other file types). TheLoadIFilterfunction is exported byquery.dll, and is simply a helper function for the above steps i described.All
IFilterdlls are COM dlls. The documented way to load a COM dll is through theCoCreateInstancefunction:I'll leave it to you to find the correct way to create a COM object from C# managed code. I've forgotten.