Google Search Appliance - Best way to filter filetype

1k Views Asked by At

I am trying to set up a filter for users by file type.

Using special query terms File Type Filtering or File Extension Filter adds text to the end of the query term. Which in turn displays Searched for "abc etx:pdf" and also adds that to the suggestions which is hardly ideal.

Setting up a seperate front end for each filetype or using as_filetype also results in a similar predicament.

I don't really want to have to set up seperate collections for each one becuase then I would end up with over 70 collections (there are 10 sites I am crawling).

Are there any other alternatives that filter results by mime or extension that aren't added to the query term? What is the best way to filter by mime or extension?

1

There are 1 best solutions below

0
On

If the file's extension appears in its URL, you can use Entity Recognition to add a special metadata entry with the file extension as the value. Or you can return a special HTTP response header from your web server for the file, which you can configure in GSA as additional metadata for the file.

Once you have a specific metadata field for the file(s), you can then use requiredfields parameter to filter for them without polluting search terms. For example, say all PDFs have a metadata field named "FileType" with value as "PDF", your search URL would look like

...&q=<what user searched>&requiredfields=FileType:PDF