According to GSA's documentation:
PDF or XPS documents typically have metadata such as:
<MT N="CreationDate" V="D:20040107111105Z"/>
<MT N="ModDate" V="D:20040209162220+01'00'"/>
The search appliance can automatically pick up these formats without any special formatting configuration.
But unfortunately this does not seem to be working. We have PDFs, DOCs and other files in our site, and the last modified dates are appearing in the corresponding <MT>
entries in the GSA search results. But <FS NAME="date">
has a blank value, which indicates that GSA could not extract the date. Even specifying the date format in "Document Dates" page in the GSA console does not help.
So how to make GSA "see" the documents' last modified dates? Please note: we cannot use web server's last-modified HTTP header values since they are not correct in our case (AEM dispatcher/caching interference).
GSA can extract metadata from Document Properties but I am not sure if GSA can use that ModDate/CreationDate to populate
<FS NAME="date">
without "Document Dates" configuration.You have mentioned that "you cannot use web server's last-modified HTTP header values since they are not correct in our case." Does it mean your web server is returning last-modified header with incorrect values?
Last-Modified response header takes precedence over all other metadata in GSA. So if your server cannot return correct values then you have to remove the Last-Modified header from response.
I have come across many people using java Simpledateformat (yy-MM-dd) while specifying the format under Document Dates but GSA can only understands strptime format.This is one of the prime reason why GSA fails to populate
<FS NAME="date">
. So make sure to use date format in strptime else leave it blank as it is not a mandatory field.