I have a requirement wherein, I have to update document metadata for millions of documents in the objectstore. So I wrote a simple java stand-alone with the below approach
SearchSQL documentSearchSQL = new SearchSQL();
String selectQuery = "Id ";
String classSymbolicName="Document_Class_Name";
String myAlias1 = "r";
String whereClause="r.Document_Type_Code='DIRMKTGDOC'and VersionStatus=1"
boolean subClassesToo=false;
documentSearchSQL.setSelectList(selectQuery);
documentSearchSQL.setFromClauseInitialValue(classSymbolicName, myAlias1, subClassesToo);
documentSearchSQL.setWhereClause(whereClause);
UpdatingBatch updatingBatch =null;
SearchScope searchScope = new SearchScope(p8ObjectStore);
RepositoryRowSet rowSet = searchScope.fetchRows(documentSearchSQL, new Integer(10000), null, new Boolean(true));
PageIterator pageIterator = rowSet.pageIterator();
RepositoryRow row;
Document document = null;
while(pageIterator.nextPage()){
Object[] rowArray = pageIterator.getCurrentPage();
updatingBatch = UpdatingBatch.createUpdatingBatchInstance(p8ObjectStore.get_Domain(),RefreshMode.NO_REFRESH);
for (int i = 0; i < rowArray.length; i++) {
row= (RepositoryRow)rowArray[i];
Properties documentProps = row.getProperties();
document = Factory.Document.fetchInstance(p8ObjectStore, documentProps.getIdValue("Id"), null);
// I have the metadata symbolic name and its values within HashMap. So iterating Map to set the values
for(Map.Entry<String, ArrayList<String>> documentMetadata : documentMetadataValues.entrySet()){
document.getProperties().putObjectValue(documentMetadata.getKey(), documentMetadata.getValue().get(1));
}
updatingBatch.add(document, null);
}
updatingBatch.updateBatch();
When I ran a query on docVersion, I found around 700K documents matching the criteria and was expecting all of them to get updated. When I ran the program, it updated about 390k documents and then gave error
com.filenet.api.exception.EngineRuntimeException: FNRCA0031E: API_UNABLE_TO_USE_CONNECTION: The URI for server communication cannot be determined from the connection object http://server:port/wsi/FNCEWS40MTOM. Message was: Connection refused: connect
Is there a better way to achieve this? Also, I will be using, component queue to run this tool in production.
You have 2 better options actually to do this, either by using the script-based bulk actions or the sweeps.
Bulk Actions
You can apply bulk actions to the search results of a query. The application of these actions occurs either while the query runs or after the query runs.
For more on this you can check the knowledge center here
Custom Sweep Job
Alternatively you can use the a custom sweep job. A sweep is an instance of a background service that you configure to process objects in a database table. If an object meets a configured criteria, the sweep performs an action on the object. The sweep consists of a sweep action and a sweep job
For more on the sweep jobs, please check the link here