Get distinct values for a group of fields from a list of records

5.2k Views Asked by At

We are using Liferay (6.2 CE GA4) with Lucene to perform search on custom assets. Currently we can retrieve the proper hits and the full documents.

We want to return a unique combination of certain fields for our custom asset.

To make it more clear, we want to do something similar to the following SQL query but using Lucene in Liferay:

SELECT DISTINCT
    field01, field02, field03 
FROM
    FieldsTable
WHERE
    someOtherField04 LIKE "%test%";
ORDER BY
    field01 ASC, field02 ASC, field03 ASC;

How we are doing it currently

Currently we are manually fetching field values by iterating through all the documents and then filtering the duplicate combination. This process takes time when there are more than 5k records to process on each request. And the distinct field values would mostly be a few hundred records.

Any help is much appreciated. Thanks

P.S.: Also cross-posted on Liferay forums: https://www.liferay.com/community/forums/-/message_boards/message/55513210

1

There are 1 best solutions below

5
On

First you need to create the SearchContext for your query (just as reference):

SearchContext searchContext = new SearchContext();
searchContext.setAndSearch(true);

// Add any specific attributes for your use case below:
Map<String, Serializable> attributes = new HashMap<>();

attributes.put(Field.CLASS_NAME_ID, 0L);
attributes.put(Field.DESCRIPTION, null);
attributes.put(Field.STATUS, String.valueOf(WorkflowConstants.STATUS_APPROVED));
attributes.put(Field.TITLE, null);
attributes.put(Field.TYPE, null);
attributes.put("articleId", null);

attributes.put("ddmStructureKey", ...);
attributes.put("ddmTemplateKey", ...);

attributes.put("params", new LinkedHashMap<String, Object>());
searchContext.setAttributes(attributes);

searchContext.setCompanyId(... the ID of my portal instance ..);
searchContext.setGroupIds(new long[] { ... the ID of the site ... });
searchContext.setFolderIds(new long[] {});

Now you can find the list of all values for one or more specific fields:

// We don't need any result document, just the field values
searchContext.setStart(0);
searchContext.setEnd(0);

// A facet is responsible for collecting the values
final MultiValueFacet fieldFacet = new MultiValueFacet(searchContext);
String fieldNameInLucene = "ddm/" + structureId + "/" + fieldName + "_" + LocaleUtil.toLanguageId(locale);
fieldFacet.setFieldName(fieldNameInLucene);
searchContext.addFacet(fieldFacet);

// Do search
IndexerRegistryUtil.getIndexer(JournalArticle.class).search(searchContext);

// Retrieve all terms
final List<String> terms = new ArrayList<>();
for (final TermCollector collector : fieldFacet.getFacetCollector().getTermCollectors()) {
    terms.add(collector.getTerm());
}

At the end terms will contain all terms of your field from all found documents.