Is there an efficient way in Java to search for multiple words (not next to each other) in the same String value?

91 Views Asked by At

We pull a lot of our data from an authoritative source into various attributes in our environment.

One of those attributes we have is "jobTitle" - as the name suggests, it's an identity's respective job title at the organization.

One of my primary jobs is to create Assignment Rules for Roles at the organization and I've run into an issue that I believe can be done more efficiently, but am missing the java knowledge

So here's the problem:

We use the following Java line to "get" any specific jobTitle in our org that has Nurse in it: identity.getAttribute("jobTitle").contains("Nurse");

My question for you Java experts - is there a way I utilize wildcards where I can pull all job titles that contain %Nursing%Specialist% or %Nursing%Coordinator% ..

So if I wanted to provision the role to all users who have a jobTitle of "Nursing Professional Development Specialist" or "Nursing Resource Coordinator" for example where the words Coordinator and Specialist could possibly be separated by other strings.

Is there an efficient way in Java to overcome this challenge?

For example-

Currently are returning true if the following match:

return "Nursing Care Coordinator".equalsIgnoreCase(identity.getAttribute("jobTitle")) || "Nursing Resource Coordinator".equalsIgnoreCase(identity.getAttribute("jobTitle")) || "Nursing Practice Specialist".equalsIgnoreCase(identity.getAttribute("jobTitle")) ||  "Nursing Professional Development Specialist".equalsIgnoreCase(identity.getAttribute("jobTitle"));

%Nursing%Specialist% - those % could be other strings such as Professional Nursing Development Specialist , so just trying to make sure all necessary job titles get the correct roles. Would like to not have to be able to type out every specific job title!!

5

There are 5 best solutions below

7
Bohemian On BEST ANSWER

You can do it efficiently for the programmer (one expression) using regex. To match "Nurse" or "Nursing .... Specialist" or "Nursing .... Coordinator":

identity.getAttribute("jobTitle")
  .matches("^(?=Nurse$|Nursing.*(Coordinator|Specialist)$).*")

and it's reasonably efficient for the machine.

See live demo for a full explanation of the regex.


If you just wanted to assert that the terms appear in any order, you would use two look aheads - one for each term:

matches("^(?=.*Nurs(e|ing))(?=.*(Coordinator|Specialist)).*")
0
Daniel Byrne On

Searching for multiple words not necessarily adjacent in a string can be handled using regular expressions.

You can use Java's 'Pattern' and 'Matcher' classes from java.util.regex to accomplish this. You can use regexr to help you build and test your regular expression.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MultiWordSearch {
    public static boolean containsWords(String input, String[] words) {
        // Building a regex pattern like "(?=.*word1)(?=.*word2)(?=.*word3).*"
        StringBuilder regex = new StringBuilder();
        for (String word : words) {
            regex.append("(?=.*").append(Pattern.quote(word)).append(")");
        }
        regex.append(".*");

        Pattern pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(input);

        return matcher.matches();
    }

    public static void main(String[] args) {
        String input = "This is a sample string to search";
        String[] words = {"sample", "search"};

        System.out.println("Contains words: " + containsWords(input, words));
    }
}
2
Greg Fenton On

As @Dave Newton points out, the term "efficient" is open pretty broadly to interpretation.

One approach would be simply to do a count of how many "keywords" are in a given jobTitle, return that count and action off it. Something like:

public static boolean containsMultipleKeywords(String jobTitle) {
  // for case-insensitive comparison
  jobTitle = jobTitle.toLowerCase();

  // Set of keywords to check
  Set<String> keywords = new HashSet<>(Arrays.asList("nurse", "coordinator", "director"));

  // Count the occurrences of keywords
  int count = 0;
  for (String keyword : keywords) {
    if (jobTitle.contains(keyword)) {
      count++;
    }
  }

  // Return true if at least two keywords are found
  return count >= 2;
}

Alternatively you could use a regular expression:

  Pattern pattern = Pattern.compile("\\b(nurse|coordinator|director)\\b");

  // Matcher to find matches in the jobTitle
  Matcher matcher = pattern.matcher(jobTitle);

  // Count the number of matches
  int count = 0;
  while (matcher.find()) {
    count++;
  }
0
WJS On

You could build a predicate to check for complete matches. Then apply it to the title. The more parameters, the fewer likely hits.

String[] titles = {"obstetrics nurse admin","Nurse Admin obstetrics",
        "obstetrics admin nurse","obstetrics nurse"};

String[] traits = {"nurse","admin","obstetrics"};

Predicate<String> check = buildPredicate(traits);
for (String title :titles) {
    System.out.println(title + " -> " + check.test(title.toLowerCase()));
}
    

static Predicate<String> buildPredicate(String[] info) {
     Predicate<String> pred = s -> s.contains(info[0]);
     for (int i = 1; i < info.length; i++) {
         final int k = i;
         pred = pred.and(s -> s.contains(info[k]));
     }
     return pred;
}

prints

obstetrics nurse admin -> true
Nurse Admin obstetrics -> true
obstetrics admin nurse -> true
obstetrics nurse -> false
0
Basil Bourque On

tl;dr

Filter a list of strings, looking for those that (a) contain the word Nursing, AND (b) contain either the word Specialist or the word Coordinator.

Stream.of (
                "Nursing Professional Development Specialist" ,  // Hit
                "Accountant" ,  // Miss
                "Nursing Manager" ,  // Miss
                "Nursing Resource Coordinator"  // Hit
        )
        .filter (
                ( String job ) ->
                        Arrays.stream ( job.split ( " " ) ).anyMatch ( Set.of ( "Nursing" ) :: contains )
                        &&
                        Arrays.stream ( job.split ( " " ) ).anyMatch ( Set.of ( "Specialist" , "Coordinator" ) :: contains )
        )
        .toList ( );

[Nursing Professional Development Specialist, Nursing Resource Coordinator]

Filtering streams

You can make a Stream of the words in each job title. Calling String#split produces an array. Pass the array to Arrays.stream to produce a stream of strings, a stream of the words in the job title.

Arrays.stream ( job.split ( " " ) )

Some sample data:

Collection < String > jobs =
        List.of (
                "Nursing Professional Development Specialist" ,  // Hit
                "Accountant" ,  // Miss
                "Nursing Manager" ,  // Miss
                "Nursing Resource Coordinator"  // Hit
        );

Define our targets.

Set < String > x = Set.of ( "Nursing" );
Set < String > y = Set.of ( "Specialist" , "Coordinator" );

Apply each of those sets of targets as a predicate of a filter on the stream of job titles. Collect the job titles that pass our test.

Collection < String > hits =
        jobs
                .stream ( )
                .filter (
                        ( String job ) ->
                                Arrays.stream ( job.split ( " " ) ).anyMatch ( x :: contains )
                )
                .filter (
                        ( String job ) ->
                                Arrays.stream ( job.split ( " " ) ).anyMatch ( y :: contains )
                )
                .toList ( );

Dump to console.

System.out.println ( "hits = " + hits );

hits = [Nursing Professional Development Specialist, Nursing Resource Coordinator]


We could combine our predicate tests. But I might prefer the original longer version using separate filter calls.

Collection < String > hits =
        jobs
                .stream ( )
                .filter (
                        ( String job ) ->
                                Arrays.stream ( job.split ( " " ) ).anyMatch ( x :: contains )
                                        &&
                                        Arrays.stream ( job.split ( " " ) ).anyMatch ( y :: contains )
                )
                .toList ( );