I have Arabic text in database with diacritics. when i type Arabic for searching some string, it is without diacritics which definitely do not match with database string. it is working fine on text without diacritics. is there any way to run it on text with diacritics ???
how to perform search on Arabic text in JAVA?
3.9k Views Asked by Baqer Naqvi AtThere are 6 best solutions below

Arabic diacritics are characters so you can use like clause like this:
SELECT * FROM table WHERE name LIKE 'a[cd]*b[cd]*'
this will find 'ab' with any number of c or d between them.
you could do so by adding all arabic diacritics between square brackets after every letter
here you can find all of them with their unicode code point.

Hope not to be late to the party, my issue is a little bit different than the OP, I wanted to search for Arabic text with diacritics and wanted to mark the searched text with some color, so I need to save the indices of the matched text.
The issue is that normalizing the text without diacritics will reduce the text length, and will get different indices of the searched text.
So, got that solved by using regex and SpannableString
/*
* input: input text with Arabic Diacritics Or Letters that you want to ignore while searching
* searchedWord: the word/text that you want to search in @input text
* color: used to return a the founded matches with a different Foreground color using a SpannableString
* */
public static Spannable searchArabicWithIgnoredDiacriticsOrLetters(String input, String searchedWord, int color) {
Spannable output = new SpannableString(replaceLetters(input));
StringBuilder sb = new StringBuilder();
for (char ch : replaceLetters(searchedWord).toCharArray()) {
sb.append(ch);
sb.append("[\\u0655\\u0654\\u0670\\u065F\\u065E\\u065D\\u065C\\u065B\\u065A\\u0659\\u0658\\u0657\\u0656\\u06EC\\u06EB\\u06EA\\u06E4\\u061A\\u0619\\u0618\\u0617\\u0616\\u0615\\u064B\\u064C\\u064D\\u064E\\u064F\\u0650\\u0651\\u0652\\u0653\\u06DA\\u06D6\\u06D7\\u06D8\\u06D9\\u06DB\\u06DC\\u06DF\\u06E0\\u06E1\\u06E2\\u06E3\\u06E5\\u06E6\\u06E7\\u06E8\\u06EB\\u06EC\\u06ED]*");
}
Pattern pattern = Pattern.compile(String.valueOf(sb)); // get Pattern of the Regex
Matcher matcher = pattern.matcher(input); // get Matcher of the Pattern Regex in the input text
while (matcher.find())
output.setSpan(new ForegroundColorSpan(color),
matcher.start(), matcher.end(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
return output;
}
public static String replaceLetters(String input) {
String output;
output = input.replaceAll("أ", "ا");
output = output.replaceAll("إ", "ا");
output = output.replaceAll("ى", "ي");
output = output.replaceAll("ة", "ه");
output = output.replaceAll("آ", "ا");
output = output.replaceAll("ٱ", "ا");
return output;
}
Another representation of replaceLetters()
public static String replaceLetters(String input) {
String output;
output = input.replaceAll("\\u0623", String.valueOf((char) Integer.parseInt("0627", 16))); // replace أ with ا
output = output.replaceAll("\\u0625", String.valueOf((char) Integer.parseInt("0627", 16))); // replace إ with ا
output = output.replaceAll("\\u0649", String.valueOf((char) Integer.parseInt("064A", 16))); // replace ي with ى
output = output.replaceAll("\\u0629", String.valueOf((char) Integer.parseInt("0647", 16))); // replace ة with ه
output = output.replaceAll("\\u0622", String.valueOf((char) Integer.parseInt("0627", 16))); // replace آ with ا
output = output.replaceAll("\\u0671", String.valueOf((char) Integer.parseInt("0627", 16))); // replace ٱ with ا
return output;
}
Note: you can refer to the accepted answer for the Unicode representation.

String targetWord = "الذين"
String text = "صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ الْمَغْضُوبِ عَلَيْهِمْ وَلَا الضَّالِّين";
char[] input = targetWord.toCharArray();
StringBuilder regex = new StringBuilder("");
for(char c : input) {
regex.append(c);
regex.append("(\\p{M})*");
}
Pattern searchRegEx = Pattern.compile(regex.toString());
Matcher m = searchRegEx.matcher(text);
if(m.find()){
i = m.start();
System.out.println("m.group(0):: " + i + " : " + m.group(0));
}

I found much better to do that. All rewards to joop for this:
import java.text.Normalizer;
import java.text.Normalizer.Form;
/**
*
* @author Ibbtek <http://ibbtek.altervista.org/>
*/
public class ArabicDiacritics {
private String input;
private final String output;
/**
* ArabicDiacritics constructor
* @param input String
*/
public ArabicDiacritics(String input){
this.input=input;
this.output=normalize();
}
/**
* normalize Method
* @return String
*/
private String normalize(){
input = Normalizer.normalize(input, Form.NFKD)
.replaceAll("\\p{M}", "");
return input;
}
/**
* @return the output
*/
public String getOutput() {
return output;
}
public static void main(String[] args) {
String test = "كَلَّا لَا تُطِعْهُ وَاسْجُدْ وَاقْتَرِبْ ۩";
System.out.println("Before: "+test);
test=new ArabicDiacritics(test).getOutput();
System.out.println("After: "+test);
}
}

Please see below class i created It is for android, return spannable String. It is so basic and did not bother about memory consumption. You guys can optimise yourself.
http://freshinfresh.com/sample/ABHArabicDiacritics.java
https://gist.github.com/alierdogan7/11f9cfb24f5551c34191485fc764d4c0
If you want to check without nunation(harakath) contains in an Arabic String,
ABHArabicDiacritics objSearchd = new ABHArabicDiacritics();
objSearchdobjSearch.getDiacriticinsensitive("وَ اَشْهَدُ اَنْ لا اِلهَ اِلاَّ اللَّهُ").contains("اشهد");
If you want to return Highlighed or redColored searched portion in String. Use below code
ABHArabicDiacritics objSearch = new ABHArabicDiacritics( وَ اَشْهَدُ اَنْ لا اِلهَ اِلاَّ اللَّهُ, اشهد);
SpannableString spoutput=objSearch.getSearchHighlightedSpan();
textView.setText(spoutput);
To see start and end position of search text, Use below methods,
/** to serch Contains */
objSearch.isContain();//
objSearch.getSearchHighlightedSpan();
objSearch.getSearchTextStartPosition();
objSearch.getSearchTextEndPosition();
Please copy shared java class and enjoy.
I will spend more time for more feature if you guys request.
Thanks
Unfortunately no. Like MIE said:
so it's not really possible as far as I know.
MIE's answer will be difficult to implement and will be simply impossible to get update if you change anything in your database.
You can maybe look at the Apache Lucene search software Library. I'm not sure but it looks like it can solve your problem.
Or you'll need to take off all the diacritics from your database and then you'll be able to query it with or without diacritics simply by using a small Arabic Normalizer like this one: