I am trying to calculate edit distances of a string against a collection to find the closest match. My current problem is that the collection is very large (about 25000 items), so I had to narrow down the set to just strings of similar lengths but that still would only narrow it down to a few thousand strings and this still is very slow. Is there a datastructure that allows for a quick lookup of similar strings or is there another way I could address this problem?
Quickly compare a string against a Collection in Java
785 Views Asked by Lezan At
3
There are 3 best solutions below
0
Mark Rotteveel
On
If your criteria for 'similar' define a total ordering, you should be able to define a Comparator and use a TreeSet to find the closest matches (eg using the ceiling and floor methods).
0
kkm -still wary of SE promises
On
Levenshtein Automata allow for fast selection of a set of words from a large dictionary such that they are within the given Levenshtein distance from a given word.
See: Schulz K, Mihov S. (2002) Fast String Correction with Levenshtein-Automata.
Related Questions in JAVA
- I need the BIRT.war that is compatible with Java 17 and Tomcat 10
- Creating global Class holder
- No method found for class java.lang.String in Kafka
- Issue edit a jtable with a pictures
- getting error when trying to launch kotlin jar file that use supabase "java.lang.NoClassDefFoundError"
- Does the && (logical AND) operator have a higher precedence than || (logical OR) operator in Java?
- Mixed color rendering in a JTable
- HTTPS configuration in Spring Boot, server returning timeout
- How to use Layout to create textfields which dont increase in size?
- Function for making the code wait in javafx
- How to create beans of the same class for multiple template parameters in Spring
- How could you print a specific String from an array with the values of an array from a double array on the same line, using iteration to print all?
- org.telegram.telegrambots.meta.exceptions.TelegramApiException: Bot token and username can't be empty
- Accessing Secret Variables in Classic Pipelines through Java app in Azure DevOps
- Postgres && statement Error in Mybatis Mapper?
Related Questions in ALGORITHM
- MCNP 6 - Doubts about cells
- Given partially sorted array of type x<y => first apperance of x comes before first of y, sort in average O(n)
- What is the algorithm behind math.gcd and why it is faster Euclidean algorithm?
- Purpose of last 2 while loops in the merge algorithm of merge sort sorting technique
- Dots and Boxes with apha-beta pruning
- What is the average and worst-case time complexity of my string searching algorithm?
- Building a School Schedule Generator
- TC problem 5-2:how to calculate the probability of the indicator random variable?
- LCA of a binary tree implemented in Python
- Identify the checksum algorithm
- Algorithm for finding a subset of nodes in a weighted connected graph such that the distance between any pair nodes are under a postive number?
- Creating an efficent and time-saving algorithm to find difference between greater than and lesser than combination
- Algorithm to find neighbours of point by distance with no repeats
- Asking code suggestions about data structure and algorithm
- Heap sort with multithreading
Related Questions in DATA-STRUCTURES
- Why is the runtime for this O(n)?
- Purpose of last 2 while loops in the merge algorithm of merge sort sorting technique
- What is the problem in my "sumAtBis" code?
- Asking code suggestions about data structure and algorithm
- What would be the most efficient way to store multiple sets of fixed arrays (std::vector)?
- About Suffix Trees features
- Getting wrong answer in Binary Search solution
- Are there techniques to mathematically compute the amount of searching in greedy graph searching?
- AVL tree Nth largest operation - How to have all my tests pass? JAVA
- Why does the map size change?
- Complexity in Union of disjointed sets with lists
- Hash collisions in Golang map resolving
- C++ ordered map optimized with index access
- How to sort this list of strings along with the strings and output the result as expected?
- Why deleting an element in a linkedlist costs O(1)
Related Questions in PATTERN-MATCHING
- powershell where-object -cnotmatch filter unwanted lines
- How to pattern match nested records in Java 22
- Matching multi-language (latin extended) characters in lua
- How to use Elixir pattern matching to check if a list's item startswith a given string(in a variable)?
- Neo4J Subgraph Labelling and accessing it back
- 'Is not' operator with custom 'throw exception' helper method makes variable unnasigned
- Algorithm for comparing two sets of sets
- Pattern matching over entries of an array/split
- Match-3 gravity games solver?
- matching multiple patterns at once in ocaml
- Why 00 is a valid integer in Python?
- How to execute multiple other cases inside of a case
- Finding items that match multiple LIKE keywords
- How can data be cast when using patterns for destructuring in Dart?
- When Option pattern matching optimizes up to if statements in Scala?
Related Questions in EDIT-DISTANCE
- How does Oracle DB compute edit distance and similarity with non-ASCII characters?
- Why is Rust port of function 2x slower than C++?
- Why "Longest Common Subsequence" prohibits "substitution" using edit distance methodology
- Why do I get runtime error (NZEC) in EDIST in spoj in Python
- Finding the subset of a dictionary that has the minimum edit distance to a given string
- Find out edit distance between two strings
- Graph edit distance for connected components in a graph - considering the spatial distance
- group_by edit distance between rows over multiple columns
- Reducing time for minimum edit distance in python?
- solr fuzzy search with edit distance above 1
- Numpy implementation of Edit Distance Algorithm
- What is an idiomatic way to generate a sequence of graphs from networkx.optimal_edit_paths?
- How to interpret the output of networkx.optimal_edit_paths?
- Abbreviation similarity between strings
- Editing distance with limit (threshold)
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Sounds like a BK-tree might be what you want. Here's an article discussing them: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees. A quick Google yields some Java implementations.