I am building a custom keyboard for android, the one that atleast supports autocomplete suggestions. To achieve this, I am storing every word that user types (not password fields) in a Room database table which has a simple model, the word and its frequency. Now for showing the suggestions, I am using a Trie which is populated by words from this database table. My query is basically to order by the table based on frequency of the word and limit the results to 5K (I do not feel like overpopulating the Trie, these 5K words can be considered as the users' favourite words that he uses often and needs suggestions for). Now my actual problem is the ORDER BY clause, this is a rapidly growing data set, sorting lets say 0.1M words to get 5K words seems like an overkill. How can i rework this approach to improve the efficiency of this entire suggestions logic.
Android custom keyboard suggestions
677 Views Asked by Shubham Gajra At
1
There are 1 best solutions below
Related Questions in OPTIMIZATION
- bs-webapi - How to loop over Dom.nodeList?
- How can I use the output signature of a functor in an interface file?
- How to use npm packages with ReasonML?
- Why would it matter where a type declaration is located?
- Can I pattern match on JS objects?
- ReasonML vs TypeScript
- Limit recursion to the first three elements of list of floats
- Why are Reason Arrays Mutable?
- How to target subdirectories in BuckleScript bsconfig.json
- OOP - How does one create a class in ReasonML
Related Questions in ANDROID-ROOM
- bs-webapi - How to loop over Dom.nodeList?
- How can I use the output signature of a functor in an interface file?
- How to use npm packages with ReasonML?
- Why would it matter where a type declaration is located?
- Can I pattern match on JS objects?
- ReasonML vs TypeScript
- Limit recursion to the first three elements of list of floats
- Why are Reason Arrays Mutable?
- How to target subdirectories in BuckleScript bsconfig.json
- OOP - How does one create a class in ReasonML
Related Questions in TRIE
- bs-webapi - How to loop over Dom.nodeList?
- How can I use the output signature of a functor in an interface file?
- How to use npm packages with ReasonML?
- Why would it matter where a type declaration is located?
- Can I pattern match on JS objects?
- ReasonML vs TypeScript
- Limit recursion to the first three elements of list of floats
- Why are Reason Arrays Mutable?
- How to target subdirectories in BuckleScript bsconfig.json
- OOP - How does one create a class in ReasonML
Related Questions in CUSTOM-KEYBOARD
- bs-webapi - How to loop over Dom.nodeList?
- How can I use the output signature of a functor in an interface file?
- How to use npm packages with ReasonML?
- Why would it matter where a type declaration is located?
- Can I pattern match on JS objects?
- ReasonML vs TypeScript
- Limit recursion to the first three elements of list of floats
- Why are Reason Arrays Mutable?
- How to target subdirectories in BuckleScript bsconfig.json
- OOP - How does one create a class in ReasonML
Related Questions in ANDROID-CUSTOM-KEYBOARD
- bs-webapi - How to loop over Dom.nodeList?
- How can I use the output signature of a functor in an interface file?
- How to use npm packages with ReasonML?
- Why would it matter where a type declaration is located?
- Can I pattern match on JS objects?
- ReasonML vs TypeScript
- Limit recursion to the first three elements of list of floats
- Why are Reason Arrays Mutable?
- How to target subdirectories in BuckleScript bsconfig.json
- OOP - How does one create a class in ReasonML
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
If not already implemented, an index on the frequency
@ColumnInfo(index = true)
.Another could be to add a table that maintains the highest 5k. Supported by yet another table (the support table) that has 1 row, with columns for; the highest frequency (not really required), the lowest frequency in the current 5k, and a 3rd column for the number currently held. So you could then, after adding an existing word get whether or not the new/updated word should be added to the 5k table (perhaps a 4th column for the primary key of the lowest to facilitate efficient deletion).
So
Note that the 5K table would probably only need to store the rowid as a pointer/reference/map to the core table.
rowid is a column that virtually all tables will have in Room (virtual tables an exception as are table that have the WITHOUT ROWID attribute but Room does not facilitate (as far as I am aware) WITHOUT ROWID table).
The rowid can be up to twice as fast as other indexes. I would suggest using
@PrimaryKey Long id=null;
(java) or@PrimaryKey var id: Long?=null
(Kotlin) and NOT using@PrimaryKey(autogenerate = true)
.autogenerate = true
equates to SQLite'sAUTOINCREMENT
, about which the SQLite documentation says "The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed."<column_name> INTEGER PRIMARY KEY
and no value or null for the primary key column's value then SQLite generates a value that is 1 greater than max(rowid).autogenerate=true
then the generated value is the greater of max(rowid) and the value stored, for that table, in the sqlite_sequence table (hence the overheads).Demonstration
The following is a demonstration albeit just using a basic Word table as the source.
First the 2 tables (@Entity annotated classes)
Word
WordSubset aka the table with the highest occurring 5000 frequencies, it simply has a reference/map/link to the underlying/actual word. :-
WordSubsetSupport this will be a single row table that contains the highest and lowest frequencies (highest is not really needed), the number of rows in the WordSubset table and a reference/map to the word with the lowest frequency.
For access an abstract class (rather than interface, as this, in Java, allows methods/functions with a body, a Kotlin interface allows these) CombinedDao :-
TheDatabase is a pretty standard @Database annotated class, other than that it allows use the main thread for the sake of convenience and brevity of the demo:-
Finally activity code that randomly generates and adds 10,000 words (or thereabouts as some could be duplicate words), each word having a frequency that is also randomly generated (between 1 and 10000) :-
Obviously the results would differ per run, also the demo is only really designed to be run once (although it could be run more).
After running, using AppInspection, then
The support table (in this instance) is :-
The subset table on it's own means little as it just contains a map to the actual word. So it's more informative to use a query to look at what it contains.
e.g.
As can be seen the query shows that the word who's wordId is 7412 is the one with the lowest frequency of 4690 (as expected according to the support table)
Going to the last page shows:-