String normalization to make local couchDB totally similar to the one replicated on Cloudant

247 Views Asked by At

When my app is run locally on my laptop CouchDB it work perfectly, but when I run it from the replication on Cloudant it breaks. I had to use https://github.com/walling/unorm js libs to make it work, but this add too much kb of js code to my liking (I am very obsessed with speed).

I would greatly prefer to encode my local CouchDB to the same utf-8 encoding that Cloudant uses, is it possible and what is the best way to do that ?

My app is a client-side (all in the browser) mini-search engine that get it's data from a csv file included in the html of the page. The csv is generated from couchDB on a laptop running ubuntu 14.10. The app is bilingual, English and French: bottinbio.com

I coded a suggestion feature (on a prototype, not the main website) to suggest words to the user as she types. The data for that comes from a Cloudant database made by replicating the laptop CouchDB database.

The problem is that accentuated words like "bière" retrieved from the Cloudant database are encoded differently than my local CouchDB. Normally clicking on the word "bière" would trigger a search in the csv for that word, but the search fails, even if "bière" is written in the csv. This does not happen when the suggestions comes from the CouchDB database on my localhost development server.

1

There are 1 best solutions below

0
On

I searched a lot on Google and found that Unicode normalization 'NFC' using unorm is the simplest way to go. Since my localhost couchDB and most browsers seems to use 'NFC' string normalization it would be far easier and less bug prone to find a way to make the Cloudant database conform to 'NFC'.

An exemple: "Bières" (Beers in French),

couchDB: "\u0042\u0069\u00E8\u0072\u0065\u0073"

Cloudant: "\u0042\u0069\u0065\u0300\u0072\u0065\u0073"

Another possibility would be to make a json file containing a list of all words strings that are different in the two database and use it to make checks. In my case this gives a small 25kb file. The problem would be the synchronization as more data is added to the database. That is not very complex to implement but could lead to errors because of the growing internationalization of the HTML5 app.