I'm working on a PHP-based shopping application. I have lists of strings that I know represent the same product. Those strings are likely to contain the full product name or part of it (full product name usually being brand + model).
I wonder what is the best approach to perform this extraction of the product names.
For instance, here a list of strings that represent the same product:
- Tkg BOUILLOIRE TKG - JK 1008 RWD
- Tkg Jk 1008 Rwd
- Tkg Kalorik - JK 1008 RWD - Bouilloire Électrique sans Fil 360°
- TKG Bouilloire électrique sans fil 1,7 litre 2000 watts Pois TKG Rouge et blanc
- Tkg Kalorik - JK 1008 RWD - Bouilloire Électrique sans Fil 360°
- Tkg JK 1008 RWD BOUILLOIRES
I expect to extract the product name "Tkg JK 1008 RWD". Pls note that String 4 only contains partial information.
I've tried an approach when I counted repeated words in all strings ; but from there, difficult to go further.
Would you have any clue ?
Cheers Nicolas
A first stab at implementing some ideas you guys brought.