I have a CSV file with each row representing the different components of an address, such as City, Street, House No, etc., and then a column with a combined address in one line, with a predefined format, e.g. Street House No, Zip Code, City.
What I want is to judge the different components from user entered Address Text, e.g. I would like to know if the user has entered all the components, or just the Street Name and the City, etc, and thne what the values for these components are.
Can I achieve this through a Machine Learning technique, so that I teach the model using my CSV file that this is how a Address Text is splitted into different components, and then expect it to provide me the different components based on that training?
Adding more information, as I'm implementing this in .NET, so a solution based on ML.NET or something that can be easily integrated with .NET, would be preferable.
Also, we can look at this problem regardless of the Address parsing context. Shouldn't we be able to teach a model that this is how a text sentence is comprised of different parts in any given context. And then expect the model to suggest the parts from a given new text sentence?
Before developing a custom model by yourself, I suggest you to give a try to the libpostal project.
(I am going to assume that you are developing in Python)
It has several interesting features already built, such as:
There is the example from the doc of pylibpostal
But libpostal is not easy to install or to integrate with popular languages such as Python as it is developed purely in C, so you need to install additional dependencies.
In case you limit your scope to USA, CAN or GBR there are other more simple alternatives such as pyap (Python Addresses Parser). But, it is not as general and powerful as libpostal. pyap is based on regex, it is faster and easier to install/maintain.