The data with semantic meaning is pretty suitable for NLP models. However, when I look at my task data, I don't know how to evaluate whether these data is suitable for NLP models.
The following is my data format:
- BGP protocol is used to transfer routing informations among ASes. Now I want to analyze AS_PATH data, which is presented as "AS2449 AS3356 AS32934". The data is a list of AS number.
- The AS_PATH has some regulations: the first AS is called vantage point, the last AS is called origin AS and there is no circle.
- The number of the data is millions.
I wonder if there are some criteria to judge whether the data is suitable for NLP model.