I want to use the resolution time in minutes and the client description of the tickets on Zendesk to predict the resolution time of next tickets based on their description. I will use only this two values, but the description is a large text. I searched about hashing the feature values instead of hash the feature name on Vowpal Wabbit but with no success. Wich is the better approach to use feature values that is large texts to predict using Vowpal?
Prediction based on large texts using Vowpal Webbit
119 Views Asked by André Muniz At
1
There are 1 best solutions below
Related Questions in MACHINE-LEARNING
- NuGet - Given a type name or a DLL, how can I find the NuGet package?
- Exception thrown at 0x0131EB06 Visual Studio
- Visual Studio 2015 Cordova Plugin Add Fail
- Cannot find InvalidCastException in C# Application
- generating C# code file during Visual Studio build
- Can I deploy multiple instances of my application on the same windows phone?
- Close the Solution Explorer window
- How to generate entity framework code-first migrations without using the package manager console?
- Implementing callback function for dialog-based application
- VB.net: How to make original variable value fulfill 2 statements?
Related Questions in VOWPALWABBIT
- NuGet - Given a type name or a DLL, how can I find the NuGet package?
- Exception thrown at 0x0131EB06 Visual Studio
- Visual Studio 2015 Cordova Plugin Add Fail
- Cannot find InvalidCastException in C# Application
- generating C# code file during Visual Studio build
- Can I deploy multiple instances of my application on the same windows phone?
- Close the Solution Explorer window
- How to generate entity framework code-first migrations without using the package manager console?
- Implementing callback function for dialog-based application
- VB.net: How to make original variable value fulfill 2 statements?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Values of features in Vowpal Wabbit can only be real numbers. If you have a categorical feature with n possible values you simply represent it as n binary features (so e.g.
color=red
is a name of a binary feature and its value is 1 by default).If you have a text description you can use the individual words of the text as features (i.e. feature names). You only need to escape ":", "|" and whitespace characters in feature names, all other characters are allowed (including "="). So an example can look like
9 |USER avg_time:11 |SUMMARY words:5 sentences:1 |TEXT I have a big problem
So this ticket with text "I have a big problem" took 9 minutes to resolve and previous tickets from the same user took on average 11 minutes to resolve. If you have enough training examples, I would recommend to add many more features (any details about the user, more summary features about the text etc). Also the time of day (morning, afternoon, evening) and day of week when the ticket was reported may be a good predictor (tickets reported on Friday evening tend to take longer), but maybe you intentionally don't want to model this and focus only on the "difficulty" of the ticket irrelevant of reporting time.
You can also try using word bigrams as features with
--ngram T2
, which means that 2-grams features will be created for all namespaces beginning with T (only TEXT namespace in my example). Maybe the individual words "big" and "problem" are not strong predictors, but the bigram "big problem" will get a high positive weight (indicating it is a good predictor of long resolution time).You mean resolution time and text of the ticket, am I right? But the resolution time is the (dependent) variable you want to predict, so this does not count as a feature (aka independent variable). Of course, if you know the identity of the user and have enough training examples for each user, you can include the average time of previous tickets (excluding the current one, of course) of the user as a feature as I tried to show in the example.