I want to use the resolution time in minutes and the client description of the tickets on Zendesk to predict the resolution time of next tickets based on their description. I will use only this two values, but the description is a large text. I searched about hashing the feature values instead of hash the feature name on Vowpal Wabbit but with no success. Wich is the better approach to use feature values that is large texts to predict using Vowpal?
Prediction based on large texts using Vowpal Webbit
166 Views Asked by André Muniz At
1
There are 1 best solutions below
Related Questions in MACHINE-LEARNING
- How to cluster a set of strings?
- Enforcing that inputs sum to 1 and are contained in the unit interval in scikit-learn
- scikit-learn preperation
- Spark MLLib How to ignore features when training a classifier
- Increasing the efficiency of equipment using Amazon Machine Learning
- How to interpret scikit's learn confusion matrix and classification report?
- Amazon Machine Learning for sentiment analysis
- What Machine Learning algorithm would be appropriate?
- LDA generated topics
- Spectral clustering with Similarity matrix constructed by jaccard coefficient
- Speeding up Viterbi execution
- Memory Error with Classifier fit and partial_fit
- How to find algo type(regression,classification) in Caret in R for all algos at once?
- Difference between weka tool's correlation coefficient and scikit learn's coefficient of determination score
- What are the approaches to the Big-Data problems?
Related Questions in VOWPALWABBIT
- How do I get the raw predictions (-r) from Vowpal Wabbit when running in daemon mode?
- How to return predictions in the [0, 1] interval for SVMs in vowpal wabbit
- Update 'vw' command line call to reference most current vowpal_wabbit installation
- Prediction based on large texts using Vowpal Webbit
- How to use vowpal wabbit for online prediction (streaming mode)
- bagging/boosting using vowel wabbit
- Vowpal Wabbit training and testing data formats
- Trouble installing vowpal wabbit/accessing boost program options
- how to retrain the model for sequence of files in vowpal wabbit
- Features in vowpal wabbit
- Dealing with class imbalance in multi-label classification
- Multi-dimensional regression with Vowpal Wabbit
- vowpal wabbit : multilable_oaa does not return label for all inputs
- Vowpal Wabbit: specifying the learning rate schedule
- VowpalWabbit: Differences and scalability
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Values of features in Vowpal Wabbit can only be real numbers. If you have a categorical feature with n possible values you simply represent it as n binary features (so e.g.
color=redis a name of a binary feature and its value is 1 by default).If you have a text description you can use the individual words of the text as features (i.e. feature names). You only need to escape ":", "|" and whitespace characters in feature names, all other characters are allowed (including "="). So an example can look like
9 |USER avg_time:11 |SUMMARY words:5 sentences:1 |TEXT I have a big problemSo this ticket with text "I have a big problem" took 9 minutes to resolve and previous tickets from the same user took on average 11 minutes to resolve. If you have enough training examples, I would recommend to add many more features (any details about the user, more summary features about the text etc). Also the time of day (morning, afternoon, evening) and day of week when the ticket was reported may be a good predictor (tickets reported on Friday evening tend to take longer), but maybe you intentionally don't want to model this and focus only on the "difficulty" of the ticket irrelevant of reporting time.
You can also try using word bigrams as features with
--ngram T2, which means that 2-grams features will be created for all namespaces beginning with T (only TEXT namespace in my example). Maybe the individual words "big" and "problem" are not strong predictors, but the bigram "big problem" will get a high positive weight (indicating it is a good predictor of long resolution time).You mean resolution time and text of the ticket, am I right? But the resolution time is the (dependent) variable you want to predict, so this does not count as a feature (aka independent variable). Of course, if you know the identity of the user and have enough training examples for each user, you can include the average time of previous tickets (excluding the current one, of course) of the user as a feature as I tried to show in the example.