Recently, I've been interested in Data analysis.
So I researched about how to do machine-learning project and do it by myself.
I learned that scaling is important in handling features.
So I scaled every features while using Tree model like Decision Tree or LightGBM.
Then, the result when I scaled had worse result.
I searched on the Internet, but all I earned is that Tree and Ensemble algorithm are not sensitive to variance of the data.
I also bought a book "Hands-on Machine-learning" by O'Relly But I couldn't get enough explanation.
Can I get more detailed explanation for this?
Why Does Tree and Ensemble based Algorithm don't need feature scaling?
650 Views Asked by yoon-seul At
2
There are 2 best solutions below
1
MikkiPython
On
Do not confuse trees and ensembles (which may be consist from models, that need to be scaled). Trees do not need to scale features, because at each node, the entire set of observations is divided by the value of one of the features: relatively speaking, to the left everything is less than a certain value, and to the right - more. What difference then, what scale is chosen?
Related Questions in PYTHON
- new thread blocks main thread
- Extracting viewCount & SubscriberCount from YouTube API V3 for a given channel, where channelID does not equal userID
- Display images on Django Template Site
- Difference between list() and dict() with generators
- How can I serialize a numpy array while preserving matrix dimensions?
- Protractor did not run properly when using browser.wait, msg: "Wait timed out after XXXms"
- Why is my program adding int as string (4+7 = 47)?
- store numpy array in mysql
- how to omit the less frequent words from a dictionary in python?
- Update a text file with ( new words+ \n ) after the words is appended into a list
- python how to write list of lists to file
- Removing URL features from tokens in NLTK
- Optimizing for Social Leaderboards
- Python : Get size of string in bytes
- What is the code of the sorted function?
Related Questions in DATA-ANALYSIS
- R sensitivity package (fast99)
- Difference between weka tool's correlation coefficient and scikit learn's coefficient of determination score
- What are the approaches to the Big-Data problems?
- How to get a number of probability distributions "averaged"?
- Incorrect colouring of Surface plot
- Encoding issues while reading/importing CSV file in Python3 Pandas
- Counting the number of join symptoms
- QlikView Resources
- Point Classification in a set of Bounding Boxes
- How to use multiple data to train a linear regression model in R
- look ahead time analysis in R (data mining algorithm)
- how long does it take to find maximum element in descending sorted array?
- compare previous and present hash key values from a Pandas dataFrame
- "Does Not Exist" (DNE) property filter for Keen IO analysis calls
- How do I choose which parameters to estimate in an ARMA model in python statsmodel?
Related Questions in DECISION-TREE
- Kaggle Titanic: Machine Learning From Disaster Decision Tree for Cabin Prediction
- Complex conditional filter design
- training and testing image data with neural network tool in MATLAB
- What is the equivalent to rpart.plot in Python? I want to visualize the results of my random forest
- Can I manually create an RWeka decision (Recursive Partitioning) tree?
- What is causing this StackOverflowError?
- Why do I get this error below while using the Cubist package in R?
- create decision tree from data
- Scaling plots in the terminal nodes of ctree graph
- Saving decision tree's output into a text file
- Implement all possible questions on a node in Decision Tree in Sklearn?
- Decision Tree nltk
- How to implement decision trees in boosting
- scikit learn decision tree export graphviz - wrong class names in the decision tree
- Random forests performed under expectation
Related Questions in ENSEMBLE-LEARNING
- How to change the function a random forest uses to make decisions from individual trees?
- How to predict labels for new data (test set) by the PartitionedEnsemble model in Matlab?
- scipy.minimize 'SLSQP' appears to return sub optimal weights values
- How does one interpret the random forest classifier from sci-kit learn?
- How to ensemble SVM and Logistic Regression with python
- Ensemble learning with Encog
- sklearn.ensemble.AdaBoostClassifier cannot accecpt SVM as base_estimator?
- Is it possible to use different classifiers in sklearn.ensemble?
- What is a weak learner?
- R caretEnsemble CV length incorrect
- How to merge keras sequential models with same input?
- Custom learner function for Adaboost
- What is the difference between ensembling and averaging models?
- "Stacking" (Ensemble) Models in R - Zero Training Error?
- Stacking classifier: Using custom classifier returns error
Related Questions in FEATURE-SCALING
- How to implement PySpark StandardScaler on subset of columns?
- Feature rescaling for k-means clustering
- Should we always first perform feature normalization and then the feature reduction?
- Feature Scaling for Time Series Forecasting
- Data normalization and rescaling value in Python
- Does feature scaling need to be done separately for independent variables?
- Does it makes sense to scale features by only one label before using logistic regression?
- How to calculate the number of features based on image resolution in neural networks(non-linear hypothesis)?
- Feature scaling in an incremental analysis
- Machine Learning: Combining Binary Encoder and RobustScaler
- Normalize data before removing low variance, makes errors
- Why Does Tree and Ensemble based Algorithm don't need feature scaling?
- Unable to inverse_transform the value of feature because of different dimensionality
- Is there any package available for scaling to unit length in R?
- Data leakage when feature scaling with K-fold cross validation in R
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Though I don't know the exact notations and equations, the answer has to do with the Big O Notation for the algorithms.
Big O notation is a way of expressing the theoretical worse time for an algorithm to complete over extremely large data sets. For example, a simple loop that goes over every item in a one dimensional array of size n has a O(n) run time - which is to say that it will always run at the proportional time per size of the array no matter what.
Say you have a 2 dimensional array of X,Y coords and you are going to loop across every potential combination of x/y locations, where x is size n and y is size m, your Big O would be O(mn)
and so on. Big O is used to compare the relative speed of different algorithms in abstraction, so that you can try to determine which one is better to use.
If you grab O(n) over the different potential sizes of n, you end up with a straight 45 degree line on your graph.
As you get into more complex algorithms you can end up with O(n^2) or O(log n) or even more complex. -- generally though most algorithms fall into either O(n), O(n^(some exponent)), O(log n) or O(sqrt(n)) - there are obviously others but generally most fall into this with some form of co-efficient in front or after that modifies where they are on the graph. If you graph each one of those curves you'll see which ones are better for extremely large data sets very quickly
It would entirely depend on how well your algorithm is coded, but it might look something like this: (don't trust me on this math, i tried to start doing it and then just googled it.)
Fitting a decision tree of depth ‘m’:
and a Log n graph ... well pretty much doesn't change at all even with sufficiently large numbers of n, does it?
so it doesn't matter how big your data set is, these algorithms are very efficient in what they do, but also do not scale because of the nature of a log curve on a graph (the worst increase in performance for +1 n is at the very beginning, then it levels off with only extremely minor increases to time with more and more n)