I'm interested in using logistic regression to classify opera singing (n=100 audiofiles) from non opera singing (n=300 audiofiles) (just an example). I have multiple features that I can use (i.e. MFCC, pitch, signal energy). I would like to use PCA to reduce dimensionality, which will drop the 'least important variables'. My question is, should I do my PCA on my whole dataset (but opera and non-opera)? Because if I do, wouldn't this drop the 'least important variables' for both opera and non-opera rather than drop the variables least important for identifying opera?
How to use principal component analysis for logistic regression
1.5k Views Asked by sos.cott At
1
There are 1 best solutions below
Related Questions in LOGISTIC-REGRESSION
- Building a logistic trend surface in R
- Error when making a parallel, binary (logistic) regression for a Sparse matrix with glmnet
- 'Unexpected result from lpSolveAPI for primal test' error for Logistic Regression
- logistic regression with caret and glmnet in R
- Logistic regression on huge dataset
- Plot ROC curve of predictive model after internal validation with bootstrap method?
- How to adjust scaled scikit-learn Logicistic Regression coeffs to score a non-scaled dataset?
- Can scikit-learn's LogisticRegression() automatically normalize input data to z-scores?
- Extracting predictor names when one predictor present in regression R
- Find selected features by RandomizedLogisticRegression
- Instrumental variable in logistics regression in R (ivreg in AER)
- Logistic Regression in python using Logit() and fit()
- Pipeline giving different answer in sklearn python
- Python statsmodels logit wald test input
- Wouldn't setting the first derivative of Cost function J to 0 gives the exact Theta values that minimize the cost?
Related Questions in PCA
- How to choose good SURF feature keypoints?
- Spectral clustering with Similarity matrix constructed by jaccard coefficient
- Export PCA components in r
- How to export PCA from Weka
- Bad Orientation of Principal Axis of a Point Cloud
- Eigenfaces in OpenCV using C++
- How to do distributed Principal Components Analysis + Kmeans using Apache Spark?
- Significance of 99% of variance covered by the first component in PCA
- How to get the number of components needed in PCA with all extreme variance?
- Insufficient memory opencv
- Extract relevant attributes from postal addresses data in order to do PCA on those Data (using R)
- Unable to plot PCA data in R. Are scores defined by a given object/name to plot them specifically?
- Obtain unstandardized factor scores from factor analysis in R
- Why does classifier accuracy drop after PCA, even though 99% of the total variance is covered?
- R Biplot with clusters as colors
Related Questions in PRINCIPAL
- Why to create a custom principal interface when you want to create a Custom Principal in Asp.net MVC?
- Get Custom Property Of User Principal
- Authentication / PrincipalPermission not work
- Spring 3 MVC Controller integration test - inject Principal into method
- Kerberos python lib for add/edit/delete a principal?
- WCF service authorization manager setting the Thread.CurrentPrincipal
- What "domain" should I specify in JNDI login to an Active Directory Server?
- What is the difference between Session and Principal in Ktor?
- Can I access the current user in session statically in Quarkus?
- How to create and persist a custom user object before endpoint execution in webflux?
- Spring Security which authorities? SecurityContextHolder's authentication authorities vs Principal's authorities
- principal components analysis
- the Principal Component Analysis of weka returns wrong instances
- How to use principal component analysis for logistic regression
- Retaining principal inside queued background work item
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Short answer:
You must do your PCA on the whole data.
Not so short answer:
Long answer:
PCA does not remove the 'least important variables'. PCA is a dimensionality reduction algorithm that is going to find linear combinations of the input features that encode the same amount of information (inertia) using fewer coordinates.
So if your data has
N_Featsyou can think of PCA as a matrix of dimensionN_Feats x Projection_sizewhereProjection_size < N_Featsthat you multiply to your data to get a projection of lower dimensionThis implies that you need all your features(variables) to compute your projection.
If you think in terms of projections, it doesn't make sense to have 2 different projections for each class. Why? There are 2 reasons: