What are different statistical tests used for Feature selection in Machine Learning?

What are different statistical tests used for Feature selection in Machine Learning?

Feature Type	Test Name	Description	Use Case
Numerical	Pearson's Correlation Coefficient	Determines the strength and direction of linear relationships between numerical variables. High absolute values indicate strong correlations.	Measure linear correlation
Numerical	Mutual Information	Measures the amount of information gained about one variable by observing another. Useful for feature selection when dealing with numerical data.	Measure dependence between variables
Numerical	ANOVA	Analyzes the difference in means among multiple groups. Helpful for selecting numerical features with significant differences in group means.	Compare means between multiple groups
Numerical	t-test	Assesses whether the means of two groups are statistically different. Useful for binary classification tasks.	Compare means between two groups
Categorical	Chi-Square Test	Determines if two categorical variables are independent or related. Useful for feature selection with categorical data.	Test independence of categorical variables
Categorical	Fisher's Exact Test	Tests the association between two categorical variables in 2x2 contingency tables. Applicable when sample sizes are small.	Test independence in 2x2 contingency tables
Categorical	Gini Importance	Measures how often a feature is used to split data in decision tree algorithms. Higher values indicate more important features.	Assess feature importance in decision trees
Categorical	Information Gain	Calculates the reduction in entropy (uncertainty) achieved by using a feature to split data in decision trees or random forests.	Measure reduction in entropy
Categorical	Cramér's V	Quantifies the association between two categorical variables in contingency tables. Values range from 0 (no association) to 1 (complete association).	Measure association in contingency tables
Categorical	Kendall's Tau and Spearman's Rank Correlation	Evaluate the strength and direction of monotonic relationships between ordinal or ranked data. Useful when data is not normally distributed.	Measure rank correlation
Categorical	Point-Biserial Correlation	Assesses the relationship between a binary target variable and a continuous or ordinal feature. Helps identify features with strong associations.	Measure correlation with binary target

Comments