Feature Type | Test Name | Description | Use Case |
Numerical | Pearson's Correlation Coefficient | Determines the strength and direction of linear relationships between numerical variables. High absolute values indicate strong correlations. | Measure linear correlation |
Numerical | Mutual Information | Measures the amount of information gained about one variable by observing another. Useful for feature selection when dealing with numerical data. | Measure dependence between variables |
Numerical | ANOVA | Analyzes the difference in means among multiple groups. Helpful for selecting numerical features with significant differences in group means. | Compare means between multiple groups |
Numerical | t-test | Assesses whether the means of two groups are statistically different. Useful for binary classification tasks. | Compare means between two groups |
Categorical | Chi-Square Test | Determines if two categorical variables are independent or related. Useful for feature selection with categorical data. | Test independence of categorical variables |
Categorical | Fisher's Exact Test | Tests the association between two categorical variables in 2x2 contingency tables. Applicable when sample sizes are small. | Test independence in 2x2 contingency tables |
Categorical | Gini Importance | Measures how often a feature is used to split data in decision tree algorithms. Higher values indicate more important features. | Assess feature importance in decision trees |
Categorical | Information Gain | Calculates the reduction in entropy (uncertainty) achieved by using a feature to split data in decision trees or random forests. | Measure reduction in entropy |
Categorical | Cramér's V | Quantifies the association between two categorical variables in contingency tables. Values range from 0 (no association) to 1 (complete association). | Measure association in contingency tables |
Categorical | Kendall's Tau and Spearman's Rank Correlation | Evaluate the strength and direction of monotonic relationships between ordinal or ranked data. Useful when data is not normally distributed. | Measure rank correlation |
Categorical | Point-Biserial Correlation | Assesses the relationship between a binary target variable and a continuous or ordinal feature. Helps identify features with strong associations. | Measure correlation with binary target |
Comments