Several factors matter behind selecting right ML model for a particular problem:
- Size of the training data
- if the training data is smaller or if the dataset has a fewer number of observations and a higher number of features like genetics or textual data, choose algorithms with high bias/low variance like Linear regression, Naïve Bayes, Linear SVM.
- if the training data is sufficiently large and the number of observations is higher as compared to the number of features, one can go for low bias/high variance algorithms like KNN, Decision trees, kernel SVM.
- Speed or Training time
- algorithms like Naïve Bayes, Linear and Logistic regression are easy to implement and quick to run. Algorithms like SVM which involve tuning of parameters, Neural networks with high convergence time, random forests need a lot of time to train the data.
- Linearity
- Data is not always linear, so we require other algorithms which can handle high dimensional and complex data structures. Examples include kernel SVM, random forest, neural nets.
- Best way to find out the linearity is to either fit a linear line or run a logistic regression or SVM and check for residual errors. Higher error means the data is not linear and would need complex algorithms to fit.
- Number of features
Ref: https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Comments