Support Vector Machine (SVM)
Meaning: SVM is a supervised learning model used for classification and regression by finding the hyperplane that best separates different classes in the feature space.
Requirements for SVM Use:
Condition | Suitable for | Note |
Data Type | both linear and non-linear data. | Normalize data to improve performance |
Data Size | smaller to medium-sized datasets. | 300 to 10,000 samples |
Feature Engineering | Requires careful feature scaling and selection. | – |
Computational Resources | Both GPU and CPU | – |
Supervision/unsupervised learning | Supervised | |
Libraries | scikit-learn, libsvm, LIBLINEAR, TensorFlow
PyTorch |
scikit-learn is Most popular for SVM |
SVM Evaluation Metrics:
- Accuracy: Measures the ratio of correctly predicted instances to the total instances.
- Precision: Ratio of true positive predictions to the total predicted positives.
- Recall: Ratio of true positive predictions to the total actual positives.
- F1 Score: Harmonic mean of precision and recall.
Pros and Cons of SVM:
Pros | Cons | Notes |
High accuracy | Sensitive to feature scaling | Important to preprocess data |
Effective in high-dimensional spaces | Can be computationally intensive for large datasets | Kernel trick can help |
Comparison: SVM vs. Logistic Regression
Aspect | SVM | Logistic Regression |
Data Type | Both linear and non-linear | Linear |
Data Size | Small to Medium | Small to Medium |
Feature Engineering | Requires careful scaling and selection | Requires feature scaling |
Computational Resources | Can be intensive for large datasets | Computationally efficient |
Accuracy | High | Moderate to high |
Handling of Imbalanced Data | May struggle | Can handle with proper techniques |
Use Case | Classification and regression | Primarily classification |
Best for | High-dimensional spaces | Simpler and interpretable models |
SVM FORMULA
Linear SVM: Finding the hyperplane that best separates the classes in the feature space.
Hyperplane Equation:
Examples of SVM Projects
- Spam Email Detection
- Use labeled email datasets to classify emails as spam or not.
- Sentiment Analysis
- Analyze text data to determine the sentiment (positive, negative, neutral).
- Handwritten Digit Recognition
- Classify digits from images using the MNIST dataset.
- Face Detection
- Detect faces in images using labeled face datasets.
- Breast Cancer Prediction
- Predict breast cancer from medical data using the UCI ML Breast Cancer Wisconsin dataset.
- Image Classification
- Classify images into different categories using datasets like CIFAR-10.
- Voice Recognition
- Classify spoken words using labeled audio datasets.
- Credit Card Fraud Detection
- Detect fraudulent transactions using credit card transaction data.
- Stock Market Prediction
- Predict stock prices or trends using historical stock market data.
- Customer Churn Prediction
- Predict whether a customer will churn (leave) based on historical data.
Resources:
- Book: “Pattern Recognition and Machine Learning” by Christopher M. Bishop.
- Course: “Machine Learning” by Stanford University on Coursera.