AI / ML
ML Basics
Supervised, unsupervised, neural networks, and evaluation metrics — your machine learning starter reference.
01ML Fundamentals▼
Supervised
Labeled data. Input+output pairs. Learns to predict.
Unsupervised
No labels. Finds patterns/clusters.
Reinforcement
Agent learns from rewards/penalties.
Overfitting
Model memorizes training data, fails on new data. Fix: more data, regularization, dropout.
Underfitting
Model too simple, misses patterns. Fix: more features, complex model.
Bias-Variance
Bias=underfitting. Variance=overfitting. Goal: balance both.
💡
Bias-Variance Tradeoff: simple models have high bias, complex models have high variance. Cross-validation helps find the sweet spot.
02Supervised Algorithms▼
| Algorithm | Type | Best for |
|---|---|---|
| Linear Regression | Regression | Predicting continuous values |
| Logistic Regression | Classification | Binary classification |
| Decision Tree | Both | Interpretable models |
| Random Forest | Both | High accuracy, less overfitting |
| SVM | Classification | High-dimensional data |
| KNN | Both | Small datasets, simple |
| Neural Network | Both | Complex patterns, large data |
| Gradient Boosting | Both | Competitions, tabular data |
MLModel evaluation
# Regression metrics MAE = mean(|actual-predicted|) MSE = mean((actual-predicted)^2) RMSE = sqrt(MSE) R^2 = 1 - SS_res/SS_tot # Classification metrics Accuracy = correct/total Precision = TP/(TP+FP) Recall = TP/(TP+FN) F1 = 2*(Precision*Recall)/(Precision+Recall)
03Neural Networks▼
Perceptron
Single neuron. Input*weights+bias -> activation function.
Activation functions
ReLU: max(0,x). Sigmoid: 1/(1+e^-x). Tanh: (e^x-e^-x)/(e^x+e^-x).
Forward pass
Data flows input->hidden->output.
Backpropagation
Error flows backward. Gradients calculated via chain rule.
Gradient descent
Weights updated: w = w - learning_rate * gradient.
Epochs
One full pass through training data.
MLSimple neural net concept
Input layer -> Hidden layers -> Output layer Each neuron: z = w1*x1 + w2*x2 + b Activation: a = ReLU(z) = max(0, z) Loss function: measures prediction error Optimizer: Adam, SGD adjust weights to minimize loss
04Model Evaluation▼
MLTrain/Test split
# Split data
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)
# Cross validation (better)
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
# Confusion matrix
Predicted + Predicted -
Actual + TP FN
Actual - FP TN❓ Quiz
What does overfitting mean?
Overfitting: model learns training data too well including noise, so it fails to generalize to unseen data. Fix: regularization, more data, simpler model.
05Feature Engineering▼
Normalization
Scale features to 0-1 range. MinMax scaler.
Standardization
Mean=0, std=1. Z-score scaling.
One-hot encoding
Convert categorical to binary columns.
Feature selection
Remove irrelevant features. Reduces overfitting.
PCA
Dimensionality reduction. Keep most important variance.
Missing values
Drop, mean/median impute, or predict.
MLPreprocessing
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X_train) # One-hot encoding pd.get_dummies(df, columns=["category"])