Machine Learning Guidance
A collection of my Machine Learning and Data Science Projects,
- including both Theoretical and Practical (Applied) ML,
- references (paper, ebook, repo, tool, etc), ranging from beginner to advanced.
Generative vs Discriminative Model
Given the training data set D={(xi;yi)∣i≤N∈Z}, where yi is the corresponding output for the input xi.
Aspect\Model |
Generative |
Discriminative |
Learn obj |
P(x,y) Joint probability |
P(y∣x) Conditional probability |
Formulation |
class prior/conditional P(y), P(x∣y) |
likelihood P(y∣x) |
Result |
not direct (Bayes) P(y∣x) |
direct classification |
Examples |
Naive Bayes, HMM |
Logistic Reg, SVM, DNN |
Reference: Generative and Discriminative Model, Professor Andrew NG
Types of Learning
- Learner L(XI)=h∈H
- Input training data XI, where xi∈R
- Hypothesis hω:X∈Rn→Y, with weights ω.
- mapping attributes vectors X to labels/output Y={y1,...,yn}
- For NN, h(x)=f(ω;x), explicitly parameterized by ω
- For Generative model f:Z→X, Z is the latent variable
Output \ Type |
Unsupervised |
Supervised |
Continuous Y =R |
Clustering & Dim Reduction |
Regression |
|
○ SVD |
○ Linear / Polynomial |
|
○ PCA |
○ Non-Linear Regression |
|
○ K-means |
○ Decision Trees |
|
○ GAN ○ VAE ○ Diffusion |
○ Random Forest |
Discrete Y ={Categories} |
Association / Feature Analysis |
Classification |
|
○ Apriori |
○ Bayesian ○ SVM |
|
○ FP-Growth |
○ Logistic Regression ○ Perceptron |
|
○ HMM |
○ kNN / Trees |
And more,
Aspect \ Type |
Semi-Supervised |
Reinforcement |
Learn from |
Labels available |
Rewards |
Methods |
pseudo-labels |
○ Q learning |
|
iteratively |
○ Markov Decision Process |
Reinforcement Learning
- In a state each timestamp
- when an action is performed, we move to a new state and receive a reward
- No knowledge in advance of how actions affect either the new state or the reward
Goal
- Value-based V(s)
- the agent is expecting a long-term return of the current states under policy π
- Policy-based
- the action performed in every state helps you to gain maximum reward in the future
- Deterministic: For any state, the same action is produced by the policy π
- Stochastic: Every action has a certain probability
- Model-based
- create a virtual model for each environment
- the agent learns to perform in that specific environment
Geometric Deep Learning
The bigger picture of learning with invariances and symmetries:
Domain |
Structure |
Symmetry / Bias |
Example |
Images |
2D grid |
Translation equivariant |
CNNs |
Sequences |
1D sequence |
Order-aware |
RNNs, Transformers |
Sets / Point Clouds |
Unordered set |
Permutation invariant |
Deep Sets, PointNet |
Graphs |
Nodes + edges |
Permutation equivariant |
GNNs, Graph Isomorphism Networks |
Manifolds / Spheres |
2D surface embedded in 3D |
Rotation equivariant |
Spherical CNNs |
Feature Engineering
- Feature Selection
- After fitting, plot Residuals vs any Predictor Variable
- Linearly-dependent feature vectors
- Imputation
- Handling Outliers
- Removal, Replacing values, Capping, Discretization
- Encoding
- Integer Encoding
- One-Hot Encoding (enum -> binary)
- Scaling
- Normalization, min-max/ 0-1
- Standardization
Inference
Aspect |
Bayesianism |
Frequentism |
Interpretation of Probability |
A measure of belief or uncertainty |
The limit of relative frequencies in repeated experiments |
Methods |
Prior knowledge and updates beliefs (Bayes') to obtain posterior distributions |
Hypothesis testing, MLE, confidence intervals |
Treatment of Uncertainty Random Variables |
Parameters |
Data set |
Handling of Data |
useful when prior information is available or when the focus is on prediction intervals. |
often requires larger sample sizes |
Flexibility |
flexible model, allow updating models for new data |
more rigid, on specific statistical methods |
Computational Complexity |
can be intensive computation, for models with high-dim parameter spaces |
simpler computation and may be more straightforward in practice |
Empiricism
Applied ML Best Practice
DNN Troubleshooting
Basic
- Initial test set + a single metric to improve
- Target performance
- Human-level performance, published results, previous baselines, etc.
Intuition
- Results can be sensitive to small changes in hyperparameter and dataset makeup.
Tune hyperparameter
|
Start simple -> Implement & Debug -> Evaluate -> ?
|
Improve model & Data
- Start simple: simplest model & data possible (LeNet on a subset of the data)
- Implement & Debug: Once model runs, overfit a single batch & reproduce a know result
- Evaluate: Apply the bias-variance decomposition
- Tuning: Coarse-to-fine random search
- Improve model/data
- Make model bigger if underfit
- Add data or regularize if overfit
Troubleshooting
OpenAI Talk
DNN improvements
Improvement direction
My Projects
Machine Learning Real World Data, University of Cambridge IA
- Text Classification; Naive Bayes; Cross-Validation,
- HMM; Social Network
Theoretical Machine Learning with Problems Sets, Stanford CS229
- Linear classifiers (Logistic Regression, GDA), SVM, etc
- Stochastic Gradient Descent; L1 L2 Regularization
Deep Learning for Computer Vision with Problems Sets, Stanford CS231n
- Image Classification + Localization (x,y,w,h)
[ Supervised Learning, Discrete label + Regression ]
- kNN; Softmax; classifier SVM classifier; CNN
- Object Detection
- Semantic / Instance Segmentation
- Image Captioning
- RNN, Attention, Transformer
- Positional Encoding
- Video understanding
- Generative model (GAN, VAE)
- Self-Supervised Learning
See more: Visual Computing
More
- Data Science | Uni. of Cambridge, Undergraduate course.
- AI | Uni of Cambridge, IB
- Search, Game, CSPs, Knowledge representation and Reasoning, Planning, NN.
- Machine Learning and Bayesian Inference | Uni of Cambridge, Undergraduate course.
- Linear classifiers (SVM), Unsupervised learning (K-means,EM), Bayesian networks
- Geometric Deep Learning | Cambridge, Oxford Master's courses.
Reference
📝OpenAI cookbook
Generative Pre-trained Transformer (GPT) from Scratch (Andrej Karpathy)
Paper
Library
- Numpy, matplotlib, pandas, TensorFlow
- Caffe, Keras
- XGBoost, gensim