Machine Learning Guidance

A collection of my Machine Learning and Data Science Projects, including both Theoretical and Practical (Applied) ML.

In addition, there are also references (paper, ebook, repo, tool, etc) that's interesting and helpful attached, ranging from beginner to advanced.

Methods

Data Modelling & Prediction

Generative vs Discriminative Model

Given the training data set D={(xi;yi)iNZ}D = \{ ( x_i ; y_i ) | i ≤ N ∈ Z \}, where yiy_i is the corresponding output for the input xix_i.

Aspect\Model Generative Discriminative
Learn obj P(x,y)P(x,y)
Joint probability
P(yx)P(y\vert x)
Conditional probability
Formulation class prior/conditional
P(y)P(y), P(xy)P(x\vert y)
likelihood
P(yx)P(y\vert x)
Result not direct (Bayes)
P(yx)P(y\vert x)
direct classification
Examples Naive Bayes, HMM Logistic Reg, SVM,
DNN

Reference: Generative and Discriminative Model, Professor Andrew NG

Types of Learning

Output \ Type Unsupervised Supervised
Continuous YY
=R=\R
Clustering &
Dim Reduction
Regression
○ SVD ○ Linear / Polynomial
○ PCA ○ Non-Linear Regression
○ K-means ○ Decision Trees
○ GAN
○ VAE ○ Diffusion
○ Random Forest
Discrete YY
={Categories}=\{Categories\}
Association /
Feature Analysis
Classification
○ Apriori ○ Bayesian       ○ SVM
○ FP-Growth ○ Logistic Regression
○ Perceptron
○ HMM ○ kNN / Trees

And more,

Aspect \ Type Semi-Supervised Reinforcement
Learn from Labels available Rewards
Methods pseudo-labels ○ Q learning
iteratively ○ Markov Decision Process

Reinforcement Learning

Goal

Feature Engineering

Inference

Aspect Bayesianism Frequentism
Interpretation of Probability A measure of belief or uncertainty The limit of relative frequencies
in repeated experiments
Methods Prior knowledge and updates beliefs (Bayes')
to obtain posterior distributions
Hypothesis testing, MLE, confidence intervals
Treatment of Uncertainty
Random Variables
Parameters Data set
Handling of Data useful when prior information is available or
when the focus is on prediction intervals.
often requires larger sample sizes
Flexibility flexible model,
allow updating models for new data
more rigid, on specific statistical methods
Computational Complexity can be intensive computation,
for models with high-dim parameter spaces
simpler computation and
may be more straightforward in practice

Empiricism

Applied ML Best Practice

DNN Troubleshooting

Basic

Intuition

                          Tune hyperparameter
                                  |
Start simple -> Implement & Debug -> Evaluate -> ?
                                  |
                         Improve model & Data

Troubleshooting

OpenAI Talk

DNN improvements

Improvement direction

My Projects

Foundation of Machine Learning (naive NLP, Network)

Machine Learning Real World Data, University of Cambridge IA

MLRD-Cambridge_IA

Theoretical Machine Learning

Theoretical Machine Learning with Problems Sets, Stanford CS229

ML-Stanford_CS229

Computer Vision

Theoretical Computer Vision with Problems Sets, Stanford CS231n

DL-for-CV-Stanford_CS231n

See more: Visual Computing

More

Reference

OpenAI cookbook

📝OpenAI cookbook

Generative Pre-trained Transformer (GPT) from Scratch (Andrej Karpathy)

Paper

Library Used

Numpy, matplotlib, pandas, TensorFlow

Caffe, Keras

XGBoost, gensim