1. What is Machine Learning?
Two definitions of Machine Learning are offered.
Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.In general, any machine learning problem can be assigned to one of two broad classifications: Supervised learning and Unsupervised learning.
Except from supervised learning and unsupervised, there are also Reinforcement learning, recommender systems.
用机器模拟人脑
- 人可以描述问题,但不知道如何显式地解决问题
- 让机器得到输入和输出,寻找解决途径
- 计算机科学,可以应用在工业和基础科学上。{自动直升机,手写识别,自然语言,机器视觉,推荐算法}
- 需要足够多的数据
不同类型的学习算法
- 作为 Tools/methods, 选择哪一种方法要看得到的数据情况(离散的数据?连续的数据)和要解决的问题(预测?分类?)
- 监督学习
- 无监督学习
- 强化学习
选择的问题
- 需要理解算法本身,我们才能根据具体的实践问题,决定采用哪一种方法(最节省时间和精力)建立系统
- 机器可以有多种方法达到目标,但是哪一种是最优的
分类问题
- Classification: Discrete valued
- Classification(discrete) CF Regression(Continus)
2. What is supervised learning
- most important one in learning paradigm
- an unknown function f that maps an input 𝑥 to an output t: D = {<x, t>}
Goal of SL
- Goal: is to learn a good approximation of 𝑓
- Input variables 𝑥 are usually called features or attributes
- Output variables 𝑡 are also called targets or labels
Tasks
Classification if 𝑡 is discrete
Regression if 𝑡 is continuous
Probability estimation if 𝑡 is a probability
Elements in Supervised Learning
- Representation
- Consist {linear models, instance-based, decision trees, set of rules, graphical models, neural networks, Gaussian Processes, Support vector machines, Model ensembles }
- Evaluation
- {Accuracy, precision and recall, squared error, Likelihood,
Posterior probability, Cost/Utility, Margin, Entropy, KL divergence}
- {Accuracy, precision and recall, squared error, Likelihood,
- Optimization
- { Combinatorial optimisation e.g.: Greedy search
Convex optimisation e.g.: Gradient descent
Constrained optimisation e.g.: Linear programming }
- { Combinatorial optimisation e.g.: Greedy search
Process
- Training Set
- Learning Algorithms
- Function h(hypothesis) {Input: size of a house -> h -> Output:estimated price}
3. Unsupervised Learning
- Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.
- We can derive this structure by clustering the data based on relationships among the variables in the data.
- With unsupervised learning there is no feedback based on the prediction results.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Non-clustering: The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
4. Model and Cost Functions
Linear Regression:
- supervised learning:gives the “right answer” to the
- Regression : predict real-valued output
- Classification: get discrete-valued output {eg:0,1}
Traning Models
- number of training examples
- input variables/ features
- output variables/ target
Workflow
- Training Set ->Learning Algorithms ->
…
5. Linear Regression
6. Linear Classification
7. Model Evaluation, Selection Ensembles
Bias Variance
- The Bias-Variance is a framework to analyze the performance of models
- Definition:
- data
- model
- performance
- Thus we can decompose the expectected square error as:
- Model Variance,
- Case Study: Bias Variance for K-NN
我们希望模型准确,误差尽可能的小。
…….
8. Reference
- Material of prof. Marcello Restelli
- Pattern Recognition and Machine Learning, Bishop [PRML]
- Elements of Statistical Learning, Hastie et al. [ESL]
- Introduction to Statistical Learning, James et al. [ISL]
- The Lack of A Priori Distinctions Between Learning Algorithms, Wolpert, 1996