Introduction to Sigmoid functions - The Faculty of ...
1 Plan for today Ist part Brief introduction to Biological systems. Historical Background. Deep Belief learning procedure. IInd part Theoretical considerations. Different interpretation. 2 Biological Neurons 3
The Retina Most common in the Preliminary parts of The data processing Retina, ears 4 What is known about the learning process Activation every activity lead to the firing of a certain set of neurons. Hebbian Learning When activities were repeated, the connections between those neurons strengthened. This repetition was what led to the formation of memory.
In 1949 introduced Hebbian Learning: synchronous activation increases the synaptic strength; asynchronous activation decreases the synaptic strength. Habituation: is the psychological process in humans and other organisms in which there is a decrease in psychological and behavioral response to a stimulus after repeated exposure to that stimulus over a duration of time. 5 A spectrum of machine learning tasks Typical Statistics Artificial Intelligence
Low-dimensional data (e.g. less than 100 dimensions) High-dimensional data (e.g. more than 100 dimensions) Lots of noise in the data The noise is not sufficient to obscure the structure in the data if we process it right.
There is a huge amount of structure in the data, but the structure is too complicated to be represented by a simple model. The main problem is figuring out a way to represent the complicated structure so that it can be learned. Link There is not much structure in the
data, and what structure there is, can be represented by a fairly simple model. The main problem is distinguishing true structure from noise. 6 Artificial Neural Networks Artificial Neural Networks have been applied successfully to : speech recognition image analysis adaptive control
I N P U T S Neuron W W W W f(n)
Outputs Activation Function W=Weight 7 Hebbian Learning Hebbian Learning When activities were repeated, the connections between those neurons strengthened. This repetition was what led to the formation of memory. Update
In 1949 introduced Hebbian Learning: synchronous activation increases the synaptic strength; asynchronous activation decreases the synaptic strength. 8 The simplest model- the Perceptron The Perceptron was introduced in 1957 by Frank Rosenblatt. - Perceptron: D0 d
9 The simplest model- the Perceptron Is a linear classifier. Can only perfectly classify a set of linearly separable data. Link How to learn multiple layers? - d incapable of processing the Exclusive Or (XOR) circuit. Link
Second generation neural networks (~1985) Back Propagation Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal outputs hidden layers input vector 11
BP-algorithm 1 .5 0 -5 0 5 .25 -5 0 5
errors Activations 0 Update Weights: Update The error: 12 Back Propagation Advantages
Multi layer Perceptron network can be trained by The back propagation algorithm to perform any mapping between the input and the output. What is wrong with back-propagation? It requires labeled training data. Almost all data is unlabeled. The learning time does not scale well It is very slow in networks with multiple hidden layers. It can get stuck in poor local optima. A temporary digression Vapnik and his co-workers developed a very clever type of perceptron called a Support Vector Machine. In the 1990s, many researchers abandoned neural networks with multiple adaptive hidden layers because Support Vector Machines worked better.
13 Overcoming the limitations of back-propagationRestricted Boltzmann Machines Keep the efficiency and simplicity of using a gradient method for adjusting the weights, but use it for modeling the structure of the sensory input. Adjust the weights to maximize the probability that a generative model would have produced the sensory input. Learn p(image) not p(label | image) 14 Restricted Boltzmann Machines(RBM) RBM is a Multiple Layer Perceptron Network The inference problem: Infer the states of the unobserved variables. The learning problem: Adjust the interactions
between variables to make the network more likely to generate the observed data. RBM is a Graphical model Output layer Hidden layer Input layer 15 graphical models RMF: undirected Each arrow represent mutual dependencies between nodes
hidden Bayesian network or belief network or Boltzmann Machine: directed acyclic hidden data HMM: the simplest Bayesian network Restricted Boltzmann Machine:
symmetrically directed acyclic no intra-layer connections 16 Stochastic binary units (Bernoulli variables) 1 j i 0 0
These have a state of 1 or 0. The probability of turning on is determined by the weighted input from other units (plus a bias) 17 The Energy of a joint configuration (ignoring terms to do with biases) The energy of the current state: The joint probability distribution Probability distribution over the visible vector v: Partition function The derivative of the energy function:
j 18 i Maximum Likelihood method iteration t learning rate Parameters (weights) update: The log-likelihood: average w.r.t the data distribution computed using the sample data x
average w.r.t the model distribution cant generally be computed 19 Hinton's method - Contrastive Divergence Max likelihood method minimizes the Kullback-Leibber divergence: Intuitively: 20 Contrastive Divergence (CD) method In 2002 Hinton proposed a new learning procedure. CD follows approximately the difference of two divergences
(="the gradient"). is the "distance" of the distribution from Practically: run the chain only for a small number of steps (actually one is sufficient) The update formula for the weights become: This greatly reduces both the computation per gradient step and the variance of the estimated gradient. Experiments show good parameter estimation capabilities. 21 A picture of the maximum likelihood learning algorithm for an RBM j
j j vi h j vi h j 1 vi h j 0 i j i i
i the fantasy t=0 t=1 t=2 t= (i.e. the model) One Gibbs Sample (CD): 22 Multi Layer Network
After Gibbs Sampling for Sufficiently long, the network reaches thermal equilibrium: the state of still change, but the probability of finding the system in any particular configuration does not. Adding another layer always improves the variation bound on the log-likelihood, unless the top level RBM is already a perfect model of the data its trained on.
h3 W3 h2 W2 h1 W1 data 23 The network for the 4 squares task 4 labels 4 logistic units
2 input units 24 The network for the 4 squares task 4 labels 4 logistic units 2 input units 25 The network for the 4 squares task 4 labels
4 logistic units 2 input units 26 The network for the 4 squares task 4 labels 4 logistic units 2 input units 27 The network for the 4 squares task
4 labels 4 logistic units 2 input units 28 The network for the 4 squares task 4 labels 4 logistic units 2 input units 29
The network for the 4 squares task 4 labels 4 logistic units 2 input units 30 The network for the 4 squares task 4 labels 4 logistic units 2 input units
31 The network for the 4 squares task 4 labels 4 logistic units 2 input units 32 The network for the 4 squares task 4 labels 4 logistic units
2 input units 33 The network for the 4 squares task 4 labels 4 logistic units 2 input units 34 entirely unsupervised except for the colors
35 Results The Network used to recognize handwritten binary digits from MNIST database: 10 labels output vector 2000 neurons 500 neurons Class:
Non Class: 500 neurons New test images from the digit class that the model was trained on Images from an unfamiliar digit class (the network tries to see every image as a 2) 28x28 pixels 36 Examples of correctly recognized handwritten digits
that the neural network had never seen before Pros: Good generalization capabilities Cons: Only binary values permitted. No Invariance (neither translation nor rotation). 37 How well does it discriminate on MNIST test set with no extra information about geometric distortions? Generative model based on RBMs
1.25% Support Vector Machine (Decoste et. al.) 1.4% Backprop with 1000 hiddens (Platt) ~1.6% Backprop with 500 -->300 hiddens ~1.6% K-Nearest Neighbor ~ 3.3% 38 A non-linear generative model for human motion CMU Graphics Lab Motion Capture Database Sampled motion from video (30 Hz). Each frame is a Vector 1x60 of the skeleton
Parameters (3D joint angles). The data does not need to be heavily preprocessed or dimensionality reduced. 39 Conditional RBM (cRBM) t Can model temporal dependences by treating the visible variables in the past as an additional biases. Add two types of connections: from the past n frames of visible to the current visible. from the past n frames of visible to the current hidden.
j i Given the past n frames, the hidden units at time t are cond. independent we can still use the CD for training cRBMs t-2 t-1 t 40 41 THANK YOU
Structured input Much easier to learn!!! Independent input Back (3) 43 The Perceptron is a linear classifier 1 0 .9 9 .0
1 Back (3) 44 A B OR(A,B) A B XOR(A,B) 0 0 0 0 0 0 0 1 1 0 1 1
1 0 1 1 0 1 1 1 1 1 1 1 0 x1 0 x0 1
A B AND(A,B) 0 0 0 0 1 0 1 0 0 1 1 1 A B NAND(A,B) 0 0 1 0 1 1 1 0 1 1 1 0 1 x1 0 x0
In a 2,000-calorie-a-day diet, that's just 200 calories -- or eight heaping teaspoons of table sugar at 25 calories each. This is why most nutrition experts recommend limiting added sugars (excluding fruit and milk) to 40 grams per day or...
Dr. Tammy Summerville. Director, Magnet Programs. ... Serve as a guest artist for dance programs at AAA and Lee High School. Promote collaboration with performing arts teachers to produce high quality, high interest, entertaining musical performances (AAA and Lee)
Chapter 19 Visual Summary Early Indian Civilizations Main Idea 1: Located on the Indus River, the Harappan civilization also had contact with people far from India. Archaeologists think that the Harappan civilization thrived between 2300 and 1700 BC. The Harappan...
Energy is needed to break chemical bonds Energy is given out when bonds are made ∆H is the difference between the energy needed to break the bonds in the reactants, and the energy given out when new bonds are made...
Check here paper Reword conclusion TK Management Steps at Sensors buffer Receive a TK (way ahead of its use) Verify authenticity Buffer TKk if correct Recover missing TK from later TK with help of hash function Rekey after half the...
State Rail Planning Best Practices Volume 2. Summarizes state rail planning activity since the publication of Vol 1 (2009) Objective: To serve as a resource to inform the development of future state rail plans.
Ready to download the document? Go ahead and hit continue!