DataLeaper

Idea Behind Bayesian Learning

CREATING A PROBABILITY DISTRIBUTION FOR THE PARAMETERS FOR THE MODEL

We always start with Bayes formula for Bayesian Learning. In my oppinion this is not intuitive at all because the actual crux of Bayesian learning comes from this statement

With frequentist approach we are updating parameters for model ( Lets call that model of data ) that models data. Bayesian learning goes further. Instead of updating directly parameters of data modeling function, we update the distribution (Lets call that distribution of parameters) that models the parameters for the function modeling the data. Difference is that each parameter for function that models the data now gets it’s own probability distribution. Each instance of parameter now gets its own probability.

Untitled

The distribution of parameters is created and updated by iterative update process as we get more and more data. In crux: the more likely to see certain data with parameter $\theta$, the more likely we are going to make that $\theta$

$P(\theta|data)=P(\theta)P(data|\theta)$

The simplest way to think of this is to imagine you have a some arbitrary function F() with 10 possible input parameters $\theta_1,\theta_2,..\theta_{10}$ and you have some data that this F() tries to model. ( Don’t think about what this F() is but think that you can get from it $P(data|\theta_i)$: the likelihood of seeing data assuming parameter $\theta_i$ )

You will now try out to get the likelihood of seeing data with each of these parameters


data = {..}
thetas = [..] # Lets assume 10 different thetas
theta_distribution = [] # This is what P(theta) means. For each theta we have a probability

for idx, theta in enumerate(thetas):
    theta_distribution[idx] = \
        theta_distribution[idx] * get_data_likelihood_with_theta(theta,data)

Todo: insert some examples here

Having made a an example for the updating the distribution of parameters based on the data we have seen, we can look now at the Bayes Theorem

$P(A|B) = \frac{P(B|A_j)P(A_j)}{\sum P(B|A_i)P(A_i)}=\frac{P(B|A_j)P(A_j)}{P(B)}=\frac{P(AB)}{P(B)}$

  1. What is the probability of B? Assuming $A_j$ is part of full system. ($\sum P(A_j)=1$ (No common area between A-s)) we can get P(B) by adding up all probabilities of B with all possible A-s
  2. What is the probability of P($A_j$) happening and What is the probability of P(B| $A_j$) happening?
    1. If we multiply these things together we get probability of P(AB)
  3. $\frac{P(AB)}{P(B)} = P(A|B)$, this is just formula for conditional probability

Lets formulate this now with data and $\theta$

EXAMPLE 1

You have an experiment 10 coin flips where 8 flips are successful.

The coin flips experiments are modeled with Binomial distribution.

EXAMPLE 2

With bayesian approach we have distributions for both t and b. Those distributions can be whatever (normal,beta,poisson, or we can have probability for each t and b). We are updating the parameters for those distributions. E.g the parameters for the distributions that model parameters that model data.

In essence we have

SIMPLE CASE

The simplest example I found in the internet was this link (Hint, its not that simple but it allowed me to get the idea of 1 method hoe bayesian learning could be used.)

It explains doing regression with Bayesian Learning.

NOW WE CAN TALK ABOUT BAYES THEOREM

The first thing you need to understand is conditional probability

$P(A|B)=\frac{P(A \cap B)}{B}$

$P(B|A)=\frac{P(A \cap B)}{A}$

BAYES THEOREM

Also in medicine there is this cancer example. Question is, if we have received a positive cancer result, what is the probability of cancer? We know that the test is able to predict cancer with 0.9 probability, when cancer really occurs. We also know that the test can predict cancer with 0.9 probability when no cancer occurs.

PS

POSITIVE NEGATIVE SUM
CANCER P(positive | cancer)P(cancer)=0.9*0.00001 P(negative | cancer)P(cancer)=0.1*0.00001 P(cancer)=0.00001
NO CANCER P(positive | no cancer)P(no cancer)=0.1*0.99999 P(negative | no cancer)P(no cancer)=0.9*0.99999 P(no cancer)=0.99999
SUM P(positive)=0.90.00001+0.10.99999 P(negative)=0.10.00001+0.90.99999 1

REFERENCE

LK 37-42 Kalev Pärna Tõenäosusteooria algkursuses