Thursday, May 30, 2013

Statistics Tidbits

Bayes Rule Refresher

Here's another useful way to state the Bayes' rule for conditional probability (it just expands on what the OP wrote): alt text
Note that here (1) is just the definition, (2) is a simple application of Bayes rule that we already know, and (3), (4) are various ways to rewrite (1) using factorization rules of the type P(ABC) = P(A|BC)P(B|C)P(C). Mentally, I find the following procedure useful:
  1. Pick the set of variables that I want to always fix as conditional (in Eq. (3) it's the event CD and in Eq. (4) it's D),
  2. Write the Bayes' rule as if these events didn't exist (i.e. for Eq. (3) I would just run the Bayes rule for P(A|B)).
  3. Rewrite the result, putting my "always conditioned-on" events behind the conditioning bar for every P(...) expression that I have.
This makes sense intuitively, if you think of conditioning as procedure of renormalizing the sample space in various ways. It's reasonable that you should be able to use Bayes' rule in the same way whether or not the probability space has been renormalized by conditioning.
link
answered 24 Oct '11, 15:35
dnquark's gravatar image


Choosing a statistical test
http://imgur.com/Ctug4Dr

Aikake Information Criterion 
Maximum Likelihood
$AIC = -2*logL(\theta|y) + 2k$
$k = $ total number of parameters

Least Squares
$AIC = n*log(\frac{RSS}{n}) + 2k$
$RSS = SSE = \sum(y - h(x))^2$
$ n = $ number of samples
http://www4.ncsu.edu/~shu3/Presentation/AIC.pdf
http://en.wikipedia.org/wiki/Residual_sum_of_squares


Matrix Form Pointwise Distances
$d_ij = ||x_i - y_j||^2 = ||x_i||^2 + ||y_j||^2 - 2<X_i,y_j>$
means that
$D = X + Y - 2X'Y$
Take the norm of X and Y i.e. X*X' or dot(X,X')

Normalize and calculate covariance
A * A.T (Hermitian!) / sqrt(diag(A.T * A) * diag(A.T * A).T)
http://statinfer.wordpress.com/2011/11/14/efficient-matlab-i-pairwise-distances/

Rolling stats
http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewnes


Current Links for Stats in Python
http://r.789695.n4.nabble.com/Ornstein-Uhlenbeck-td2991060.html

http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

http://blog.yhathq.com/posts/estimating-user-lifetimes-with-pymc.html

http://robjhyndman.com/hyndsight/crossvalidation/