Simple Summary Statistics


Indicate the general size of the value

  • Mean: extreme values sensitive
  • Median
  • Mode


How different are the values in the data set from each other

  • Range: difference between the largest and smallest values in the data set (ignore most of the data)
  • Variance (mean squared deviation)
  • Standard deviation: square root of variance


  • Right Skewed
  • Left Skewed


  • Deciles: divide into tenths
  • Percentiles: divide into 100ths

Collecting Good Data

Incomplete Data:

Discard or insert substitute values.

Incorrect Data:

From reading instruments or recording values

Error propagation


Observational versus Experimental Data:

  • Observational: cannot interfere or intervene in the process of capturing data
  • Experimental: manipulate the objects in some way(effective at sorting out what causes what)


Experimental Design:

double blind, factorial

Survey Sampling:


  • Law of large numbers
  • Central limit theorem


The Essence of Chance

World is full of uncertainty. law of large numbers: proportion get closer and closer to a particular value

Understand Probability

degree of belief:

  • 1:certain
  • 0:impossible
  • 0~1:probability of happening

subjective/personal probability: depends on who is assessing the probability

frequentist interpretation:frequencies/counts => probability

classical approach: all events are composed of a collection of equally likely elementary events

Law of Chance


joint probability: two events will both occur

conditional probability: an event will happen if another one has occurred

Bayes's theorem: relate 2 conditional probabilities


P(B|A)=P(A|B)*P(B) / ( P(A|B)*P(B)+P(A|~B)*P(~B) )

Random variables and Their Distributions

Sample: subset of the complete 'population' of values

Random Variables: e.g. outcome of a throw of a die

Describe Distribution:

  • cumulative probability distribution
  • probability density curve

Discrete Random Variables:

  • Bernouli distribution: toss a coin
  • binomial distribution: toss a coin 100 times
  • Poisson distribution: emails arriving at my computer(no upper limit)

Uniform Distribution: a random variable can take values only within some finite interval and it's equally likely that will take any of the values in that interval(postman arrives between 10am and 11am in a totally unpredictable way)

Exponential Distribution: lifetimes of glass vases

Normal/Gaussian Distribution: bell shaped

Central Limit Theorem: the larger samples we take, the better estimate we make.




comments powered by Disqus