Empirical Stuff

How to get started – thanks to the Unviersity of Princeton DATA ANALYSIS NOTES: LINKS AND GENERAL GUIDELINES

(Scroll down if 101 is simply too easy for you and want to get the datasets I have been working)

A very general guideline

Once you define the question and, hopefully, have a clear idea of what you want to know you can proceed to apply the statistical technique suitable for your data.

At first you need to answer two questions:

  1. What is your dependent variable?
  2. What is(are) your independent variable(s)?

There is no a straight answer on what kind of technique you need to use for your data. Two factors play a role:

  1. Your theory
  2. Your data
  3. Your knowledge on the topic

For practical purposes the statistical technique you choose will depend mostly on the type of your dependent variable. See the following site for types of analysis using different types of dependent variables http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm.

In general, if your dependent variable is:

  1. Dichotomous (0, 1/ male, female) Use logit or probit. Logit is the most common application.
  2. Ordered (1, 2, 3, 4/ bad, not so bad, not so good, good) going from low to high, negative to positive use ordered logit (or probit)
  3. Different categories (1, 2,3/democrat, independent, republican) use multinomial logit.
  4. Continuous (1, 1.01, 1.02,) Regression (simple, multivariate).

Other things to consider:

  1. Is your data organized by groups or entites (panel data, cross sectional)
  2. What about time (years, months, days, quarters, etc.)

If you have one or both of the previous one you may need to control for variables that vary across time but not entities (like public policies) or variables that vary across entities but not time (like cultural factors).

Once you define your dependent and independent variables you can start exploring the relationships between them. For this you can do the following:

  1. Create a correlation matrix for all variables. This will help you to have an idea of the nature of the relationship between not only the dependent and independent variables but also among the later ones (in Stata type spearman [list of variables], star(0.05), or pwcorr [list of variables], sig. Type help spearman or help pwcorr for more details.)
  2. Create a scatter plot between the dependent variable and each of the independent variables (in Stata type scatter [dep. var] [indep. var], type help scatter for more options or visit the DSS help or training pages for examples: http://dss.princeton.edu/training/ or the general DSS help pages)