Xe Currency Converter. These are the highest points the exchange rate has been at in the last 30 and day periods. These are the lowest points the exchange rate has been at in the last 30 and day periods. These are the average exchange rates of these two currencies for the last 30 and 90 days.

- robots for binary options download
- best islamic forex broker 2012 movie
- why investing in fast food may be a good thing
- informationless investing in gold
- sideways trend lines forex
- forex trading no leverage
- wearing sweater vests
- forex news source
- ars usd investing in the stock
- gold forex ticker
- pa oscillator forex indicator download

Forums FAQ. Search in titles only. Posts Latest Activity. Page of 1. Filtered by:. Sarah Soendergaard. Dealing with heteroskedasticity with the regress command 17 Oct , However, when I check model assumptions heteroskedasticity appears as a consequence of differences between genders cf.

Stata paste-in I. Thus, I need to account for the heteroskedasticity somehow. Stata paste-in II , but I am uncertain how to check, whether using this model reduces or eliminate the effect of heteroskedasticity. The Stata manual refers to the Wald test for test of heteroskedasticity, but does not contain info in relation to interpretation my take is that heteroskedasticity is still present.

Thanks in advance. Best, Sarah Code:. Last edited by Sarah Soendergaard ; 17 Oct , Tags: None. Clyde Schechter. The use of -hetregress- does not eliminate heteroscedasticity. Rather, it fits a model that does not require homoscedasticity as an assumption. It is a different regression model altogether with a different assumption about the distribution of the residuals.

Comment Post Cancel. Carlo Lazzaro. Sarah: welcome to this forum. As an aside to Clyde's helpful reply, you may want to consider logging the regressand of your OLS and check whether the model still suffers from heteroskedasticity. Another recipe would consider including other predictors, if available and check as above. If these fixes do not change the situation, you can invoke -robust- standard errors. By the way, you do not say if the -estat ovtest- has been performed and what outcome gave you back: an uncorrect model specification is far more serious than heteroskedasticity.

Dear Clyde and Carlo Thanks for the fast replies - it is really appreciated and helpful. Clyde - just to be entirely sure - are there any severe functional differences between the regress and hetregress commands that you think I need to be aware of other than the hetregress not requiring homoscedasticity?

In other words is the hetregress more a less a regress model without the need for homocedasticity? I have read the information regarding the model provided by STATA, and have not come across anything which have caught my eye but that may come down to my somewhat limited experience with statistical linguistics.

Carlo - If I have to be honest, I am not really sure I understand your proposal correctly. That is, of all linear unbiased estimators, the OLS estimator has the smallest variance. Stated another way, any other linear unbiased estimator will have a larger variance than the OLS estimator. The Gauss-Markov assumptions are as follows. Here is a useful identity. We will be using all of the following distributions throughout the semester.

If we express X as a standard score, z, then z has a standard normal distribution because the mean of z is zero and its standard deviation is one. We can generate a normal variable with , observations as follows. They have associated with them their so-called degrees of freedom which are the number of terms in the summation.

Chi-square distribution If z is distributed standard normal, then its square is distributed as chi-square with one degree of freedom. Since chi-square is a distribution of squared values, and, being a probability distribution, the area under the curve must sum to one, it has to be a skewed distribution of positive values, asymptotic to the horizontal axis.

Here is the graph. We can generate a chi-square variable by squaring a bunch of normal variables. It is the ratio of two squared quantities, so it is positive and therefore skewed, like the chi-square. Here are graphs of some typical F distributions.

We can create a variable with an F distribution is follows: generate five standard normal variables, then sum their squares to make a chi-square variable v with five degrees of freedom; finally, divide v by z to make a variable distributed as F with 5 degrees of freedom in the numerator and 10 degrees of freedom in the denominator.

It can be positive or negative and is symmetric, like the normal distribution. We can create a variable with a t distribution by dividing a standard normal variable by a chi-square variable. See 10 above. If we want to actually use this distribution to make inferences, we need to make an assumption concerning the form of the function.

It is equal to the error sum of squares divided by the degrees of freedom N-k and is also known as the mean square error. By far the most common test is the significance test, namely that X and Y are unrelated. Pearson, and a number of other statisticians, kept requiring ad hoc adjustments to their chi-square applications.

Sir Ronald Fisher solved the problem in and called the parameter degrees of freedom. Suppose we have a data set with N observations. We can use the data in two ways, to estimate the parameters or to estimate the variance. The sum is 6 and the mean is 2. Once we know this, we can find any data point knowing only the other two. That is, we used up one degree of freedom to estimate the mean and we only have two left to estimate the variance.

To take another example, suppose we are estimating a simple regression model with ordinary least squares. This leaves the remaining N-2 data points to estimate the variance. We can also look at the regression problem this way: it takes two points to determine a straight line. If we only have two points, we can solve for the slope and intercept, but this is a mathematical problem, not a statistical one.

There is one degree of freedom that could be used to estimate the variance. Our simple rule is that the number of degrees of freedom is the sample size minus the number of parameters estimated. In the example of the mean, we have one parameter to estimate, yielding N-1 data points to estimate the variance.

To prove this, we have to demonstrate that this is an unbiased estimator for the variance of X and therefore, dividing by N yields a biased estimator. However, we only have N-2 independent observations on which to base our estimate of the variance, so we divide by N-2 when we take the average. Then the new summation could not be larger than the original summation, which is the variance of x, because we are omitting some positive, squared terms. Law of Large Numbers The law of large numbers states that if a situation is repeated again and again, the proportion of successful outcomes will tend to approach the constant probability that any one of the outcomes will be a success.

For example, suppose we have been challenged to guess how many thumbtacks will end up face down if are dropped off a table. Answer: How would we prepare for such a challenge? Toss a thumbtack into the air and record how many times it ends up face down. Divide that into the number of trials and you have the proportion.

Multiply by and that is your guess. Suppose we are presented with a large box full of black and white balls. We want to know the proportion of black balls, but we are not allowed to look inside the box. What do we do? Answer: sample with replacement. The number of black balls that we draw divided by the total number of balls drawn will give us an estimate.

The more balls we draw, the better the estimate. Mathematically, the theorem is, Let X 1 , X 2 , Central Limit Theorem This is arguably the most amazing theorem in mathematics. Let X be the mean of a random sample of size N from f x.

The incredible part of this theorem is that no restriction is placed on the distribution of x. No matter how x is distributed as long as it has a finite variance , the sample mean of a large sample will be distributed normally. It used to be that a sample of 30 was thought to be enough. Nowadays we usually require or more. Infinity comes quickly for the normal distribution. The CLT is the reason that the normal distribution is so important.

The theorem was first proved by DeMoivre in but was promptly forgotten. It was resurrected in by LaPlace, who derived the normal approximation to the binomial distribution, but was still mostly ignored until Lyapunov in generalized it and showed how it worked mathematically. Nowadays, the central limit theorem is considered to be the unofficial sovereign of probability theory. He wrote in Natural Inheritance, I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "Law of Frequency of Error".

The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway.

It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. However, we can prove it for certain cases using Monte Carlo methods. For example, the binomial distribution is decidedly non-normal since values must be either zero or one. However, according to the theorem, the mean of a large number of sample means will be distributed normally.

Here is the histogram of the binomial distribution with an equal probability of generating a one or a zero.. I then computed the mean and variance of the resulting 10, observations. Variable Obs Mean Std. The histogram of the resulting data is:. This is the distribution of the population of cities in the United States with popularion of , or more. There are very few very large cities and a great number of relatively small cities.

Suppose we treat this distribution as the population. I then computed the mean and standard deviation of the resulting data. According to the Central Limit Theorem, the z-scores should have a zero mean and unit variance. Here is the histogram. The sample consists of N observations. The probability of observing the sample is the probability of observing Y1 and Y2, etc. This is nothing more than ordinary least squares. Therefore, OLS is also a maximum likelihood estimator. As in the F-test, if the restriction is false, we expect that the residual sum of squares ESS for the restricted model will be higher than the ESS for the unrestricted model.

This means that the likelihood function will be smaller not maximized if the restriction is false. If the hypothesis is true, then the two values of ESS will be approximately equal and the two values of the likelihood functions will be approximately equal.

To do this test, do two regressions, one with the restriction, saving the residual sum of squares, ESS R , and one without the restriction, saving ESS U. Like all large sample tests, its significance is not well known in small samples. Usually we just assume that the small sample significance is about the same as it would be in a large sample.

Multiple regression and instrumental variables Suppose we have a model with two explanatory variables. This would generate the least squares estimators for this multiple regression. However, there is another approach, known as instrumental variables, that is both illustrative and easier.

Since we do not observe ui, we have to make the assumptions operational. We therefore define the following empirical analogs of A1 and A2. Divide both sides by N. Now we can work in deviations. Even though these equations look complicated with all the summations, these summations are just numbers derived from the sample observations on x, y, and z.

There are a variety of ways to solve two equations in two unknowns. Back substitution from high school algebra will work just fine. We might be able to make some sense out of these formulas if we translate them into simple regression coefficients. This is the case of perfect collinearity. X and Y are said to be collinear because one is an exact function of the other, with no error term.

This could happen if X is total wages and Y is total income minus non-wage payments. If X and Z are perfectly collinear, then whenever X changes, Z has to change. Therefore it is impossible to hold Z constant to find the separate effect of X. Stata will simply drop Z from the regression and estimate a simple regression of Y on X. The second term, rzyrxz is the effect of X on Y through Z. In other words, X is correlated with Z, so when X varies, Z varies.

This causes Y to change because Y is correlated with Z. It therefore measures the effect of X on Y, while statistically holding Z constant, the so-called partial effect of X on Y. It partials out the effect of X on Y. In this case, the multiple regression estimators collapse to simple regression estimators. Suppose we type the following data into the data editor in Stata. The problem is that we have an omitted variable, temperature.

The coefficients are not significant, probably because we only have eight observations. Nevertheless, remember that omitting an important explanatory variable can bias your estimates on the included variables. The omitted variable theorem The reason we do multiple regressions is that most things in economics are functions of more than one variable. If we make a mistake and leave one of the important variables out, we cause the remaining coefficient to be biased and inconsistent.

The explanatory variables are uncorrelated with the error term. This was the case with rainfall and temperature in our crop yield example in Chapter 7. If there are more omitted variables, add more equations. In the Data Editor, create the following variable. These numbers are exact because there is no random error in the model. You can see that the theorem works and both estimates are biased upward by the omitted variables.

Target and control variables: how many regressors? When we estimate a regression model, we usually have one or two parameters that we are primarily interested in. These variables associated with those parameters are called target variables. In a demand curve we are typically concerned with the coefficients on price and income.

These coefficients tell us if the demand curve downward sloping and whether the good is normal or inferior. We are primarily interested in the coefficient on the interest rate in a demand for money equation. In a policy study the target variable is frequently a dummy variable see below that is equal to one if the policy is in effect and zero otherwise. To get a good estimate of the coefficient on the target variable or variables, we want to avoid omitted variable bias. To that end we include a list of control variables.

For example, if we are studying the effect of three-strikes law on crime, our target variable is the three-strikes dummy. We include in the crime equation all those variables which might cause crime aside from the three-strikes law e. What happens if, in our attempt to avoid omitted variable bias, we include too many control variables?

Instead of omitting relevant variables, suppose we include irrelevant variables. However, including irrelevant variables will make the estimates inefficient relative to estimates including only relevant variables. It will also increase the standard errors and underestimate the t-ratios on all the coefficients in the model, including the coefficients on the target variables. Thus, including too many control variables will tend to make the target variables appear insignificant, even when they are truly significant.

We can summarize the effect of too many or too few variables as follows. Omitting a relevant variable biases the coefficients on all the remaining variables, but decreases the variance increases the efficiency of all the remaining coefficients. Discarding a variable whose true coefficient is less than its true theoretical standard error decreases the mean square error the sum of variance plus bias squared of all the remaining coefficients.

What is the best practice? I recommend the general to specific modeling strategy. After doing your library research and reviewing all previous studies on the issue, you will have comprised a list of all the control variables that previous researchers have used. You may also come up with some new control variables. Start with a general model, including all the potentially relevant controls, and remove the insignificant ones. Use t-tests and F-tests to justify your actions.

You can proceed sequentially, dropping one or two variables at a time, if that is convenient. After you get down to a parsimonious model including only significant control variables, do one more F-test to make sure that you can go from the general model to the final model in one step.

Sets of dummy variables should be treated as groups, including all or none for each group, so that you might have some insignificant controls in the final model, but not a lot. At this point you should be able to do valid hypothesis tests concerning the coefficients on the target variables. Proxy variables It frequently happens that researchers face a dilemma. Data on a potentially important control variable is not available. However, we may be able to obtain data on a variable that is known or suspected to be highly correlated with the unavailable variable.

Such variables are known as proxy variables, or proxies. The dilemma is this: if we omit the proxy we get omitted variable bias, if we include the proxy we get measurement error. As we see in a later chapter, measurement error causes biased and inconsistent estimates, but so does omitting a relevant variable.

What is a researcher to do? Monte Carlo studies have shown that the bias tends to be smaller if we include a proxy than if we omit a variable entirely. However, the bias that results from including a proxy is directly related to how highly correlated the proxy is with the unavailable variable.

It is better to omit a poor proxy. It might be possible, in some cases, to see how the two variables are related in other contexts, in other studies for example, but generally we just have to hope. The bottom line is that we should include proxies for control variables, but drop them if they are not significant.

An interesting problem arises if the proxy variable is the target variable. In that case, we are stuck with measurement error. So we use Z, which is available and is related to X. Unless we know the value of b, we have no measure of the effect of X on Y. However, we do know if the coefficient d is significant or not and we do know if the sign is as expected.

Therefore, if the target variable is a proxy variable, the estimated coefficient can only be used to determine sign and significance. An illustration of this problem occurs in studies of the relationship between guns and crime. There is no good measure of the number of guns, so researchers have to use proxies.

As we have just seen, the coefficient on the proxy for guns cannot be used to make inferences concerning the elasticity of crime with respect to guns. Nevertheless two studies, one by Cook and Ludwig and one by Duggan, both make this mistake. They are also known as binary variables. They are extremely useful. For example, I happen to have a data set consisting of the salaries of the faculty of a certain nameless university salaries.

The average salary at the time of the survey was as follows. We can then find the average female salary. The average male salary is somewhat higher. We can use the scalar command to find the difference in salary between males and females. Is this difference significant given the variance in salary? It is somewhat more elegant to use regression analysis with dummy variables to achieve the same goal. The t-ratio on female tests the null hypothesis that this salary difference is equal to zero.

This is exactly the same results we got using the standard t-test of the difference between two means. So, there appears to be significant salary discrimination against women at this university. This is the dummy variable trap. The result is we get the same regression we got with male as the only independent variable. It is possible to force Stata to drop the intercept term instead of one of the other variables,. This formulation is less useful because it is more difficult to test the null hypothesis that the two salaries are different.

Perhaps this salary difference is due to a difference in the amount of experience of the two groups. Maybe the men have more experience and that is what is causing the apparent salary discrimination. If so, the previous analysis suffers from omitted variable bias. But it is still significant and negative.

It is also possible that women receive lower raises than men do. We can test this hypothesis by creating an interaction variable by multiplying experience by female and including it as an additional regressor. There is a significant salary penalty associated with being female, but it is not caused by discrimination in raises. Useful tests F-test We have seen many applications of the t-test. However, the F-test is also extremely useful.

Suppose we want to test the null hypothesis that the coefficients on female and fexp are jointly equal to zero. The way to test this hypothesis is to run the regression as it appears above with both female and fexp included, and note the residual sum of squares 1. Then run the regression again without the two variables assuming the null hypothesis is true and seeing if the two residual sum of squares are significantly different.

If they are, then the null hypothesis is false. If they are not significantly different, then the two variables do not help explain the variance of the dependent variable and the null hypothesis is true cannot be rejected. Stata allows us to do this test very easily with the test command. This command is used after the regress command. Refer to the coefficients by the corresponding variable name.

According the F-ratio, we can firmly reject this hypothesis. Chow test This test is frequently referred to as a Chow test, after Gregory Chow, a Princeton econometrician who developed a slightly different version. This is a Chow test. However, because the t-tests on fexp and fadmin are not significant, we know that the difference is not due to raises or differences paid to women administrators.

I have one more set of dummy variables to try. I have a dummy for each department. Maybe women tend to be over-represented in departments that pay lower salaries to everyone, males and females e. This conclusion rests on the department dummy variables being significant as a group. We can test this hypothesis, using an F-test, with the testparm command.

That is where the testparm command comes in. So there is no significant salary discrimination against women, once we control for field of study. Female physicists, economists, chemists, computer scientists, etc. Unfortunately, the same is true for female French teachers and phys ed instructors. By the way, the departments that are significantly overpaid are dept 3 Chemistry , dept 10 Computer Science , and dept 11 Economics.

It is important to remember that, when testing groups of dummy variables for significance, that you must drop or retain all members of the group. If the group is not significant, even if one or two are significant, you should drop them all. If the group is significant, even if only a few are, then all should be retained. Granger causality test Another very useful test is available only for time series.

In C. Granger published an article suggesting a way to test causality. Granger suggested running a regression of prison on lagged prison and lagged crime. If lagged crime is significant, then crime causes prison. Then do a regression of crime on lagged crime and lagged prison.

If lagged prison is significant, then prison 2 Granger, C. I have data on crime crmaj, major crime, the kind you get sent to prison for: murder, rape, robbery, assault, and burglary and prison population per capita for Virginia from to crimeva.

Does prison cause or deter crime? The sum of the coefficients is negative, so it deters crime if the sum is significantly different from zero. Does crime cause prison? Apparently, prison deters crime, but crime does not cause prisons, at least in Virginia. The Granger causality test is best done with control variables included in the regression as well as the lags of the target variables.

However, if the control variables are unavailable or the analysis is just preliminary, then the lagged dependent variables will serve as proxies for the omitted control variables. J-test for non-nested hypotheses Suppose we are having a debate about crime with some sociologists.

We believe that crime is deterred by punishment, mainly imprisonment, while the sociologists think that putting people in prison does no good at all. The sociologists think that crime is caused by income inequality. We both agree that unemployment and income have an effect on crime.

How can we resolve this debate? The key is to create a supermodel with all of the variables included. We then test all the coefficients. If prison is significant and income inequality is not, then we win. If the reverse is true then the sociologists win. If neither or both are significant, then it is a tie. You can get ties in tests of non-nested hypotheses. By the way, the J in J-test is for joint hypothesis.

I have time series data for the US which has data on major crime murder, rape, robbery, assault, and burglary , real per capita income, the unemployment rate, and income inequality measured by the Gini coefficient from gini. The Gini coefficient is a number between zero and one. Zero means perfect equality, everyone has the same level of income. One means perfect inequality, one person has all the money, everyone else has nothing.

We create the supermodel by regressing the log of major crime per capita on all these variables. It appears to be a draw. However, the coefficient on gini is negative, indicating that, as the gini coefficient goes up more inequality, less equality crime goes down. If there were more than one variable for each of the competing models e. We would test the joint significance of prison and arrests versus the joint significance of gini and divorce with F-tests.

LM test Suppose we want to test the joint hypothesis that rtpi and unrate are not significant in the above regression, even if we already know that they are highly significant. We know we can test that hypothesis with an F- test. However, an alternative is to use the LM for Lagrangian Multiplier, never mind test. The primary advantage of the LM test is that you only need to estimate the main equation once.

Under the F-test you have to estimate the model with all the variables, record the error sum of squares, then estimate the model with the variables being tested excluded, record the error sum of squares, and finally compute the F-ratio. For example, use the crime The F-test is easy in Stata. If they are truly irrelevant, they will not be significant and the R-square for the regression will be approximately zero. The LM statistic is N times the R-square, which is distributed as chi-square with, in this case, two degrees of freedom because there are two restrictions two variables left out.

Although the fact that both variables have very low t-ratios is a clue. Whenever we run regress Stata creates a bunch of output that is available for further analysis. To see this output, use the ereturn list command. The prob-value is.

We cannot reject the null hypothesis that the age groups are not related to crime. OK, so the F-test is easier. Nevertheless, we will use the LM test later, especially for diagnostic testing of regression models. One modification of this test that is easier to apply in Stata is the so-called F-form of the LM test which corrects for degrees of freedom and therefore has somewhat better small sample properties.

The results are also similar to the nR-square version of the test, as expected. I routinely use the F-form of the LM test because it is easy to do in Stata and corrects for degrees of freedom. We know that they are completely independent, so there should not be a significant relationship between them. It is sometimes easy to find outliers and influential observations, but not always. If you have a large data set, or lots of variables, you could easily miss an influential observation.

You do not want your conclusions to be driven by a single observation or even a few observations. So how do we avoid this influential observation trap? Stata has a function called dfbeta which helps you avoid this problem. It works like this suppose we simply ran the regression with and without New Hampshire.

With NH we get a significant result and a relatively large coefficient on x. Without NH we should get a smaller and insignificant coefficient. We could do this 51 times, dropping each state in turn and seeing what happens, but that would be tedious. Instead we can use matrix algebra and computer technology to achieve the same result. DFbetas Belsley, Kuh, and Welsch4 suggest dfbetas scaled dfbeta as the best measure of influence.

After regress, invoke the dfbeta command. Regression Diagnostics. Now list the observations that might be influential. Clearly NH is influential, increasing the value of the coefficient by two standard errors. At this point it would behoove us to look more closely at the NH value to make sure it is not a typo or something.

What do we do about influential observations? We might find simple coding errors. If we have some and they are not coding errors, we might take logarithms which has the advantage of squashing variance or weight the regression to reduce their influence. If none of this works, we have to live with the results and confess the truth to the reader. Multicollinearity We know that if one variable in a multiple regression is an exact linear combination of one or more variables in the regression, that the regression will fail.

Stata will be forced to drop the collinear variable from the model. But what happens if a regressor is not an exact linear combination of some other variable or variables, but almost? This is called multicollinearity.

It has the effect of increasing the standard errors of the OLS estimates. This means that variables that are truly significant appear to be insignificant in the collinear model Also, the OLS estimates are unstable and fragile small changes in the data or model can make a big difference in the regression estimates.

It would be nice to know if we have multicollinearity. Variance inflation factors Stata has a diagnostic known as the variance inflation factor , VIF, which indicates the presence of multicollearity. The rule of thumb is that multicollinearity is a problem if the VIF is over Here is an example. All the VIF are over It is highly correlated with all the other variables. Perhaps we should drop it and try again.

If we had not checked for multicollinearity, we might have concluded that none of the variables were related to y. So doing a VIF check is generally a good idea. However, dropping variables is not without its dangers. It could be that x4 is a relevant variable and the result of dropping it is that the remaining coefficients are biased and inconsistent.

Be careful. Thus, for each observation the error term, ui, comes from a distribution with exactly the same mean and variance, i. When the identically distributed assumption is violated, it means that the error variance differs across observations. Constant variance must be homoskedasticity. If the data are heteroskedastic, ordinary least squares estimates are still unbiased, but they are not efficient.

This means that we have less confidence in our estimates. However, even worse, perhaps, our standard errors and t-ratios are no longer correct. It depends on the exact kind of heteroskedasticity whether we are over- or under-estimating our t-ratios, but in either case, we will make mistakes in our hypothesis tests. So, it would be nice to know if our data suffer from this problem. Testing for heteroskedasticity The first thing to do is not really a test, but simply good practice: look at the evidence.

The residuals from the OLS regression, ei are estimates of the unknown error term ui, so the first thing to to is graph the residuals and see if they appear to be getting more or less spread out as Y and X increase. Breusch-Pagan test There are two widely used tests for heteroskdeasticity. The first is due to Breusch and Pagan. It is an LM test. The basic idea is as follows. Consider the following multiple regression model. So we need an estimate of the variance of the error term at each observation.

We use the OLS residual, e, as our estimate of the error term, u. Therefore our estimate of the variance of ui is ei , the square of the residual. If there is no heteroskedasticity, the squared residuals will be unrelated to any of the independent variables.

Alternatively, we could test whether the squared residual increases with the predicted value of y or some other suspected variable. As an example, consider the following regression of major crime per capita across 51 states including DC in hetero. If we ignore the outlier at , it looks like the variance of the residuals is getting larger with larger population.

The Breusch-Pagan test is invoked in Stata with the hettest command after regress. Population does seem to be a problem. We will consider what to do about it below. White modifies the Breusch-Pagan auxiliary regression as follows. To invoke the White test use imtest, white. The im stands for information matrix.

The White test has a slight advantage over the Bruesh-Pagan test in that it is less reliant on the assumption of normality. However, because it computes the squares and cross-products of all the variables in the regression, it sometimes runs out of space or out of degrees of freedom and will not compute.

Remember, there is no heteroskdeasticity associated with prison, metpct, or rpcpi. For these reasons, I prefer the Breusch-Pagan test. Weighted least squares Suppose we know what is causing the heteroskedasticity. Applying OLS to this weighted regression will be unbiased, consistent and efficient relative to unweighted least squares. The problem with this approach is that you have to know the exact form of the heteroskedasticity. If you were wrong and there was no heteroskedasticity, you have created it.

You could make things worse by using WLS inappropriately. There is one case where weighted least squares is appropriate. We want to estimate the model as close as possible to the truth, so we divide the state totals by the state population to produce per capita values.

This does not mean that it is always correct to weight by the square root of population. After all, the error term might have some heteroskedasticity in addition to being an average. The most common practice in crime studies is to weight by population, not the square root, although some researchers weight by the square root of population.

Robust standard errors and t-ratios Since heteroskedasticity does not bias the coefficient estimates, we could simply correct the standard errors and resulting t-ratios so our hypothesis tests are valid. We started with a simple regression model in deviations. As above, we will use the square of the 2 residual, ei. We will call them robust standard errors. It is easy to get robust standard errors in Stata. For example, in the crime equation above we can see the robust standard and t-ratios by adding the option robust to the regress command.

In fact, it is the standard methodology in crime regressions. Robust standard errors yield consistent estimates of the true standard error no matter what the form of the heteroskedasticity. Using them allows us to avoid having to discover the particular form of the problem. Finally, if there is no heteroskedasticity, we get the usual standard errors anyway.

This is becoming standard procedure among applied researchers. If you know that a variable like population is a problem, then you should weight in addition to using robust standard errors. The existence of a problem variable will become obvious in your library research when you find that most of the previous studies weight by a certain variable. This could happen if there are errors of measurement in the independent variable or variables.

This means that the OLS estimate is biased. In this case, the OLS estimate is biased downward, since the true value of beta is divided by a number greater than one. Cure for errors in variables There are only two things we can do to fix errors in variables. The first is to use the true value of x. Since this is usually not possible, we have to fall back on statistics. Suppose we know of a variable, z, that is 1 highly correlated with xT, but 2 uncorrelated with the error term.

This variable, called an instrument or instrumental variable, allows us to derive consistent not unbiased estimates. It collapses to the OLS estimator if x can be an instrument for itself, which would happen if x is not measured with error. Then x would be highly correlated with itself, and not correlated with the error term, making it the perfect instrument. This IV estimator is consistent because z is not correlated with v.

In the first stage we regress the variable with the measurement error, x, on the instrument, z, using ordinary least squares, saving the predicted values. In the second stage, we substitute the predicted values of x for the actual x and apply ordinary least squares again. The result is the consistent IV estimate above. Assume that the relationship between the instrument z and the possibly mis-measured variable, x is linear.

If these two estimates are significantly different, it must be due to measurement error. If it is significant, then the two parameter estimates are different and there is significant errors in variables bias. If it is not significant, then the two parameters are equal and there is no significant bias. We can implement this test in Stata as follows. Wooldridge has a nice example and the data set is available on the web. The data are in the Stata data set called errorsinvars.

The dependent variable is the log of the wage earned by working women. The independent variable is education. The estimated coefficient is the return to education. However, the high wages earned by highly educated women might be due to more intelligence. Brighter women will probably do both, go to college and earn high wages. So how much of the return to education is due to the college education? We can implement instrumental variables with ivreg.

The first stage regression varlist is put in parentheses. In fact, education is only significant at the. Also, there might still be some bias because the coefficient on the residual w-hat is almost significant at the. More research is needed. For example, consider the simple Keynesian system. Before we go on, we need to define some terms. The above simultaneous equation system of two equations in two unknowns C and Y is known as the structural model.

It is the theoretical model from the textbook or journal article, with an error term added for statistical estimation. The variables C and Y are known as the endogenous variables because their values are determined within the system. I, G, and X are exogenous variables because their values are determined outside the system. This structural model has two types of equations.

The first equation the familiar consumption function is known as a behavioral equation. Behavioral equations have error terms, indicating some randomness. The second equation is an identity because it simply defines national income as the sum of all spending endogenous plus exogenous. Substituting the second equation into the first yields the value for C.

We could, if we had the data, estimate this equation with ordinary least squares. They are both linear, with almost the same parameters. The parameters are functions of the parameters of the structural model.

Both equations have the same variables on the right hand side, namely all of the exogenous variables. Because of the fact that the reduced form equations have the same list of variables on the right hand side, all we need to write them down in general form is the list of endogenous variables because we have one reduced form equation for each endogenous variable and the list of exogenous variables.

We are now in position to show that the endogenous variable, Y, on the right hand side of the consumption function is correlated with the error term in the consumption equation. Look at the reduced form equation for Y RF2 above. Note that Y is a function of u. Clearly, variation in u will cause variation in Y.

Therefore, cov Y,u is not zero. On the other hand, OLS estimates of the reduced form equation are unbiased and consistent because the exogenous variables are taken as given and therefore uncorrelated with the error term, v1 or v2. In the case of the simple Keynesian model, the coefficients from the reduced form equation for Y correspond to multipliers, giving the change in Y for a one-unit change in I or G or X. Example: supply and demand Consider the following structural supply and demand model in deviations.

There are two endogenous variables p and q and one exogenous variable, y. We can derive the reduced form equations by solving by back substitution. Now we solve for p. RF2 reveals that price is a function of u, so applying OLS to either the supply or demand curve will result in biased and inconsistent estimates.

However, we can again appeal to instrumental variables 2sls to yield consistent estimates. The good news is that the structural model itself produces the instruments. Remember, instruments are required to be highly correlated with the problem variable p , but uncorrelated with the error term. The exogenous variable s in the model satisfy these conditions. In the supply and demand model, income is correlated with price see RF2 , but independent of the error term because it is exogenous.

Suppose we choose the supply curve as our target equation. In the first stage we regress the problem variable, p, on all the exogenous variables in the model. Here there is only one exogenous variable, y. In other words, estimate the reduced form equation, RF2. Suppose we estimate the two reduced form equations above.

If we take the probability limit of the two estimators, we get the IV estimator again. Remember, consistency "carries over. Or which is more right? What is going on? The identification problem An equation is identified if we can derive estimates of the structural parameters from the reduced form. If we cannot, it is said to be unidentified. The supply curve was just identified in the first example, but over identified when we added the weather variable.

It would be nice to know if the structural equation we are trying to estimate is identified. A simple rule, is the so-called order condition for identification. To apply the order condition, we first have to choose the equation of interest. Let G be the number of endogenous variables in the equation of interest including the dependent variable on the left hand side. Let K be the number of exogenous variables that are in the structural model, but excluded from the equation of interest.

To be identified, the following relationship must hold. That is why we got two estimates from indirect least squares. Two notes concerning the order condition. First, it is a necessary, but not sufficient condition for identification. That is, no equation that fails the order condition will be identified.

However, there are rare cases where the order condition is satisfied but the equation is still not identified. However, this is not a major problem because, if we try to estimate an equation using two stage least squares or some other instrumental variable technique, the model will simply fail to yield an estimate.

At that point we would know that the equation was unidentified. A more complicated condition, called the rank condition, is necessary and sufficient, but the order condition is usually good enough. Second, there is always the possibility that the model will be identified according to the order condition, but fail to yield an estimate because the exogenous variables excluded from the equation of interest are too highly correlated with each other. Say we require that two variables be excluded from the equation of interest and we have two exogenous variables in the model but not in the equation.

Looks good. Illustrative example Dennis Epple and Bennett McCallum, from Carnegie Mellon, have developed a nice example of a simultaneous equation model using real data. The demand for broiler chicken meat is assumed to be a function of the price of chicken meat, real income, and the price of beef, a close substitute.

Supply is assumed to be a function of price and the price of chicken feed corn. Epple and McCallum have collected data for the US as a whole, from to They recommend estimating the model in logarithms. The data are available in DandS. First we try ordinary least squares on the demand curve. For reasons that will be discussed in a later chapter, we estimate the chicken demand model in first differences. This is a short run demand equation relating the change in price to the change in consumption.

To estimate this equation in Stata, we have to tell Stata that we have a time variable with the tsset year command. We can then use the D. We will use the noconstant option to avoid adding a time trend to the demand curve. This is not a bad estimated demand function.

The demand curve is downward sloping, but very inelastic, and significant at the. The elasticity of income is positive, indicating that chicken is not an inferior good, which makes sense. The coefficient on the price of beef is positive, which is consistent with the price of a substitute. All the explanatory variables are highly significant.

The coefficient on price is virtually zero. This means that the supply of chicken does not respond to changes in price. We need to use instrumental variables. The exogenous variables are lpfeed, ly, and lpbeef. Note that the demand curve is just identified by lpfeed. Model -. The coefficient on price is now positive and significant, indicating that the supply of chickens will increase in response to an increase in price. Also, increases in chickenfeed prices will reduce the supply of chickens. The supply curve is over identified by the omission of ly and lpbeef.

Note to the reader. I have taken a few liberties with the original example developed by Professors Epple and McCallum. They actually develop a slightly more sophisticated, and more believable, model, where the coefficient on price in the supply equation starts out negative and significant in the supply equation and eventually winds up positive and significant.

However, they have to complicate the model in a number of ways to make it work.

To use this command, simply provide the two probabilities to be used the probability of success for group 1 is given first, then the probability of success for group 2. For example,. At this point we need to pause for a brief discussion regarding the coding of data. These codes must be numeric i. Many statistical packages, including Stata, will not perform logistic regression unless the dependent variable coded 0 and 1.

Specifically, Stata assumes that all non-zero values of the dependent variables are 1. Therefore, if the dependent variable was coded 3 and 4, which would make it a dichotomous variable, Stata would regard all of the values as 1. This is hard-coded into Stata; there are no options to over-ride this. If your dependent variable is coded in any way other than 0 and 1, you will need to recode it before running the logistic regression. By default, Stata predicts the probability of the event happening.

Stata has two commands for logistic regression, logit and logistic. The main difference between the two is that the former displays the coefficients and the latter displays the odds ratios. You can also obtain the odds ratios by using the logit command with the or option. Which command you use is a matter of personal preference.

Below, we discuss the relationship between the coefficients and the odds ratios and show how one can be converted into the other. However, before we discuss some examples of logistic regression, we need to take a moment to review some basic math regarding logarithms. In this web book, all logarithms will be natural logs.

This is critical, as it is the relationship between the coefficients and the odds ratios. We have created some small data sets to help illustrate the relationship between the logit coefficients given in the output of the logit command and the odds ratios given in the output of the logistic command.

We will use the tabulate command to see how the data are distributed. We will also obtain the predicted values and graph them against x , as we would in OLS regression. We use the expand command here for ease of data entry. On each line we enter the x and y values, and for the variable cnt , we enter then number of times we want that line repeated in the data set. We use the expand command to finish creating the data set. We can see this by using the list command.

If list command is issued by itself i. In this example, we compared the output from the logit and the logistic commands. Later in this chapter, we will use probabilities to assist with the interpretation of the findings.

Many people find probabilities easier to understand than odds ratios. You will notice that the information at the top of the two outputs is the same. Wald test values called z and the p-values are the same, as are the log likelihood and the standard error. However, the logit command gives coefficients and their confidence intervals, while the logistic command give odds ratios and their confidence intervals.

You will also notice that the logistic command does not give any information regarding the constant, because it does not make much sense to talk about a constant with odds ratios. The output from the logit command indicates that the coefficient of x is 0. This means that with a one unit change in x , you would predict a 0 unit change in y. To transform the coefficient into an odds ratio, take the exponential of the coefficient:. This yields 1, which is the odds ratio. An odds ratio of 1 means that there is no effect of x on y.

Looking at the z test statistic, we see that it is not statistically significant, and the confidence interval of the coefficient includes 0. Note that when there is no effect, the confidence interval of the odds ratio will include 1. In this example, we see that the coefficient of x is again 0 1. Again, we conclude that x has no statistically significant effect on y. However, in this example, the constant is not 0.

The constant also called the intercept is the predicted log odds when all of the variables in the model are held equal to 0. Here we see that the odds ratio is 4, or more precisely, 4 to 1. In other words, the odds for the group coded as 1 are four times that as the odds for the group coded as 0. Because both of our variables are dichotomous, we have used the jitter option so that the points are not exactly one on top of the other. While we will briefly discuss the outputs from the logit and logistic commands, please see our Annotated Output pages for a more complete treatment.

The meaning of the iteration log will be discussed later. We will not try to interpret the meaning of the "pseudo R-squared" here except to say that emphasis should be put on the term "pseudo" and to note that some authors including Hosmer and Lemeshow, discount the usefulness of this statistic. The log likelihood of the fitted model is The likelihood is the probability of observing a given set of observations, given the value of the parameters.

The number This indicates that a decrease of 1. This coefficient is also statistically significant, with a Wald test value z of Because the Wald test is statistically significant, the confidence interval for the coefficient does not include 0. As before, the coefficient can be converted into an odds ratio by exponentiating it:. You can obtain the odds ratio from Stata either by issuing the logistic command or by using the or option with the logit command.

You will notice that the only difference between these two outputs is that the logit command includes an iteration log at the top. Our point here is that you can use more than one method to get this information, and which one you use is up to you. The odds ratio is interpreted as a.

Notice that a. In other words, as you go from a non-year-round school to a year-round school, the ratio of the odds becomes smaller. In the previous example, we used a dichotomous independent variable. Traditionally, when researchers and data analysts analyze the relationship between two dichotomous variables, they often think of a chi-square test. Chi-square is actually a special case of logistic regression. In a chi-square analysis, both variables must be categorical, and neither variable is an independent or dependent variable that distinction is not made.

In logistic regression, while the dependent variable must be dichotomous, the independent variable can be dichotomous or continuous. Also, logistic regression is not limited to only one independent variable. This is a measure of the education achievements of the parents of the children in the schools that participated in the study. Looking at the output from the logit command, we see that the LR-chi-squared is very high and is clearly statistically significant.

The value of the Wald statistic indicates that the coefficient is significantly different from 0. However, it is not obvious what a 3. This tells us that the odds ratio is This is the amount of change expected in the odds ratio when there is a one unit change in the predictor variable with all of the other variables in the model held constant. If you tried to draw a straight line through the points as you would in OLS regression, the line would not do a good job of describing the data.

One possible solution to this problem is to transform the values of the dependent variable into predicted probabilities, as we did when we predicted yhat1 in the example at the beginning of this chapter. This s-shaped curve resembles some statistical distributions and can be used to generate a type of regression equation and its statistical tests. To get from the straight line seen in OLS to the s-shaped curve in logistic regression, we need to do some mathematical transformations.

When looking at these formulas, it becomes clear why we need to talk about probabilities, natural logs and exponentials when talking about logistic regression. Interpreting the output from this logistic regression is not much different from the previous ones. The LR-chi-square is very high and is statistically significant.

Both of these coefficients are significantly different from 0 according the Wald test. In OLS regression, the R-square statistic indicates the proportion of the variability in the dependent variable that is accounted for by the model i.

Unfortunately, creating a statistic to provide the same information for a logistic regression model has proved to be very difficult. Many people have tried, but no approach has been widely accepted by researchers or statisticians. The output from the logit and logistic commands give a statistic called "pseudo-R-square", and the emphasis is on the term "pseudo".

This statistic should be used only to give the most general idea as to the proportion of variance that is being accounted for. The fitstat command gives a listing of various pseudo-R-squares. You can download fitstat over the internet see How can I use the search command to search for programs and get additional help? As you can see from the output, some statistics indicate that the model fit is relatively good, while others indicate that it is not so good.

The values are so different because they are measuring different things. We will not discuss the items in this output; rather, our point is to let you know that there is little agreement regarding an R-square statistic in logistic regression, and that different approaches lead to very different conclusions. If you use an R-square statistic at all, use it with great care.

Next, we will describe some tools that can used to help you better understand the logistic regressions that you have run. These commands are part of an. Graphs are again helpful. When the outcome is categorical and the predictor is also categorical, a grouped bar graph is informative. The following is the graph of vote choice and gender.

The figure shows that, within males, Trump support was higher. Within females, Clinton support was higher. Boxplots are useful for examining the association between a categorical variable and a variable measured on an interval scale. The interpretation is that older respondents tend to be more likely to vote for Trump.

Stata has two commands for fitting a logistic regression, logit and logistic. The difference is only in the default output. The logit command reports coefficients on the log-odds scale, whereas logistic reports odds ratios. The syntax for the logit command is the following:. After specifying logit , the dependent variable is listed first followed by the independent variables.

The i. The last variable is age. The syntax for logistic is the same except that we swap out the name of the command. Both commands estimate exactly the same model. The significant tests for the individual coefficients are also the same, as they are both based on the coefficients corresponding to the logit scale the output from logit.

The odds ratios presented by logistic are simply the exponentiated coefficients from logit. For example, the coefficient for educ was -. The standard errors for the odds ratios are based on the delta method. We get the odds ratio version as:. Note also that we can still get odds ratios from logit if we specify the or option:. The coefficients returned by logit are difficult to interpret intuitively, and hence it is common to report odds ratios instead.

In general, the percent change in the odds given a one-unit change in the predictor can be determined as. Odds ratios are commonly reported, but they are still somewhat difficult to intuit given that an odds ratio requires four separate probabilities:. However, due to the nonlinearity of the model, it is not possible to talk about a one-unit change in an independent variable having a constant effect on the probability. Instead, predicted probabilities require us to also take into account the other variables in the model.

For example, the difference in the probability of voting for Trump between males and females may be different depending on if we are talking about educated voters in their 30s or uneducated voters in their 60s. Stata makes it easy to determine predicted probabilities for any combination of independent variables using the margins command.

For example, say we want to know the probability that a year-old college-educated male votes for Trump. The syntax would be:. Since gender is a factor, we could get the probabilities for both females and males if we specify the syntax as:.

Factor variables can be specified after the margins command and before the options, but covariates non-categorical predictors must be specified using the at option. For both outputs, the value in the Margin column is the predicted probability. The probability that a year-old, college-educated male votes for Trump is. We can run the following code:. The output is the following:. The top of the output provides a key for interpreting the table. For example, where the table reads 3 Female , we have the probability of voting for Trump among year-old females.

This is a lot of output, so Stata provides the extraordinarily useful marginsplot command, which can be called after running any margins command. We get the predicted probabilities plotted across the range of ages, with separate lines for male and female, holding education constant at a college degree. Based on the model, the probability of voting for Trump increases with age, but it is always higher for males than females.

Univariate Summaries The first step in any statistical analysis should be to perform a visual inspection of the data in order to check for coding errors, outliers, or funky distributions. We can check using the tab command: tab vote vote Freq. Percent Cum. Here is what these look like: tab vote, nolab vote Freq.

Percent Valid Cum.