Logistic regression

From Wikiversity
Jump to navigation Jump to search
Type classification: this is a notes resource.
  1. Logistic regression is a statistical technique that allows the prediction of categorical dependent variables on the bases of categorical and/or continuous independent variables (Pallant, 2005; Tabachnick & Fidell, 2007).
  2. Logistic regression assumptions relate to sample size, multicollinearity and outliers.
  3. The output statistics of interest include the:
    1. Classification Tables: Used to determine whether prediction has improved across the models
    2. Omnibus Tests of Model Coefficients: Which indicates whether the improvement across the models is significant
    3. Model Summary: Which indicates the amount of variance accounted for by the models)
    4. Variables in the Equation: Which indicates the importance of each of the independent variables within the models:
      1. Wald statistics: The squared ratio of the unstandardized logit coefficient to its standard error.

References[edit | edit source]

  1. Brace, N., Keup, R. & Snelgar, R. (2003). An introduction to logistic regression (Section 4). In SPSS for psychologists (2nd. ed.). New York: Palgrave Macmillan.
  2. Mertler, C. A., & Vannatta, R. A. (2002). Logistic regression (Ch 11). In Advanced and multivariate statistical methods: Practical application and interpretation (2nd ed.) (pp. 313-330). Los Angeles: Pyrczak Publishing.
  3. Pallant, J. (2005). Logistic regression (Ch 14). In SPSS survival manual (2nd ed.). Sydney: Allen & Unwin.
  4. Peng, J. C., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 1-14.
  5. Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.) Boston, MA: Allyn-Bacon.


P308D - Categorical Data Analysis - Dale Berger


Mediation & Suppression[edit | edit source]

What is a negative correlation between gender and graduation date? - People who are higher on sex are lower on graduation date. Graduating earlier or later? Earlier.

So: people who are male are graduating earlier; females later: what does that imply if you look at the

(See the Google Sheet













-.149: Put that in a sentence: A: “On average, females graduated -.149 semesters later” A2: Probably better to say, “ More females graduated later than males.” …

When we say, “Holding graduation date constant” or even better, “for men and women graduating in the same semester, on average the difference in salary was 2253. Now, that’s a larger number than what we got when we ‘’didn’t’’ control for salary.

Q: So, why? A: There are a couple ways to do this. One is graphically:









This graph is simplified. (J: I don’t really get it:))

Suppression: the relationship between gender and salary was suppressed by the graduation date: people who graduated later tended to get greater salaries, but because proportionally more women graduated later, this suppressed the difference in salary between men and women.

How do you tell if tehre is suppression? A: You compare C and C’ (“cee” and “cee prime”) - and if C’ is bigger than C, then you have suppression.

So, in this situation we have mediation, and we also have suppression.

Q: Two-Tail tests. p. 42; - each table is .025 on bottom end, and .025 on top end.

Q: When doing the Wilcoxon T; figuring out which tnumber to chose from the possible probabilities? - A: The table only lists suprisingly small values: depending on how you sum ranks … the Wilcoxon test is based on ranking them in the order that gives you the smallest possible t value. The same concept applies to the Mann-Whitney “U” - you get multiple answers; smaller is what you take because tables are constructed on one hand.

Theoretically, you have the whole distribution. . .

Q: If you use the table on the left, you’re using α.025.

Lambda

Gamma[edit | edit source]

This statistic is useful if you have two ordered variables, going from Low to High. They don’t need to be interval: all you need to do is order it; in the case of which one is bigger, and which one is smaller.

If you have people who are Temp; People who are hourly; exempt employees; profit-sharing; you could code these things as 1,2,3,4, but would it make sense to run a correlation looking for a linear relationship between them? Not really. But it would make sense at a job level, if you were able to put these in a specific order.

Take an attitude measure: Strongly Agree (1) <—? (2) —>Strongly Disagree (3) - there aren’t necessarily equal intervals here, but what we’re interested in is whether there is an ordinal trend, such as, “are people higher in the job level more likely to agree? Example 2

In a 2x2 table like in Example 3 what test would you apply? Chi Square Test of Independence.

If you ran a χ^2 on this table, (https://docs.google.com/spreadsheets/d/1qc381NZ-FnCABig4oXXr4eEQh7TX2L3so1rrF_U0smc/edit#gid=0 Example 2] how many rows would it have? You can't really tell what the salary is: it's either really big or really small: and you have 6 degrees of freedom (3*4-3) (rows*columns-rows???)

Say someone is higher on 01; Example 2.1-2 If you take someone from 'a' and someone from ‘b' they can be concordant or discordant - (otherwise we don't count them?)

If I sample one person from two cells; I have two cells. We could see: are they a concordant pair, a discordant pair, or neither? In this case, they would be a concordant pair; because the person in cell ‘d’ is higher on BA and Salary than the person in cell ‘a’.

This tells us how many we have: 100 for the first pick; 99 for the second; divide by 2 because it’s not a combination (J: It’s a permutation? Or did I get that mixed up?

How many concordant pairs would there be there? 16*51 = 816.

Where:

  1. Number of (possible) Concordant Pairs = (P)
  2. Number of (possible) Discordant Pairs = (Q)
  3. Gamma = (P-Q)/(P+Q)
  4. In this example: That is (816-216)/(816+216) = .581
  1. Discordant pairs = One is high, one is low

To do this, you have to code consistently with some theory: In Example 2 anyone lower and further to the right would be concordant


Odds Ratio[edit | edit source]

What is an Odds Ratio? If someone has no BA, what are the Odds that they’re on Salary?

The odds would be 16:24; which is roughly 3:2 = 1.5. That’s not the proportion of people on Salary. . . the Odds ratio is a ratio of odds! (laughter from class)


  1. Odds Ratio of being on salary if you have No Ba:
    1. 1.5
  2. Odds of being on Salary if you have a BA:
    1. 5.666666667
  3. Odds of being on Salary for People with BA over Odds of Being on Salary for People with No BA
    1. (51/9)/(24/16)
    2. Q: Why is it “BA”/“NoBA” rather than the other way around? A: Answered below!

Q: If the odds are “1”, what does that mean? A: Equally likely. Q: If the odds were the same for both groups, what would the ratio be? A: “1”

If you did this the other way, you woul dget the ratio 0.265.

Q: If you see an odds ratio of 10, another odds ratio of .1, whose is bigger? A: They’re the same. (J: (1/ODDS Ratio 1) = (Odds Ratio 2/1))

(J: Odds ratio Test of Independence. ?)

Q: If you have 16/24 - 16*24 you have perfect independence; and you will get a χ^2 of zero. If you have perfect independence, you expect Bumble had something to do with your data. . . .

Q: In 2x2 table, Odds Ratio is identical to test of independence - (J: is that true?)


Binary Logistic Regression[edit | edit source]

Packet 6:

Assumptions of Linear Model:

  1. Normal distribution of Errors
  2. Equal Varance.

Bottom line: Ordinary Least Square Regression is clearly inappropriate.

Now, the logistic model fitting the pattern to the data looks like a lazy s. [image: https://www.evernote.com/shard/s95/sh/f634acd0-a7d0-4ebe-9a0c-c919c213881e/31a50a753d8f6c7b4cdc26dfa6e3d794]

(J: you’d better hope this sample doesn’t contain ))

Section 2: Examples in SPSS[edit | edit source]

Questions[edit | edit source]

  1. J: Question: If C and C’ are both significant, is that sufficient grounds for calling the suppression significant?
  2. How do you calculate degrees of freedom for a 3x4 table?
  3. I just ran with random numbers and got a Z of -184.985526. . . ??

If you have success Rate; Failure rate; Etc.,


See also[edit | edit source]

External links[edit | edit source]

* Binary Logistic Regression with SPSS