Latest class analysis is a methodology behind some of our analysis of our 2017 polling with Ipsos MORI of the public attitudes to the NHS and social care services.
Imagine trying to split your friends into subgroups. You could do it based on basic, directly measurable characteristics: height, gender, birthplace. Or you could use things which are harder to put your finger on: political ideology, intelligence, tech savviness. If you go down the second route then you probably think that different subgroups (or ways of classifying people, such as ‘liberal conservative’ or ‘libertarian’) exist for each, even if you can’t directly observe them. These are latent variables. When we’re trying to find distinct groupings, and using discrete data (eg ‘yes/no’ rather than ‘1,2,3,4…’), they are called latent classes – hence latent class analysis.
The premise of the method is that we can try and find out if these classes exist by try to sensibly group people based on their responses to surveys. So if we had three questions (Do you like ice cream? Do you like tiramisu? Do you like fizzy drinks?) we may end up identifying three natural groups: people who like them all (‘Sweet Tooth’), people who like none (‘Healthy Eaters’) and people who only like ice cream and tiramisu (‘Dessert Fans’). Or we may just find the first two groups cover everyone.
What data did we analyse?
We analysed respondent-level survey data from our work with Ipsos MORI. Interviews were carried out on Capibus, Ipsos MORI’s face-to-face omnibus survey. 1,985 adults aged 15 and over in Great Britain were interviewed between 5 and 15 May 2017 in respondents’ homes. Data are weighted to age, region, working status and social grade within gender, as well as household tenure and respondent ethnicity.
A subset of the data was used. Questions which seemed more descriptive that normative were dropped to simplify the model, and looked at later as potential covariates - something which we include in the model to see if it influences the results. The remaining questions were then recoded so that people’s responses were either a positive statement about the NHS, or not. The ‘or not’ would sometimes include negative statements as well as ‘don’t know’ and ‘refused’.
This leaves us with seven questions, and two possible answers for each – so 128 possible response profiles. 117 of these were present in the data, suggesting that most possible views were represented. Around 10% of people fitted the most NHS-positive of the profiles (having a pro-NHS answer to all 7 questions).
We also had considerable demographic data, as well as some descriptive questions we included in the survey. These were considered, where appropriate, as covariates. This allowed us to see that ‘The 60%’ referred to in my blog were more likely to have used the NHS in the last year. Surprisingly few other covariates were significantly different across the groups.
How did we decide how many groups to use?
LCA can be either hypothesis-testing or exploratory. If it’s hypothesis-testing you may go in with an idea based in theory (eg, ‘people either love the NHS, hate the NHS, or love the principles of the NHS but don’t want it to have more money’), and then see if it adds up. This would be a 'three class model'.
But we took an exploratory approach. We fitted a number of models with between two and seven classes (after this the model stops being able to fit, as the data are too spread out). While LCA models are a bit tricky, the general rule of thumb is that the lower the Bayesian Information Criterion (BIC) the better. The BIC is a criterion which checks how well the model fits, but penalises overfitting (ie when your model is too granular and only really applies to your data).
This criterion (as well as other fit statistics) suggested that a two class model was the best option, although a three class model was also justifiable. The two class option was chosen as, on top of being statistically more robust (including having smaller standard errors – a measure of accuracy), it was easily interpretable. The model was run a number of times, and tended to converge (this means that the model fit stops improving) after 500 iterations.
A third class?
As mentioned, a three class option was also an appropriate choice. This option mostly had the impact of splitting the ‘40%’ class. It resulted in three classes: ‘The 55%’, ‘The 31%’ and ‘The 14%’. The middle class were mostly uninteresting as they sat between the two more extreme classes on most of the questions. However, they were more likely than either to be willing to use non-NHS services, and to think the NHS wastes money. It is hard to know exactly why this is. Models with more than three classes could not be justified as the model did not fit well.