CeLSIUS logo


Census logo
Aggregated datasets

The most flexible form in which LS data may be supplied to users is the aggregated dataset. This allows the user to run exploratory analyses but it is also possible to run more complex analyses such as multivariate regressions.

An aggregated (or collapsed) dataset contains a separate record for each populated combination of values of variables, and a count of individuals matching that combination.

For example, suppose you were looking at marital status in 1991 in relation to marital status ten years earlier and also to age, housing tenure and car access in 1981. The table below shows a few combinations of values taken from the dataset produced for such a study. The user will have specified how the variables should be derived and which values should be used.

Count
Age group 1981
Housing tenure 1981
Car access 1981
Marital status 1981
Marital status 1991
733
20-29
Owner-occupier
No car
Single
Single
821
20-29
Owner-occupier
No car
Single
Married
77
20-29
Owner-occupier
No car
Single
Divorced
3
20-29
Owner-occupier
No car
Single
Widowed
29
20-29
Owner-occupier
No car
Married
Single
3054
20-29
Owner-occupier
No car
Married
Married
403
20-29
Owner-occupier
No car
Married
Divorced
etc...          

Data extracted from the LS on 05/09/2003.

Suppose that there are 5 age-groups, covering the population aged 20-70 years, in 10-year intervals, and that there are 3 categories for housing tenure, 2 categories for car access, and 4 for each of the marital status variables. Therefore there are 5 x 3 x 2 x 4 x 4 = 480 combinations of values within these variables that can arise.

Such datasets are "collapsed" in the sense that records with the same values are grouped. In the example above, there are 821 individuals in the LS who were single owner-occupiers aged 20 – 29 in 1981, who did not have access to a car in 1981, and who were married by 1991. Rather than include 821 separate records for these people, they are collapsed into a single record with a count of 821.

Datasets can be provided in the format of the user's choice: e.g. Stata, SPSS, SAS or as a delimited text file. As well as the dataset itself, you would be provided with the command file (generated by Stata or other statistical package) which shows exactly how the dataset was produced.