Editing, Data Sources, And Imputation In 2023 Census

Editing, data sources, and imputation in the 2023 Census summarises the approach to filling in gaps in census attributes when valid information has not been provided on 2023 Census forms. This includes questions for which no information is provided, as well as those where the information provided is inconsistent or not usable for other reasons.

This paper first provides a general overview of the different data sources and methods used for filling gaps. These methods are:

  • editing
  • deterministic derivation, based on information from other census variables
  • historical census responses
  • administrative (admin) data
  • statistical imputation.

It then provides more detailed information about the sources and methods used for each 2023 Census variable. Finally, the paper describes the methods used to assess the quality of each source.

This paper is one of a collection of documents summarising the methodology used to combine 2023 Census responses with admin data.

Using a combined census model for the 2023 Census provides links to the other papers.

Download the paper and table below or read the summary of key points online.

Summary of key points

For the 2023 Census, we use a range of methods to ensure valid values for individual and dwelling attributes where possible. This includes editing to detect and resolve errors, as well as the use of alternative data sources to fill gaps.

The 2023 Census builds on the combined model first implemented in the 2018 Census. Like in 2018, the 2023 Census uses historical census data, admin data, and statistical imputation to supplement 2023 Census responses. Using these data sources makes more information available for data users and improves the overall quality of census outputs.

Compared with the 2018 Census, we have expanded the use of alternative data sources for the 2023 Census. Some variables, such as post-school field of study and usual residence one year ago, use admin data for the first time. Other variables, such as Māori descent and total personal income, have incorporated new sources, or improved methods, to increase the quality of data provided. For the new sex at birth and gender variables, we use both admin data and statistical imputation to provide the highest quality outputs.

However, despite extensive research, some variables still lack viable alternative data sources. These variables will have higher amounts of missing data and will be affected more by differential response rates across subpopulations.

ISBN 978-1-99-104995-7


Census communications

/Stats NZ Public Release. View in full here.