Microdata bases
Due to the relevance of the ICT surveys and to meet the growing demand for microdata, the Regional Center for Studies on the Development of the Information Society (Cetic.br), a department of the Brazilian Network Information Center (NIC.br), outlines below the steps to provide users with access to the anonymized microdata files.
Microdata is the smallest fraction of data collected in a survey. Through numeric codes, it provides the individual responses to a questionnaire. By grouping microdata into certain units (such as geographic region or social class), new aggregated data is generated, enabling construction of data in relation to larger units.
The microdata bases can be accessed by specific software, such as R, SAS and SPSS, that allow users to manipulate the microdata to compose new aggregations and, consequently, new analyses.
To make them available, the microdata bases undergo an anonymization process to protect the identity of respondents.
The anonymization process of the microdata base aims to preserve the confidentiality of respondents and can be carried out through methods of restricted access to the data or by restricting the data made available.
Restricted access methods aim to define who and/or under what conditions access to the data is permitted. This includes, for example, authorizing only certain individuals to handle the data, allowing access to protected data via passwords/encryption, or sharing the data under a usage agreement/license.
Data restriction methods aim to limit the content of the data made available to protect it. This can be done through suppression of available data (e.g., data that could identify individuals) or by adding noise to the database (i.e., including fictitious occurrences).
These techniques are employed to enable both the protection of respondent confidentiality and access to research microdata bases. In the microdata bases of the ICT Surveys, restricted access or/and data restriction methods are used. In surveys involving establishments, they are used in combination.
The confidentiality protection methods applied in the construction of microdata bases were guided by the following principles:
- 1. De-identification: removal of all identifying information to ensure de-identification prior to the publication of microdata bases. These are data elements that could directly identify responding units, such as names, identification numbers, and contact information.
- 2. Generalization and suppression: generalization or omission of information when necessary to prevent the data from being linked to specific responding units. Recoding is used as a generalization method to reduce the number of categories for some variables. Additionally, suppression is performed after recoding to omit values of certain variables for some responding units. These methods aim to reduce the number of unique observations and, consequently, lower the risk of identification, which is essential for confidentiality protection.
- 3. Utility of use: assessment of the resulting microdata bases after confidentiality protection processing, considering the tabular plan disclosed for the survey. The statistics produced from these bases do not show significant differences when compared to the tabulation of the original survey data.
As part of the process to disseminate the results of the ICT surveys, Cetic.br|NIC.br makes the microdata bases of the surveys and their documentation available; they can be downloaded or obtained through an Access and Use Agreement. There are three types of microdata sets made available for download: 1) original de-identified databases (e.g., ICT Households and ICT Kids Online Brazil); 2) databases processed for statistical confidentiality control (e.g., ICT Enterprises); and 3) original databases available through an Access and Use Agreement (as detailed below).
Original de-identified microdata bases available for download
De-identified microdata bases from the ICT Households and ICT Kids Online Brazil surveys, starting from 2015, are available for download on the Cetic.br|NIC.br website
De-identification is carried out to preserve respondent confidentiality. The process consists of removing information that would allow for the direct identification of respondents.
The following documents are made available:
- Microdata base;
- Data collection instrument, containing the questionnaire applied to respondents;
- Methodological report, detailing the survey methodology;
- Data collection report, detailing the data collection process for each survey edition;
- Data dictionary, identifying the variables included in the microdata base.
All the microdata bases are anonymized to protect the identity of the respondents. The database dictionaries – variable labels and value labels associated with the answer options – are available exclusively in Portuguese.
De-identified microdata bases with processing for statistical confidentiality control available for download
Microdata bases from the ICT Enterprises survey, processed for statistical confidentiality control, are available for download on the Cetic.br|NIC.br website, starting from 2015. The original microdata bases from this survey (i.e., without confidentiality control processing) can be accessed through an Access and Use Agreement (as detailed below).
The following documents are made available:
- Microdata base processed for statistical confidentiality control;
- Data collection instrument, containing the questionnaire applied to respondents;
- Methodological report, detailing the survey methodology;
- Data collection report, detailing the data collection process for each survey edition;
- Data dictionary, identifying the variables included in the microdata base.
De-identified microdata bases available through an Access and Use Agreement
These microdata bases are made available upon the signing of an Access and Use Agreement between the requesting institution and NIC.br.
The requesting institution must complete a form (Portuguese / English) with information about the project, specifying the microdata bases of interest (survey and year), listing all individuals from the institution who will be involved in handling and analyzing the data, and providing the objectives, justification, and methodology of the study to be conducted using the microdata base. The completed form must be submitted in PDF format via email to: acordos.cetic@nic.br.
Following the review and approval of the submitted information, the Access and Use Agreement will be prepared and signed by NIC.br and a representative of the institution. To finalize the agreement, a document must be submitted confirming that the institution’s legal representative has the authority to sign it.
It is important to note that the Agreement is signed only once between the Parties, and remains valid until its expiration. Once signed, future requests for microdata bases can be submitted to Cetic.br|NIC.br at any time via email
The following microdata bases are available through the Access and Use Agreement:
In accordance with Data on the Web Best Practices (Data on the Web Best Practices), of the Worldwide Web Consortium (W3C), the documents and microdata bases of the ICT household surveys are made available on the website under the Attribution 4.0 International (CC BY 4.0) license. This means that you can share and adapt the material but, whenever you do so or use any of the materials provided, you must give NIC.br/Cetic.br appropriate credit. See the full license here.
If you have any questions or suggestions regarding access to the microdata bases, the documents provided or any other issue related to the databases, please contact us.
