Methodology

Freedom of information is vital to democracy – but a poorly understood mechanism. Each public institution interprets access laws in its own way, and governments are wildly inconsistent when it comes to complying with their obligations. There is also no central resource for searching through previously filed requests and no single place that teaches people how to best navigate the system.

The Secret Canada project aims to change that.

Since the fall of 2021, we have filed more than 400 freedom-of-information (FOI) requests and spoken with more than 200 experts, all with the purpose of compiling the best resource Canada has ever seen.

The result is a database of more than 300,000 FOI request summaries spanning more than 600 government and public institutions across the country. Our team spent more than a year collecting and cleaning the data for this project; during that time, we developed a process for how we worked with the information.

If you spot an error or omission in this methodology, let us know.

Table of contents:

Deciding which institutions to track
Collecting the data
Inconsistent data and files
Cleaning and standardization
The audit
Reviewing the data
Exclusions and pending data

Deciding which institutions to track

From the beginning, we knew we wanted to collect data from hundreds of public institutions in order to study and compare how they handled FOI requests. The idea was to request the databases FOI offices use to track requests, which we hoped would be roughly similar from one body and jurisdiction to the next. We requested data from every federal, provincial and territorial ministry and department (at the time, 253 in all) – effectively all major core government institutions in Canada. We called this first group of requests an “audit,” since they asked for a significant amount of information.

To supplement that, we filed a second group of requests expressly designed to expand our completed FOI summary database’s coverage. This group of requests was not part of our audit. In all, we filed almost 200 additional requests that captured FOI data from other essential public bodies such as municipalities, police forces, transit agencies, hospitals and educational organizations, including school boards and universities.

For this second group, we set rules that helped us determine which institutions would receive our requests. For example, we filed requests to every municipality with a population of 100,000 or more – 57 in all. Similarly, we filed requests to all police forces that oversee populations of at least 100,000. We also filed requests to the largest school boards and hospitals in the country. For school boards, we asked each province and territory for a list of their largest boards by enrolment and filed a request to the first one on each list (or, if several boards were similarly sized, multiple requests). For hospitals, we asked the Canadian Institute for Health Information for a list of hospitals by bed count; we then filed a request to the largest in each province and territory.

We did not file requests with some public bodies directly because their information is held elsewhere. For example, FOI requests to the Ontario Provincial Police are handled by the Ministry of the Solicitor General, and the Toronto Transit Commission’s requests are contained in data from the City of Toronto.

In total, our database now tracks more than 600 public institutions.

Collecting the data

The audit phase of our project began in early May, 2022, when we filed 253 separate FOI requests to every federal, provincial and territorial ministry and department in the country. The requests were identical and read:

We are conducting a study of how freedom of information requests are processed across the country. To that end, we are requesting information relating to how these requests are dealt with by your office. Please provide us with an electronic (that is, machine-readable) record of all requests received between January 1, 2021 and end of day December 31, 2021, whether completed or not. Please include fields containing the following information: request number, request summary, type of requester (academic, media, etc.), whether the request is sensitive or contentious, date received, date request completed, disclosure decision, reason for time extension, extension duration in days, exemptions applied, whether it was subject to an appeal, and number of pages disclosed. We recognize your system may not capture all of these fields. If that’s the case, please give us the fields you do have. Please provide in an electronic format such as Excel or delimited text (e.g. CSV). Do not provide PDF or image files.

(For Quebec, we filed these in French.)

Once we’d received responses for most of our audit requests, we began the second phase of the project. Beginning in August, 2022, we filed FOIs to municipalities, police, hospitals and so on. In total, we filed 192 of these supplementary requests. These requests were simpler, since we only needed basic FOI summary information. They read as follows:

The vast majority of large public entities use a records/data management system to track and manage FOI requests. I am seeking access to several fields included in that system: 1) the request text (if this is not available, a summary of the request), 2) the date it was received, 3) the date it was completed, 4) whether access was granted in full or in part, and 5) how many pages were released. I am seeking this information for records that were completed between January 1, 2021 and June 30, 2022. I am only seeking access to general records requests, not personal record requests.
If your system is missing any of the fields, for any part of this time period, I am comfortable proceeding without that information, as long as the request text/summary and date of release is included. You do not need to check in with me about this.
If your system exports spreadsheet files, please provide the records in a machine-readable table format (Excel or CSV, for instance).

Many of the requests for both our first and second rounds were filed through the mail, though we also filed via online portals or e-mail if those options were available. We tracked our FOIs using a Google Sheets-based tracking spreadsheet we developed for the project.

Our first round of requests used a 12-month time frame, from January, 2021, to the end of December, 2021. For our second round, we increased the time span of our requests to cover an 18-month period, from January, 2021, to the end of June, 2022.

Going forward, the time periods for our requests will be the same, and we’ll file targeted requests to backfill the missing six months for institutions that gave us 12 months of request summaries.

We spent months negotiating for access with FOI offices across the country. In cases where they didn’t have the data we’d requested, we asked them to provide what they thought was the closest match. For our first round of requests, we asked for data on both personal information FOIs and general FOIs, since we needed that data to inform our analysis and audit. During the second round, when all we wanted was summaries of FOIs that could be re-requested by others, we excluded personal information requests.

Some jurisdictions and public bodies made virtually all the data we needed available online. The federal government, for instance, posts a spreadsheet listing all completed requests online. We eventually decided to use that data instead of what we’d obtained through our requests, since it stretched back to 2012 and covered a larger number of institutions than the ones we’d filed FOIs to. This means some of the data we received in our first round of requests was superseded by what we downloaded and explains why the time range for some institutions is much broader than what we’d originally asked for.

In a small number of cases, we determined that the FOI summary data we downloaded was incomplete, so we supplemented it by combining it with what we received via FOI requests. This was the case for the City of Ottawa, the City of St. John’s, Canada Post and the RCMP.

Inconsistent data and files

Given the 445 FOI summary data requests we filed for this project, we were left with hundreds of files of extremely variable quality. The text in many files was near-unreadable, for instance, and the data structures (the column names, column types, etc.) used by various institutions were highly inconsistent. These issues meant we could not easily combine the various datasets we received into a single, comprehensive database.

We ran into other issues, too: some files were PDFs of spreadsheets, despite our insistence that public institutions provide machine-readable formats. Others were programmatically locked, meaning their contents couldn’t be copied and pasted or analyzed. We spent months going back and forth with FOI offices, sometimes pressing them for files that were closer to what we’d originally requested or seeking clarification on what information certain columns were designed to capture.

Some of the most laborious work on this project involved manually reviewing each FOI response and deciding how we’d approach its data. We looked for irregularities and tested how difficult it would be to adapt each file into a database- and analysis-friendly structure. That process helped us decide how and whether the information could be usable. In the end, we used a downloaded file, an FOI response, combined them or abandoned the institution entirely due to quality issues.

When we received PDFs of spreadsheets instead of machine-readable files, this meant the FOI office had effectively destroyed the underlying raw data. In those cases, we ran the files through optical character recognition (better known as “OCR”) software to convert the text and numbers back into data, then attempted to transcribe the files. In some cases, this conversion process worked; in others, we were forced to manually transcribe requests.

A few places, such as the City of Charlottetown and the Delta Police Department, did not have FOI tracking databases or systems at all. In those cases, we asked for copies of the acknowledgment letters they sent to requesters, which note the request text and request date, along with final response letters for those requests, which would give us a closed date. We then manually transcribed the information in those letters to build up datasets for those public bodies.

For some examples of the types of responses we received from public institutions, read our blog post.

Cleaning and standardization

Once we had our data in various spreadsheets, we had to clean it.

We received dates in various formats (YYYY-MM-DD, MM-DD-YYYY, DD-MM-YYYY, etc.). Some files couldn’t be ingested into our database because of invisible text characters or because column names didn’t hew to what our database expected. We manually corrected these errors over the course of several months.

We used spreadsheet applications like Microsoft Excel and Google Sheets to work with these files, since they had to be accessible to the entire Secret Canada team. This presented its own challenges, however, since these programs (particularly Microsoft Excel) are notorious for converting text and numbers into dates. Our database import code warned us of these errors, which we then manually fixed.

We made an effort to modify request texts as little as possible. Some had spelling mistakes, while others were entirely capitalized or lower cased. In most cases, we left them as they were. The only exception to this was when we manually transcribed PDFs. In those situations, we added the word “redacted” to indicate blacked-out passages for clarity and sometimes corrected spelling.

Our use of OCR technology for some files means other request texts may read oddly, given that a number of words were blacked out. Because of this, some requests may say things like “[REDACTED]” or “**REDACTED**”, while others may not note a redaction at all and simply skip over several words.

Many of the requests we received were of people requesting their own personal information. These types of requests are not releasable to anyone but the original requester. While we needed this data for our audit, personal requests are of no value to the public or our searchable database, so we removed them during the cleaning process.

We wrote special text standardization code to improve the readability of our requests. For instance, we replaced multiple hard returns with a single hard return, multiple spaces with a single space and trimmed spaces from the start and end of requests. We also removed invisible text characters in the data and replaced various types of bullet point characters with a standard bullet point. We removed summaries that had no informational value – they may have consisted entirely of dashes, punctuation, etc. – and filtered out requests with blank summaries.

The audit

Part of this project involved an audit of how every federal, provincial and territorial ministry and department in the country handles FOI requests. We did this by requesting a copy of each institution’s FOI tracking database.

We standardized the files in two ways: first, by manually modifying them in spreadsheet applications so that they shared a common data structure. From there, we used a programming language called Python to further “normalize” the data.

During our analysis, we learned that Ontario’s and Quebec’s environment ministries were skewing some of our results, particularly our analysis of dispositions (whether files were released in full, in part, withheld, no records existed, etc.) and timelines (that is, how long requests took to be completed).

These ministries received thousands of FOI requests for documents regarding specific properties and often told requesters that no documents existed. Because these requests were so voluminous and closed so quickly, they skewed both the overall statistics and Ontario’s and Quebec’s figures.

An FOI coordinator for Quebec’s Ministère de l’Environnement et lutte contre les changements climatiques explained that these requests are often made pro forma. “We have a lot of requests regarding environmental studies of specific sites by engineer firms and due diligence from legal counsels or law firms,” she told us in an e-mail. “Often, these are conducted before starting a building site or buying a property.”

We ultimately ran two versions of our analysis: one with the two ministries and another excluding them.

We received different types of data from the various ministries and departments we audited. Some included information we could use to determine request timelines, while others did not. The same was true of dispositions. For this reason, we used different subsets of the audit data for our disposition and timeline analyses.

Reviewing the data

Institutions are barred from releasing certain types of personal information. However, as we were constructing our database, we discovered this kind of information in the texts of a few requests.

Once we had a draft version of our database, we vetted all information we received through FOI requests. (We didn’t vet the summaries for requests that we downloaded from public, online resources.)

A team of seven reporters and editors read through 74,585 requests. In one case, a request revealed the name of a child involved in a Children’s Aid Society proceeding. (That file was removed.) In the end, we identified and removed a total of eight requests in their entirety from the database. Before we publish any original source files we received via FOI for this project, we will manually redact the entire removed requests.

Exclusions and pending data

Fifty-two public bodies were excluded from our database because they did not respond to our FOIs, the files they sent us were unreadable or summary texts weren’t provided. We are still waiting for responses from several others. You can learn more about the status of a given public body’s data by visiting the page listing all the institutions we track.

This methodology was last updated on June 7, 2023.

We’d love to hear about you’re using Secret Canada. Send us a note or use the hashtag #SecretCanada on social media. This information helps us grow the project.