Testing With Sample Data

Introduction

The Ed-Fi Alliance development team primarily uses two sample data sets for testing, named Glendale and Northridge. These data sets are "realistic" but do not represent real students. Community users will typically use different test data, sourced from their own systems.

The way that data are stored in these various data sets can diverge significantly: although the Ed-Fi Data Standard provides prescriptions for how to store data, there are vendor-specific nuances and local business rules that can cause the same conceptual data to be stored in slightly different ways. For example, one system may track attendance by marking a student as present, where another system tracks by marking them as absent — with absence of an attendance record on a day when the student is enrolled implying that the student is present.

In addition to the unit testing process built into the Analytics Middle Tier code, we suggest that anyone developing new views install the Glendale database for the Data Standard version(s) on which they're developing. With that data set available, you can perform exploratory testing of the output from new views. While it is impossible to predict the query outcome before writing the query, exploratory testing can:

  • Confirm that all columns have a value (no nulls).
  • Investigate if a view returns zero records, looking carefully to ensure it is due to a lack of data in the source database rather than due to improper join conditions or where clauses.

In the latter case, you can add a few missing records manually to test the output. When submitting views to the Alliance for inclusion in an upcoming Analytics Middle Tier release, save any records added manually in a SQL script and attach the script to the Tracker ticket for the submission. This allows the Ed-Fi development team to repeat the same testing.

The Northridge database only exists for Data Standard 3+. The Ed-Fi development focuses most of its testing effort on the Glendale data set for comparisons between Data Standard 2.2 and Data Standard 3+. Northridge is made available for additional testing on a different sample data set.

Sample Data Sets

Many of these files are in bacpac format instead of SQL backup files. The format provides a much smaller file size, which comes at the expense of a slower restoration process. Instructions from Microsoft: Import a BACPAC File to Create a New User Database.

PostgreSQL files should be restored using psql.exe --file <file path>, not pg_restore.exe .

Grand Bend

Available for all data standards and ODS/API technology versions. This is the "populated template" that comes with the application by default. It contains about 2,000 students.

Glendale

This data set contains about 48,000 students, and was created by anonymizing real data from a Local Education Agency (LEA) in the early days of Ed-Fi.

Northridge

This data set contains about 21,000 students, and was created synthetically using the Ed-Fi Sample Data Generator.

Contents