Introduction

Analytics Middle Tier 2.0 is becoming a full citizen in the Ed-Fi platform in 2020, rather than just a proof-of-concept on the Ed-Fi Exchange. As it grows up, it needs to correct some architectural concerns that came up as feedback from the field. It also needs to be up to par with the latest release of the ODS/API, version 3.3. This document aims to inform about the challenges and elicit feedback on the real-world usefulness of the proposed solutions.

Naming Convention

Requirement

Hold names to under 63 characters for PostgreSQL compatibility.

Design

Currently, none of the objects have a name that violates this constraint. To help avoid problems with future views, it is proposed to either truncate "Dimension" to "Dim" or drop the word altogether. Generally, clarity should be preferred over length when naming objects, hence the question is: how much clarity is lost if we move away from the "Dimension" suffix?

Old NameA - TruncateB - Drop
​analytics.StudentDimensionanalytics.StudentDim​analytics.Student​

Is there risk of confusing the analytics.Student  view with the real edfi.Student  table when "Dimension" is entirely dropped? The typical use case for Analytics Middle Tier is to only import the views into a business intelligence / reporting data model - thus the end-user would not see the edfi.Student  table

Status

Committing to "Dim" suffix on dimensions for best balance between name length and clarity of intent.  Stephen Fuqua

BIA-289 - Getting issue details... STATUS

Multi Data Standard Support

Requirement

Support installing the views on ODS databases supporting multiple data standards (2.2, 3.1, 3.2).

Design

Version 1.3.0 added support for Data Standard 3.1, which was used by ODS/API 3.1.1 and 3.2, through the use of the –dataStandard <Ds2 | Ds31>  argument. 

Forcing the user to remember which data standard is installed is sub-optimal. We should be able to detect this implicitly and install the correct version without user input.

  1. If table AddressType exists, then install Data Standard 2. (warning) if needed
  2. Else if table VersionLevel exists, then install Data Standard 3.1.
  3. Else if table DeployJournal exists, then install Data Standard 3.2.
  4. Else throw an error: "Unable to determine the ODS database version".


Status

Will proceed with this design.  Stephen Fuqua

Key Summary Status
Loading...
Refresh

Data Standard 2 support plan:

  • New views submitted by the community on Data Standard 3+ will not be translated to Data Standard 2 by the Alliance. Pull requests from community members adding the view(s) to Data Standard 2 will be welcomed.
  • This will be documented in the official notes.
  • Data Standard 2 support will signal deprecation - that is, we reserve the right to remove Data Standard from a future Analytics Middle Tier 3.0 release.

Student, Parent, and Staff Keys

Requirement

The views should expose "Key" fields based on the natural key of the underlying table. 

Design

In the case of StudentDimension , ContactPersonDimension , and UserDimension , the original release used StudentUSI , ParentUSI , and StaffUSI  respectively. The "USI" columns are primary keys and were used by mistake. The "UniqueId" columns are the correct natural keys.

Change all instances of StudentKey , ContactPersonKey , and UserKey  to use the corresponding "UniqueId" column from the source table.

Status

Will proceed with this design.  Stephen Fuqua

Key Summary Status
Loading...
Refresh

Descriptor and Type Mapping

Requirement

Decouple the views from hard-coded Descriptor and Type values. 

Context

many of the views need to lookup records by Descriptor value - for instance, looking up the Attendance records where a student has an "Excused Absence" or "Unexcused Absence." Because the original developer had access to only a limited dataset, it was not realized that the Descriptor values will vary widely from one implementation to the next. Thus the hard-coding needs to be decoupled, allowing the implementation to provide a mapping from their Descriptor value to the concept used by the Analytics Middle Tier.

In theory, the various "Types" values in Data Standard 2 should provide a more universal constant than the Descriptors. However, some community members report that these too are mutable. Therefore, (a) using Types is not a solution for Data Standard 2, and (b) even those views with hard-coding to Types instead of Descriptors must be modified for greater independence. Note: Type tables were removed in Data Standard 3 precisely because they were not being used in the originally-designed manner. 

Design

Summary

  1. Move hard-coded values to a "Constants" table.
  2. Create mapping tables that link Descriptors or Types to Constants.
  3. Modify all views as needed to join to the Constants and new mapping tables.

Descriptor Map 2

ConstantNamePurpose
AddressType.Home Looking up ContactPerson's Home address
AddressType.Mailing Looking up ContactPerson's Mailing address
AddressType.Physical Looking up ContactPerson's Physical address
AddressType.TemporaryLooking up ContactPerson's Temporary address
AddressType.Work Looking up ContactPerson's Work address
Descriptor.Absent

Looking up StudentAbsenceEvents that should be treated as "Absent" in an Early Warning System. Example descriptor values to map might be "Excused Absence" and "Unexcused Absence."

As another example, if a Field Trip absence event should be treated as an absence from school for the purpose of Early Warning, then one would also map the descriptor for "Field Trip" to the constant "Descriptor.Absent".

Descriptor.TardyLooks up StudentAbsenceEvents that should be treated as "Tardy".
Descriptor.InstructionalDay

Determines if a calendar date is an instructional day that should be used in calculating attendance rates.

The Ed-Fi default template mapping would use both the "Instructional Day" and "Make Up Day" descriptors.

  
EmailType.Home/Personal Looking up ContactPerson's home or personal e-mail address.
EmailType.WorkLooking up ContactPerson's work e-mail address.  
FoodServicesDescriptor.FullPriceDetermines if a student is eligible for school food service.
GradeType.Grading PeriodLooking up the Grade records by the most granular period, which by default is "Grading Period". Some implementations might instead use terms like "Quarter" or "Six Weeks".
TelephoneNumberType.HomeLooking up ContactPerson's Home phone number.
TelephoneNumberType.MobileLooking up ContactPerson's Mobile phone number.
TelephoneNumberType.WorkLooking up ContactPerson's Work phone number.
Group.TeacherSupports creation of row-level authorization data.
Group.PrincipalSupports creation of row-level authorization data.
Group.SuperintendentSupports creation of row-level authorization data.

Example

In Version 1.x, the StudentEarlyWarningFact view reports on excused and unexcused absences, looking for StudentSchoolAttendanceEvent  records with attendance descriptor values of either "Excused Absence" or "Unexcused Absence".

In version 2, the view would now search for all StudentSchoolAttendanceEvent records whose descriptor maps to the relevant constant. Thus there would be two DescriptorMap  values, one each for "Excused Absence" and "Unexcused Absence." Any school who uses a different term than these two would create a DescriptorMap  record mapping that term to the DescriptorConstant  value of "Absent".

Implications

Those who install the Analytics Middle Tier will need to carefully assess their Descriptors and Types, and then manage the DescriptorMap table (and TypeMap , for Data Standard 2) accordingly. 

Default Mappings

A new command-line Option will be provided to run a script that loads the default Descriptor mapping for the default Ed-Fi descriptors (minimal/populated template descriptors).

.\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options DefaultMap


Status

Will proceed with this design.  Stephen Fuqua

Key Summary Status
Loading...
Refresh

Changes to the Student Dimension

Requirements

  1. Create a "Student" dimension with a single unique key.
  2. Provide intuitive access to student demographics.

Context

Student Dimension Uniqueness

The Early Warning System fact views both assumed that the StudentDimension  would only have a single record for a student. However, a student could be enrolled in multiple schools at the same time, resulting in two records in the StudentDimension for the same StudentKey. This is problematic for the PowerBI Starter Kit, which has a hard requirement for unique StudentKeys.

Demographics in Ed-Fi UDM v2.2

Sources for student demographics:

  • edfi.Student  contains sex, Hispanic/Latino ethnicity, economic disadvantaged (Bool), school foodservice eligibility, limited English proficiency.
  • One-to-many tables:
    • edfi.StudentCohortYear 
    • edfi.StudentDisability
    • edfi.StudentLanguage
    • edfi.StudentLanguageUse
    • edfi.StudentProgramAssociation
    • edfi.StudentCharacteristic is a generic table, and contains begin/end date
    • edfi.StudentRace 

Demographics in Ed-Fi UDM v3.x

Sources for student demographics:

  • edfi.StudentSchoolAssociation  contains School Year, Enrollment Date and Grade Level
  • edfi.StudentEducationOrganizationAssociation  contains Sex, Hispanic/Latino ethnicity, and Limited English Proficiency
  • There are a series of many-to-many tables to store specific types of multi-value demographic characteristics - (warning) note these can be saved for either the school or the district (or charter, state, ESC, etc.)
    • edfi.StudentEducationOrganizationAssociationCohortYear 
    • edfi.StudentEducationOrganizationAssociationDisability 
    • edfi.StudentEducationOrganizationAssociationLanguage 
    • edfi.StudentEducationOrganizationAssociationLanguageUse 
    • edfi.StudentEducationOrganizationAssociationRace 
    • edfi.StudentEducationOrganizationAssociationTribalAffiliation 
  • And there is the generic edfi.StudentEducationOrganizationAssociationStudentCharacteristic table, which has a time Period associated with it.
    • Includes food service eligibility, which was present on Student  as a Boolean in version 1.

Dimension or Fact?

While gender, race, and ethnicity all have strings associated with them, some elements of the demographic and enrollment data are more fact-oriented than dimension-oriented:

  • IsHispanic
  • IsEconomicallyDisadvantaged
  • LimitedEnglishProficiency
  • SchoolEnrollmentDate

The ODS does not support slowly-changing dimensions, so there is only ever one current snapshot of these data - one cannot tie them to a date unless referring to the enrollment date... except in the case of food service eligibility because there is a time period.

Foodservice Eligibility

Foodservice eligibility is tracked via a Program Association, which not a demographic in the Ed-Fi Unified Data Model. Therefore it should be removed from demographics and placed in the program association views. For supporting the Power Bi Starter Kit, a new Early Warning System view might be needed that tries to preserve the old StudentDimension in some ways, including flattening the foodservice eligibility into a single Boolean value.

Design Proposal

Summary:

  1. Eliminate the idea of a separate "Student Dimension" in the core data collection.
  2. Create two new, very similar, dimensions to replace the old StudentDimension:
    1. StudentSchoolDim 
    2. StudentLocalEducationAgencyDim 
  3. Combine the data from various edfi.StudentEducationOrganizationAssociationXYZ  tables into a single view, DemographicDim.
  4. Create two bridge tables to link student information to the characteristics
    1. StudentSchoolDemographicBridge 
    2. StudentLocalEducationAgencyDemographicBridge 


Student to Characteristic Relationships

Rationale for the Student Dimension replacement:

  1. Students don't (or shouldn't) exist in isolation from an organization - hence no need for a StudentDim.
  2. Across the Ed-Fi data model, there are two different student relationships:
    1. With the school (e.g. StudentSchoolAssociation, StudentDisciplineIncident, StudentGradebookEntry, etc.).
    2. And with the more generic Education Organization - which in the current context of the Analytics Middle Tier, generally means Local Education Agency.

      Exception: StudentAssessment is only connected directly to a Student! From analytics viewpoint, we will define the Analytics Middle Tier as assuming that analytics on assessment data will always be in the context of a School or Local Education Agency.

  3. In defining meaningful StudentSchool  and StudentLocalEducationAgency  entities, there will be some overlap of fields - but the data could be different. This is an inherently dangerous area of the Ed-Fi data model. If we were to combine the data into a single perspective, then we would be hiding the danger. The data analyst will need to read and understand why there are two "root entities" for data reporting, and then choose which one to use based on their implementation.
  4. The Analytics Middle Tier is intended for Local Education Agency use cases. Other use cases can be added in the future as needed to support other types of Education Organizations (e.g. a future view StudentStateEducationAgencyDim ).

Rationale for combining the various characteristic tables into a single point in time view:

  1. There are seven (in Data Standard 3+) similar characteristics tables that do not have time periods associated with them
    1. CohortYear
    2. Disability
    3. DisabilityDesignation
    4. Language
    5. LanguageUse
    6. Race
    7. TribalAffiliation
  2. And there is one with a time period: StudentCharacteristic
  3. Those without a time period can be combined into a single view for "demographics"
  4. Those  without a time period can also be included in that view, so long as the data analyst understands that the "Bridge" between the student and the demographics represents "data as of right now".
  5. The two with a time period can, in the future, be used to create new Fact views that link to the date range. See Program Views below.

DemographicDim

Ultimately these values come from the edfi.Descriptor  table, although not all descriptors will be here. String values will be used for keys instead of DescriptorId in order to allow combining data from multiple year-specific ODS databases into a single data mart - this would not be possible with the auto-incremented DescriptorId  since that value will differ between ODS database instances.

Structure

ColumnData TypeSourceDescription
​DemographicKeyString​

"{Source Table}" or 

"{Source Table}.{Descriptor.CodeValue}"

Primary key.​

Made up of the table source and the Descriptor value. To support hierarchies, there will also be a root Key with only the table source value.

DemographicParentKeyStringsame as above

Facilitates creation of roll-up / hierarchy in BI tools by relating each individual record to its "parent concept".

DemographicLabelString"{Descriptor}.{CodeValue}" for all Descriptors related to the relevant tables*.For parent entities, will be the same as the Key. For child entities, will be the actual demographic label.

Data Standard 2.2 Source Tables

Descriptors for the following tables:

  1. StudentCohortYear
  2. StudentDisability
  3. StudentLanguage
  4. StudentLanguageUse
  5. StudentRace
  6. StudentCharacteristic (where time dates encompass "now")

Student contains "IsEconomicDisadvantaged" in DS 2, whereas this is now one of the "StudentCharacteristics" in DS 3. In order to have parity between the two data standards, the DemographicDim  view therefor needs a hard-coded row that does not come from a table:

DemographicKeyParentKeyDemographicLable
​StudentCharacteristic#EconomicDisadvantagedStudentCharacteristic​Economic Disadvantaged​

Otherwise the sample records will be as with the Data Standard 3+ samples below.

Data Standard 3+ Source Tables

Descriptors for the following tables:

  1. StudentEducationAgencyCohortYear
  2. StudentEducationAgencyDisability
  3. StudentEducationAgencyDisabilityDesignation
  4. StudentEducationAgencyLanguage
  5. StudentEducationAgencyLanguageUse
  6. StudentEducationAgencyRace
  7. StudentEducationAgencyTribalAffiliation
  8. StudentEducationAgencyStudentCharacteristic (where time StudentEducationAgencyStudentCharacteristicPeriod encompasses "now")

Sample Records

DemographicKeyParentKeyDemographicLabel
StudentCharacteristicStudentCharacteristicStudentCharacteristic
StudentCharacteristic#Economic DisadvantagedStudentCharacteristicEconomic Disadvantaged
StudentCharacteristic#HomelessStudentCharacteristicHomeless
StudentCharacteristic#RunawayStudentCharacteristicRunaway
RaceRaceRace
Race#American Indian - Alaska NativeRaceAmerican Indian - Alaska Native
Race#AsianRaceAsian
Race#Black - African AmericanRaceBlack - African American
LanguageNoneLanguage
Language#AdygheLanguageAdyghe
Language#Swiss GermanLanguageSwiss German
etc.



StudentSchoolDim

Data Standard 2.2

Student to Characteristic Relationships (DS 2)

ColumnData TypeSourceDescription
StudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Primary key
​StudentKeyStringedfi.Student.UniqueId​
SchoolKeyStringedfi.StudentSchoolAssociation.SchoolId
SchoolYearStringedfi.StudentSchoolAssocation.SchoolYearconvert to string to signal to modeling tools that this is not an aggregatable number
StudentFirstNameStringedfi.Student.FirstName
StudentMiddleNameStringedfi.Student.MiddleName
StudentLastNameString

edfi.Student.LastSurname


EnrollmentDateKeyStringedfi.StudentSchoolAssociation.EntryDateformatted as YYYY-MM-DD
GradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorId
LimitedEnglishProficiency

String

edfi.Descriptor.CodeValue via edfi.Student.LimitedEnglishProficiencyDescriptorId

Replace null with "Not Applicable"
IsHispanic

Boolean

edfi.Student.HispanicLatinoEthnicity

Replace null with 0
SexStringedfi.SexType.CodeValue via edfi.Student.SexTypeId
LastModifiedDateDateTime

Most recent date from any source that has a LastModifiedDate column


Data Standard 3+

StudentSchoolDim ERD

ColumnData TypeSourceDescription
StudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Primary key
​StudentKeyStringedfi.Student.UniqueId​
SchoolKeyStringedfi.StudentSchoolAssociation.SchoolId
SchoolYearStringedfi.StudentSchoolAssocation.SchoolYearconvert to string to signal to modeling tools that this is not an aggregatable number
StudentFirstNameStringedfi.Student.FirstName
StudentMiddleNameStringedfi.Student.MiddleName
StudentLastNameString

edfi.Student.LastSurname


EnrollmentDateKeyStringedfi.StudentSchoolAssociation.EntryDateformatted as YYYY-MM-DD
GradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorId
LimitedEnglishProficiency

String

edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation.LimitedEnglishProficiencyDescriptorId

Replace null with "Not Applicable"
IsHispanic

Boolean

edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

Replace null with 0
SexStringedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation.SexDescriptorId
LastModifiedDateDateTime

Most recent date from any source that has a LastModifiedDate column



The (first) primary contact was included in the original Student Dimension to further flatten the model. However, this had a large performance cost. To improve performance, flattening the primary contact is now left as an exercise for downstream semantic models - for example in a SSAS Tabular Data Model. 

StudentLocalEducationAgencyDim

Data Standard 2.2

StudentLocalEducationAgencyDim for DS 2.2

ColumnData TypeSourceDescription
StudentLocalEducationAgencyKeyString"{Student.StudentUniqueId}-{School.LocalEducationAgencyId}"Primary key
​StudentKeyStringedfi.Student.UniqueId​
LocalEducationAgencyKeyStringedfi.School.LocalEducationAgencyId
SchoolYearStringedfi.StudentSchoolAssocation.SchoolYearconvert to string to signal to modeling tools that this is not an aggregatable number
StudentFirstNameString

edfi.Student.FirstName


StudentMiddleNameStringedfi.Student.MiddleName
StudentLastNameStringedfi.Student.LastSurname
GradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorId
LimitedEnglishProficiency

String

edfi.Descriptor.CodeValue via edfi.Student.LimitedEnglishProficiencyDescriptorId

Replace null with "Not Applicable"
IsHispanic

Boolean

edfi.Student.HispanicLatinoEthnicity

Replace null with 0
SexStringedfi.SexType.CodeValue via edfi.Student.SexTypeId
LastModifiedDateDateTime

Most recent date from any source that has a LastModifiedDate column


Data Standard 3+

StudentLocalEducationAgencyDim

ColumnData TypeSourceDescription
StudentLocalEducationAgencyKeyString"{Student.StudentUniqueId}-{School.LocalEducationAgencyId}"Primary key
​StudentKeyStringedfi.Student.UniqueId​
LocalEducationAgencyKeyStringedfi.School.LocalEducationAgencyId
SchoolYearStringedfi.StudentSchoolAssocation.SchoolYearconvert to string to signal to modeling tools that this is not an aggregatable number
StudentFirstNameString

edfi.Student.FirstName


StudentMiddleNameStringedfi.Student.MiddleName
StudentLastNameStringedfi.Student.LastSurname
GradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorId
LimitedEnglishProficiency

String

edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation.LimitedEnglishProficiencyDescriptorId

Replace null with "Not Applicable"
IsHispanic

Boolean

edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

Replace null with 0
SexStringedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation.SexDescriptorId
LastModifiedDateDateTime

Most recent date from any source that has a LastModifiedDate column


StudentSchoolDemographicsBridge

Data Standard 2.2

ColumnData TypeSourceDescription
​StudentSchoolDemographicBridgeKeyString​"{DemographicKey}-{StudentSchoolKey}"​Primary key​
StudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Foreign key
DemographicKeyStringDemographicDim.DemographicKeyForeign key

Must be composed of a series of union queries that combine records from these tables:

  1. StudentCohortYear
  2. StudentDisability
  3. StudentLanguage
  4. StudentLanguageUse
  5. StudentRace
  6. StudentCharacteristic (where time dates encompass "now")

As well as a record for "StudentCharacteristic#Economic Disadvantaged" if Student.IsEconomicDisadvantaged is true.

Data Standard 3+

ColumnData TypeSourceDescription
​StudentSchoolDemographicBridgeKeyString​"{DemographicKey}-{StudentSchoolKey}"​Primary key​
StudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Foreign key
DemographicKeyStringDemographicDim.DemographicKeyForeign key

Must be composed of a series of union queries that combine records from these tables:

  1. StudentEducationOrganizationCohortYear
  2. StudentEducationOrganizationDisability
  3. StudentEducationOrganizationDisabilityDesignation
  4. StudentEducationOrganizationLanguage
  5. StudentEducationOrganizationLanguageUse
  6. StudentEducationOrganizationRace
  7. StudentEducationOrganizationTribalAffiliation
  8. StudentEducationOrganizationStudentCharacteristic (where StudentEducationOrganizationStudentCharacteristicPeriod dates encompass "now")

The joins need to be from Student → StudentSchoolAssociation → these tables, with StudentSchoolAssociation.SchoolId serving as the EducationOrganizationId in the joins.

StudentLocalEducationAgencyDemographicsBridge

Data Standard 2.2

ColumnData TypeSourceDescription
​StudentLocalAgencyDemographicBridgeKeyString​"{DemographicKey}-{StudentLocalEducationAgencyKey}"​Primary key​
StudentLocalEducationAgencyKeyString"{Student.StudentUniqueId}-{School.LocalEduationAgencyId}"Foreign key
DemographicKeyStringDemographicDim.DemographicKeyForeign key

Must be composed of a series of union queries that combine records from these tables:

  1. StudentCohortYear
  2. StudentDisability
  3. StudentLanguage
  4. StudentLanguageUse
  5. StudentRace
  6. StudentCharacteristic (where time dates encompass "now")

As well as a record for "StudentCharacteristic#Economic Disadvantaged" if Student.IsEconomicDisadvantaged is true.

Data Standard 3+

ColumnData TypeSourceDescription
​StudentLocalAgencyDemographicBridgeKeyString​"{DemographicKey}-{StudentLocalEducationAgencyKey}"​Primary key​
StudentLocalEducationAgencyKeyString"{Student.StudentUniqueId}-{LocalEduationAgency.EducationOrganizationId}"Foreign key
DemographicKeyStringDemographicDim.DemographicKeyForeign key

Must be composed of a series of union queries that combine records from these tables:

  1. StudentEducationOrganizationCohortYear
  2. StudentEducationOrganizationDisability
  3. StudentEducationOrganizationDisabilityDesignation
  4. StudentEducationOrganizationLanguage
  5. StudentEducationOrganizationLanguageUse
  6. StudentEducationOrganizationRace
  7. StudentEducationOrganizationTribalAffiliation
  8. StudentEducationOrganizationStudentCharacteristic (where StudentEducationOrganizationStudentCharacteristicPeriod dates encompass "now")

The joins need be from Student → StudentSchoolAssociation → School, with the School.LocalEducationAgencyId serving as the EducationOrganizationId in the other joins.

Alternatives

The following alternatives were considered and rejected

Split StudentDim into StudentDim and StudentEnrollmentDim

The original StudentDimension  would be split in two: StudentDim  with no SchoolKey  in it and a StudentEnrollmentDim  (or StudentDemographicDim ) holding the Student-to-school relationship and demographics. Rejected for these reasons:


  1. Trying to keep the number of views as small as possible, so that the domain model is easier to understand compared to the source Ed-Fi data model.
  2. Generally need to query for that student-school relationship - not for a student in isolation.
  3. Keeping only a single "enrollment" or "demographic" dimension for the student requires implementing business logic to determine which demographics take precedence - if demographics are saved for both school and local education agency, then when one should be used? Whichever choice is made, it will likely be wrong for many implementations.

Change StudentDim to StudentSchoolDim

In this version, the old StudentDimension  is essentially renamed to StudentSchoolDim - largely preserving the old structure. Compared to the proposed model, this version allows the data analyst to quickly and easily find the right student information. It also relieves the analyst from having to decide which version of truth to use - the School or the Local Education Agency. As mentioned above, it has been decided that the Analytics Middle Tier should not gloss over this difficulty: the data analyst must inspect their implementation and decide which perspective (School or Local Education Agency) is appropriate in each circumstance.

Create Separate Bridge Tables for Each Demographic

Instead of combining the demographics into a single bridge view, we could have created one for each concept: Disability, Race, Tribal Affiliation, etc. On one level, this would have simplified the data analyst's work when looking for a particular demographic field: they can just look for the view with the word "Race" in the name, for example. However, this comes at the expense of proliferating more tables, making the Analytics Middle Tier look too much like the Ed-Fi data standard.

Multiple Bridges

Status

Planning to adopt this dual-root (Student-School and Student-LocalEducationAgency) approach.   Stephen Fuqua

Key Summary Status
Loading...
Refresh

Program Views

Requirement

Support analytics on Program Participation at the school level.

Context

A small set of Program-related views was added to Analytics Middle Tier as an experiment in supporting a second use-case: analyzing student program participation. Programs in the default Ed-Fi ODS template include "Bilingual", "Career and Technical Education", "Special Education", and a few others. These data are represented in two different fact views: analytics.StudentProgramEvent  and analytics.StudentProgramFact 

The "Event" view represents the date on which a student entered or exited a program. The "Fact" view represents every day on which the student was in a program. Each perspective has its own utility in analytics / reporting.

Note, however, that they both join to analytics.LocalEducationAgencyDimension . There is no linkage to schools. This is because the data modeler originally heard (or thought he heard) that program enrollment is "always" at the district level. Since then, he has received feedback that many implementations do link students to programs at the school level, or even at the state level.

Design

a Remove the Views

Eliminate the problem by eliminating the views, unless and until we get a detailed real-world use case definition that would solve these problems.

b Add a SchoolKey to Both Views

This implies that SchoolKey  or LocalEducationAgencyKey  could be null, generally an undesirable situation in dimensional modeling. A few options:

  1. Ignore the problem: downstream data analyst have to join the program views to SchoolDimension  or LocalEducationAgencyDimension  with an outer join.
    1. (tick) Good for data architect.
    2. (warning) Dangerous for data analyst.

  2. Nulls can be eliminated - or at least nearly eliminated - for LocalEducationAgencyKey by loading a School's LocalEducationAgencyKey  value. 
    1. (warning) Moderate additional complexity for data architecture.
    2. (tick)(warning) Resolves one outer join problem but leaves the other in place.

  3. Create a "fake school" for each LEA in the SchoolDimension , with SchoolName = 'n/a' . Use this as the SchoolKey  when program participation is only at the LEA level.
    1. (warning) Ugly for the data architect, although not impossible.
    2. (tick)(warning) Resolves the other outer join problem, at the expense of having a strange "District" entry show up in School filters. Dubious value.

  4. Separate the views into copies for School and LocalEducationAgency.
    1. (error) Just forces the problem onto the data analyst.

The Data Standard  shows that a School can belong to 0 or 1 Local Education Agency. Side note: that Agency might be a Charter Management Organization. Thus option 2 can still lead to lost records when using an INNER JOIN. As with Option 3, null/missing records can be eliminated by creating a "n/a" LocalEducationAgency for these schools.

If these program views are to be kept, then a combination of options 2 and 3 seems like the only option that presents a useful interface to the data analyst.

Status

Going to defer for a real use case so that we don't mislead anyone. Taking the program views out of Analytics Middle Tier 2.0. Must remember to address FoodService when coming back to this in Analytics Middle Tier 2.1+.

BIA-293 - Getting issue details... STATUS

 Stephen Fuqua

School Year

Requirement

Add SchoolYear to help support longitudinal data / multi-year databases. Wherever possible, would be nice to support drill-down hierarchies by school year.

Design

The following dimension views could have a SchoolYear  column in them; Data Standard 2's support for School Year is limited compared to Data Standard 3. 

Data Standard 2Data Standard 3
​Student / Student EnrollmentStudent / Student Enrollment
Student SectionStudent Section

Date

Grading Period

The multi-year use-case was not originally one of the goals of the Analytics MIddle Tier, so no consideration was given to adding to the two views that could support it. It will be trivial to add to these two views in common above.

For Date and Grading Period, there is real value. To support in Data Standard 2, we would need to create a mapping table or extra column on each of those two tables. This takes into account that one record could below to multiple school years in some edge cases. The additional effort required may push solving for Date and Grading Period to a future release, e.g. Analytics Middle Tier 2.1.

Decided to support SchoolYear in the student-school relationship and in the student-section relationship in Analytics Middle Tier 2.0. SchoolYear column will exist for Data Standard 2 but will not be populated where not available.  Stephen Fuqua

Key Summary Status
Loading...
Refresh

Separation Between Core and Use-Case Views

Requirement

Manage a collection of "core" views and separate collections of use-case specific views.

Design

The application already has a concept for installing optional components, which was first created for optional install of additional indexes in the ODS. Proposal:

  1. Always install a core set of views
    1. ContactPersonDimension
    2. DateDimension
    3. GradingPeriodDimension
    4. LocalEducationAgencyDimension
    5. MostRecentGradingPeriod
    6. SchoolDimension
    7. SchoolNetworkAssociationDimension
    8. StudentDimension
    9. StudentEnrollmentDimension (if created, see above)
    10. StudentSectionDimension

  2. Move some of the existing views into new optional collections:
    1. Row-level Security (RLS)
      1. StudentDataAuthorization
      2. UserAuthorization
      3. UserDimension

    2. Early Warning System (EWS)
      1. StudentEarlyWarningFact
      2. StudentSectionGradeFact

    3. QuickSight-Early Warning System (QEWS)
      1. Ews_SchoolRiskTrend
      2. Ews_StudentAttendanceTrend
      3. Ews_StudentEnrolledSectionGrade
      4. Ews_StudentEnrolledSectionGradeTrend
      5. Ews_StudentIndicators
      6. Ews_StudentIndicatorsByGradingPeriod
      7. Ews_UserSchoolAuthorization

    4. Program Analysis (PROGRAM)
      1. ProgramTypeDimension
      2. StudentProgramEvent
      3. StudentProgramFact

        Thus to install the Early Warning System and Row-level security collections used by the Power BI Starter Kit v2, the admin user would run this command:

        .\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options EWS RLS
  3. Avoid name overlaps
    1. Option 1: separate by "namespace" (schema).  Instead of having a single analytics  schema, we could create an analytics_core  schema and other schemas to match use cases:

      v1 Namev2 Name
      analytics.​ContactPersonDimensionanalytics_core.​ContactPersonDimension
      analytics.DateDimensionanalytics_core.DateDimension
      analytics.Ews_SchoolRiskTrendanalytics_qews.SchoolRiskTrend
      analytics.Ews_StudentAttendanceTrendanalytics_qews.StudentAttendanceTrend
      analytics.Ews_StudentEnrolledSectionGradeanalytics_qews.StudentEnrolledSectionGrade
      analytics.Ews_StudentEnrolledSectionGradeTrendanalytics_qews.StudentEnrolledSectionGradeTrend
      analytics.Ews_StudentIndicatorsanalytics_qews.StudentIndicators
      analytics.Ews_StudentIndicatorsByGradingPeriodanalytics_qews.StudentIndicatorsByGradingPeriod
      analytics.Ews_UserSchoolAuthorizationanalytics_qews.UserSchoolAuthorization
      analytics.GradingPeriodDimensionanalytics_core.GradingPeriodDimension
      analytics.LocalEducationAgencyDimensionanalytics_core.LocalEducationAgencyDimension
      analytics.MostRecentGradingPeriodanalytics_core.MostRecentGradingPeriod
      analytics.ProgramTypeDimensionanalytics_program.ProgramTypeDimension
      analytics.SchoolDimensionanalytics_core.SchoolDimension
      analytics.SchoolNetworkAssociationDimensionanalytics_core.SchoolNetworkAssociationDimension
      analytics.StudentDataAuthorizationanalytics_rls.StudentDataAuthorization
      analytics.StudentDimensionanalytics_core.StudentDimension
      analytics.StudentEarlyWarningFactanalytics_ews.StudentEarlyWarningFact
      analytics.StudentProgramEventanalytics_program.StudentProgramEvent
      analytics.StudentProgramFactanalytics_program.StudentProgramFact
      analytics.StudentSectionDimensionanalytics_core.StudentSectionDimension
      analytics.StudentSectionGradeFactanalytics_ews.StudentSectionGradeFact
      analytics.UserAuthorizationanalytics_rls.UserAuthorization
      analytics.UserDimensionanalytics_rls.UserDimension
      analytics.UserStudentDataAuthorizationanalytics_rls.UserStudentDataAuthorization
    2. Option 2: keep everything in a single schema, ensuring unique names, so that downstream data models (without namespaces/schemas) do not need to name their models differently than the views. Put use case name as object name prefix.

      v1 Namev2 Name
      analytics.​ContactPersonDimensionanalytics.​ContactPersonDim
      analytics.DateDimensionanalytics.DateDim
      analytics.Ews_SchoolRiskTrendanalytics.qews_SchoolRiskTrend
      analytics.Ews_StudentAttendanceTrendanalytics.qews_StudentAttendanceTrend
      analytics.Ews_StudentEnrolledSectionGradeanalytics.qews_StudentEnrolledSectionGrade
      analytics.Ews_StudentEnrolledSectionGradeTrendanalytics.qews_StudentEnrolledSectionGradeTrend
      analytics.Ews_StudentIndicatorsanalytics.qews_StudentIndicators
      analytics.Ews_StudentIndicatorsByGradingPeriodanalytics.qews_StudentIndicatorsByGradingPeriod
      analytics.Ews_UserSchoolAuthorizationanalytics.qews_UserSchoolAuthorization
      analytics.GradingPeriodDimensionanalytics.GradingPeriodDim
      analytics.LocalEducationAgencyDimensionanalytics.LocalEducationAgencyDim
      analytics.MostRecentGradingPeriodanalytics.MostRecentGradingPeriod
      analytics.ProgramTypeDimensionanalytics.program_ProgramTypeDimension
      analytics.SchoolDimensionanalytics.SchoolDim
      analytics.SchoolNetworkAssociationDimensionanalytics.SchoolNetworkAssociationDim
      analytics.StudentDataAuthorizationanalytics.rls_StudentDataAuthorization
      analytics.StudentDimensionanalytics.StudentDim
      analytics.StudentEarlyWarningFactanalytics.ews_StudentEarlyWarningFact
      analytics.StudentProgramEventanalytics.program_StudentProgramEvent
      analytics.StudentProgramFactanalytics.program_StudentProgramFact
      analytics.StudentSectionDimensionanalytics.StudentSectionDim
      analytics.StudentSectionGradeFactanalytics.ews_StudentSectionGradeFact
      analytics.UserAuthorizationanalytics.rls_UserAuthorization
      analytics.UserDimensionanalytics.rls_UserDim
      analytics.UserStudentDataAuthorizationanalytics.rls_UserStudentDataAuthorization
    3. Option 3: keep everything in single schema and don't force prefixing for use cases. Just have clear and unique names for views. Prefix on case-by-case basis.

      Leaning toward option (b). Additional benefit: helps the reader know where to look up additional information about use-case specific views, such as important usage notes.

Going with option (b).  Stephen Fuqua

Key Summary Status
Loading...
Refresh

Additional Views

Will not add any new views in the 2.0 release. New views can be added with 2.1, 2.2 etc. This 2.0 release is all about fixing architectural problems and setting the stage for broader adoption.

Documentation

End-Users

Decisions made in defining the Ed-Fi data model are allowing a great deal of flexibility in storing data, at the expense of un-intuitive complexity. Users of the Analytics Middle Tier need to know about the complexities in order to use this tool effectively. For example, if adopting option (b) to solve the Program view problem, there needs to be clear guidance to help the end-user.

Users also need to be made aware of potential data quality issues, for example with the Student Demographics. If a student is enrolled in two schools at a time, and they don't both enter the same demographic information (e.g. one accidentally clicks on the wrong gender, or one does not mark student as Hispanic/Latino), then how will the data analyst know and reconcile this? The Ed-Fi Alliance cannot prescribe an answer: it depends on the implementation.

For the (rare?) case that the console deployment tool does not work, provide guidance on directly accessing the views from the source code repository. Warn that scripts, when manually executed, need to be run in numeric order of file name, starting with the Core collection first and then installing other collections as needed.

Other issues will likely arise, so that end-user documentation will be an ongoing exercise.

Contributors

Documentation for contributors to the project will need to spell out how to contribute; how to create use-cases; naming conventions; when and how to place a new view into the Core collection.

Version 2 versus Version 3 support.

  • No labels

10 Comments

  1. Received via Slack from Brian Mullins :

    1. Drop back to Dim.
    2. yes, StudentDimension should be about the student, not student enrollments. We need an additional StudentEnrollmentDim. As for Programs, we have desire to know the school of the program association.
    3. Yes, please add the SchoolYear columns to the views. Mostly agree with Stephens choices for which to include it in. I'd think DataDimension would be one to include it in, but it would be just for convenience.
    4. Absolutely separate the individual use-case views.
    1. In followup, I confirmed that Brian's installation does have StudentProgramAssociations where the EducationOrganizationId is for a School, rather than a LocalEducationAgency. This may necessitate a change to the Program views. Will analyze and update this document if needed.

  2. Received via email from Rafael Loustaunau:

    1. They are fine as they are
    2. Yes -Preferably -eventhough its crunched data you may have students coming and going multiple times during one school year 
    3. Yes. Crunching data by school year ok, so same view two lines, 1 for each student-school-year ok
    4. Yes if it means placing use-case-specific data items like at-risk indicators for Texas  vs core for all users nationwide.
  3. Received via Slack from Zackary Geers:

    Re #1. I'm not using it, so I don't care. I think your point of limited adoption is a weighty factor, Re #2, I think the Dims should be relatively flat. I think the StudentEnrollment use case is much higher then the Student use case, though you probably have a better view of the problem. All the reports I run are nearly always based on the StudentEnrollment, not the student itself.

    I don't use the analytics tools, but I'm talking about my general reporting use cases.

    Re naming, if you're trying to keep under 63 chars, I'd trim your prefix/schema name a bunch as well. I'd rather have an abbreviated schema name vs having to really trim table names. I'd keep the Dim on the end though, it makes it easier to discuss it. When I say "look at the student table", it's going to be more clear I'm talking about the edfi.Student table, vs when I'm talking about "look at studentdim".
    And in general, I'm pro adding school year, it's a good idea, even if the current use cases are only the current school year, it'll give us the flexibility to store other years in the future.

  4. Stephen Fuqua

    I have a few comments and questions.  I'll try to break them into sections in this comment and provide explanation where I think it may be useful.


    Questions

    • Are the 'views' discussed in the document actual database 'view' objects or would these be tables that can be indexed?
      • If these are 'views' how well does the performance scale across the number of students being served by an organization?
    • Why are the student demographics stored on an enrollment dimension instead of being attributed to the student directly?
    • How are changing dimensions handled in the model?  For example, the proposed student dimension includes data about a contact person that can change/update independently of any attribute of the student changing and doesn't seem to provide a way to map student addresses into the same dimension.
    • How does the proposed StudentSectionGradeFact handle standards-based grading values?  
    • Why is it not feasible to have multiple date dimensions that better reflect some of the common use cases in schools (e.g., calendar dates, school dates, fiscal dates, etc...)?  What about using a snowflake for the date dimension that would provide mappings for individual schools to calendar dates, etc?  
      • What implications does this have for simpler queries like the demographics of a school on a given date?

    Comments

    • If the intent is to build the middle tier using database views (e.g., view objects), it seems like it could pose some performance constraints that would limit use and adoption among larger organizations.
      • This is based on the assumption that the views would be stored in a separate schema which would prevent indexing in SQL server instances.
      • The other assumption I am making is that arbitrary queries against the views will incur a non-trivial cost in the query processor/engine to solve the joins and likely wouldn't be cached for the long-term.
    • In the naming conventions, I think solution B (e.g., dropping) the dimension suffix makes the most sense and would likely improve clarity if anyone is thinking about developing something like a GraphQL API to query the data; basically the dimensional tables would map more directly to entities defined in a GraphQL API.
    • It seems like it would provide better performance to decode descriptor values and populate appropriate fields in the dimension tables with those strings.  If that is done, I would also suggest including an INTEGER (or SMALLINT) field that can be provided to users working with analytic software.
      • Basically the INTEGER (or SMALLINT) and STRING representation in the same table would make it possible to define the hashtable that maps the integer values to the text labels since any statistical analysis would require transforming the strings into some type of numeric representation anyway.
    • For the query about programs, the query was ill formed.  My assumption is that this would return the correct value and more clearly expresses the desired business rules:

      SELECT ProgramTypeDimension.ProgramType,
          -- If the goal is to count the number of individual students enrolled, it is better to make it explicit
          COUNT(DISTINCT StudentEnrollmentDimension.StudentKey) as EnrolledCount,
          SUM(CASE WHEN StudentEnrollmentDimension.LimitedEnglishProficiency <> 'Not applicable' THEN 1 ELSE 0 END) AS LimitedEnglishProficiencyCount
      FROM analytics.ProgramTypeDimension
      INNER JOIN analytics.StudentProgramEvent WITH (NOLOCK) ON ProgramTypeDimension.ProgramTypeKey = StudentProgramEvent.ProgramTypeKey
      INNER JOIN analytics.StudentEnrollmentDimension WITH (NOLOCK) ON StudentProgramEvent.StudentKey = StudentEnrollmentDimension.StudentKey
      WHERE StudentProgramEvent.ProgramEventType = 'Enter'
      GROUP BY ProgramTypeDimension.ProgramType;
    • It seems like the MostRecentGradingPeriod 'view' could be replaced fairly easily using a bit column in a grading period view that would indicate whether or not that record belongs to the most recent grading period and could be included in an index to improve performance for queries related to the most recent grading period.
    • I would recommend against creating multiple schema related to small segments of the data model in favor of a single schema for all analytic tasks with a strong/opinionated naming convention that can be adopted by end users wishing to add additional views/tables in their own instances.
    1. Billy - thank you for the great & detailed response. Meta comment: the view designs are very much dependent on specific needs and use cases identified so far; only the ProgramFact and ProgramEvent views were designed without real-world requirements (as examples).

      Q1: views, not tables. See Analytics Middle Tier Enhancements for more background info on this. For a visualization tool, the views should be materialized whether as SQL tables or in another tool, e.g. Analysis Services or Tableau.

      Q2: some of the views definitely do not scale well, hence the recommendation not to use direct queries against the transactional ODS. For more information, see Limiting Impact on the Production ODS.

      Q3: student demographics are attached to the school enrollment in Data Standard 3. My understanding is this is because two different schools in the same district might record the demographics differently. When student moves from one school to the other, previously the second school's demographic record would overwrite the first school's. Now they are both retained. Please see Eric Jansson for more detailed discussion if needed.

      Q4: the existing use case and discussions have not required any planning for slowly changing dimensions, therefore I have not even analyzed the question.

      Q5: I do not know what "standards-based grading values" means.

      Q6: Where would the multiple date dimensions come from, and what purpose would they solve? I'm open to that, but need requirements that would drive creation of those multiple dimensions. Right now the DateDimension simply gives us various string representations of a date. It handles all of the dates in the CalendarDateCalendarEvent table, ignoring the SchoolId because it simply isn't relevant in any of the views or use cases so far.

      Q7: In the current design, you would not be able to get student demographics for a school by historical date - because StudentEnrollmentDim would only include currently enrolled students, a query for anything other than "today" would be missing any students who were enrolled at that time but have since exited. This system as currently designed definitely does not fulfill temporal requirements. This is a very interesting point though and worth exploring in the future. We could consider adding a variant on the StudentEnrollment dimension that would include exit date on it and thus have historical records. At first glance I'm not inclined to change the existing view but also not completely ruling it out yet.

      Responding to comments...

      • Most of the views could be indexed in SQL Server. Haven't spent time evaluating that though. Would probably want to install that as an optional component. There is an option called "Indexes" right now that installs additional indexes in a few edfi.xyz tables. I should change that option name to something else, leaving room to have an option for installing indexes on views. Indexes on views would not be part of release 2.0 but good option to consider for future releases.
      • No decision yet on suffixing "Dimension", "Dim", or nothing - although I'm personally favoring no suffix, and you present a good argument for that: consistency in downstream models, whether GraphQL or Tabular Data Model, etc.
      • FYI, the development team is planning on doing a development spike on putting GraphQL over the Analytics Middle Tier.
      • Can you provide a concrete exmaple of the int/smallint discussion? I'm not following the problem or the solution, to be honest.
      • The program query was definitely quick-and-dirty to try to illustrate a problem. The query you have provided does not account for a student being dual-enrolled and thus having two records, which is the reason I used the outer apply with "select distinct".
      • MostRecentGradingPeriod view was created for the Early Warning System use case in Power BI or QuickSight and is very useful as is. Could consider adding a column to GradingPeriod if useful to someone.
      • Segmenting by schemas is definitely not required... it does not allow you to have the same name in different schemas, but if you want to have multiple schemas in the same data model, then this would get very confusing. I will think about an alternate proposal following your suggestion.
      1. Hi Stephen Fuqua,


        Thanks for the quick response.  I'll try to add follow up where appropriate/necessary below.


        Q1.  Since SQL Server requires the views to be schema bound in order to add indexes to the views, how would the view be "materialized"?  I don't think PostgreSQL has this restriction and know that Oracle doesn't have the restriction for generating a materialized view.  That said, even BI tools will query the source system (e.g., Tableau, SAP BO, Qlik, etc....), which means executing queries against a view that can get expensive fairly quickly.  

        Q2.  If the recommendation is to not query these views directly, what is the value added for this effort?  Basically, if the organization would still need to spin up a new server to host a different representation of the data, is the analytics middle tier serving the purpose of analytic needs (e.g., denormalized data structure that requires less complex queries to retrieve data used for analytic/research needs).

        Q3. Wow....I would think it would be better to resolve the inconsistency in the data pushed into the system instead of promulgating a true and false version of the data simultaneously.  

        Q4. Would it be fair to say that this means the analytic middle tier should only represent the current state and nothing else?

        Q5. Some grading is based on rubrics that have text labels which are neither numeric nor bi-gram based.  For example, the grading scale might look like 'Struggling', 'Approaching Standard', 'Meets Standard', 'Exceeds Standard'.  Integers can be mapped to the text labels, but because the scale is ordinal in nature there are no defined arithmetic operations for the integer representation (e.g., if 'Struggling' = 1 and 'Approaching Standard' = 2, what would 1 + 2 equal that would correctly describe the sum of the values?).

        Q6. Most SEAs and LEAs operate either on a 01JUL fiscal year or follow the federal govt fiscal year of 01OCT.  Calendar dates all begin on 01JAN.  Academic dates, however, can vary by district and school, and sometimes programs within schools.  Depending on the underlying question, it may be necessary to be able to represent the data based on dates related to any of those calendars.  The fiscal date and school date keys can be defined using calendar dates by adding a few columns to the existing table.  This can get a bit bulky at the SEA level, or at the LEA level if there are a lot of schools, but partitioning the table and using reasonable indexes would take care of any potential performance issues while retaining a greater amount of flexibility.


        1. Q1: I hadn't yet thought through implications for schema binding. Having schema-bound views could be a barrier to in-place version update of a database (which is not common in Ed-Fi world, from what I hear). Ultimately,  the Reporting & Visualization Work Group may want to investigate a standardized data mart that starts from the Analytics Middle Tier views and then materializes & optimizes them on a secondary server. Yes, BI tools will query directly - but I encourage people to use the built-in mechanism to schedule full system cache instead of every end-user request generating a query against the ODS.

          Q2: Using on a different server is not ideal but simply a matter of practicality - one cannot optimize a single database for both transactions and reporting. Even without the analytics middle tier, I would not recommend that people run analytics directly off of a transactional ODS because of the potential for writing bad queries that can escalate row and table locks.

          Q3: Resolving the inconsistency is really important. But it is not something that Analytics Middle Tier can solve. Either the implementation, or the Ed-Fi Data Standard, needs to resolve that inconsistency.

          Q4: the analytics middle tier as designed so far represents current state. There is nothing to keep us from evolving toward historical data, once we define requirements for such.

          Q5: one of the limitations I have is access to only one old sample data set. There is a LetterGradeTranslation table that is used by the grade table to create an equivalency for analytics, not sure if that would help in this case though. Would like to hear specific ideas on how we could represent that rubric you describe, whether in the current views or in new use-case specific views.

          Q6: I don't see Fiscal Year in the ODS. I can see the value of having that data, but unless I'm overlooking something (entirely possible!), it is simply not available for use in the Analytics Middle Tier. That said, Analytics Middle Tier is already providing some extra configuration, like the aforementioned LetterGradeTranslation. If we want to provide views with fiscal dates in the future, we could add a FiscalYear configuration table. We can explore that for version 2.1 or beyond, which will include new views and optionally could include new columns on existing views.

          Generally speaking the reporting & visualization work group will have an opportunity to significantly shape the additions made after version 2.0. Right now I want to solve this current set of architectural challenges, which includes some breaking changes, and get the product out to the community soon.

  5. Brief notes from conversation with Jean-Francois Guertin:

    • Prefer StudentDim, or DimStudent
    • Keeping data standard 2 support good for letting users create multi-year data marts
    • Investigate performance impact of hashing long concatenated natural keys
    • "Enrollment" implies a fact/event. Consider not having a "student demographics" or "student enrollment" dimension (at least in the current design), and instead just have a fact / event.
    • School Year helps with ETL into a data mart. For data standard 2, could have an analytics_config table for storing the database's year, and then add that to the DateDimension and GradingPeriod views, allowing the support of School Year in both 2.x and 3.x databases.
    • Like the "namespace separation" of schemas.
  6. Ralph Kimball says:

    "In the real world, there are many compelling reasons to build the FK-PK pairs as surrogate keys that are just sequentially assigned integers. It's a major mistake to build data warehouse keys out of the natural keys that come from the underlying operational data sources." (The Kimball Group Reader, ch 6).

    The Analytics Middle Tier is using the natural keys to create surrogate keys. This is driven by the fact that the Analytics Middle Tier is really just a bunch of views on top of multiple tables. Which underlying table's auto-increment integer key would be used? Still, worth thinking about this a little bit more. Some of the key values in the Analytics Middle Tier are ridiculously large because they include text.

    Might add this as a new section.