Improving ODS Query Performance With Indexes

Overview

The Analytics Middle Tier views — especially fact-type views — are complex and contain many conditions in the JOIN and WHERE clauses. The ODS database is optimized for high performance on REST API operations, necessarily striking a balance between write and read performance. Thus there are performance-enhancing indexes on natural keys, which are used in some GET requests, but not much else. Adding other indexes can significantly boost performance of ad hoc queries and permanent views such as those created by the Analytics Middle Tier, but they come with a cost: higher disk space utilization and slower write performance.

Optional Indexes

In addition to the optional AMT Collections, there exists an option to install extra indexes that significantly improve the performance of some of the AMT queries. Install is simple: just provide the "indexes" keyword on the options flag (-o  or --options):

.\EdFi.AnalyticsMiddleTier.Console.exe -c "..." -o indexes

This installs the following indexes:

Table

On Columns

Include Columns

​edfi.Grade

StudentUSI, SchoolId, LocalCourseCode, SchoolYear, SectionIdentifier, SessionName

NumericGradeEarned​

edfi.StudentSectionAssociation

SchoolId, LocalCourseCode, SchoolYear, SessionName

EndDate, LastModifiedDate

edfi.AcademicSubjectType
(Data Standard v2.2 only)

CodeValue

(none)

On the Glendale sample database, the Grade  table storage allocation grows by approximately 9% with the addition of this index: 75 MB additional space for sample grade data covering 16,724 students. This is not a significant additional burden, and may not cause a noticeable performance impact. The impact for a large district or a state, however, could be problematic for write performance to the Grade table. The storage allocation for StudentSectionAssociation  similarly grows by about 10% in Glendale, with approximately 30 MB dedicated to the new index. The v2.2-only AcademicSubjectType index has a negligible footprint.

Query for index size
SELECT 
    tn.[name] AS [Table name], 
    ix.[name] AS [Index name],
    SUM(sz.[used_page_count]) * 8 AS [Index size (KB)]
FROM
    sys.dm_db_partition_stats AS sz
INNER JOIN 
    sys.indexes AS ix
ON
    sz.[object_id] = ix.[object_id] 
AND
    sz.[index_id] = ix.[index_id]
INNER JOIN
    sys.tables tn
ON
    tn.OBJECT_ID = ix.object_id
GROUP BY
    tn.[name],
    ix.[name]
ORDER BY 
    3 desc

Adding New Indexes

If you have a performance problem on one or more views, one of the patterns for limiting impact on the ODS is to copy the data to a reporting server via replication or log shipping. This reporting database is a better place for adding the optional indexes collection. Once you have created some representative queries against the Analytics Middle Tier views, either directly or through a BI/reporting tool, you may find that the performance warrants another look at additional indexes. Don't just blindly create those "missing" indexes! is a good reference article on how to detect potential indexes and decide on whether to implement them.

Alternatively, if the views all return within minutes (where microseconds to seconds is the objective), then you may benefit from streaming the view contents out to new tables on another database, where you can then optimize a concrete table instead of optimizing a view.

LastModifiedDate

Finally, a special word about LastModifiedDate: this column is used on many different tables for display only; it is not part of the WHERE clause or other selection criteria in the Analytics Middle Tier views. However, if you are regularly querying for changed records based on this column, your query time will benefit greatly from adding an index on this column to the key tables used by the Core View Collection and/or other view collections.

Contents