Meadowlark Application Architecture

Introduction

Meadowlark 0.2.0 has a significantly different architecture than Meadowlark 0.1.0. One primary goal of Meadowlark 0.2.0 is to support running under multiple frontend services as well as use multiple backend datastores, which requires a new pluggable architecture. Please note that as a research project, the Meadowlark architecture is subject to change as specific design choices are refined and tested.

Meadowlark Internal Architecture

The Meadowlark internal architecture is broken up broadly into three parts: Service plugins to web framework implementations, Meadowlark core itself, and Backend plugins to document-oriented datastores. Meadowlark core is further composed of a pluggable middleware pipeline, and provides in-process event notifications to Backend plugins that register as Listeners.

Meadowlark Services

A Meadowlark service is the frontend piece of Meadowlark that accepts API requests and returns API responses. Services are expected to be a thin wrapper that translates and forwards API client requests to Meadowlark Core and translates Meadowlark Core responses back into API client responses.

Frontend requests and responses are standardized by Meadowlark Core, and the interface includes standard CRUD operations and ad-hoc query support.



Starting a service is what launches a Meadowlark instance, and determines where that instance is hosted and executed. A Service is also where Meadowlark Core picks up configuration information for backend plugin loading.

Examples of Services include:

  • API Gateway + AWS Lambda
  • Fastify web framework
  • Google cloud functions*
  • Azure functions*

* not implemented yet, as of milestone 0.2.0.

Meadowlark Core

Meadowlark Core is a JavaScript library that provides an implementation of the ODS/API based on a given Ed-Fi data standard project as defined in the MetaEd language.

At its heart, Meadowlark Core is a lightweight validation engine. Its behavior is essentially* the same regardless of resource endpoint (* see info box below). On startup, Meadowlark Core loads and caches a MetaEd project and uses that project to validate an API request from a Service. If the request is valid according to the shape of the MetaEd project, it forwards the request to a Meadowlark backend to store/retrieve an API document from a document-oriented datastore. 

Core validation proceeds in several steps. The first is the resource endpoint. Meadowlark Core determines whether a resource name is valid. The next part of validation is the shape of the document. Meadowlark Core understands the schema for each API resource and applies document validation if a body is part of the request.

The final two parts of validation are in some ways deferred to the back end data store. The first is security. In Meadowlark 0.2.0, the only security available is ownership based security. Ownership is determined by a JSON Web Token (JWT) from the incoming API request.  This requires Meadowlark Core to query the back end for authorization to access the specific document.

The second part of validation that is deferred to the back end is reference validation. This is a fundamental part of Meadowlark and indeed a fundamental part of the ODS/API’s behavior. This validation must be fully deferred to a datastore because reference validation requires that referenced documents are transactionally checked for existence.

"Essentially" the Same

The description above applies to all resources. However, a few resources also need additional code to match the behavior of the legacy ODS/API. For example, the Ed-Fi Data Standard defines SchoolYear  as a resource, but the ODS/API treats this as SchoolYearType. There are also additional processing rules for handling sub-class validation (i.e. checking that a reference to educationOrganizationId = 5 points a real Education Organization, which is stored as a School, Local Education Organization, etc.). These functional differences are planned for implementation in Milestone 0.3.0.

Middleware

Internally, Meadowlark is designed as a pipeline that takes an incoming request and processes it all the way through to a response. The Meadowlark pipeline design takes its cue from the middleware designs found in web frameworks like Ruby on Rails and Express (Node.js). Middleware cleanly separates out the different stages of processing a request. A middleware component can pass  a request to the next middleware component, optionally enriching the request with additional information, or it can halt processing and return a response to the front end. Middleware components are also a possible future plugin point for third-party Meadowlark extensions.

Example Middleware pipeline for POST, PUT:

  1. Parse Request - Headers, body, JWT
  2. Resource Validation - Is this a valid resource endpoint according to the MetaEd model?
  3. Document Validation - Is this a valid document for this resource according to the MetaEd model?

Security

Meadowlark 0.1.0 prototyped a limited version of EducationOrganization-based security. In 0.2.0, that has been replaced by ownership-based security. The client name is taken from the JWT token and passed to the Backend plugin.

CRUD and Query Handlers

After a request for a CRUD operation or a query passes through the middleware pipeline, it is forwarded to a specific handler for that operation. For example, a POST request goes to the Upsert handler. These handlers forward the request to the configured Backend plugin to further process the request. These handlers also provide pre- and post-event notifications to Listener plugins.

Upserts

Consistent with the /wiki/spaces/EFAPIGUIDE/pages/24281157, a POST request with identity information matching a previously stored resource will be treated as if it were a PUT request or update - hence, the code provides combined "update or insert" (upsert) functionality.

Event Listeners

Backend plugins can also take on the role of a Listener plugin. These plugins receive in-process event notifications on actions taken by the CRUD and Query handlers. For example, a “search engine” plugin might listen to an “AfterUpsert” datastore event. If the event indicates a successful document insert, then the plugin might send a corresponding insert request to a search engine for indexing.



Change Data Capture and "AfterUpsert"

The Milestone 0.2.0 release will use this "AfterUpsert" process to send data out to OpenSearch. The longer-term vision is to use Change Data Capture to send data to Kafka and thence to OpenSearch, so that the API application is only writing to a single data store.

Resource/Document Identifiers

Unlike the generated resource ids of the ODS/API, Meadowlark resource ids are derived directly from the resource (e.g. AcademicWeek) and the parts of the document that make up its identity (e.g. schoolId and weekIdentifier). By design, this provides a uniqueness guarantee for documents similar to a “natural key” in relational database terminology. This also means that references a document makes to other documents, which are made up of these same identity elements, are directly convertible to these identifiers.  

These document ids are combined into a string, a hash value is calculated (SHA3-224), and the resulting value is Base64-encoded for string storage. Unlike with an auto-generated numeric identifier, or a random UUID, this version of a document id is deterministic: the same input will always result in the same value. Thus anyone implementing the ToString() → Hash → Encode logic can generate / predict the correct Id value for an individual resource.

Meadowlark Backend Plugins

Meadowlark backend plugins provide reference validation and persistence to document-oriented datastores. They can also take on the role of a query handler, which provides ad-hoc query support. Backend plugins are JavaScript packages that export functions according to a standard interface. These functions include typical CRUD operations, security checks, and optional ad-hoc query support.

Backend Datastores

Backend datastores provide upsert, update, delete and read by id semantics. Meadowlark is designed to be used with schemaless datastores, typically document-oriented ones. Wide-column datastores as well as relational databases with document store features are also appropriate. Importantly, the datastore must have transaction support that allows for a mix of reads and writes. 

For external integrations, it is important for the datastore to have a change data capture feature (sometimes called change streams). Ad-hoc document query support can be useful but is not required. Note that Meadowlark is designed so that join semantics are unnecessary.

Document Storage Schema

API documents are stored with the document id as the key and the API document as a value. Additional metadata is stored as appropriate for the datastore (e.g. additional columns, additional document nodes). This metadata includes such things as the document type (e.g. AcademicWeek) and owner (for ownership based security).

Metadata on the references one document has to others is also stored as a series of document ids. This information is used to support validation on delete, and is usually stored separately from the API document. No joining is necessary, but this metadata needs to be indexed.

Note that this design requires a maximum of two tables in relational database terminology, or their non-relational equivalent (e.g. two collections in MongoDB, two item types in DynamoDB).

Reference Validation

A backend plugin provides reference validation as a part of upsert, update and delete behaviors. These validations are performed as part of a datastore transaction that culminates in the requested datastore change.

Because document ids can be derived directly from an API document identity (document type + identity elements), a simple lookup by document id can check for document existence and uniqueness. Because references in a given API document include identity elements, the document ids of those referenced documents can be derived.

Upsert and update actions perform reference validation by transactionally checking each reference id to confirm existence in the datastore before making the change.

Delete actions perform reference validation in reverse, by first checking for the existence of any documents (by id) referencing the document to be deleted. This is done via lookup on the indexed references metadata.



Query Handlers

Query handlers are Meadowlark backend plugins that can accept ad-hoc query requests. This plugin role is separate from the datastore plugin role because many schemaless datastores are not able to provide ad-hoc query support efficiently, if at all. In such cases, Meadowlark can be configured to forward these queries to a datastore that supports ad-hoc queries (e.g. a search engine) and is synchronized with the main datastore. 

Security

Security checks need to be deferred to the backend plugin. This takes the form of a security middleware provided by the backend, which checks CRUD requests for authorization. Query security is handled at the time of the query, as the client may have authorization for only some of the query results.

Meadowlark External Architecture

Meadowlark Event Streaming

Also see: Meadowlark - Durable Change Data Capture

Meadowlark is designed to fit within a larger system architecture by providing easy integration via event streaming. The Meadowlark event stream takes every change to the datastore and formats it as an immutable event. These events have a shape almost identical to that of the API document storage schema, and are enabled by the change data capture features of a datastore.

Change Data Capture

Modern datastores provide a feature known as change data capture, or change streams, which provide events when a data element changes in the datastore. Because Meadowlark only requires a single table (or collection, item type, etc.) for API document storage, a change listener for a datastore can be very simple. On any API document change, it bundles the document plus its metadata into a Meadowlark event and sends it to an event streaming platform. For Meadowlark 0.2.0, we are investigating Kafka as that platform.

Apache Kafka

Kafka is an open-source append-only event store that allows for sophisticated event processing. It is quite popular, with managed offerings available from cloud providers and connectors for a wide variety of event sources and sinks. However, it also has a reputation of being complex to manage in general, which is part of the Meadowlark investigation.

Kafka messages are published to topics. For Meadowlark 0.2.0, all messages are published to the same topic. Kafka messages have keys which act like primary keys. Meadowlark uses the document id as the primary key. This fits well with Kafka features like log compaction, where Kafka can discard obsolete changes for a Meadowlark document (e.g. create + delete = no event) and provide an event-based view of the document datastore.

Meadowlark Code Organization

Meadowlark is organized as a monorepo with npm packages in three directories:

Backends

The location for backend plugins. This includes datastore plugins for MongoDB and PostgreSQL, a query handler plugin for OpenSearch, and the 0.1.0 DynamoDB code in plugin form.

Services

This is for the frontend plugins, including one for AWS API Gateway + Lambda and one for the Fastify web framework.

Packages

This is the location of the Meadowlark library packages, including Meadowlark Core itself. This also includes a MetaEd plugin for creating the metadata used by Meadowlark for document schema validation, and identity and reference extraction.