Delivering a unified workflow with full traceability for data scientists

Data scientists often reuse previously built components to accelerate developing AI applications. But often, the roadblock to reusability lies in the difficulty data scientists have when going to find, share, and integrate prebuilt components into their current project.

A large portion of application components that can be reused are called features. In the world of machine learning, a feature is a data input used to train models. For example, in a wind turbine predictive maintenance model, the variance of the power output and the average of the power output could be useful features. Typically, features involve transformations from the source data available to a problem. These transformations may include value mappings, value scaling, and aggregations.

During the development process, data scientists turn to feature stores that act as centralized hubs for feature recipes, metadata, and pre-computed feature data, making it easier for data scientists to discover and reuse features.

This blog post provides an overview of the common functionalities of feature stores and highlights the unique benefits of the C3 AI Feature Store, such as its end-to-end data lineage and ability to speed up time-to-production for machine learning projects.

A feature store is a centralized repository of materialized (pre-computed) feature data. It provides three main functions:

  1. The sharing and discoverability of features across teams.
  2. The reuse of named features in both training and prediction/inference contexts.
  3. A point-in-time view of multiple features (e.g., the most recent data value for each feature, as of a specific point in time).

Overall, the feature store provides a higher level of abstraction for both data scientists and MLOps to train, optimize, and monitor ML models. However, existing feature stores suffer from several limitations:

  1. The actual feature values are computed outside of the store using data engineering tools like Apache Spark. Feature code must be completely rewritten when transitioning from a data science proof-of-concept to production.
  2. Individual features may be aggregations computed at different intervals and combining them directly will yield incorrect results.
  3. As a standalone component, feature stores are unable to track where individual data records originated and where they are used.

 

The C3 AI Feature Store

The C3 AI Feature Store, available on the C3 AI Platform, has unique benefits that accelerate time-to-production for ML projects.

By design, the C3 AI Feature Store tightly integrates with the upstream and downstream steps of a machine learning workflow. This is achieved by maintaining metadata about data sources, transformations, storage, and models trained using the feature store. This metadata then drives the process of efficiently computing and storing features from both experimental data and production data.

Many tasks related to using a feature store in production can be automated with this approach: the transition from experimentation to production, recomputing of updated features based on new data, resampling of time series features, the aggregation of composite features, and end-to-end data lineage.

 

Fast Transition from Experimentation to Production

C3 AI supports authoring features in two ways: using C3 AI’s time series expression language and through arbitrary Python functions (e.g., calling Pandas APIs). Features created in both approaches are ready to be used in a production application.

Data scientists can perform exploration and feature engineering using their preferred interface. The C3 AI Platform then provides the necessary metadata to enable using these features in production. A simple example of this workflow is illustrated below.

First, the data scientist authors their features using Pandas:

 

Once features are defined, they are viewable in the C3 AI Studio:

These features are now included when the application is deployed to production.

 

Putting the C3 AI Feature Store to Use

The C3 AI Feature Store is a key component to the development of all C3 AI applications and accelerates the production timeline. It has helped build a diverse set of applications that large enterprises use daily for predictive maintenance, fraud detection, and supply chain optimization.

In the next blog on the C3 AI Feature Store, we’ll continue to explore how you can use it to author features and dive into the full user experience.

 


Learn more about the C3 AI Platform and the C3 AI Feature Store.

Learn More