By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Feature Stores for ML

A forum for the international community of users and developers
of Feature Store platforms for machine learning.
Thank you for subscribing to our newsletter!
Oops! Something went wrong while submitting the form.

What is a feature store?

The short version - a data management layer for machine learning that allows to share & discover features and create more effective machine learning pipelines.
Read the full article

FEATURED BLOG POSTS

Have an interesting blog idea?

EXISTING FEATURE STORES

Hopsworks

Hopsworks

The first open-source Feature Store and the first with a DataFrame API. Most data sources (batch/streaming) supported. Ingest features using SQL, Spark, Python, Flink. The only feature store supporting stream processing for writes. Available as managed platform and on-premises.

Company:
Hopsworks
Vertex AI

Vertex AI

A centralized repository for organizing, storing, and serving ML features on the GCP Vertex platform. Vertex AI Feature Store supports BigQuery, GCS as data sources. Separate ingestion jobs after feature engineering in BigQuery. Offline is BigQuery, Online BigTable.

Company:
Google
SageMaker

SageMaker

Sagemaker Feature Store integrates with other AWS services like Redshift, S3 as data sources and Sagemaker serving. It has a feature registry UI in Sagemaker, and Python/SQL APIs. Online FS is Dynamo, offline parquet/S3.

Company:
Amazon
Databricks

Databricks

A Feature Store built around Spark Dataframes. Supports Spark/SQL for feature engineering with a UI in Databricks. Online FS is AWS RDS/MYSQL/Aurora. Offline is Delta Lake.

Company:
Databricks
Zipline

Zipline

One of the first feature stores from 2018. Feature engineering is using a DSL that implements a variety of functions such as resource efficient and point-in-time correct training set backfills, scheduled updates, feature visualizations and automatic data quality monitoring.

Company:
AirBnB
Michelangelo

Michelangelo

The mother of feature stores. Michelangelo is an end-to-end ML platfom and Palette is the features store. Features are defined in a DSL that translates into Spark and Flink jobs. Online FS is Redis/Cassandra. Offline is Hive.

Company:
Uber
Bigabid

Bigabid

A centralized feature store where different data professionals across the company can each create and manage canonical features. This allows data scientists to add features they’ve built into a shared feature store where it is easy to consume both online and offline.

Company:
Bigabid
FBLearner

FBLearner

Internal end-to-end ML Facebook platform that includes a feature store. It provides innovative functionality, like automatic generation of UI experiences from pipeline definitions and automatic parallelization of Python code.

Company:
Facebook
Overton

Overton

Internal end-to-end ML platform at Apple. It automates the life cycle of model construction, deployment, and monitoring by providing a set of novel high-level, declarative abstractions. It has been used in production to support multiple applications in both near-real-time applications and back-of-house processing.

Company:
Apple
Featureform

Featureform

FeatureForm is a virtual feature store platfrom - you plug in your offline and online data stores. It supports Flink, Snowflake, Airflow Kafka, and other frameworks.

Company:
Featureform
Twitter

Twitter

Twitter's first feature store was a set of shared feature libraries and metadata. Since then, they moved to building their own feature store, which they did by customizin feast for GCP.

Company:
Twitter
Feast

Feast

Originally developed as an open-source feature store by Go-JEK, Feast has been taken on by Tecton to be a minimal, configurable feature store. You can connect in different online/offline data stores and it can run on any platform. Feature engineering is done outside of Feast.

Company:
Linux Foundation
Tecton

Tecton

Tecton.ai is a managed feature store that uses PySpark or SQL (Databricks or EMR) or Snowflake to compute features and DynamoDB to serve online features. It provides a Python-based DSL for orchestration and feature transformations that are computed as a PySpark job. Available on AWS.

Company:
Tecton
Scribble Enrich

Scribble Enrich

The platform handles the complexity of computation and data semantics by providing a python SDK to develop, document and test the feature engineering modules (transforms, pipelines, scheduling, etc) and controlled execution on the server-side.

Company:
ScribbleData
Jukebox

Jukebox

Spotify built their own ML platform that leverages TensorFlow Extended (TFX) and Kubeflow. They focus on designing and analyzing their ML experiments instead of building and maintaining their own infrastructure, resulting in faster time from prototyping to production.

Company:
Spotify
Intuit

Intuit

Intuit have built a feature store as part of their data science platform. It was developed for AWS and uses S3 and Dynamo as its offline/online feature serving layers.

Company:
Intuit
Iguazio

Iguazio

A centralized and versioned feature storre built around their MLRun open-source MLOps orchestration framework for ML model management. Uses V3IO as it offline and online feature stores.

Company:
Iguazio
Kaskada

Kaskada

The company that first solved temporal streaming joins, enabling running predictive models from event-based data - powered by cutting edge data infrastructure.

Company:
Kaskada
Rasgo

Rasgo

A feature store for preparing, understanding, and deploying features using cloud data warehouses (Snowflake) that can be accessed either through a web app for browsing your features and creating modeling data sets or a python package for publishing and deploying your features

Company:
Rasgo
Abacus.ai

Abacus.ai

The platform allows to build real-time machine and deep learning features, upload ipython notebooks, monitor model drift, and set up CI/CD for machine learning systems.

Company:
Abacus.ai
FeatureBase

FeatureBase

A database developed to support machine-scale analytics and data science workflows. The core format allows for granular scans at a feature-by-feature level rather than a columnar or tabular data format.

Company:
Molecula
Doordash

Doordash

A ML Platform with an effective online prediction ecosystem. It serves traffic on a large number of ML Models, including ensemble models, through their Sibyl Prediction Service.They extended Redis with sharding and compression to work as their online feature store.

Company:
Doordash
Salesforce

Salesforce

ML Lake is a shared service that provides the right data, optimizes the right access patterns, and alleviates the machine learning application developer from having to manage data pipelines, storage, security and compliance. Built on an early version of Feast based around Spark.

Company:
Salesforce
H2O AI Hybrid Cloud

H2O AI Hybrid Cloud

H2O.ai and AT&T co-created the H2O AI Feature Store to store, update, and share the features data scientists, developers, and engineers need to build AI models.

Company:
H2O.ai and AT&T
Continual

Continual

Continual is a SQL-centric feature store that aims to maintain relationships between all features in an entity and performs joins when entity is needed. It is available in open beta.

Company:
Continual
Metarank

Metarank

A real-time feature-engineering framework and feature store containing with YML support for defining features for online shopping recommendation use cases. Builds on Apache Flnk and optionally Redis. A low-code Learn-to-Rank online recommendation engine.

Company:
Findify
Feathr

Feathr

Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.

Company:
Microsoft / Linkedin

FEATURE STORE COMPARISON

Platform

Open Source

Offline

Online

Real Time Ingestion

Feature Ingestion API

Write Amplification

Supported Platforms

Training API

Training Data

Hopsworks

AGPL-V3

Hudi/Hive and pluggable

RonDB

No

AWS, GCP, On-Prem

Spark

DataFrame (Spark or Pandas), files (.csv, .tfrecord, etc)

Michelangelo

No

Hive

Cassandra

None

Proprietary

Spark

DataFrame (Pandas)

Zipline

No

Hive

Unknown KV Store

None

Proprietary

Spark

Streamed to models?

Twitter

No

GCS

Manhatten, Cockroach

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Iguazio

No

Parquet

V3IO, proprietary DB

Unknown

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

Databricks

No

Delta Lake

Mysql or Aurora

Unknown

Unknown

Spark

Spark Dataframes

Kaskada

No

Pluggable

Redis

No

AWS, Azure, GCP

Proprietary

DataFrame (Pandas)

Rasgo

No

Snowflake

None

No

AWS, Azure, GCP

Snowflake

DataFrame (Pandas)

SageMaker

No

S3, Parquet

DynamoDB

Yes. Ingestion Jobs

AWS

Aurora

DataFrame (Pandas)

Featureform

No

Pluggable

Pluggable

Unknown

AWS, Azure, GCP

Unknown

??

Jukebox

No

BigQuery

BigTable

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Doordash

No

Snowflake

Redis

Unknown

Proprietary

Snowflake

DataFrame (Pandas)

Salesforce

No

S3, Iceberg

DynamoDB

Yes. Ingestion Jobs

Proprietary

Iceberg

DataFrame (Pandas)

Intuit

No

S3

GraphQL API, unknown backend

Unknown

Proprietary

Unknown

DataFrame (Pandas)

OLX

No

Kafka

Kafka

No

Proprietary

KSQLdb

From feature logging

Continual

No

Snowflake

Coming soon

No

Snowflake, more coming

Snowflake

Proprietary

Metarank

Yes

N/A

Redis

No

Open-Source

XGBoost, LightGBM

CSV files?

Scribble Enrich

No

Pluggable

Pluggable

No

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

PLATFORM - CATEGORIES
DETAILS

Hopsworks

Open - Source

AGPL-V3

Feature Ingestion API

Supported Platforms

AWS, GCP, On-Prem

Michelangelo

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Zipline

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Twitter

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Iguazio

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP, on-prem

Databricks

Open - Source

No

Feature Ingestion API

Supported Platforms

Unknown

Kaskada

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP

Rasgo

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP

SageMaker

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS

Featureform

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP

Jukebox

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Doordash

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Salesforce

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Intuit

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

OLX

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Continual

Open - Source

No

Feature Ingestion API

Supported Platforms

Snowflake, more coming

Metarank

Open - Source

Yes

Feature Ingestion API

Supported Platforms

Open-Source

Scribble Enrich

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP, on-prem

Feature Ingestion API: What APIs and languages are supported for writing features to the feature store?
Write Amplification: Do you write your features more than once - .e.g, write to stable storage first, then run a separate job to ingest features?
Training API (PIT Join Engine): When you create training data from reusable features, you need to join the feature values together. What compute engine is used to perform this point-in-time JOIN?
Training Data: How is the training data made available to machine learning frameworks? As dataframes or files?

Build or Buy

Developing a software can be extremely costly and time-consuming so reusability of different systems proves to be a reasonable solution, however the number of companies building their own feature store is on the rise.

Learn more about the industry's conundrum by watching the relevant panel discussion from the latest Feature Store Summit.

Build or Buy

Developing a software can be extremely costly and time-consuming so reusability of different systems proves to be a reasonable solution, however the number of companies building their own feature store is on the rise.

Learn more about the industry's conundrum by watching the relevant panel discussion from the latest Feature Store Summit.