DocumentationPricingJoin Waitlist

Getting Started

  • Introduction
  • Installation & Setup

Features

  • Modules
  • Platform
  • Recipes

Cluster

  • Details & Setup
  • Tasks, Actions & Agents
  • Assets & Files
    • Asset Types
    • Publish Assets
    • Path Aliases
    • Secrets

API Reference

  • Command-Line Interface
  • Recipe Development
  • Recipes API

Assets & Files

Assets are pointers to data resources that are registered with Ellf and can refer to files, directories and machine learning models on your cluster, as well as any other sources like data from an API call or a custom remote database. In addition to assets, you can upload and manage Python packages on your cluster and create datasets used to store data and annotations. In the web application, the ClusterAssets page shows you an overview of assets available on your cluster.

Use Ellf to manage files and resources

If you’ve connected Ellf to your coding assistant, it will be able to create and manage assets for you. You can also use the in-app chat and reference resources via @, for example to start a task using a data source, train from a dataset or assign an agent to a running task.

Asset types

Assets cover all common data types like input files and resources, match patterns, annotated and raw datasets, models, vectors, and Python packages, including built-in and custom recipes. You can also implement your own custom asset types.

Data and data files

Naturally, data is one of the most important parts of many workflows and you often want to start by adding your input data to the app. The hybrid cloud architecture of Ellf means that all of your data stays private and on the data processing cluster hosted by you, and our servers only store a record of the data, called an “asset”.

raw_documents.jsonl
input
April 15, 2026 at 5:46 PM
company_names.jsonl
patterns
April 15, 2026 at 5:46 PM

Assets can point to pretty much anything that’s loadable – commonly, this includes files and directories of files, but it can also include remote resources from a database or an API. See the section on custom assets for details on how to implement your own dataclasses for custom asset types and associated logic like loading.

Datasets

Datasets are named collections of data and annotations in Prodigy’s JSON format, stored in the database on your cluster. When you start an annotation task, you typically provide the name of a new or existing dataset to save the collected annotations to. Datasets can then be used for training and evaluating models and performing other analysis. The datasets command lets you manage your datasets.

legal-contracts-entities
dataset
April 15, 2026 at 5:46 PM

Exporting datasets

The datasets export command lets you export examples from an existing dataset to a file, e.g. to inspect it locally or to integrate the data into a larger automation pipeline. Datasets are stored in Prodigy’s JSONL (newline-delimited JSON) format. Also see the docs on Prodigy’s annotation interfaces for more details on the data format and structure they produce.

Exporting a dataset
$ellf

datasets

export

my_dataset--output ./my_dataset.jsonl
my_dataset.jsonl
{"text":"Uber\u2019s Lesson: Silicon Valley\u2019s Start-Up Machine Needs Fixing","meta":{"source":"The New York Times"},"_input_hash":1886699658,"_task_hash":-1952856502,"tokens":[{"text":"Uber","start":0,"end":4,"id":0},{"text":"\u2019s","start":4,"end":6,"id":1},{"text":"Lesson","start":7,"end":13,"id":2},{"text":":","start":13,"end":14,"id":3},{"text":"Silicon","start":15,"end":22,"id":4},{"text":"Valley","start":23,"end":29,"id":5},{"text":"\u2019s","start":29,"end":31,"id":6},{"text":"Start","start":32,"end":37,"id":7},{"text":"-","start":37,"end":38,"id":8},{"text":"Up","start":38,"end":40,"id":9},{"text":"Machine","start":41,"end":48,"id":10},{"text":"Needs","start":49,"end":54,"id":11},{"text":"Fixing","start":55,"end":61,"id":12}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":0,"end":4,"token_start":0,"token_end":0,"label":"ORG"},{"start":15,"end":29,"token_start":4,"token_end":5,"label":"LOCATION"}],"answer":"accept"}
{"text":"Pearl Automation, Founded by Apple Veterans, Shuts Down","meta":{"source":"The New York Times"},"_input_hash":1487477437,"_task_hash":-1298236362,"tokens":[{"text":"Pearl","start":0,"end":5,"id":0},{"text":"Automation","start":6,"end":16,"id":1},{"text":",","start":16,"end":17,"id":2},{"text":"Founded","start":18,"end":25,"id":3},{"text":"by","start":26,"end":28,"id":4},{"text":"Apple","start":29,"end":34,"id":5},{"text":"Veterans","start":35,"end":43,"id":6},{"text":",","start":43,"end":44,"id":7},{"text":"Shuts","start":45,"end":50,"id":8},{"text":"Down","start":51,"end":55,"id":9}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":0,"end":16,"token_start":0,"token_end":1,"label":"ORG"},{"start":29,"end":34,"token_start":5,"token_end":5,"label":"ORG"}],"answer":"accept"}
{"text":"How Silicon Valley Pushed Coding Into American Classrooms","meta":{"source":"The New York Times"},"_input_hash":1842734674,"_task_hash":636683182,"tokens":[{"text":"How","start":0,"end":3,"id":0},{"text":"Silicon","start":4,"end":11,"id":1},{"text":"Valley","start":12,"end":18,"id":2},{"text":"Pushed","start":19,"end":25,"id":3},{"text":"Coding","start":26,"end":32,"id":4},{"text":"Into","start":33,"end":37,"id":5},{"text":"American","start":38,"end":46,"id":6},{"text":"Classrooms","start":47,"end":57,"id":7}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":4,"end":18,"token_start":1,"token_end":2,"label":"LOCATION"}],"answer":"accept"}
{"text":"Women in Tech Speak Frankly on Culture of Harassment","meta":{"source":"The New York Times"},"_input_hash":-487516519,"_task_hash":62119900,"tokens":[{"text":"Women","start":0,"end":5,"id":0},{"text":"in","start":6,"end":8,"id":1},{"text":"Tech","start":9,"end":13,"id":2},{"text":"Speak","start":14,"end":19,"id":3},{"text":"Frankly","start":20,"end":27,"id":4},{"text":"on","start":28,"end":30,"id":5},{"text":"Culture","start":31,"end":38,"id":6},{"text":"of","start":39,"end":41,"id":7},{"text":"Harassment","start":42,"end":52,"id":8}],"_session_id":null,"_view_id":"ner_manual","answer":"accept"}

Keep in mind that the data may contain references to files hosted on your cluster, like image or audio file paths. If you want to export all assets and datasets, e.g. for backup purposes, you can use the ellf export command.

Models

Models are a sub-type of assets, since they’re also just collections of files under the hood. They can be required in recipes, produced by actions (e.g. after training), or used by agents to perform specific tasks.

en_core_web_trf
model
April 15, 2026 at 5:46 PM
en_core_web_lg.vectors
vectors
April 15, 2026 at 5:46 PM

Models can be uploaded to your cluster using publish data, pointing the destination path to a location on your cluster’s storage bucket. You can also produce models from action recipes – for instance, training a model from a dataset will automatically register the resulting model as a new asset. To use a model in a recipe, annotate the argument with the Model type, which provides a load method that returns the loaded model. For more details and examples, see the recipe development guide.

Packages

Recipes often depend on various other Python packages, which can include libraries available on PyPi, as well as your own private packages. Ellf manages those centrally alongside assets, to ensure each worker of the cluster has the correct set of package dependencies available when executing a recipe. Recipes are packaged and uploaded as versioned Python packages and you’ll be able to see the recipe version used by a task or action in the app and CLI info.

prodigy-recipes
package
April 15, 2026 at 5:46 PM
Named Entity Recognition
recipe
April 15, 2026 at 5:46 PM

Publishing data assets

The easiest way to upload and publish data to your cluster is by using the publish data command. This takes care of uploading the files to the storage bucket on your cluster and creates a record of the asset in Ellf so you can use it within the app. The {__bucket__} variable in the destination path is a path alias built in by default, which refers to the cluster’s storage bucket URL.

Publishing a JSONL file
$ellf

publish

data

./example.jsonl

"{__bucket__}/example/data.jsonl"

--name "My first asset"--version 1.0.0--kind input

--loader jsonl

Under the hood, the above command performs the following two steps, which you can also execute separately if it’s useful for your workflow or automation.

  1. Copy files to the cluster using files cp and create parent directories if needed.
  2. Create asset record in Ellf using assets create, using the given name, kind and path, and set the loader in the asset’s meta.

The newly created asset will now show up in ClusterAssets in the UI and in assets list on the CLI. The copied files are also shown when you run files ls on the assets directory.

List assets in Ellf
$ellfassetslist
id                                     name             kind
------------------------------------   --------------   -----
bdfd0a49-0877-49e6-bd37-d481162df351   My first asset   input
List files on the cluster
$ellffilesls"{__bucket__}"--recurse
{__bucket__}/example/data.jsonl
My first asset
input
April 15, 2026 at 5:46 PM

Assets can also be created dynamically from Python within recipes, for example to save preprocessed data or import from an internal or external resource like a database. If your data source requires authentication, you can use the built-in secrets feature to securely make your API keys or credentials available to the recipe.


Path aliases

Assets are pointers to files that can be located anywhere, most commonly on your cluster. Under the hood, those files will be added to a cloud storage bucket, e.g. an S3 bucket if you’re using AWS. To make it easier to manage file paths on your cluster, Ellf lets you create named path aliases using the paths command. When managing files and creating assets, you can then use the alias as a variable in the path, e.g. {alias}/ so you don’t have to repeat the full URL. Built-in path aliases are formatted between __ and can also be used in custom path aliases.

Built-in aliasDescription
__nfs__The path to the NFS drive.
__bucket__The data bucket of the cluster.

For example, you can have an alias {train} that points to {__bucket__}/training. After adding a path, you can then use it in commands like publish and files.

Creating a path
$ellf

paths

create

train

"{__bucket__}/training"

Publishing asset with custom path alias
$ellf

publish

data

./data.jsonl

"{train}/data.jsonl"

--name "Corpus"--kind input

--loader jsonl


Secrets

Secrets let you securely manage API keys and other credentials across your organization, which you can then select and use in recipes – for example, if you need to connect to a model via an API. Under the hood, secrets are named pointers and you can view and manage them using the secrets command or via ClusterAssetsSecrets in the UI. Secret values are stored only on your cluster and never sent to our servers.

Adding secrets

You can create a secret using secrets create on the CLI or via the UI. If you don’t provide a --value, you’ll be prompted to enter it interactively so it doesn’t appear in your shell history.

Creating a secret
$ellf

secrets

create

GEMINI_API_KEY

Enter secret value for GEMINI_API_KEY:

Creating a secret with an explicit value
$ellf

secrets

create

GEMINI_API_KEY

--value "sk-..."

Hiding the secret in your shell history

To make sure your secret isn’t saved in the history, don’t set the --value and enter it when prompted instead. You can also prefix the command with a space to keep it out of history if your shell is configured with HISTCONTROL=ignorespace in bash or setopt HIST_IGNORE_SPACE in zsh.

Using secrets in recipes

Recipes for tasks, actions and agents can require secrets as their arguments, e.g. to authenticate with an API. In the UI, you can then see a dropdown of the available secrets. On the CLI, you can simply provide the secret name.

Using a secret as an argument
$ellf

agents

create

gemini_agent"Gemini Auto-Labeler Agent"

--model gemini-2.0-flash

--api-key GEMINI_API_KEY

If you’re developing your own recipes, you can require the Secret type for an argument, which will then allow the user to select a secret and pass an instance of it to your recipe function. Secret.value will return the plain-text string, which you can then pass forward to API calls and other libraries that require it.

Using a secret in a recipe
from ellf_recipes_sdk import agent_recipe, Secret

@agent_recipe(title="My Recipe")
def recipe(*, api_key: Secret):
    gemini_key = api_key.value()
Read next
Command-line interface

from the makers of spaCy and Prodigy

Navigation

  • Home
  • Documentation

Platform

  • Pricing
  • Waitlist

Resources

  • Case Studies
  • Blog
Terms & ConditionsPrivacy PolicyImprint© 2026 Explosion