Assets & Files

Assets are pointers to data resources that are registered with Ellf and can refer to files, directories and machine learning models on your cluster, as well as any other sources like data from an API call or a custom remote database. In addition to assets, you can upload and manage Python packages on your cluster and create datasets used to store data and annotations. In the web application, the ClusterAssets page shows you an overview of assets available on your cluster.

Use Ellf to manage files and resources

If you’ve connected Ellf to your coding assistant, it will be able to create and manage assets for you. You can also use the in-app chat and reference resources via @, for example to start a task using a data source, train from a dataset or assign an agent to a running task.

Asset types

Assets cover all common data types like input files and resources, match patterns, annotated and raw datasets, models, vectors, and Python packages, including built-in and custom recipes. You can also implement your own custom asset types.

Data and data files

Naturally, data is one of the most important parts of many workflows and you often want to start by adding your input data to the app. The hybrid cloud architecture of Ellf means that all of your data stays private and on the data processing cluster hosted by you, and our servers only store a record of the data, called an “asset”.

raw_documents.jsonl

input

July 13, 2026 at 2:25 PM

company_names.jsonl

patterns

July 13, 2026 at 2:25 PM

Contracts

pdf

July 13, 2026 at 2:25 PM

Assets can point to pretty much anything that’s loadable – commonly, this includes files and directories of files, but it can also include remote resources from a database or an API. See the section on custom assets for details on how to implement your own dataclasses for custom asset types and associated logic like loading. Ellf also ships a built-in PDF asset type for annotating scanned and born-digital documents.

Datasets

Datasets are named collections of data and annotations in Prodigy’s JSON format, stored in the database on your cluster. When you start an annotation task, you typically provide the name of a new or existing dataset to save the collected annotations to. Datasets can then be used for training and evaluating models and performing other analysis. The datasets command lets you manage your datasets.

legal-contracts-entities

dataset

July 13, 2026 at 2:25 PM

Exporting datasets

The datasets export command downloads examples from an existing dataset to a file, e.g. to inspect it locally or to integrate the data into a larger automation pipeline. Datasets are stored in Prodigy’s JSONL (newline-delimited JSON) format. Also see the docs on Prodigy’s annotation interfaces for more details on the data format and structure they produce.

Exporting a dataset

$ellfdatasetsexportmy_dataset--output ./my_dataset.jsonl

my_dataset.jsonl

{"text":"Uber\u2019s Lesson: Silicon Valley\u2019s Start-Up Machine Needs Fixing","meta":{"source":"The New York Times"},"_input_hash":1886699658,"_task_hash":-1952856502,"tokens":[{"text":"Uber","start":0,"end":4,"id":0},{"text":"\u2019s","start":4,"end":6,"id":1},{"text":"Lesson","start":7,"end":13,"id":2},{"text":":","start":13,"end":14,"id":3},{"text":"Silicon","start":15,"end":22,"id":4},{"text":"Valley","start":23,"end":29,"id":5},{"text":"\u2019s","start":29,"end":31,"id":6},{"text":"Start","start":32,"end":37,"id":7},{"text":"-","start":37,"end":38,"id":8},{"text":"Up","start":38,"end":40,"id":9},{"text":"Machine","start":41,"end":48,"id":10},{"text":"Needs","start":49,"end":54,"id":11},{"text":"Fixing","start":55,"end":61,"id":12}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":0,"end":4,"token_start":0,"token_end":0,"label":"ORG"},{"start":15,"end":29,"token_start":4,"token_end":5,"label":"LOCATION"}],"answer":"accept"}
{"text":"Pearl Automation, Founded by Apple Veterans, Shuts Down","meta":{"source":"The New York Times"},"_input_hash":1487477437,"_task_hash":-1298236362,"tokens":[{"text":"Pearl","start":0,"end":5,"id":0},{"text":"Automation","start":6,"end":16,"id":1},{"text":",","start":16,"end":17,"id":2},{"text":"Founded","start":18,"end":25,"id":3},{"text":"by","start":26,"end":28,"id":4},{"text":"Apple","start":29,"end":34,"id":5},{"text":"Veterans","start":35,"end":43,"id":6},{"text":",","start":43,"end":44,"id":7},{"text":"Shuts","start":45,"end":50,"id":8},{"text":"Down","start":51,"end":55,"id":9}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":0,"end":16,"token_start":0,"token_end":1,"label":"ORG"},{"start":29,"end":34,"token_start":5,"token_end":5,"label":"ORG"}],"answer":"accept"}
{"text":"How Silicon Valley Pushed Coding Into American Classrooms","meta":{"source":"The New York Times"},"_input_hash":1842734674,"_task_hash":636683182,"tokens":[{"text":"How","start":0,"end":3,"id":0},{"text":"Silicon","start":4,"end":11,"id":1},{"text":"Valley","start":12,"end":18,"id":2},{"text":"Pushed","start":19,"end":25,"id":3},{"text":"Coding","start":26,"end":32,"id":4},{"text":"Into","start":33,"end":37,"id":5},{"text":"American","start":38,"end":46,"id":6},{"text":"Classrooms","start":47,"end":57,"id":7}],"_session_id":null,"_view_id":"ner_manual","spans":[{"start":4,"end":18,"token_start":1,"token_end":2,"label":"LOCATION"}],"answer":"accept"}
{"text":"Women in Tech Speak Frankly on Culture of Harassment","meta":{"source":"The New York Times"},"_input_hash":-487516519,"_task_hash":62119900,"tokens":[{"text":"Women","start":0,"end":5,"id":0},{"text":"in","start":6,"end":8,"id":1},{"text":"Tech","start":9,"end":13,"id":2},{"text":"Speak","start":14,"end":19,"id":3},{"text":"Frankly","start":20,"end":27,"id":4},{"text":"on","start":28,"end":30,"id":5},{"text":"Culture","start":31,"end":38,"id":6},{"text":"of","start":39,"end":41,"id":7},{"text":"Harassment","start":42,"end":52,"id":8}],"_session_id":null,"_view_id":"ner_manual","answer":"accept"}

Keep in mind that the data may contain references to files hosted on your cluster, like image or audio file paths. If you want to export all assets and datasets, e.g. for backup purposes, you can use the ellf export command.

Models

Models are a sub-type of assets, since they’re also just collections of files under the hood. They can be required in recipes, produced by actions (e.g. after training), or used by agents to perform specific tasks.

en_core_web_trf

model

July 13, 2026 at 2:25 PM

en_core_web_lg.vectors

vectors

July 13, 2026 at 2:25 PM

Models can be uploaded to your cluster using publish data, pointing the destination path to a location on your cluster’s storage bucket. You can also produce models from action recipes – for instance, training a model from a dataset will automatically register the resulting model as a new asset. To use a model in a recipe, annotate the argument with the Model type, which provides a load method that returns the loaded model. For more details and examples, see the recipe development guide.

Packages

Recipes often depend on various other Python packages, which can include libraries available on PyPi, as well as your own private packages. Ellf manages those centrally alongside assets, to ensure each worker of the cluster has the correct set of package dependencies available when executing a recipe. Collections of recipes are uploaded as versioned Python packages and you’ll be able to see the recipe version used by a task or action in the app and CLI info.

prodigy-recipes

package

July 13, 2026 at 2:25 PM

Named Entity Recognition

recipe

July 13, 2026 at 2:25 PM

PDF documents

PDFs are a built-in asset type (kind="pdf") for annotating documents – drawing regions on pages, correcting OCR, or labeling text spans over the extracted layout. Unlike a plain input asset, a PDF asset is a folder holding the source PDFs alongside their rendered page images and a manifest, so its pages are rendered once and reused by every annotation task built from it. See Working with PDFs for the recipes and end-to-end workflow.

Contracts

pdf

July 13, 2026 at 2:25 PM

Keep the asset available

Annotations reference the page images by URL rather than embedding them. The asset needs to stay on the cluster so those URLs stay resolvable – deleting it (or its rendered pages) breaks the page images in any dataset built from it.

Publishing data assets

The easiest way to upload and publish data to your cluster is by using the publish data command. This takes care of uploading the files to the storage bucket on your cluster and creates a record of the asset in Ellf so you can use it within the app. The {__bucket__} variable in the destination path is a path alias built in by default, which refers to the cluster’s storage bucket URL.

Publishing a JSONL file

$ellfpublishdata./example.jsonl"{__bucket__}/example/data.jsonl"--name "My first asset"--version 1.0.0--kind input--loader jsonl

Publishing PDFs

PDF assets publish a little differently. You don’t pass a dest – they’re rooted automatically – and registering the source files is only the first step: the pages need to be rendered before the asset can be annotated. Add --render to do both at once, e.g. ellf publish data ./contracts/ --name contracts --kind pdf --render. See the publish data reference for the full arguments.

The newly created asset will now show up in ClusterAssets in the UI and in assets list on the CLI. The copied files are also shown when you run files ls on the assets directory.

List assets in Ellf

List files on the cluster

My first asset

input

July 13, 2026 at 2:25 PM

Assets can also be created dynamically from Python within recipes, for example to save preprocessed data or import from an internal or external resource like a database. If your data source requires authentication, you can use the built-in secrets feature to securely make your API keys or credentials available to the recipe.

Path aliases

Assets are pointers to files that can be located anywhere, most commonly on your cluster. Under the hood, those files will be added to a cloud storage bucket, e.g. an S3 bucket if you’re using AWS. To make it easier to manage file paths on your cluster, Ellf lets you create named path aliases using the paths command. When managing files and creating assets, you can then use the alias as a variable in the path, e.g. {alias}/ so you don’t have to repeat the full URL. Built-in path aliases are formatted between __ and can also be used in custom path aliases.

Built-in alias	Description
`__nfs__`	The path to the NFS drive.
`__bucket__`	The data bucket of the cluster.

For example, you can have an alias {train} that points to {__bucket__}/training. After adding a path, you can then use it in commands like publish and files.

Creating a path

$ellfpathscreatetrain"{__bucket__}/training"

Publishing asset with custom path alias

Secrets

Secrets let you securely manage API keys and other credentials across your organization, which you can then select and use in recipes – for example, if you need to connect to a model via an API. Under the hood, secrets are named pointers and you can view and manage them using the secrets command or via ClusterAssetsSecrets in the UI. Secret values are stored only on your cluster and never sent to our servers.

Adding secrets

You can create a secret using secrets create on the CLI or via the UI. If you don’t provide a --value, you’ll be prompted to enter it interactively so it doesn’t appear in your shell history.

Creating a secret

$ellfsecretscreateGEMINI_API_KEYEnter secret value for GEMINI_API_KEY:

Creating a secret with an explicit value

$ellfsecretscreateGEMINI_API_KEY--value "sk-..."

Hiding the secret in your shell history

To make sure your secret isn’t saved in the history, don’t set the --value and enter it when prompted instead. You can also prefix the command with a space to keep it out of history if your shell is configured with HISTCONTROL=ignorespace in bash or setopt HIST_IGNORE_SPACE in zsh.

Using secrets in recipes

Recipes for tasks, actions, agents and services can require secrets as their arguments, e.g. to authenticate with an API. In the UI, you can then see a dropdown of the available secrets. On the CLI, you can simply provide the secret name.

Using a secret as an argument

$ellfagentscreategemini_agent"Gemini Auto-Labeler Agent"--model gemini-2.0-flash--api-key GEMINI_API_KEY

If you’re developing your own recipes, you can require the Secret type for an argument, which will then allow the user to select a secret and pass an instance of it to your recipe function. Secret.value will return the plain-text string, which you can then pass forward to API calls and other libraries that require it.

Using a secret in a recipe

from ellf_recipes_sdk import agent_recipe, Secret

@agent_recipe(title="My Recipe")
def recipe(*, api_key: Secret):
    gemini_key = api_key.value()