Modules

Modules are collections of tools, skills and agents for different steps of your development process and lifecycle. By default, Ellf will guide you through the whole end-to-end process and select the right modules in the order most fitting for your requirements. You can also run individual modules directly as slash commands.

Project Planning

Break your business problem down into actionable steps

Data Annotation

Define label scheme, NLP components and annotate data

Prodigy Recipes

Implement custom workflows for the Prodigy annotation tool

Pattern Generation

Interactively develop patterns and rule-based logic

Model Training

Train and export custom models and components

Question Answering

Ask specific questions and query the knowledge base

Human Support

Get support and feedback from our NLP experts

Coding Handover

Hand over in-app conversations to your coding assistant

Monitoring

Monitor jobs, training progress and cluster health

Platform Operations

Navigate the platform and CLI to perform actions

Project planning

This module helps you break down your business problems and requirements into actionable steps and outlines a project plan, including strategies for data preparation and development, NLP components to use, label schemes, annotation and model training. It will ask follow-up questions as needed to determine the right approach and next steps. This is usually the first step in the Ellf process.

PROJECT_PLAN.md

NLP Project Plan: Fraud Report Classifier

Problem Statement

Build a pipeline that processes analyst-written fraud investigation summaries and produces four outputs: fraud type (multi-class, 6 labels), affected product, urgency level, and legal escalation flag. Urgency and legal escalation are derived from business rules applied downstream of model predictions — not model outputs themselves. Starting from scratch on annotation.

Pipeline Overview

Analyst report text

│─→ [Fraud Type Classifier] ← supervised textcat, 6-class exclusive

|─→ [Social Engineering Detector] ← supervised textcat, binary

|─→ [Product Extractor] ← rules (PhraseMatcher on known product names)

|─→ [Amount Extractor] ← regex / MONEY NER for dollar amounts

|─→ [Business Logic Layer] ← NOT a model

|─→ urgency: fraud_type × amount × product → policy thresholds

|─→ escalate: urgency == HIGH → legal team routing

Key architecture decision: Urgency and legal escalation are policy decisions, not language-understanding tasks. Internal thresholds ($100K, insider fraud type) are encoded as rules owned by the fraud team, not embedded in model weights.

Components

Component	Approach	Rationale
Fraud type	Supervised textcat, 6-class exclusive	Core NLP task; domain-specific; needs training
Social engineering vector	Supervised textcat, binary	Method flag separate from outcome type; binary is fast to annotate
Product extraction	PhraseMatcher rules	Analyst reports name products explicitly; rules are fast and auditable
Amount extraction	Regex / spaCy MONEY entity	Structured format; rules are sufficient
Urgency	Business logic rules	Thresholds ($100K, fraud type) are policy, not language
Legal escalation	Business logic rules	Derived from urgency; policy-owned

Data Strategy

Source: Analyst-written fraud investigation summaries (free text, high quality, domain-consistent language)
No existing labels — annotating from scratch
insider_fraud and insurance_fraud estimated at 10–15% each — random sampling sufficient

Annotation Plan

Fraud type classifier

Recipe: textcat.correct with LLM pre-annotation
Labels: account_takeover, money_laundering, card_fraud, application_fraud, insider_fraud, insurance_fraud (exclusive)
Target volume: 400–500 annotated examples
Evaluation split: Set aside ~100 examples before annotation starts (document-level split)
Pilot first: Annotate 50–75 manually (no LLM). Fix schema before scaling.

Social engineering vector

Recipe: textcat.binary
Label: social_engineering_vector (true/false)
Pass: Second pass after fraud type annotation is stable
Target volume: 200–300 examples

Schema decision

social_engineering is a method, not an outcome — removed from fraud type labels
Added as a separate binary attribute to avoid confusable label pairs

Evaluation Strategy

Test set

Hold out ~100 examples before any annotation begins (document-level split)
Never used in training; kept constant as the ground truth benchmark

Metrics

Per-class F1 for all 6 fraud types — do not rely on macro average alone
insider_fraud and insurance_fraud tracked separately
Confusion matrix — expected confusables: card_fraud ↔ account_takeover, application_fraud ↔ money_laundering

Baselines

Most-frequent-class baseline before any model evaluation

Training curves

Train on 25/50/75/100% after each annotation batch
Rising at 100% → annotate more; flat → investigate schema or architecture

Memorisation check

Train on pilot examples, evaluate on those same examples — must be near-perfect

Roadmap

Phase	What	Output
1 — Pilot	Read reports, write guidelines, annotate manually	Stable schema + 75 examples
2 — Baseline	Train first model, memorisation check	Go/no-go on schema
3 — Scale annotation	LLM-assisted textcat.correct to 400–500 examples	Training dataset
4 — Train & evaluate	Full training run, per-class F1, error analysis	v1 fraud type model
5 — SE vector pass	Binary annotation pass + training	v1 SE vector model
6 — Rules layer	Product PhraseMatcher, amount regex, urgency/escalation rules	Complete pipeline

Next Steps

Hold out ~100 docs as the test set before touching any annotation tool
Read 50–75 reports manually to validate the schema
Write one-page annotation guidelines with label definitions and tiebreaker rules
Annotate the pilot batch using the textcat recipe
Train a quick baseline and run the memorisation check
If training curves are rising, scale with LLM-assisted annotation
Build the rules layer (product PhraseMatcher, amount regex) in parallel

Decision Log

Urgency: Business logic. Thresholds ($100K, insider fraud type) are policy rules.
Legal escalation: Business logic derived from urgency. Policy-owned.
Product extraction: PhraseMatcher rules first. Analyst reports name products explicitly.
Fraud type: Supervised textcat, 6-class exclusive. Core NLP task.
Social engineering: Binary vector flag, not a fraud type label. It's a method, not an outcome.
Evaluation-first: 100-doc test set held out before annotation begins.

Ellf also uses the plans CLI to store and sync project plans so you can work on them locally in your coding assistant and via the in-app chat. Plans are scoped to Ellf projects and each project can have one or more plans attached to it, for example one plan for each NLP component you’re developing. As you work with Ellf, your project plans are continuously updated to reflect the state and progress of your project. In the UI, you can view your project plans by clicking Plans in the sidebar.

Data annotation

This module helps you convert your business problem and project plans into the right NLP components, develop the best matching label schemes that are easy to annotate and learn from, and set up automated data annotation workflows. It will start the Prodigy annotation tool if needed and provide you with the setup to scale up your data labelling process by creating tasks in the platform and setting up agents to auto-label data for you.

The data annotation module also incorporates many best practices for creating generalizable categories and implementing the most efficient human-in-the-loop and model-assisted data development strategies, like our guide for optimizing annotation workflows. This information is also available in the knowledge base that you can access via the /ellf-ask module.

✳︎Claude Code: Setting up annotation on the cluster

Prodigy recipe builder

This module makes coding assistants proficient at using the Prodigy annotation tool and implementing custom recipes and interfaces for specific use cases. Ellf will use it if a custom annotation workflow is required for data collection in your project, e.g. during the data annotation step. You can also call this module directly if you already have an annotation goal in mind and need help implementing the workflow and annotation interface for it.

The recipe builder can implement recipes for both the standalone workflow that runs Prodigy locally, as well as Ellf tasks that run on your cluster and let you assign human annotators and autonomous annotation agents in the platform. Compared to the data annotation module, which focuses on annotation strategy, this module specifically targets the Python recipe implementation.

✳︎Claude Code: Building a custom annotation recipe

Pattern generation

This module helps you determine which information to extract requires machine learning solutions, and which parts can be reliably extracted using rule-based logic like match patterns, regular expressions or other heuristics for faster and more predictable results. It can also automate the creation of those match patterns and iteratively refine them with alternating precision and recall passes, while asking you for interactive feedback on the results. At the end of the process, the module can provide you with the code or script needed to apply the generated patterns or integrate them into your existing NLP pipeline to combine models with rules.

✳︎Claude Code: Generating patterns for invoice extraction

Model training

This module uses your data to train the best possible small, fast and private model components, e.g. a spaCy pipeline consisting of the previously developed components. Based on your requirements and project plan, it will automatically pick a suitable configuration, embeddings and hyperparameters. It will also log the training results for further analysis and export a loadable model and code to use it in your application. To perform and manage training in the cloud, the module can also set up actions that run on your cluster and use GPU resources if available.

Once your training experiments are running, the module can analyze your training logs and results, interpret and explain them for you, and recommend strategies for next steps and further improvements.

✳︎Claude Code: Detecting overfitting during training

Question answering

This module connects to Ellf’s extensive knowledge base consisting of documentation, code, project examples and advice for different use cases and domains, as well as development best practices and our distilled expertise from 10+ years of solving real-world NLP problems in industry. The knowledge base also includes documentation and examples of Ellf itself, as well as our other tools, spaCy and Prodigy, and their related ecosystems.

✳︎Claude Code: Example

Human support

If you ever feel stuck or need human feedback on your project plans and workflow, you can use this module to send us your logs and have them reviewed by one of our NLP experts. Alternatively, you can also export your conversation log to a file and email it to us at support@ellf.ai.

✳︎Claude Code: Submitting a support request

Support requests are reviewed by our team and you’ll typically receive a response within a few business day. The module will always ask for your permission before sharing your conversation logs.

Handover to coding assistant

This module allows in-app chat conversations to be handed over to your coding assistant like Claude Code in order to create and edit files locally and write code. The conversation history can be moved to your cluster, and by default, Ellf will automatically pick up from the latest conversation. You can also call the module explicitly with a conversation ID or select from the available past conversations.

Handover from in-app chat to Claude Code

Can you implement this?

Sure! Here’s the handoff summary:


Request ID	`523060d8-b2db-4fd5-b541-5ee119007501`
Status	planned
Description	Write `loader.py` for documents from data warehouse

In your local Claude Code, run /ellf-todo to pick it up – it will load our full conversation context automatically.

✳︎Claude Code: Continuing development after handover

Under the hood, this module uses the todo CLI to create and manage requests, which can then be picked up by your coding assistant. It can also use the plans CLI to fetch and update your project plans so it has the available context for the given todo.

Monitoring and observability

This module monitors and diagnoses running jobs from your coding environment and the in-app chat. It can check on tasks, actions, agents and services running on your cluster, local training runs, standalone Prodigy servers, and overall cluster health. Instead of dumping raw logs into your conversation, the module produces compact structured summaries with actionable alerts. It covers several monitoring surfaces:

Training jobs: Tracks loss, dev scores and best checkpoint, and detects problems like overfitting, score plateaus, loss spikes and NaN loss.
Annotation tasks: Reports task state, dataset growth signals and annotator or agent activity to help you spot stalled or misconfigured annotation workflows.
Agents: Checks agent state, recent errors and whether the agent is producing work or appears blocked.
Cluster health: Runs connectivity, cluster-side and node capacity checks to surface degraded state or resource pressure.

✳︎Claude Code: Checking on a training action

› /ellf-monitor How’s the NER training going?
⏺︎ 
Job ner-product-names-v2 (action)
State running — step 800
Best score 0.81 (ents_f) at step 720
Train loss 0.52 (stable)
Alerts None
Training looks healthy. The score has been improving steadily and there are no signs of overfitting or loss instability.


Job	`ner-product-names-v2` (action)
State	running — step 800
Best score	0.81 (`ents_f`) at step 720
Train loss	0.52 (stable)
Alerts	None

✳︎Claude Code: Checking on an annotation task

› /ellf-monitor How is the company reports annotation task going?
⏺︎Task: Company Reports Annotation 2026


Field Value
Status ready
Recipe ner (Named Entity Recognition)
Project Company Reports Analysis
Labels PERSON, ORG, DATE
Dataset company-reports-ner (kind: ner)
Input Reports 2026 Corpus
Created 2026-04-14 by Alex
Annotate URL https://app.ellf.ai/annotate/019d8cd3-5051-7313-a930-7319c0237464
Status: Task is provisioned and ready to annotate — no errors, no related actions yet. No intervention needed. Next step is to open the annotate URL and start collecting examples, or launch an action to train once the dataset has content.
› What’s the status of the collected annotations?
⏺︎Collected Annotations — Company Reports Annotation 2026


Metric Value
Total examples 232
Answer breakdown 230 accept / 0 reject / 2 ignore
Examples with user spans 168 (64 accepted with no spans)
User span labels PERSON: 89, ORG: 45, DATE: 18
Distinct Annotators 3 (Alex, Charlie, Kim)

Field	Value
Status	ready
Recipe	`ner` (Named Entity Recognition)
Project	Company Reports Analysis
Labels	PERSON, ORG, DATE
Dataset	`company-reports-ner` (kind: ner)
Input	Reports 2026 Corpus
Created	2026-04-14 by Alex
Annotate URL	`https://app.ellf.ai/annotate/019d8cd3-5051-7313-a930-7319c0237464`

Metric	Value
Total examples	232
Answer breakdown	230 accept / 0 reject / 2 ignore
Examples with user spans	168 (64 accepted with no spans)
User span labels	PERSON: 89, ORG: 45, DATE: 18
Distinct Annotators	3 (Alex, Charlie, Kim)

✳︎Claude Code: Checking on the cluster status

› /ellf-monitor What’s the status of my cluster?
⏺︎Overall: Healthy ✓

Check Result
Cluster API Responding on yourorg.ellf.run
K8s API Accessible, cluster Ready

Nodes (1):


Node Status CPU Memory GPU Pods
gke-cluster-0469e73e-9z75 Ready 1930m ~6.0 GiB - 5
No degraded state, no failed checks. No intervention needed.

Check	Result
Cluster API	Responding on `yourorg.ellf.run`
K8s API	Accessible, cluster Ready

Node	Status	CPU	Memory	GPU	Pods
`gke-cluster-0469e73e-9z75`	Ready	1930m	~6.0 GiB	-	5

Platform operations

This module makes your coding assistant and the in-app chat proficient at navigating the web platform and ellf command-line interface so it’s able to create and manage tasks, actions, agents and services, interact with the cluster, navigate to the right pages and resources, and perform any other actions for you. It essentially makes Ellf good at using itself to its full potential.

Ellf will generally ask you for additional confirmation before creating, running or deleting things for you, and will execute everything under your user account.

Example: Performing actions via the in-app chat

✳︎Claude Code: Performing actions via the CLI