Get Started with Lilypad

This guide will have you up and running with Lilypad in less than 5 minutes.

Create an account

First, navigate to https://app.lilypad.so and create an account. You’ll need a GitHub or Google account to sign up.

Next, navigate to Settings -> Organization and:

Create a new project.
Generate an API key for that project.

We recommend saving the ID and API key in your environment (e.g. .env file):

LILYPAD_PROJECT_ID=...
LILYPAD_API_KEY=...

Installation

Install the Lilypad Python SDK:

# For OpenAI support
uv add "python-lilypad[openai]"
 
# For multiple providers
uv add "python-lilypad[openai,anthropic,google]"

Available provider extras:

openai - OpenAI models
anthropic - Anthropic models
google - Google models (genai SDK)
bedrock - AWS Bedrock models
azure - Azure AI models
mistral - Mistral models
outlines - Outlines framework

Configure Tracing

Lilypad automatically traces any LLM API call you make automatically — without a proxy.

To enable tracing, simply run configure:

from dotenv import load_dotenv
import lilypad
from openai import OpenAI
 
load_dotenv()
lilypad.configure()
 
client = OpenAI()
 
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(completion.choices[0].message.content)
# > The capital of France is Paris.

Follow the link or navigate to the Home page for your project to see the trace.

Trace Arbitrary functions

You can also use the trace decorator to trace arbitrary Python functions to add more structure and color to your traces:

from dotenv import load_dotenv
import lilypad
from openai import OpenAI
 
load_dotenv()
lilypad.configure()
 
client = OpenAI()
 
 
@lilypad.trace()
def answer_question(question: str) -> str | None:
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": question}]
    )
    return completion.choices[0].message.content
 
 
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.

You’ll see the answer_question trace in your project’s Home page.

Trace Arbitrary Code Blocks

Lilypad also supports tracing arbitrary code blocks using the span context manager:

from dotenv import load_dotenv
import lilypad
from openai import OpenAI
 
load_dotenv()
lilypad.configure()
 
client = OpenAI()
 
 
@lilypad.trace()
def answer_question(question: str) -> str | None:
    with lilypad.span("Answer Question Prompt") as span:
        messages = [{"role": "user", "content": question}]
        span.metadata({"messages": messages})
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )
    with lilypad.span("Answer Question Completion") as span:
        span.metadata(completion.model_dump())
    return completion.choices[0].message.content
 
 
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.

You should now also see the Answer Question Prompt and Answer Question Response spans inside the answer_question trace in your project’s Home page.

Automatically Version An LLM Function

For non-deterministic functions, it’s extremely important that we take a snapshot of the exact version of the code that was used to produce an output. This reproducibility is necessary for properly evaluating quality down the line.

Replacing the trace decorator with a generation decorator will do exactly that. Lilypad will automatically snapshot and version your non-determinstic LLM function alongside the trace.

from dotenv import load_dotenv
import lilypad
from openai import OpenAI
 
load_dotenv()
lilypad.configure()
 
client = OpenAI()
 
 
@lilypad.generation()  # Automatically versions `answer_question`
def answer_question(question: str) -> str | None:
    with lilypad.span("Answer Question Prompt") as span:
        messages = [{"role": "user", "content": question}]
        span.metadata({"messages": messages})
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )
    with lilypad.span("Answer Question Completion") as span:
        span.metadata(completion.model_dump())
    return completion.choices[0].message.content
 
 
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.

You can see the generation in the Generations tab of your project. This will contain not only the trace but also everything you need to know about that exact generation for future analysis and evaluation.

💡

The version is the runnable closure

Whenenever you run a generation decorated LLM function, we compute the entire code execution graph. This is what we use to determine the version.

This means that you will always be able to reproduce the exact version of the code you ran simply by running the closure again.

This also means that changing the code to an existing older version will automatically attach the trace to that older version.

Evaluations

We are working on some cool tooling around evaluations, starting with analysis and annotation tooling.

It’s currently in closed beta. If you’re interested in participating, reach out!

See our pricing page for more information.

Last updated on April 29, 2025

Overview Open Source