Get Started with Lilypad
This guide will have you up and running with Lilypad in less than 5 minutes.
Create an account
First, navigate to https://app.lilypad.so and create an account. You’ll need a GitHub or Google account to sign up.
Next, navigate to Settings -> Organization and:
- Create a new project.
- Generate an API key for that project.
We recommend saving the ID and API key in your environment (e.g. .env
file):
LILYPAD_PROJECT_ID=...
LILYPAD_API_KEY=...
Installation
Install the Lilypad Python SDK:
# For OpenAI support
uv add "python-lilypad[openai]"
# For multiple providers
uv add "python-lilypad[openai,anthropic,google]"
Available provider extras:
openai
- OpenAI modelsanthropic
- Anthropic modelsgoogle
- Google models (genai
SDK)bedrock
- AWS Bedrock modelsazure
- Azure AI modelsmistral
- Mistral modelsoutlines
- Outlines framework
Configure Tracing
Lilypad automatically traces any LLM API call you make automatically — without a proxy.
To enable tracing, simply run configure
:
from dotenv import load_dotenv
import lilypad
from openai import OpenAI
load_dotenv()
lilypad.configure()
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(completion.choices[0].message.content)
# > The capital of France is Paris.
Follow the link or navigate to the Home page for your project to see the trace.
Trace Arbitrary functions
You can also use the trace
decorator to trace arbitrary Python functions to add more structure and color to your traces:
from dotenv import load_dotenv
import lilypad
from openai import OpenAI
load_dotenv()
lilypad.configure()
client = OpenAI()
@lilypad.trace()
def answer_question(question: str) -> str | None:
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": question}]
)
return completion.choices[0].message.content
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.
You’ll see the answer_question
trace in your project’s Home page.
Trace Arbitrary Code Blocks
Lilypad also supports tracing arbitrary code blocks using the span
context manager:
from dotenv import load_dotenv
import lilypad
from openai import OpenAI
load_dotenv()
lilypad.configure()
client = OpenAI()
@lilypad.trace()
def answer_question(question: str) -> str | None:
with lilypad.span("Answer Question Prompt") as span:
messages = [{"role": "user", "content": question}]
span.metadata({"messages": messages})
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
with lilypad.span("Answer Question Completion") as span:
span.metadata(completion.model_dump())
return completion.choices[0].message.content
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.
You should now also see the Answer Question Prompt
and Answer Question Response
spans inside the answer_question
trace in your project’s Home page.
Automatically Version An LLM Function
For non-deterministic functions, it’s extremely important that we take a snapshot of the exact version of the code that was used to produce an output. This reproducibility is necessary for properly evaluating quality down the line.
Replacing the trace
decorator with a generation
decorator will do exactly that. Lilypad will automatically snapshot and version your non-determinstic LLM function alongside the trace.
from dotenv import load_dotenv
import lilypad
from openai import OpenAI
load_dotenv()
lilypad.configure()
client = OpenAI()
@lilypad.generation() # Automatically versions `answer_question`
def answer_question(question: str) -> str | None:
with lilypad.span("Answer Question Prompt") as span:
messages = [{"role": "user", "content": question}]
span.metadata({"messages": messages})
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
with lilypad.span("Answer Question Completion") as span:
span.metadata(completion.model_dump())
return completion.choices[0].message.content
output = answer_question("What is the capital of France?")
print(output)
# > The capital of France is Paris.
You can see the generation in the Generations tab of your project. This will contain not only the trace but also everything you need to know about that exact generation for future analysis and evaluation.
The version is the runnable closure
Whenenever you run a generation
decorated LLM function, we compute the entire code execution graph. This is what we use to determine the version.
This means that you will always be able to reproduce the exact version of the code you ran simply by running the closure again.
This also means that changing the code to an existing older version will automatically attach the trace to that older version.
Evaluations
We are working on some cool tooling around evaluations, starting with analysis and annotation tooling.
It’s currently in closed beta. If you’re interested in participating, reach out!
See our pricing page for more information.