Today I'm excited to introduce FlorDB, a versatile hindsight logging database that simplifies how we manage the AI and machine learning lifecycle.
Let me start by explaining what makes FlorDB unique. While there are many tools out there for managing ML workflows, FlorDB introduces something particularly useful: hindsight logging. Imagine you're many hours into training a model, and you suddenly realize you forgot to track an important metric. Traditionally, this would mean starting over from scratch. But with FlorDB, you can add those logging statements after the fact and efficiently replay your training with the new logging in place -- often in just seconds.
FlorDB is designed to integrate seamlessly with your existing workflow. Whether you're using Make for basic automation, Airflow for complex pipelines, MLFlow for experiment tracking, or Slurm for cluster management – FlorDB works alongside all of them.
What makes FlorDB particularly useful is its adaptability. It can serve as your:
- Git-aware logging library
- Checkpoint/Restore system for long-running Python tasks
- Model registry for version control
- Feature store for materializing results of featurization
- Label management solution for data annotation
- And more, adapting to your specific needs
Getting started with FlorDB is straightforward. If you want the latest stable version, you can simply run:
pip install flordb
Let me show you how simple it is to start using FlorDB. One of our core design principles is "low floor, high ceiling" – meaning it should be easy to get started, but capable enough for complex use cases.
Here's all you need to log your first message:
import flor
flor.log("msg", "Hello world!")
When you run this, you'll see:
msg: Hello, World!
Changes committed successfully
And retrieving your logs? Just as simple. You can use a Flor Dataframe:
import flor
flor.dataframe("msg")
This gives you a clean, organized view of all your logged messages. No need to set up a database schema, no complex configurations – just straightforward logging capabilities.
What's particularly useful about this approach is that you can start small, logging just the basics, and expand your logging as your needs grow. There's no upfront commitment to a complex infrastructure – FlorDB grows with your project's needs.
[Pause for transition]
Now that we've covered the basics, let me show you how FlorDB handles more complex scenarios, like tracking machine learning experiments...
Now let's look at how FlorDB handles real machine learning workflows. While the basic logging we just saw is useful, FlorDB really shines when working with complex experiments that have multiple hyperparameters and metrics to track.
Let me show you how you can adapt your existing PyTorch training script to incorporate FlorDB logging. We'll break this down into three key parts: logging hyperparameters, managing model checkpoints, and tracking metrics.
First, let's look at how we handle hyperparameters:
import flor
import torch
# Define and log hyperparameters
hidden_size = flor.arg("hidden", default=500)
batch_size = flor.arg("batch_size", 32)
learning_rate = flor.arg("lr", 1e-3)
Notice how we're using flor.arg
here. This does two important things: it logs the parameter values, and it makes them configurable from the command line. This means you can easily run experiments with different parameters without changing your code:
python train.py --kwargs hidden=250 lr=5e-4
Next, let's look at the training loop. FlorDB provides a checkpointing system that works seamlessly with PyTorch:
# Use FlorDB's checkpointing to manage model states
with flor.checkpointing(model=net, optimizer=optimizer):
for epoch in flor.loop("epoch", range(num_epochs)):
for data in flor.loop("step", trainloader):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Log the loss value for each step
flor.log("loss", loss.item())
# Evaluate the model on the test set
eval(net, testloader)
Let me highlight a few important features here:
- The
flor.checkpointing
context manager handles saving and loading model states flor.loop
helps track iteration progressflor.log
captures metrics like loss values during training
To view all this logged information, you can use a Flor Dataframe just like before, but now with multiple columns:
import flor
flor.dataframe("hidden", "batch_size", "lr", "loss")
This gives you a comprehensive view of your experiment, showing how different hyperparameters affect your model's performance.
[Pause for transition]
Now that we've covered basic experiment logging, let me introduce you to one of FlorDB's most distinctive capabilities: hindsight logging - (which is) the ability to add logging statements after your experiments have run.
It's particularly useful when you realize you need to track something you didn't think of initially.
Let me walk you through a practical example.
First, let's use a sample repository to demonstrate this.
You can get it by running:
git clone https://github.com/rlnsanz/ml_tutorial.git
cd ml_tutorial
make install
Let's start with our first training run:
python train.py
When we run this, we'll see output like:
Created and switched to new branch: flor.shadow
device: cuda
seed: 9288
hidden: 500
epochs: 5
batch_size: 32
lr: 0.001
print_every: 500
[Training progress output...]
accuracy: 90.9
correct: 9090
Now, let's run another experiment with different parameters:
python train.py --kwargs epochs=3 batch_size=64 lr=0.0005
At this point, we have two training runs in our database. We can view them easily:
import flor
flor.dataframe("device", "seed", "epochs", "batch_size", "lr", "accuracy")
Here's where hindsight logging becomes valuable.
Imagine you're analyzing these runs and realize you need to know what the gradient norms were during training.
A naive solution is simply to retrain both models from scratch after adding the relevant logging statement.
With FlorDB, you simply add the new logging statement to the latest version of your code and replay your previous runs efficiently.
Here's how we add gradient norm logging to our training script:
flor.log("gradient_norm",
torch.nn.utils.clip_grad_norm_(
model.parameters(), max_norm=float('inf')
).item()
)
To replay our previous runs with this new logging, we simply use:
python -m flor replay gradient_norm
FlorDB will analyze your previous runs and replay them efficiently, only executing the parts needed to capture the new information.
When it's done, you can view the updated results:
import flor
flor.dataframe("seed", "batch_size", "lr", "gradient_norm")
Notice how the gradient norm, data recovered with hindsight logging, is now just another column in the Flor Dataframe.
The power here is that we didn't have to modify our original experiments or rerun them from scratch.
FlorDB handled all the complexity of propagating logging statements back in time and replaying the necessary parts of our training history efficiently.
[Pause for transition]
Now that we've covered the core features of FlorDB, let me show you how it fits into larger AI/ML applications. In the real world, AI/ML applications need to manage complex pipelines spanning multiple components - from feature computation to model training to human feedback. Let me demonstrate this using our Document Parser application.
When processing PDF documents, we need to extract and store various features. Here's how FlorDB handles this:
# featurize.py
for doc_name in flor.loop("document", os.listdir(...)):
N = get_num_pages(doc_name)
for page in flor.loop("page", range(N)):
# text_src is 'OCR' or 'TXT'
text_src, page_text = read_page(doc_name, page)
flor.log("text_src", text_src)
flor.log("page_text", page_text)
# Run featurization
headings, page_numbers = analyze_text(page_text)
flor.log("headings", headings)
flor.log("page_numbers", page_numbers)
What's important to note is that FlorDB automatically tracks:
- The source of each feature (OCR vs raw text)
- Text features like headings and page numbers
- The relationship between documents, pages, and their features
- Complete provenance of how features were computed
All this happens without needing a predefined schema or complex setup. We'll see this in action next.
Now let me show you how FlorDB manages model training and versioning:
# train.py
# Flor Dataframe for training data
labeled_data = flor.dataframe("first_page", "page_color")
# Define and track model parameters
hidden_size = flor.arg("hidden", default=500)
num_epochs = flor.arg("epochs", 5)
batch_size = flor.arg("batch_size", 32)
learning_rate = flor.arg("lr", 1e-3)
with flor.checkpointing(model=net, optimizer=optimizer):
for epoch in flor.loop("epoch", range(num_epochs)):
for data in flor.loop("step", trainloader):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
flor.log("loss", loss.item())
optimizer.step()
# Log evaluation metrics
acc, recall = eval(net, testloader)
flor.log("acc", acc)
flor.log("recall", recall)
During inference, we can automatically select the best model:
# infer.py
# Query for best model based on metrics
best_model = flor.dataframe("acc", "recall")
One of the most powerful aspects is how FlorDB handles human feedback. In our PDF Parser application, we have a Flask interface where experts can review and correct model predictions:
# app.py
@app.route("/save_colors", methods=["POST"])
def save_colors():
colors = request.get_json().get("colors", [])
pdf_name = pdf_names.pop()
with flor.iteration("document", None, pdf_name):
for i in flor.loop("page", range(len(colors))):
# FlorDB saves changes to ground truth (i.e. feedback)
flor.log("page_color", colors[i])
flor.commit()
return jsonify({"message": "Colors saved"}), 200
The key here is that FlorDB maintains complete provenance of both machine predictions and human corrections, making it easy to:
- Track which predictions were corrected
- Use corrections to improve model training
- Maintain data quality over time
import flor
flor.dataframe("first_page", "page_color")
This gives you a complete view of the data, including both machine-generated and human-corrected labels.
Finally Let's look at how all these pieces fit together in a real ML application. Here's our complete pipeline as defined in the Makefile:
process_pdfs: $(PDFS) pdf_demux.py
@echo "Processing PDF files..."
@python pdf_demux.py
@touch process_pdfs
featurize: process_pdfs featurize.py
@echo "Featurizing Data..."
@python featurize.py
@touch featurize
train: featurize hand_label train.py
@echo "Training..."
@python train.py
model.pth: train export_ckpt.py
@echo "Generating model..."
@python export_ckpt.py
infer: model.pth infer.py
@echo "Inferencing..."
@python infer.py
@touch infer
hand_label: label_by_hand.py
@echo "Labeling by hand"
@python label_by_hand.py
@touch hand_label
run: featurize infer
@echo "Starting Flask..."
@flask run
We've decided to manage dependencies and dataflow using Make, but you could just as easily use Airflow, Kubeflow, or any other workflow management system. FlorDB operates at the Python layer, and adapts to your existing infrastructure, making it easy to integrate into your AI/ML applications.
So, that's FlorDB - a versatile logging database that brings together experiment tracking, model management, and human feedback in one cohesive system. Let me point you to some resources to learn more:
To recap, the simplest way to get started is to install FlorDB via pip:
pip install flordb
FlorDB is actively maintained and developed at ASU's School of Computing & Augmented Intelligence (SCAI), building on years of research at UC Berkeley's RISE Lab. We welcome contributions and feedback from the community.
If you're interested in learning more about the research behind FlorDB, check out our papers:
- "Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle" (CIDR 2025)
- "Hindsight Logging for Model Training" (VLDB Journal, 2021)
Thank you for your interest in FlorDB. We look forward to seeing what you'll build with it!