Live preview

Scientia

Make retrieval-augmented generation (RAG) observable.

If you have ever changed a top_k value, got a "better" answer, and could not explain why, you have met the normal reality of RAG. These systems are not deterministic. Without visibility, improvement turns into guessing, and guessing does not scale to real workloads.

Scientia turns that invisible loop into something you can inspect. It helps you see what was retrieved, what was ignored, and what likely influenced the answer so progress becomes repeatable.

Open the laboratory Read the docs

The core value

Stop guessing. Start diagnosing.

Scientia helps you improve RAG workflows by making the relationships visible: what you changed, what retrieval returned, and how the output responded.

Compare configurations side-by-side.
Inspect grounding and evidence usage.
Build an eval habit based on evidence, not vibes.

Try the loop (60 seconds)

Load a small set of documents you know well.
Ask a question that tends to cause weak retrieval or confident wrongness.
Run A/B with one change, then inspect the retrieved context and trace.

If one change improves one query but breaks another, that is still progress. You learned something real.

Modes

Baseline mode

Fast, low-friction checks for 'is this grounded?' questions.

A/B testing

Change one variable and compare outputs side-by-side.

Graph mode

Trace multi-hop questions across related evidence.

What you can see (without opening a notebook)

Retrieved context (and how it ranked)

Rerank decisions (when applicable)

Latency and retrieval signals

Traces you can export and share

Eval signals that keep evidence in focus

Evals are not about perfection. They are about consistency so your system gets more reliable over time.

Links

Dive deeper or jump to the about section.

Docs About

About

The RAG loop

RAG work is often treated as a linear task: load documents, ask a question, ship a feature. In practice, it is a loop:

Ask - Retrieve - Synthesize - Evaluate - Adjust

The loop is where quality comes from. When that loop is invisible, teams move slowly, repeat the same mistakes, and ship brittle experiences. When the loop is visible, you can actually engineer the outcome.

Scientia exists to keep that loop visible without forcing you to live inside notebooks or build custom dashboards for every experiment.

What "observable" means here

Observable does not mean "more metrics." It means you can answer basic questions quickly:

What evidence did retrieval return?
Was the evidence relevant, or just vector-adjacent?
Did the answer actually use that evidence?
What changed between config A and config B?
Are we getting better in a way that generalizes?

A note on world context

Some questions need more than your uploaded artifacts. A good system should distinguish between:

Document-grounded answers backed by your content.
Blended answers that mix your content with world knowledge.

Scientia makes that distinction explicit so you do not treat "sounds plausible" as "supported by evidence."

How to use Scientia without boiling the ocean

If you only remember one habit, make it this: change one thing at a time.A/B testing is less glamorous than "try five new settings," but it is how you build a system you can trust.

It is

A knowledge explorer for RAG workflows.
A way to compare retrieval strategies and parameters.
A place to build a repeatable evaluation habit.

It is not

A turnkey production RAG stack.
A promise that RAG becomes deterministic.
A replacement for good source material and good questions.

Scientia is built to make the "why did it answer that?" question easier to answer, especially in reviews where "it seems better" is not good enough.