Core Concepts
Understand the key concepts in Risicare observability.
This guide explains the core concepts you'll encounter when using Risicare.
Traces and Spans
Traces
A trace represents a complete execution flow through your agent. It starts when your agent receives input and ends when it produces output.
Trace: "Answer user question about weather"
├── Span: parse_input (2ms)
├── Span: llm_call to gpt-4o (1.2s)
├── Span: tool_call to weather_api (300ms)
└── Span: format_response (5ms)
Spans
A span represents a single unit of work within a trace. Spans have:
- Name: What operation this span represents
- Kind: The type of span (LLM_CALL, TOOL_CALL, AGENT, etc.)
- Timing: Start time and duration
- Attributes: Key-value metadata
- Events: Named events that occurred during the span
- Status: OK or ERROR
Span Hierarchy
Spans form a tree structure with parent-child relationships:
with start_span("process_request") as parent:
with start_span("call_llm") as child:
# child.parent_id == parent.span_id
passAgents
An agent is a logical component that makes decisions. In Risicare, agents are identified by:
- ID: Unique identifier (auto-generated or explicit)
- Name: Human-readable name (e.g., "planner", "researcher")
- Role: The agent's role (orchestrator, worker, reviewer)
- Type: The agent framework/pattern used
@agent(name="planner", role="orchestrator")
def plan_task(objective):
# All spans inside this function are associated with this agent
passSessions
A session groups related traces from the same user interaction. Use sessions to:
- Track multi-turn conversations
- Group related agent executions
- Analyze user journeys
with session_context(session_id="user-123-session"):
# All traces here belong to this session
result1 = agent.run("First request")
result2 = agent.run("Follow-up request")Semantic Phases
Risicare tracks semantic phases to understand agent decision-making:
| Phase | Description | Example |
|---|---|---|
| THINK | Reasoning and planning | Analyzing the problem |
| DECIDE | Making a decision | Choosing which tool to use |
| ACT | Taking an action | Calling an API |
| OBSERVE | Reading state | Checking memory |
@trace_think
def analyze_problem(context):
"""This is a THINK phase - reasoning about the problem"""
pass
@trace_decide
def choose_tool(options):
"""This is a DECIDE phase - selecting an action"""
pass
@trace_act
def execute_tool(tool, args):
"""This is an ACT phase - performing the action"""
passContext Propagation
Risicare automatically propagates context through your code:
- Thread-safe: Uses Python
contextvarsfor thread isolation - Async-safe: Works correctly with
asyncio - Cross-process: Supports W3C Trace Context for distributed tracing
Automatic Propagation
@agent(name="parent")
async def parent_agent():
# Context automatically propagates to child calls
await child_agent() # Inherits trace context
@agent(name="child")
async def child_agent():
# This agent's spans are children of parent_agent's span
passManual Context
# Extract context for passing to another system
context = get_trace_context()
# Restore context in another thread/process
with restore_trace_context(context):
# Spans created here continue the trace
passError Taxonomy
When errors occur, Risicare classifies them using a 10-module taxonomy:
| Module | What It Covers |
|---|---|
| PERCEPTION | Input parsing, validation |
| REASONING | Logic errors, hallucinations |
| TOOL | Tool execution failures |
| MEMORY | State management issues |
| OUTPUT | Response formatting |
| COORDINATION | Workflow problems |
| COMMUNICATION | Inter-agent messages |
| ORCHESTRATION | Agent lifecycle |
| CONSENSUS | Multi-agent agreement |
| RESOURCES | Resource contention |
Each module contains categories, and each category contains specific error codes:
TOOL.EXECUTION.TIMEOUT
│ │ └── Specific error code
│ └── Category (EXECUTION)
└── Module (TOOL)
The Self-Healing Pipeline
When an error is detected, Risicare runs a 4-stage diagnosis pipeline:
Error Detected
↓
1. Context Extraction
Extract relevant spans, messages, and state
↓
2. Taxonomy Classification
Classify using gpt-4o-mini (fast)
↓
3. Root Cause Analysis
Deep analysis using gpt-4o (thorough)
↓
4. Fix Suggestion
Generate fix configurations
Fixes are then validated through hypothesis testing and deployed via A/B testing.