Core Concepts

Understand the key concepts in Risicare observability.

This guide explains the core concepts you'll encounter when using Risicare.

Traces and Spans

Traces

A trace represents a complete execution flow through your agent. It starts when your agent receives input and ends when it produces output.

Trace: "Answer user question about weather"
├── Span: parse_input (2ms)
├── Span: llm_call to gpt-4o (1.2s)
├── Span: tool_call to weather_api (300ms)
└── Span: format_response (5ms)

Spans

A span represents a single unit of work within a trace. Spans have:

Name: What operation this span represents
Kind: The type of span (LLM_CALL, TOOL_CALL, AGENT, etc.)
Timing: Start time and duration
Attributes: Key-value metadata
Events: Named events that occurred during the span
Status: OK or ERROR

Span Hierarchy

Spans form a tree structure with parent-child relationships:

with start_span("process_request") as parent:
    with start_span("call_llm") as child:
        # child.parent_id == parent.span_id
        pass

Agents

An agent is a logical component that makes decisions. In Risicare, agents are identified by:

ID: Unique identifier (auto-generated or explicit)
Name: Human-readable name (e.g., "planner", "researcher")
Role: The agent's role (orchestrator, worker, reviewer)
Type: The agent framework/pattern used

@agent(name="planner", role="orchestrator")
def plan_task(objective):
    # All spans inside this function are associated with this agent
    pass

Sessions

A session groups related traces from the same user interaction. Use sessions to:

Track multi-turn conversations
Group related agent executions
Analyze user journeys

with session_context(session_id="user-123-session"):
    # All traces here belong to this session
    result1 = agent.run("First request")
    result2 = agent.run("Follow-up request")

Semantic Phases

Risicare tracks semantic phases to understand agent decision-making:

Phase	Description	Example
THINK	Reasoning and planning	Analyzing the problem
DECIDE	Making a decision	Choosing which tool to use
ACT	Taking an action	Calling an API
OBSERVE	Reading state	Checking memory

@trace_think
def analyze_problem(context):
    """This is a THINK phase - reasoning about the problem"""
    pass
 
@trace_decide
def choose_tool(options):
    """This is a DECIDE phase - selecting an action"""
    pass
 
@trace_act
def execute_tool(tool, args):
    """This is an ACT phase - performing the action"""
    pass

Progressive Integration

Risicare supports incremental adoption. Start with zero-code auto-instrumentation and add richer observability as needed:

Tier progression from Tier 0 (zero code) through Tier 5 (full platform) showing increasing depth of observability

Context Propagation

Risicare automatically propagates context through your code:

Thread-safe: Uses Python contextvars for thread isolation
Async-safe: Works correctly with asyncio
Cross-process: Supports W3C Trace Context for distributed tracing

Automatic Propagation

@agent(name="parent")
async def parent_agent():
    # Context automatically propagates to child calls
    await child_agent()  # Inherits trace context
 
@agent(name="child")
async def child_agent():
    # This agent's spans are children of parent_agent's span
    pass

Manual Context

# Extract context for passing to another system
context = get_trace_context()
 
# Restore context in another thread/process
with restore_trace_context(context):
    # Spans created here continue the trace
    pass

Error Taxonomy

When errors occur, Risicare classifies them using a 10-module taxonomy:

Module	What It Covers
PERCEPTION	Input parsing, validation
REASONING	Logic errors, hallucinations
TOOL	Tool execution failures
MEMORY	State management issues
OUTPUT	Response formatting
COORDINATION	Workflow problems
COMMUNICATION	Inter-agent messages
ORCHESTRATION	Agent lifecycle
CONSENSUS	Multi-agent agreement
RESOURCES	Resource contention

Each module contains categories, and each category contains specific error codes:

TOOL.EXECUTION.TIMEOUT
 │      │        └── Specific error code
 │      └── Category (EXECUTION)
 └── Module (TOOL)

The Self-Healing Pipeline

When an error is detected, Risicare runs a 4-stage diagnosis pipeline:

Error Detected
     ↓
1. Context Extraction
   Extract relevant spans, messages, and state
     ↓
2. Taxonomy Classification
   Classify using gpt-4o-mini (fast)
     ↓
3. Root Cause Analysis
   Deep analysis using gpt-4o (thorough)
     ↓
4. Fix Suggestion
   Generate fix configurations

Fixes are then validated through hypothesis testing and deployed via A/B testing.

Next Steps

Decorators Reference

All SDK decorators explained

Learn more

Error Taxonomy

Full taxonomy reference

Learn more

Self-Healing Overview

How automatic fixes work

Learn more

Multi-Agent Observability

Track agent interactions

Learn more

Edit this page on GitHub

PreviousInstallation NextInstrument