deeprecall
| Entity Passport | |
| Registry ID | gh-model--kothapavan1998--deeprecall |
| License | MIT |
| Provider | github |
Cite this model
Academic & Research Attribution
@misc{gh_model__kothapavan1998__deeprecall,
author = {kothapavan1998},
title = {deeprecall Model},
year = {2026},
howpublished = {\url{https://github.com/kothapavan1998/deeprecall}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
Quick Commands
git clone https://github.com/kothapavan1998/deeprecall âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for deeprecall: Semantic (S:50), Authority (A:0), Popularity (P:41), Recency (R:89), Quality (Q:50).
Verification Authority
đ What's Next?
Technical Deep Dive
DeepRecall
Recursive reasoning over your data. Plug into any vector DB or LLM framework.
Standard RAG retrieves documents once and stuffs them into a prompt. DeepRecall uses MIT's Recursive Language Models to let your LLM search, reason, search again, and repeat -- until it actually has enough information to answer properly.
The LLM gets a search_db() function injected into a sandboxed Python REPL. It decides what to search for, analyzes results with code, refines its queries based on what it found, and synthesizes a final answer. This is not a fixed pipeline -- the LLM drives the retrieval strategy.
Install
pip install deeprecall[chroma] # ChromaDB (local, zero-config)
pip install deeprecall[milvus] # Milvus
pip install deeprecall[qdrant] # Qdrant
pip install deeprecall[pinecone] # Pinecone
pip install deeprecall[faiss] # FAISS (local, ML-native)
pip install deeprecall[server] # API server (FastAPI + uvicorn)
pip install deeprecall[rich] # Rich console output (verbose mode)
pip install deeprecall[redis] # Redis distributed cache
pip install deeprecall[otel] # OpenTelemetry tracing
pip install deeprecall[langchain] # LangChain adapter
pip install deeprecall[llamaindex] # LlamaIndex adapter
pip install deeprecall[rerank-cohere] # Cohere reranker
pip install deeprecall[rerank-cross-encoder] # Cross-encoder reranker
pip install deeprecall[all] # Everything
Note: DeepRecall depends on
rlmswhich transitively installs its own dependencies (OpenAI SDK, etc.). If you see dependency conflicts, checkpip show rlmsfor the transitive tree.
Quick Start
from deeprecall import DeepRecall
from deeprecall.vectorstores import ChromaStore
store = ChromaStore(collection_name="my_docs")
store.add_documents(["doc 1 text...", "doc 2 text...", "doc 3 text..."])
# Context manager ensures cleanup (search server, connections)
with DeepRecall(
vectorstore=store,
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini", "api_key": "sk-..."},
) as engine:
result = engine.query("What are the key themes across these documents?")
print(result.answer)
print(f"Sources: {len(result.sources)}")
print(f"Steps: {len(result.reasoning_trace)}")
print(f"Time: {result.execution_time:.1f}s")
Tip: Always use
withor callengine.close()when done to release background resources. Vector stores with persistent connections (Milvus, Qdrant) also supportwith store:for automatic cleanup.
What's New in v0.4
RLM v0.1.1a Support
DeepRecall now requires rlms>=0.1.1,<0.2.0, unlocking depth>1 recursive subcalls, context compaction, cost tracking, and scaffold protection from the upstream RLM library. RLM-level limit exceptions (TimeoutExceededError, TokenLimitExceededError, ErrorThresholdExceededError) are now caught gracefully with partial answer recovery.
Cost Tracking
Real USD cost is now extracted automatically when using OpenRouter. Every result includes cost data in result.usage.total_cost_usd and per-model breakdown.
result = engine.query("question")
print(f"Cost: ${result.usage.total_cost_usd}") # e.g. $0.0045
print(result.usage.model_breakdown) # per-model cost_usd
print(f"Budget spent: ${result.budget_status['cost_usd']}")
Cost Budget Enforcement
max_cost_usd is now actively enforced -- both at the RLM level (stops the reasoning loop) and at the tracer level. Previously this was reserved for future use.
result = engine.query(
"Complex question?",
budget=QueryBudget(max_cost_usd=0.10), # Hard USD cap
)
Context Compaction
For queries that require many reasoning steps, enable compaction to avoid hitting the model's context window limit. When enabled, RLM automatically summarises the conversation history when token usage nears the threshold.
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
compaction=True, # Enable context summarisation
compaction_threshold_pct=0.85, # Trigger at 85% of context window
)
Execution Limits
New max_timeout, max_errors, and max_tokens config params give fine-grained control over RLM execution.
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
max_timeout=120.0, # Kill after 2 minutes wall-clock
max_errors=5, # Abort after 5 consecutive REPL errors
max_tokens=50000, # Total token limit (input + output)
)
Iteration Lifecycle Callbacks
New on_iteration_start and on_iteration_complete hooks fire before/after each RLM reasoning iteration, giving more granular observability than the existing on_reasoning_step.
from deeprecall.core.callbacks import BaseCallback
class MyCallback(BaseCallback):
def on_iteration_start(self, iteration):
print(f"Starting iteration {iteration}...")
def on_iteration_complete(self, iteration, has_final_answer):
if has_final_answer:
print(f"Got final answer at iteration {iteration}")
Metadata Filters & Context Injection
query() now accepts filters (metadata dict applied to every search) and context_prefix (text prepended to the prompt). Useful for per-rule or per-section compliance queries.
result = engine.query(
"Does this deal comply with anti-money laundering rules?",
filters={"section": "4.2"},
context_prefix="Policy Section 4.2: Anti-Money Laundering Requirements",
)
Async Embedding Functions
Vector stores now accept async embedding_fn callables. Async functions are auto-detected and wrapped, so you no longer need a manual thread pool executor workaround.
async def my_async_embed(texts: list[str]) -> list[list[float]]:
return await embedding_service.embed(texts)
store = MilvusStore(collection_name="docs", embedding_fn=my_async_embed)
What's New in v0.3
Exception Handling
All DeepRecall errors inherit from DeepRecallError -- catch at the boundary for production use.
from deeprecall import DeepRecall, DeepRecallError, LLMProviderError, VectorStoreError
try:
result = engine.query("question")
except LLMProviderError:
# LLM call failed (timeout, rate limit, etc.)
...
except VectorStoreError:
# Vector DB unreachable or query failed
...
except DeepRecallError:
# Catch-all for any DeepRecall error
...
Retry with Exponential Backoff
Automatic retries for transient LLM and vector store failures.
from deeprecall import DeepRecall, DeepRecallConfig, RetryConfig
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
retry=RetryConfig(max_retries=3, base_delay=1.0, jitter=True),
)
engine = DeepRecall(vectorstore=store, config=config)
Batch Queries
Run multiple queries concurrently with a thread pool.
results = engine.query_batch(
["Question 1?", "Question 2?", "Question 3?"],
max_concurrency=4,
)
for r in results:
print(r.answer[:100])
FAISS Vector Store
Local vector index used by most ML teams.
from deeprecall.vectorstores import FAISSStore
store = FAISSStore(dimension=384, embedding_fn=my_embed_fn)
store.add_documents(["Hello world", "Foo bar"])
results = store.search("greeting")
# Persistence
store.save("./my_index")
store = FAISSStore.load("./my_index", embedding_fn=my_embed_fn)
Budget Guardrails
Control exactly how much a query can spend -- tokens, time, searches, or dollars.
from deeprecall import DeepRecall, QueryBudget
engine = DeepRecall(vectorstore=store, backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"})
result = engine.query(
"Complex multi-hop question?",
budget=QueryBudget(
max_search_calls=10, # Stop after 10 vector DB searches
max_tokens=50000, # Total token budget
max_time_seconds=30.0, # Wall-clock timeout
),
)
# Check what was used
print(result.budget_status) # {"iterations_used": 5, "search_calls_used": 8, ...}
Reasoning Trace
Full visibility into what the LLM did at every step -- code executed, outputs, searches made.
result = engine.query("What caused the 2008 financial crisis?")
for step in result.reasoning_trace:
print(f"Step {step.iteration}: {step.action}")
if step.searches:
print(f" Searched: {[s['query'] for s in step.searches]}")
if step.code:
print(f" Code: {step.code[:100]}...")
Callbacks
Hook into the reasoning pipeline for monitoring, logging, or custom integrations.
from deeprecall import DeepRecall, DeepRecallConfig, ConsoleCallback, JSONLCallback
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
callbacks=[
ConsoleCallback(), # Live step-by-step output
JSONLCallback(log_dir="./logs"), # Structured logging
],
)
engine = DeepRecall(vectorstore=store, config=config)
OpenTelemetry Tracing
Emit distributed traces to Jaeger, Datadog, Grafana Tempo, Honeycomb, or any OTLP backend.
from deeprecall import DeepRecall, DeepRecallConfig, OpenTelemetryCallback
otel = OpenTelemetryCallback(
service_name="my-rag-service",
# endpoint="https://otlp.datadoghq.com:4317", # Datadog
# headers={"DD-API-KEY": "your-key"},
)
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
callbacks=[otel],
)
# Every query() call emits a trace with child spans for each reasoning step and search
Caching (In-Memory, Disk, Redis)
Avoid redundant LLM and vector DB calls. Three backends: in-memory (dev), SQLite (single-machine), Redis (distributed/production).
from deeprecall import DeepRecall, DeepRecallConfig, InMemoryCache, DiskCache, RedisCache
# In-memory (fastest, ephemeral -- good for dev)
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
cache=InMemoryCache(max_size=500, default_ttl=3600),
)
# Disk / SQLite (persists across restarts, single machine)
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
cache=DiskCache(db_path="./deeprecall_cache.db"),
)
# Redis (distributed, production -- works with AWS ElastiCache, GCP Memorystore, etc.)
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
cache=RedisCache(url="redis://localhost:6379/0"),
)
engine = DeepRecall(vectorstore=store, config=config)
# Second identical query hits cache -- zero LLM cost
Reranking
Improve search quality with Cohere or cross-encoder rerankers.
from deeprecall.core import CohereReranker # or: CrossEncoderReranker
config = DeepRecallConfig(
backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"},
reranker=CohereReranker(api_key="co-..."),
)
Async Support & Thread Safety
DeepRecall is designed for high-concurrency production use. Every blocking operation (LLM calls, vector DB searches, cache I/O, file writes) is offloaded from the async event loop via asyncio.to_thread(). All shared state is protected with proper synchronization.
from deeprecall import AsyncDeepRecall
engine = AsyncDeepRecall(vectorstore=store, backend="openai",
backend_kwargs={"model_name": "gpt-4o-mini"})
# Non-blocking -- multiple queries can run concurrently
result = await engine.query("question")
await engine.add_documents(["new doc..."])
# Async batch queries
results = await engine.query_batch(["q1?", "q2?"], max_concurrency=4)
Server Auth & Rate Limiting
deeprecall serve --api-keys "key1,key2" --rate-limit 60 --port 8000
How It Works
- A lightweight HTTP server wraps your vector store on a random port
- A
search_db(query, top_k, filters)function is injected into the RLM's sandboxed REPL - The LLM enters a recursive loop -- it can search, write Python, call sub-LLMs, and search again
- When it has enough info, it returns a
FINAL()answer - You get back the answer, sources, full reasoning trace, budget usage, and confidence score
Vector Stores
| Store | Install | Needs embedding_fn? |
|---|---|---|
| ChromaDB | deeprecall[chroma] |
No (built-in) |
| Milvus | deeprecall[milvus] |
Yes |
| Qdrant | deeprecall[qdrant] |
Yes |
| Pinecone | deeprecall[pinecone] |
Yes |
| FAISS | deeprecall[faiss] |
Yes |
All stores implement the same interface: add_documents(), search(), delete(), count(), close().
All stores support context managers for automatic cleanup:
with ChromaStore(collection_name="my_docs") as store:
store.add_documents(["Hello world"])
results = store.search("greeting")
# connections released automatically
Custom Embedding Functions
Stores that require embedding_fn expect a callable with this signature. Both sync and async functions are supported -- async functions are auto-detected and wrapped.
def my_embed_fn(texts: list[str]) -> list[list[float]]:
"""Takes a list of strings, returns a list of embedding vectors."""
# Example using OpenAI:
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(input=texts, model="text-embedding-3-small")
return [e.embedding for e in response.data]
store = MilvusStore(collection_name="docs", embedding_fn=my_embed_fn)
# Async embedding functions also work directly (v0.4+):
async def my_async_embed(texts: list[str]) -> list[list[float]]:
return await embedding_service.embed(texts)
store = MilvusStore(collection_name="docs", embedding_fn=my_async_embed)
Framework Adapters
LangChain / LlamaIndex / OpenAI-compatible API -- see adapters docs.
deeprecall serve --vectorstore chroma --collection my_docs --port 8000
CLI
deeprecall init # Generate starter config
deeprecall ingest --path ./docs/ # Ingest documents
deeprecall query "question" --max-searches 10 --max-time 30
deeprecall serve --port 8000 --api-keys "key1,key2"
deeprecall delete doc_id_1 doc_id_2 # Delete documents
deeprecall status # Show version, installed extras
deeprecall benchmark --queries q.json # Run benchmark
The CLI automatically loads environment variables from a .env file via python-dotenv, so you can set OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. without exporting them in your shell.
Project Structure
deeprecall/
âââ core/ # Engine, config, guardrails, tracer, cache, callbacks, reranker
â âââ exceptions.py # DeepRecallError hierarchy
â âââ retry.py # Exponential backoff with jitter
â âââ deprecations.py # @deprecated decorator
â âââ logging_config.py # configure_logging() helper
â âââ cache.py # InMemoryCache, DiskCache (SQLite)
â âââ cache_redis.py # RedisCache (distributed)
â âââ callbacks.py # ConsoleCallback, JSONLCallback, UsageTrackingCallback, ProgressCallback
â âââ callback_otel.py # OpenTelemetry distributed tracing
â âââ async_engine.py # AsyncDeepRecall (non-blocking wrapper)
â âââ ...
âââ vectorstores/ # ChromaDB, Milvus, Qdrant, Pinecone, FAISS adapters
âââ adapters/ # LangChain, LlamaIndex, OpenAI-compatible server
âââ middleware/ # API key auth (sync + async), rate limiting (thread-safe)
âââ prompts/ # System prompts for the RLM
âââ cli.py # CLI entry point
tests/
âââ test_exceptions.py # Exception hierarchy tests
âââ test_retry.py # Retry logic tests
âââ test_batch.py # Batch query tests
âââ test_deprecations.py # Deprecation utility tests
âââ test_concurrency.py # Thread safety & race condition tests
âââ ... # 460+ tests (unit + integration + live + e2e)
Contributing
git clone https://github.com/kothapavan1998/deeprecall.git
cd deeprecall
pip install -e ".[all]"
make check
See CONTRIBUTING.md.
Citation
Built on Recursive Language Models by Zhang, Kraska, and Khattab (MIT).
License
MIT
đ Quick Start
pip install deeprecall[chroma] # ChromaDB (local, zero-config)
pip install deeprecall[milvus] # Milvus
pip install deeprecall[qdrant] # Qdrant
pip install deeprecall[pinecone] # Pinecone
pip install deeprecall[faiss] # FAISS (local, ML-native)
pip install deeprecall[server] # API server (FastAPI + uvicorn)
pip install deeprecall[rich] # Rich console output (verbose mode)
pip install deeprecall[redis] # Redis distributed cache
pip install deeprecall[otel] # OpenTelemetry tracing
pip install deeprecall[langchain] # LangChain adapter
pip install deeprecall[llamaindex] # LlamaIndex adapter
pip install deeprecall[rerank-cohere] # Cohere reranker
pip install deeprecall[rerank-cross-encoder] # Cross-encoder reranker
pip install deeprecall[all] # Everything
â ī¸ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source âđ Limitations & Considerations
- âĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- âĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- âĸ FNI scores are relative rankings and may change as new models are added.
- â License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on GitHub metadata. Not a recommendation.
đĄī¸ Model Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- gh-model--kothapavan1998--deeprecall
- slug
- kothapavan1998--deeprecall
- source
- github
- author
- kothapavan1998
- license
- MIT
- tags
- ai-agents, chromadb, langchain, llamaindex, milvus, openai, pinecone, python, qdrant, rag, reasoning-engine, recursive-language-model, retrieval-augmented-generation, rlm, vector-database
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
- other
đ Engagement & Metrics
- downloads
- 0
- stars
- 8
- forks
- 0
Data indexed from public sources. Updated daily.