API guide
Authentication, the recall API, and the full REST reference. New to ThinkingMemory? Start with the documentation to understand how it works and why, then come back here to build.
Introduction
A raw vector store returns nearest neighbours. An agent needs the usefulcontext for its current task, within a token budget. ThinkingMemory's single query primitive is recall: you send an intent, it returns a deduped, cited context string built from hybrid retrieval, reranking, and token-budget packing, and it improves as your agent runs. It is not a vector store and not an SDK; it is a database whose query is recall.
The hosted service runs at https://memory.thinkingdbx.com. The core engine is open source under Apache-2.0 (github.com/mplusm/thinkingmemory).
Core concepts
One unified memory substrate
Everything is a memory: a row with text, structured content, an embedding, and metadata. The classic "layers" are just a tag (mtype) on that row, so recall can search across or within them.
| mtype | Use it for |
|---|---|
episodic | Events and observations: what happened, when. |
semantic | Facts and knowledge: stable truths about the world or user. |
procedural | How-to: steps, skills, and successful procedures. |
working | Short-lived scratch context for the current task. |
Memory attributes
| Field | Meaning |
|---|---|
agent_id | Which agent the memory belongs to. The data plane is per-agent. |
scope | private (default), shared, or global visibility within your tenant. |
salience | Importance weight. Each recall boosts the salience of what it surfaced. |
confidence | How sure you are of the memory (0–1). |
decay_rate | How quickly relevance fades over time (defaults per mtype). |
provenance | Where it came from: source, derived_from, etc. Powers the trace. |
Getting started
1. Create an account (a tenant is provisioned automatically). 2. In the console go to API keys and mint a key — it is shown once. 3. Authenticate every data-plane request with the X-API-Key header.
export TM=https://memory.thinkingdbx.com
export TM_API_KEY=tm_live_... # from the console -> API keysTwo credential types exist: API keys (tm_live_…) for agents/apps calling /v1, and a console session (email + password or magic-link) for humans managing the account. This guide uses API keys.
Quickstart
Store a memory
curl -X POST $TM/v1/remember \
-H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
-d '{
"agent_id": "support-bot",
"content": {"text": "Customer ACME prefers email over phone."},
"mtype": "semantic"
}'Recall context
curl -X POST $TM/v1/recall \
-H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
-d '{
"agent_id": "support-bot",
"intent": "How should I contact ACME?",
"token_budget": 512
}'The response is ready to drop into your prompt:
{
"intent": "How should I contact ACME?",
"context": "[1] Customer ACME prefers email over phone.",
"items": [
{ "citation": 1, "id": 42, "mtype": "semantic", "score": 0.91,
"why": ["vector", "keyword", "rerank"], "text": "Customer ACME prefers email over phone.",
"provenance": null }
],
"tokens_used": 12,
"tokens_saved_vs_dump": 240,
"dump_tokens": 252,
"candidates_considered": 7
}Recall in depth
A single recall call runs a pipeline:
- Candidate generation — three retrievers in parallel: vector similarity (cosine over embeddings), keyword (PostgreSQL full-text), and recency.
- Graph expansion — optionally pull in neighbours of the top hits up to
graph_hops. - Fusion — combine the ranked lists with Reciprocal Rank Fusion, weighted by salience.
- Rerank — optionally score the top candidates against the intent with a cross-encoder (on by default on the cloud).
- Packing — fit the highest-ranked memories into
token_budget, dedupe, and number them with[n]citations.
The why array on each item tells you which signals surfaced it (vector, keyword, recency, graph, rerank). tokens_saved_vs_dump shows how much smaller the packed context is than dumping every memory.
Parameters
| Field | Default | Description |
|---|---|---|
agent_id | — | Required. Whose memory to search. |
intent | — | Required. What the agent needs right now. |
token_budget | 4000 | Max tokens for the packed context. |
k | 20 | Max number of items returned. |
mtypes | all | Restrict to certain memory types. |
scopes | all | Restrict to certain scopes. |
rerank | setting | Cross-encoder rerank on/off. |
graph_hops | 0 | Expand via the entity graph (0 = off). |
as_of | now | Recall against what the agent believed at a past time. |
Memory lifecycle
Memory curates itself in the background so quality does not degrade as volume grows. The lifecycle engine runs on a daily scheduler (and on demand via /v1/maintenance/run):
- Decay — relevance fades over time per each memory's
decay_rate; recall counteracts decay for useful memories. - Extraction — turn recent episodic memories into durable semantic facts.
- Consolidation — merge near-duplicates into a single stronger memory.
- Supersession — newer information replaces stale information.
- Contradiction resolution — detect conflicting memories and keep the right one.
- Forgetting — prune low-salience, expired memories.
curl -X POST $TM/v1/maintenance/run \
-H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
-d '{"agent_id": "support-bot", "interval_days": 1}'Bitemporal & audit
Every memory carries a validity window, so you can ask what an agent believed at any point in time, not just what is true now. Pass as_of to recall, or read a full snapshot from the timeline.
# what the agent believed on a given date
curl "$TM/v1/timeline/support-bot?as_of=2026-06-01T00:00:00Z" \
-H "X-API-Key: $TM_API_KEY"trace answers "why do I know this?" by walking a memory's provenance chain, and audit is an append-only log of every operation.
curl "$TM/v1/trace/42?depth=3" -H "X-API-Key: $TM_API_KEY"
curl "$TM/v1/audit?agent_id=support-bot&limit=50" -H "X-API-Key: $TM_API_KEY"Entity graph
Memories can be linked into a graph, and recall can expand along those links. Create edges with link, inspect them with neighbors, and set graph_hops on recall to fold neighbours into the candidate set.
curl -X POST $TM/v1/link \
-H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
-d '{"src_id": 42, "dst_id": 43, "relation": "relates_to", "bidirectional": true}'
curl "$TM/v1/neighbors/42?depth=2" -H "X-API-Key: $TM_API_KEY"Working memory
For ephemeral, per-agent scratch state, use the Redis-backed working-memory store under /working. It is a simple TTL'd key/value space, separate from the durable memory database.
curl -X POST $TM/working/store \
-H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
-d '{"agent_id": "support-bot", "key": "current_ticket", "value": "ACME-1021"}'API reference
Base URL https://memory.thinkingdbx.com. Data-plane endpoints require X-API-Key; account endpoints use a console session.
Memory database — /v1
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/remember | Store a memory. |
| POST | /v1/remember/batch | Store many memories. |
| POST | /v1/recall | Intent in, packed cited context out. |
| GET | /v1/memory/{id} | Fetch one memory. |
| GET | /v1/trace/{id} | Provenance chain. |
| GET | /v1/timeline/{agent_id} | Beliefs as of a time (?as_of=). |
| GET | /v1/audit | Audit log. |
| POST | /v1/forget | Soft (default) or hard delete a memory. |
| POST | /v1/link | Create a graph edge. |
| GET | /v1/neighbors/{id} | Graph neighbours. |
| POST | /v1/maintenance/run | Run the lifecycle now. |
remember — request body
| Field | Default | Notes |
|---|---|---|
agent_id | — | Required. |
content | — | Required. Structured object; text defaults to a rendering of it. |
text | auto | The text used for embedding/keyword search. |
mtype | episodic | episodic / semantic / procedural / working. |
scope | private | private / shared / global. |
salience / confidence | 1.0 | Weighting and certainty. |
decay_rate | per mtype | Optional override. |
provenance | null | Optional source metadata. |
Errors & status codes
| Code | Meaning |
|---|---|
200 / 201 | Success. |
401 | Missing or invalid API key / session. |
402 | Plan limit reached (storage, agents, or monthly operations). Upgrade to continue. |
404 | Resource not found. |
422 | Validation error (check the detail field). |
429 | Rate limit exceeded for your plan. |
Plans, limits & quotas
Every plan sets caps on agents, stored memories, monthly operations (recall + remember), and request rate. Exceeding a cap returns 402. See current numbers and upgrade on the pricing page; track usage on the console's Billing page.
Embeddings
You send text, not vectors. Embeddings are generated server-side with a local model (BAAI/bge-small-en-v1.5, 384 dimensions), so there are no per-token embedding fees and nothing extra to run. The same model is used for storing and for recall queries.
MCP integration
ThinkingMemory is MCP-native. Point an MCP-compatible client at it to give the agentremember and recall tools directly.
{
"mcpServers": {
"thinkingmemory": {
"command": "npx",
"args": ["-y", "@thinkingmemory/mcp"],
"env": { "TM_API_KEY": "tm_live_...", "TM_BASE_URL": "https://memory.thinkingdbx.com" }
}
}
}Security & isolation
- Per-tenant isolation — every memory is scoped to your tenant and enforced at the database with PostgreSQL row-level security plus hash partitioning, on top of application scoping.
- API keys — stored only as hashes; the full value is shown once. Rotate or revoke from the console at any time.
- Encryption in transit — all traffic is TLS.
- Payments — handled by our Merchant of Record, Paddle; we never store card details. See the Privacy Policy.
FAQ
Is this a vector database?
No. Vectors are one signal; recall fuses vector, keyword, recency, and graph, reranks, and packs a budgeted, cited context.
Do I have to send embeddings?
No — send text. Embeddings are generated server-side.
Which frameworks and models work?
Any. ThinkingMemory is agent-agnostic and MCP-native.
Can I self-host?
Yes — the core engine is Apache-2.0. The cloud is the managed, multi-tenant service on top.
Questions? Contact us or read the open-source engine docs on GitHub.