API guide

Authentication, the recall API, and the full REST reference. New to ThinkingMemory? Start with the documentation to understand how it works and why, then come back here to build.

Introduction

A raw vector store returns nearest neighbours. An agent needs the usefulcontext for its current task, within a token budget. ThinkingMemory's single query primitive is recall: you send an intent, it returns a deduped, cited context string built from hybrid retrieval, reranking, and token-budget packing, and it improves as your agent runs. It is not a vector store and not an SDK; it is a database whose query is recall.

The hosted service runs at https://memory.thinkingdbx.com. The core engine is open source under Apache-2.0 (github.com/mplusm/thinkingmemory).

Core concepts

One unified memory substrate

Everything is a memory: a row with text, structured content, an embedding, and metadata. The classic "layers" are just a tag (mtype) on that row, so recall can search across or within them.

mtype	Use it for
`episodic`	Events and observations: what happened, when.
`semantic`	Facts and knowledge: stable truths about the world or user.
`procedural`	How-to: steps, skills, and successful procedures.
`working`	Short-lived scratch context for the current task.

Memory attributes

Field	Meaning
`agent_id`	Which agent the memory belongs to. The data plane is per-agent.
`scope`	`private` (default), `shared`, or `global` visibility within your tenant.
`salience`	Importance weight. Each recall boosts the salience of what it surfaced.
`confidence`	How sure you are of the memory (0–1).
`decay_rate`	How quickly relevance fades over time (defaults per mtype).
`provenance`	Where it came from: source, `derived_from`, etc. Powers the trace.

Getting started

1. Create an account (a tenant is provisioned automatically). 2. In the console go to API keys and mint a key — it is shown once. 3. Authenticate every data-plane request with the X-API-Key header.

export TM=https://memory.thinkingdbx.com
export TM_API_KEY=tm_live_...   # from the console -> API keys

Two credential types exist: API keys (tm_live_…) for agents/apps calling /v1, and a console session (email + password or magic-link) for humans managing the account. This guide uses API keys.

Quickstart

Store a memory

curl -X POST $TM/v1/remember \
  -H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
  -d '{
    "agent_id": "support-bot",
    "content": {"text": "Customer ACME prefers email over phone."},
    "mtype": "semantic"
  }'

Recall context

curl -X POST $TM/v1/recall \
  -H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
  -d '{
    "agent_id": "support-bot",
    "intent": "How should I contact ACME?",
    "token_budget": 512
  }'

The response is ready to drop into your prompt:

{
  "intent": "How should I contact ACME?",
  "context": "[1] Customer ACME prefers email over phone.",
  "items": [
    { "citation": 1, "id": 42, "mtype": "semantic", "score": 0.91,
      "why": ["vector", "keyword", "rerank"], "text": "Customer ACME prefers email over phone.",
      "provenance": null }
  ],
  "tokens_used": 12,
  "tokens_saved_vs_dump": 240,
  "dump_tokens": 252,
  "candidates_considered": 7
}

Recall in depth

A single recall call runs a pipeline:

Candidate generation — three retrievers in parallel: vector similarity (cosine over embeddings), keyword (PostgreSQL full-text), and recency.
Graph expansion — optionally pull in neighbours of the top hits up to graph_hops.
Fusion — combine the ranked lists with Reciprocal Rank Fusion, weighted by salience.
Rerank — optionally score the top candidates against the intent with a cross-encoder (on by default on the cloud).
Packing — fit the highest-ranked memories into token_budget, dedupe, and number them with [n] citations.

The why array on each item tells you which signals surfaced it (vector, keyword, recency, graph, rerank). tokens_saved_vs_dump shows how much smaller the packed context is than dumping every memory.

Parameters

Field	Default	Description
`agent_id`	—	Required. Whose memory to search.
`intent`	—	Required. What the agent needs right now.
`token_budget`	`4000`	Max tokens for the packed context.
`k`	`20`	Max number of items returned.
`mtypes`	all	Restrict to certain memory types.
`scopes`	all	Restrict to certain scopes.
`rerank`	setting	Cross-encoder rerank on/off.
`graph_hops`	`0`	Expand via the entity graph (0 = off).
`as_of`	now	Recall against what the agent believed at a past time.

Memory lifecycle

Memory curates itself in the background so quality does not degrade as volume grows. The lifecycle engine runs on a daily scheduler (and on demand via /v1/maintenance/run):

Decay — relevance fades over time per each memory's decay_rate; recall counteracts decay for useful memories.
Extraction — turn recent episodic memories into durable semantic facts.
Consolidation — merge near-duplicates into a single stronger memory.
Supersession — newer information replaces stale information.
Contradiction resolution — detect conflicting memories and keep the right one.
Forgetting — prune low-salience, expired memories.

curl -X POST $TM/v1/maintenance/run \
  -H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
  -d '{"agent_id": "support-bot", "interval_days": 1}'

Bitemporal & audit

Every memory carries a validity window, so you can ask what an agent believed at any point in time, not just what is true now. Pass as_of to recall, or read a full snapshot from the timeline.

# what the agent believed on a given date
curl "$TM/v1/timeline/support-bot?as_of=2026-06-01T00:00:00Z" \
  -H "X-API-Key: $TM_API_KEY"

trace answers "why do I know this?" by walking a memory's provenance chain, and audit is an append-only log of every operation.

curl "$TM/v1/trace/42?depth=3" -H "X-API-Key: $TM_API_KEY"
curl "$TM/v1/audit?agent_id=support-bot&limit=50" -H "X-API-Key: $TM_API_KEY"

Entity graph

Memories can be linked into a graph, and recall can expand along those links. Create edges with link, inspect them with neighbors, and set graph_hops on recall to fold neighbours into the candidate set.

curl -X POST $TM/v1/link \
  -H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
  -d '{"src_id": 42, "dst_id": 43, "relation": "relates_to", "bidirectional": true}'

curl "$TM/v1/neighbors/42?depth=2" -H "X-API-Key: $TM_API_KEY"

Working memory

For ephemeral, per-agent scratch state, use the Redis-backed working-memory store under /working. It is a simple TTL'd key/value space, separate from the durable memory database.

curl -X POST $TM/working/store \
  -H "X-API-Key: $TM_API_KEY" -H "Content-Type: application/json" \
  -d '{"agent_id": "support-bot", "key": "current_ticket", "value": "ACME-1021"}'

API reference

Base URL https://memory.thinkingdbx.com. Data-plane endpoints require X-API-Key; account endpoints use a console session.

Memory database — `/v1`

Method	Endpoint	Description
POST	`/v1/remember`	Store a memory.
POST	`/v1/remember/batch`	Store many memories.
POST	`/v1/recall`	Intent in, packed cited context out.
GET	`/v1/memory/{id}`	Fetch one memory.
GET	`/v1/trace/{id}`	Provenance chain.
GET	`/v1/timeline/{agent_id}`	Beliefs as of a time (`?as_of=`).
GET	`/v1/audit`	Audit log.
POST	`/v1/forget`	Soft (default) or hard delete a memory.
POST	`/v1/link`	Create a graph edge.
GET	`/v1/neighbors/{id}`	Graph neighbours.
POST	`/v1/maintenance/run`	Run the lifecycle now.

remember — request body

Field	Default	Notes
`agent_id`	—	Required.
`content`	—	Required. Structured object; `text` defaults to a rendering of it.
`text`	auto	The text used for embedding/keyword search.
`mtype`	`episodic`	episodic / semantic / procedural / working.
`scope`	`private`	private / shared / global.
`salience` / `confidence`	`1.0`	Weighting and certainty.
`decay_rate`	per mtype	Optional override.
`provenance`	null	Optional source metadata.

Errors & status codes

Code	Meaning
`200 / 201`	Success.
`401`	Missing or invalid API key / session.
`402`	Plan limit reached (storage, agents, or monthly operations). Upgrade to continue.
`404`	Resource not found.
`422`	Validation error (check the `detail` field).
`429`	Rate limit exceeded for your plan.

Plans, limits & quotas

Every plan sets caps on agents, stored memories, monthly operations (recall + remember), and request rate. Exceeding a cap returns 402. See current numbers and upgrade on the pricing page; track usage on the console's Billing page.

Embeddings

You send text, not vectors. Embeddings are generated server-side with a local model (BAAI/bge-small-en-v1.5, 384 dimensions), so there are no per-token embedding fees and nothing extra to run. The same model is used for storing and for recall queries.

MCP integration

ThinkingMemory is MCP-native. Point an MCP-compatible client at it to give the agentremember and recall tools directly.

{
  "mcpServers": {
    "thinkingmemory": {
      "command": "npx",
      "args": ["-y", "@thinkingmemory/mcp"],
      "env": { "TM_API_KEY": "tm_live_...", "TM_BASE_URL": "https://memory.thinkingdbx.com" }
    }
  }
}

Security & isolation

Per-tenant isolation — every memory is scoped to your tenant and enforced at the database with PostgreSQL row-level security plus hash partitioning, on top of application scoping.
API keys — stored only as hashes; the full value is shown once. Rotate or revoke from the console at any time.
Encryption in transit — all traffic is TLS.
Payments — handled by our Merchant of Record, Paddle; we never store card details. See the Privacy Policy.

FAQ

Is this a vector database?

No. Vectors are one signal; recall fuses vector, keyword, recency, and graph, reranks, and packs a budgeted, cited context.

Do I have to send embeddings?

No — send text. Embeddings are generated server-side.

Which frameworks and models work?

Any. ThinkingMemory is agent-agnostic and MCP-native.

Can I self-host?

Yes — the core engine is Apache-2.0. The cloud is the managed, multi-tenant service on top.

Questions? Contact us or read the open-source engine docs on GitHub.