server: usage api

Add a new /api/usage endpoint that shows aggregate usage statistics per model since the server started.
2026-04-24 17:55:43 +02:00 · 2026-01-27 17:01:18 -08:00
parent 9aefd2dfee
commit 91dc088e8b
6 changed files with 366 additions and 12 deletions
--- a/docs/api.md
+++ b/docs/api.md
@@ -15,6 +15,7 @@
 - [Push a Model](#push-a-model)
 - [Generate Embeddings](#generate-embeddings)
 - [List Running Models](#list-running-models)
+- [Usage](#usage)
 - [Version](#version)
 - [Experimental: Image Generation](#image-generation-experimental)

@@ -1854,6 +1855,53 @@ curl http://localhost:11434/api/embeddings -d '{
 }
 ```

+## Usage
+
+```
+GET /api/usage
+```
+
+Show aggregate usage statistics per model since the server started. All timestamps are UTC in RFC 3339 format.
+
+### Examples
+
+#### Request
+
+```shell
+curl http://localhost:11434/api/usage
+```
+
+#### Response
+
+```json
+{
+  "start": "2025-01-27T20:00:00Z",
+  "usage": [
+    {
+      "model": "llama3.2",
+      "requests": 5,
+      "prompt_tokens": 130,
+      "completion_tokens": 890
+    },
+    {
+      "model": "deepseek-r1",
+      "requests": 2,
+      "prompt_tokens": 48,
+      "completion_tokens": 312
+    }
+  ]
+}
+```
+
+#### Response fields
+
+- `start`: when the server started tracking usage (UTC, RFC 3339)
+- `usage`: list of per-model usage statistics
+  - `model`: model name
+  - `requests`: total number of completed requests
+  - `prompt_tokens`: total prompt tokens evaluated
+  - `completion_tokens`: total completion tokens generated
+
 ## Version

 ```