Michael Yang
8cb7b94c40
next ollama runner
...
implement llama and mllama model architectures in go using ggml (through
cgo)
2025-01-20 09:39:43 -08:00
Patrick Devine
86a622cbdc
Update the /api/create endpoint to use JSON ( #7935 )
...
Replaces `POST /api/create` to use JSON instead of a Modelfile.
This is a breaking change.
2024-12-31 18:02:30 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 ( #6963 )
...
Co-authored-by: jmorganca <jmorganca@gmail.com >
Co-authored-by: Michael Yang <mxyng@pm.me >
Co-authored-by: Jesse Gross <jesse@ollama.com >
2024-10-18 16:12:35 -07:00
Jeffrey Morgan
d05da29912
server: add tool parsing support for nemotron-mini ( #6849 )
2024-09-17 18:06:16 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Josh
980dd15f81
cmd: speed up gguf creates ( #6324 )
2024-08-12 11:46:09 -07:00
Josh
1dc3ef3aa9
Revert "server: speed up single gguf creates ( #5898 )" ( #6323 )
...
This reverts commit 8aac22438e .
2024-08-12 09:57:51 -07:00
Josh
8aac22438e
server: speed up single gguf creates ( #5898 )
2024-08-12 09:28:55 -07:00
Jesse Gross
7edaf6e7e8
manifest: Store layers inside manifests consistently as values.
...
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840 )") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.
This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.
2024-08-07 17:03:06 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
...
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Michael Yang
43606d6d6a
fix parsing tool calls
2024-07-18 12:08:11 -07:00
Michael Yang
b255445557
marshal json automatically for some template values ( #5758 )
2024-07-17 15:35:11 -07:00
Michael Yang
5fd6988126
parse tool call as individual objects
2024-07-17 11:19:04 -07:00
Michael Yang
5a83f79afd
remove unneeded tool calls
2024-07-16 13:48:45 -07:00
Michael Yang
5afbb60fc4
fix unmarshal type errors
2024-07-16 11:39:34 -07:00
Michael Yang
d02bbebb11
tools
2024-07-15 15:26:16 -07:00
Michael Yang
ebc529cbb3
autodetect stop parameters from template
2024-07-12 16:01:23 -07:00
Michael Yang
dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
...
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
88bcd79bb9
err on insecure path
2024-07-01 15:55:59 -07:00
Michael Yang
58e3fff311
rename templates to template
2024-07-01 10:40:54 -07:00
Michael Yang
123a722a6f
zip: prevent extracting files into parent dirs ( #5314 )
2024-06-26 21:38:21 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot ( #5246 )
...
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:
* Too many allocations when decoding strings
* Hitting disk for each read of each key and value, resulting in a
not-okay amount of syscalls/disk I/O.
The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.
This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.
Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Michael Yang
b2f00aa977
close zip files
2024-05-06 15:27:19 -07:00
Michael Yang
f5e8b207fb
s/DisplayLongest/String/
2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383
no iterator
2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a
comments
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00