Commit graph

19 commits

Author SHA1 Message Date
Peter Tripp
b4659bb44e
Fix inaccurate Ollama context length for qwen2.5 models (#20933)
Since Ollama/llama.cpp do not currently YARN for context length
extension, the context length is limited to `32768`. This can be
confirmed by the Ollama model card.
See corresponding issue on Ollama repo : 
https://github.com/ollama/ollama/issues/6865

Co-authored-by: Patrick Samson <1416027+patricksamson@users.noreply.github.com>
2024-11-22 10:10:01 -05:00
Thorsten Ball
aee01f2c50
assistant: Remove low_speed_timeout (#20681)
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.

Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.

Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.

So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.

Closes #19509

Release Notes:

- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.

---------

Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
2024-11-15 07:37:31 +01:00
Conrad Irwin
02d0561586
Fix read timeout for ollama (#18417)
Supercedes: #18310

Release Notes:

- Fixed `low_speed_timeout_in_seconds` for Ollama
2024-09-27 00:36:17 -06:00
Peter Tripp
7398f795e3
Ollama llama3.2 default context size (#18366)
Release Notes:

- Ollama: Added llama3.2 support
2024-09-25 18:01:12 -04:00
John Cummings
8a7ef4db59
ollama: Add max tokens for qwen2.5-coder (#18290) 2024-09-24 13:17:17 -04:00
Piotr Osiewicz
43e005e936
chore: Remove commented out code following 15446 (#18047)
Closes #ISSUE

Release Notes:

- N/A
2024-09-19 02:19:58 +02:00
Piotr Osiewicz
2c8a6ee7cc
remote_server: Remove dependency on libssl and libcrypto (#15446)
Fixes: #15599
Release Notes:

- N/A

---------

Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Conrad <conrad@zed.dev>
2024-09-18 23:29:34 +02:00
Daniel Rauber
8660719bd1
ollama: Add context_size for new "yi-coder" model (#17409)
Release Notes:

- Added context_size for "yi-coder" model in ollama

More information about the model on ollama:
https://ollama.com/library/yi-coder:9b
2024-09-05 11:05:57 -04:00
Peter Tripp
b62e63349b
Ollama max_tokens settings (#17025)
- Support `available_models` for Ollama
- Clamp default max tokens (context length) to 16384.
- Add documentation for ollama context configuration.
2024-08-30 08:52:00 -04:00
Peter Tripp
7936fe40ae
ollama: Support model context_size (num_ctx) >2048 (#16877) 2024-08-26 11:09:47 -04:00
Max Brunsfeld
4c390b82fb
Make LanguageModel::use_any_tool return a stream of chunks (#16262)
Some checks are pending
CI / Check formatting and spelling (push) Waiting to run
CI / (macOS) Run Clippy and tests (push) Waiting to run
CI / (Linux) Run Clippy and tests (push) Waiting to run
CI / (Windows) Run Clippy and tests (push) Waiting to run
CI / Create a macOS bundle (push) Blocked by required conditions
CI / Create a Linux bundle (push) Blocked by required conditions
CI / Create arm64 Linux bundle (push) Blocked by required conditions
Deploy Docs / Deploy Docs (push) Waiting to run
Docs / Check formatting (push) Waiting to run
This PR is a refactor to pave the way for allowing the user to view and
edit workflow step resolutions. I've made tool calls work more like
normal streaming completions for all providers. The `use_any_tool`
method returns a stream of strings (which contain chunks of JSON). I've
also done some minor cleanup of language model providers in general,
removing the duplication around handling streaming responses.

Release Notes:

- N/A
2024-08-14 18:02:46 -07:00
Piotr Osiewicz
874f0c0712
assistant: Use tools in other providers (#15803)
- [x] OpenAI
- [ ] ~Google~ Moved into a separate branch at:
https://github.com/zed-industries/zed/tree/tool-calls-in-google-ai I've
ran into issues with having the API digest our schema without tripping
over itself - the function call parameters are malformed and whatnot. We
can resume from that branch if needed.
- [x] Ollama
- [x] Cloud
- [ ] ~Copilot Chat (?)~

Release Notes:

- Added tool calling capabilities to OpenAI and Ollama models.
2024-08-06 15:45:47 +02:00
Bennet Bo Fenner
af4b9805c9
assistant: Fix issues when configuring different providers (#15072)
Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
2024-07-24 11:21:31 +02:00
Mikayla Maki
855048041d
Update http crate name (#15041)
Release Notes:

- N/A
2024-07-23 15:01:05 -07:00
Kyle Kelley
53f702c92f
Allow Ollama Model KeepAlive to be None, defaulting to indefinite (#13059)
Putting this back to `Option<KeepAlive>` to make existing configs keep
working.

Release Notes:

- N/A
2024-06-14 10:33:28 -07:00
Kyle Kelley
d9c21b4eb1
Accept numeric keep alive in Ollama settings (#13046)
This adds the ability to set the keep alive as an integer, including
`-1` for staying alive indefinitely until a new model is loaded or
Ollama exits. I've also set the default to `-1` so that models stay
ready to go for Zed to use.

Release Notes:

- N/A
2024-06-14 09:35:04 -07:00
Marshall Bowers
72dac24acf
Add missing LICENSE file to ollama crate (#12943)
This PR adds a missing LICENSE file to the recently-added `ollama`
crate.

Also added the missing `lints.workspace = true` to the `Cargo.toml`.

Release Notes:

- N/A
2024-06-12 15:12:36 -04:00
Kyle Kelley
bee3441c78
Ollama improvements (#12921)
Attempt to load the model early on when the user has switched the model.

This is a follow up to #12902

Release Notes:

- N/A
2024-06-12 08:10:51 -07:00
Kyle Kelley
4cb8d6f40e
Ollama Provider for Assistant (#12902)
Closes #4424.

A few design decisions that may need some rethinking or later PRs:

* Other providers have a check for authentication. I use this
opportunity to fetch the models which doubles as a way of finding out if
the Ollama server is running.
* Ollama has _no_ API for getting the max tokens per model
* Ollama has _no_ API for getting the current token count
https://github.com/ollama/ollama/issues/1716
* Ollama does allow setting the `num_ctx` so I've defaulted this to
4096. It can be overridden in settings.
* Ollama models will be "slow" to start inference because they're
loading the model into memory. It's faster after that. There's no UI
affordance to show that the model is being loaded.

Release Notes:

- Added an Ollama Provider for the assistant. If you have
[Ollama](https://ollama.com/) running locally on your machine, you can
enable it in your settings under:

```jsonc
"assistant": {
    "version": "1",
    "provider": {
      "name": "ollama",
      // Recommended setting to allow for model startup
      "low_speed_timeout_in_seconds": 30,
    }
}
```

Chat like usual

<img width="1840" alt="image"
src="https://github.com/zed-industries/zed/assets/836375/4e0af266-4c4f-4d9e-9d74-1a91f76a12fe">

Interact with any model from the [Ollama
Library](https://ollama.com/library)

<img width="587" alt="image"
src="https://github.com/zed-industries/zed/assets/836375/87433ac6-bf87-4a99-89e1-96a93bf8de8a">

Open up the terminal to download new models via `ollama pull`:


![image](https://github.com/zed-industries/zed/assets/836375/af7ec411-76bf-41c7-ba81-64bbaeea98a8)
2024-06-11 17:35:27 -07:00