This one is tricky, the thought came into my mind first (is likely wrong) is using different LLM…

1 min readJan 12, 2024

This one is tricky, the thought came into my mind first (is likely wrong) is using different LLM model (like chat vs completion variant)

I didn't check the code, but the API doc say we can specify model in the call, so we can use a chat model in generate API and a complete model in chat API.

They leave me to the guess that's mostly relate to the preparation of data. We can see the endpoint of complete, the sample use some indicator [INST] and [/INST] to enclose the message, which that's prepared for you when you use the Ollama client (I read the there is a template to process the message), I vaguely guess that might be related to Llama model training/data ??? (I have absolute no evidence)

While in the chat model, it use the standard role-message object arrange as input to prepare data, which, I can only assume there is yet another template to process the data into some form of [INST]<msg>[/INST] format (again, I haven't check code on that...)

Written by Stephen Cow Chau

Responses (1)