Quick note on using AWS Bedrock with Autogen and LiteLLM
Intention
The intention of this article is to remind myself the hiccup I encountered while testing out AWS Bedrock with Autogen.
Note that because the AI landscape and libraries change rapidly and so this article could be outdated very soon.
(Date of writing and testing: Nov 2023)
Journey / Challenges
First Challenge — Autogen interface with AWS Bedrock
The first challenge is Autogen is implemented in the mind of interfacing with OpenAI API, so there is no native support for calling AWS Bedrock (or other LLMs).
The obvious solution would be wrapping an API call that look like OpenAI API so that the openai python package (and thus autogen) can interact with.
Here comes several packages, like Autogen blog post for using local LLM suggest FastChat, or other like vLLM…
My take is LiteLLM as it’s found from the git pull request of autogen. The nice thing about LiteLLM is it support AWS Bedrock (while other framework might be more focus on local LLM run).
Second Challenge — package version (openai, autogen, litellm)
As of the time of writing and testing (Nov 2023), OpenAI have lately released v1.0 of openai-python (seems to be 30 Sept 2023) and so there are breaking changes in code.
My previous test of autogen are package version 0.1.x, which imply openai-python version 0.28, but litellm install is expecting openai-python 1.x, so the obvious solution is to use autogen version 0.2+ which should have dependency openai v1.x and so both library is happy.
Third Challenge — Serving LiteLLM while NOT blocking the Google Colab notebook execution
Given running LiteLLM involve using shell execution to serve up the API (with uvicorn) which would block further cell execution.
There is this nice extension call “colab-xterm” to help:
Side note: if one encouter a 403 running this extension, that’s some cookies unable to be used, here is the way to enable it in Chrome:
Forth Challenge (which is an observation) — passing extra parameters
Note that the command we run LiteLLM already included the switch “drop_params”, this helped to ignore the unnecessary parameters being passed into the target model/API (and avoid breaking the call)
A typical config for autogen that most tutorial sample would suggest is as follow:
When we go back to check the xTerm output for LiteLLM, we would see the generation is empty, and another interesting finding is, there also have a dictionary {temperature: 0}
So trying out removing the temperature key in llm_config:
Note that, with or without this dictionary does not improve the performance of the result, in which Llama2 13B chat is not a great model for autogen agent conversation (while…there are countries on Earth where Claude model is not supported…).
Last observation — Model performance of Llama2 13B chat
Chekcing out issues discussion on autogen github on general non-openAI LLM performance, the common conclusion is, there are some issue on group discussion select role as well as the prompt improvement needs.
From my own experiment with Llama2 model, I think the ability to retain conversation context does not match the level of OpenAI or Claude models, so we might need to craft very specific prompt (which imply we cannot use Autogen’s implemented agents and need to write our own agent with own prompt template) for that.
Also, the model only have 4k context size make it almost impossible to perform any agent conversation (which would easily go beyond 4k token size in 1 or 2 rounds of conversation among agents).
Conclusion
The conclusion is simple, as far as I am in the country that cannot use Claude yet, I think I would not further explore using AWS Bedrock Llama 2 chat and stick back with OpenAI.