Quick note on using AWS Bedrock with Autogen and LiteLLM

Stephen Cow Chau

4 min readNov 23, 2023

Intention

The intention of this article is to remind myself the hiccup I encountered while testing out AWS Bedrock with Autogen.

Note that because the AI landscape and libraries change rapidly and so this article could be outdated very soon.

(Date of writing and testing: Nov 2023)

Journey / Challenges

First Challenge — Autogen interface with AWS Bedrock

The first challenge is Autogen is implemented in the mind of interfacing with OpenAI API, so there is no native support for calling AWS Bedrock (or other LLMs).

The obvious solution would be wrapping an API call that look like OpenAI API so that the openai python package (and thus autogen) can interact with.

Here comes several packages, like Autogen blog post for using local LLM suggest FastChat, or other like vLLM…

My take is LiteLLM as it’s found from the git pull request of autogen. The nice thing about LiteLLM is it support AWS Bedrock (while other framework might be more focus on local LLM run).

Second Challenge — package version (openai, autogen, litellm)

As of the time of writing and testing (Nov 2023), OpenAI have lately released v1.0 of openai-python (seems to be 30 Sept 2023) and so there are breaking changes in code.

My previous test of autogen are package version 0.1.x, which imply openai-python version 0.28, but litellm install is expecting openai-python 1.x, so the obvious solution is to use autogen version 0.2+ which should have dependency openai v1.x and so both library is happy.

Third Challenge — Serving LiteLLM while NOT blocking the Google Colab notebook execution

Given running LiteLLM involve using shell execution to serve up the API (with uvicorn) which would block further cell execution.

The API would be running until terminate (which we need it to run while calling it with autgen), and thus this block further Colab cell running

There is this nice extension call “colab-xterm” to help:

This is a terminal that one can type the command to run the LiteLLM API while the cell is not blocking

Side note: if one encouter a 403 running this extension, that’s some cookies unable to be used, here is the way to enable it in Chrome:

Forth Challenge (which is an observation) — passing extra parameters

Note that the command we run LiteLLM already included the switch “drop_params”, this helped to ignore the unnecessary parameters being passed into the target model/API (and avoid breaking the call)

A typical config for autogen that most tutorial sample would suggest is as follow:

When we go back to check the xTerm output for LiteLLM, we would see the generation is empty, and another interesting finding is, there also have a dictionary {temperature: 0}

So trying out removing the temperature key in llm_config:

Note that we no longer see the additional dictionary regarding temperature

Note that, with or without this dictionary does not improve the performance of the result, in which Llama2 13B chat is not a great model for autogen agent conversation (while…there are countries on Earth where Claude model is not supported…).

Last observation — Model performance of Llama2 13B chat

Chekcing out issues discussion on autogen github on general non-openAI LLM performance, the common conclusion is, there are some issue on group discussion select role as well as the prompt improvement needs.

From my own experiment with Llama2 model, I think the ability to retain conversation context does not match the level of OpenAI or Claude models, so we might need to craft very specific prompt (which imply we cannot use Autogen’s implemented agents and need to write our own agent with own prompt template) for that.

Also, the model only have 4k context size make it almost impossible to perform any agent conversation (which would easily go beyond 4k token size in 1 or 2 rounds of conversation among agents).

Conclusion

The conclusion is simple, as far as I am in the country that cannot use Claude yet, I think I would not further explore using AWS Bedrock Llama 2 chat and stick back with OpenAI.