A deeper dive to AI models in Open WebUI environment
What is this article about?
This article will be about showing how you can customize your AI models in Open WebUI interface and exploring how different hardware impacts different models and how different models work in comparison to other models. The comparison will be about speed, quality and use cases of the models. We will also dive deeper into how different model parameter sizes compare to smaller ones. If you are interested in trying this out but you don’t have the Open WebUI environment installed yet, you can learn how to setup and install Open WebUI in our install guide.
Here are a few examples of comparing the models
Note: The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Models used: Llama 3.2:3b and Gemma3:12b
Prompt given:
Can you tell me 3 jokes?
Expected result: Llama3.2:3b answers a lot faster because it's a smaller model and requires less performance from your system.
Recording of the comparison:
Note: The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Models used: Gemma3:12b and deepseek-r1:1.5b
Prompt given:
Pretend you are a customer service representative. A customer is complaining about a broken product. Respond to their complaint.
Expected result: A response addressing the problem with the broken product and a solution for it.
Screenshot of the answers side by side: You can see that Gemma3 gave a more extensive answer with different variations. Gemma3 also gave some notes in the end to improve the variations.
Note: The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Model used: Deepseek-r1:8b and Gemma3:1b
Prompt given:
(Summarize this text for me) and (A farmer has 12 cows, 25 chickens, and 8 pigs. If each cow produces 5 gallons of milk per day, and each chicken lays 2 eggs per day, how many total eggs and gallons of milk does the farmer have at the end of a single day? Show your working)
Expected result: A small summary of the given text and for the other one answer of 60 gallons of milk and 50 eggs.
Screenshot of the answers: Below you can see that deepseek had the correct answer and gemma was hallucinating and made up pigs for the total animals. Gemma also got the answer wrong in the end.
For the text summary test below gemma had no issues, because one of gemma use cases is text summarization.
Here are some overviews about 3 popular models
Deepseek is a large language model used for data analysis, mathematical computation and software development. It has 7 different parameter sizes available in Ollama which are 1.5b, 7b, 8b, 14b, 32b, 70b and the largest one 671b.
I ran some tests like execution time and the quality of the response on different Deepseek-r1 parameter sizes. For my tests I used 1.5b and 8b parameter sizes. They ran okay with my computer.
Note: These test results might not be similar on your system, because there are so many variables that affect the results. The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Prompt given: This prompt measures the speed from entering the prompt until the answer is finished.
Summarize the main arguments of the declaration of independence in 3-4 sentences.
Expected result: 3-4 sentences and a response time of 3-7 seconds.
Recording of the results: Below you can see that the difference in response speed is about 7 seconds, but the answer deepseek-r1:8b gives is more informative. Deepseek has a "thought" process before the response so results may vary based on the thought process length.
Gemma is a large language model used for multitasking, but it excels in tasks like summarization and reasoning. Gemma3 has 1b, 4b, 12b and 27b parameter sizes available in Ollama. Gemma3 has an advantage of cost efficiency because it is currently the most capable model designed to run on a single graphics card.
I ran some tests like execution time and the quality of the response on different Gemma3 parameter sizes. For my tests I used 4b and 12b parameter sizes. The 4b ran well because it is a smaller one but the 12b had some performance issues, because it requires a better system.
Note: These test results might not be similar on your system, because there are so many variables that affect the results. The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Prompt given: This prompt measures the speed from entering the prompt until the answer is finished.
Summarize the main arguments of the declaration of independence in 3-4 sentences.
Expected result: 3-4 sentences and a response time of 3-7 seconds.
Recording of the results: Below you can see that the difference in response speed is about 20 seconds. You can see how the 12b parameter size output is slower. That is because it can run on this system but it's recommended to be run on a better system.
Llama 3.2 is a large language model used for multitasking, but it excels in tasks like summarization and text rewriting. It has 2 parameter sizes available in Ollama which are 1b and 3b. It also has a variation called llama3.2-vision which has 11b and 90b parameter sizes.
I ran some tests like execution time and the quality of the response on different Llama3.2 parameter sizes. For my tests I used both 1b and 3b parameters. They ran well, because they are smaller models after all.
Note: These test results might not be similar on your system, because there are so many variables that affect the results. The hardware used for these examples is mentioned later in "tools that we're using" section of the article.
Prompt given: This prompt measures the speed from entering the prompt until the answer is finished. It also shows the difference in response quality
Summarize the main arguments of the declaration of independence in 3-4 sentences.
Expected result: 3-4 sentences and a response time of 3-7 seconds.
Recording of the results: Below you can see that the difference in response speed is about 5 seconds. You can see a slight quality difference, because 3b is using dates and persons in the response.
What you’ll learn
- How to operate the Open WebUI web interface
- How to customize AI models in Open WebUI web interface
- Differences between different AI models
- What is required from your system to run different models
Tools that we’re using
- Ollama: Is used to download and run your AI models in different environments.
- Open WebUI: Is for using the AI models in a web interface like a browser. There you can chat with the AI models directly and use multiple models at the same time.
- Computer: Processor is intel i7-6700, graphics card is NVIDIA gtx 1080 8GB and the system has 64GB of RAM.
Why should you try this?
- Models: Different models give you different results. Some are better for certain tasks like calculation or reasoning.
- Customization: You get to try different AI models for different tasks and you can compare them side by side.
- Future of AI: AI will be a part of our future without a doubt. So having information about how different models work might be useful.
Notes before you start
- Operating platform: You’ll need an updated Open WebUI web interface environment with at least 2 AI models downloaded.
- Results: Performance may be different from my tests based on your hardware.
- Graphics card: Not mandatory, but recommended for faster processing speed.
- Updates: AI models and platforms update at a rapid pace so this guide might not be 100% up to date.
This is an overview on how system requirements work.
The terms we need to discuss are "VRAM" and "RAM" these are the most important ones in system requirements of using AI models.
RAM is short for random-access memory. In AI modeling this is being used for pre-processing and post-processing the data.
VRAM is short for video random-access memory. In AI modeling this is being used to manage the data at faster speeds than "RAM". Also if there is not enough VRAM to handle the information then the process is loaded into RAM.
The processes you implement with AI models use either RAM or VRAM. If neither is big enough for the process then it simply won't work.
Here are some recommendations of VRAM for models like deepseek-r1, llama3.2 and gemma3.
Note: These are just recommendations and they might not be 100% accurate, because of variations.
Deepseek-r1:8b recommended VRAM of 8GB
Llama3.2:3b recommended VRAM of 7GB
Gemma3:4b recommended VRAM of 9GB
In summary if you want to run your process with a GPU you need
enough VRAM and if you want to run it with CPU then you need
enough RAM. Processes are a lot faster on GPU because it's made to handle information at faster speeds. Typical difference in answer speed with the computer listed in "tools that we're using" section of the article with and without GPU is somewhere between 30 seconds to several minutes depending on the model and prompt you are using.
Open WebUI web interface makes it possible to compare multiple models at the same time. This can be used if you want different options to choose from. There are 3 simple steps on how to do it.
Note: This will require more performance from your system, because you are running multiple models at the same time after all.
Step 1: Press start new chat from the left upper corner of your Open WebUI page
Step 2: Add the models you want to compare from the + icon next to your chosen model
Step 3: Write a question or enter a prompt in the chat and you will see a side-by-side comparison
How to navigate to model settings in Open WebUI web interface:
Step 1: Click the profile icon to go to admin panel
Step 2: Go to settings and models
Step 3: Click the pen icon on the model you want to customize
Step 4: You should see the system prompt under Model Params, Write your desired prompt to make your model respond based on the prompt set
Step 5: At the bottom click Save & Update, then the model is saved with your prompt set and it should answer with the prompt in mind
In the model settings you can customize everything from model name to custom prompts. Setting a system prompt makes the AI think he is some character or person and it will make the answers unique.
Example prompt:
you are my assistant called Donkey Kong
Then ask the model who are you? The answer should be something like "Wahoo I'm Donkey Kong a super strong ape and your assistant now". It will also implement Donkey Kong behavior in all prompts given to it.
For every model there are advanced params you can adjust based on what you use the model for. These are the 5 steps on how to use the advanced params.
Step 1: Click your profile icon and select admin panel
Step 2: Select settings and then models
Step 3: Select the pen icon next to your model that you want to customize
Step 4: Click show next to advanced params and adjust the params you want to adjust
Step 5: After making your changes click "Save & Update" from the bottom of the page
Let's use advanced param called "temperature" as an example.
Temperature changes the creativity of the answer. The higher the number the more creative it is.
For non-creative tasks like translation or categorization 0-0,3 is okay. For creative tasks like marketing the default temperature of 0,7-1 is fine.
Notes before continuing: There are 3 things that you should know about to customize your model answers for documents.
RAG is short for Retrieval augmented generation. Rag is a tool used for enhancing large language models by giving them external data sources like text files. Allowing the models to search for information inside the document and then using that information to generate a more accurate and relevant response instead of using its already existing trained data.
RAG template is a set of rules used to control how the model retrieves and utilizes information from a given documentation like a text file.
Reranking model is a tool often used with RAG to make the retrieval results better by using the most relevant documents first.
Document settings are used when you upload a document for the model to use in searching information from it. These settings can be found with these 2 steps.
Step 1: Click your profile icon and head to admin panel
Step 2: Click settings and select documents
Summary about what you’ve done
You have learned how to use and compare different AI models. You know a little bit about the system requirements of different models. You also learned a little bit on how to customize the models for different situations. Now you can use what you’ve learned to utilize different AI models for various situations.