Backyard AI is primarily used with LLaMa models that have been fine-tuned for conversations.
These models have been trained to continue a back-and-forth dialogue between two or more parties. The model generates text until it is about to start the user’s response, at which point it stops and allows you to add the user’s side of the chat.
Base models are foundational large language models that have been trained on a wide variety of text data. The most common base model found on Backyard AI is LLaMa, developed by Meta.
The base model comes in several sizes, each one with a different number of parameters (e.g. 8B, 70B). A higher parameter count means more nuanced and accurate responses, while also requiring significantly more processing power.
Other base model architectures, including Mistral, Phi, and others, are also available on Backyard AI.
Fine-tuned models extend the capabilities of their base model. They are trained (via curated datasets) to generate text in a specific style, such as instruction-following, programming assistance, or roleplay.
Each model in the Backyard AI Model Manager is a fine-tuned model (the base model is never run directly). Each one has been trained on conversational datasets by various third parties.
Large Language Models (LLM)s generate text by calculating which tokens are most likely to come next based on a given input sequence. This calculation requires that words be converted to numbers, i.e. "tokens". A token is approximately 3-4 letters.
Here is a visualization of text broken into tokens:
The exact set of tokens processed when generating the next token is called the "context".
Models can only process a certain amount of context at once. This is what people are referring to when they talk about "context size" or "model memory".
When your conversation history starts to exceed the context window, the older messages are removed incrementally from the model context. If you find the model forgetting something you talked about an hour ago, this is why.