Models
Vello offers all the best models for various use cases via our inference engine.
OpenAI Models
- GPT 3.5 Turbo (openai/gpt-3.5-turbo): Great for most day-to-day use.
- GPT 4 (openai/gpt-4): The best model for deeper reasoning, creativity, and challenging coding tasks.
- GPT 4 Classic (openai/gpt-4-classic): Pre-turbo classic GPT4 model, useful for compatibility or old-style behavior.
- GPT 4 Vision (openai/gpt-4-vision-preview): This model can see, understand, and describe what is in an image.
Anthropic Models
- Claude 2.1 (anthropic/claude-2): The best model for longer contexts (~300 pages), and more natural-sounding speech.
- Claude 3 Opus (anthropic/claude-3-opus): Anthropic’s new premium model, beats GPT-4 on most benchmarks.
- Claude 3 Sonnet (anthropic/claude-3-sonnet): High-quality fast model, beats GPT-3.5 on most benchmarks.
- Claude 3 Haiku (anthropic/claude-3-haiku): Small, very fast model. High quality with speed.
Perplexity Models
- Perplexity Web Fast (perplexity/pplx-7b-online): Fast web search model. Great for quick answers and web searches.
- Perplexity Web (perplexity/pplx-70b-online): Deeper web search model. Great for harder questions and research.
- Perplexity Fast (perplexity/pplx-7b-chat): Fast chat model, based on Mistral 7B.
- Perplexity (perplexity/pplx-70b-chat): Deeper chat model, based on Llama 2 70B.
Gryphe Models
- Mytho Mist (gryphe/mythomist-7b): Mixture of models, great for character generation.
- MythoMax (gryphe/mythomax-l2-13b): MythoMax 13B.
Nous Research Models
- Nous Hermes (nousresearch/nous-hermes-llama2-13b): Mixture of models, great for character generation.
Fireworks Models
- Mixtral Chat (fireworks/mixtral-8x7b-fw-chat): Mistral’s mixture of models, great for character generation.
- Fireworks Function Call (fireworks/models/fw-function-call-34b-v0): Open Source Function Calling Model from Fireworks.
Google Models
- Gemini Pro (google/gemini-pro): On par with GPT-4 for coding, complex reasoning, and other difficult queries, but with a more human feel.
Mistral Models
- Mistral Medium (mistral/mistral-medium): Mistral’s best model, on par with GPT-4 in performance.
- Mistral Small (mistral/mistral-small): Mid-tier offering based on Mixtral 8x7B. Fast, yet capable model.
- Mistral Tiny (mistral/mistral-tiny): Fastest, smallest model, great for most simple tasks.
- Mistral Large (mistral/mistral-large, mistral/mistral-large-latest): Premium model, rivals GPT-4 in performance.
Premium Models
A subset of models are designed as premium, they offer better performance, but for a higher per token cost. Premium models are GPT 4, Claude 2.1, Claude 3 Opus & Sonnet, and Mistral Large. All paid plans have access to premium models, at varying capacity. See our plans page for more details.
Context Size
Understanding the capacity and limitations in terms of context size offered for each AI model within the Vello suite are crucial for optimizing the performance of your tasks. These are the current context sizes in tokens provided for each model:
- GPT-4 & Variants: 3000 tokens
- Pro Plan: 4000 tokens
- Claude 3 Series:
- Claude 3 Opus: 4000 tokens
- Pro Plan: 6000 tokens
- Claude 3 Sonnet: 100,000 tokens
- Pro Plan: 150,000 tokens
- Claude 3 Haiku: 200,000 tokens
- Claude 3 Opus: 4000 tokens
- GPT-3.5 Turbo & Variants: Up to 16,384 tokens
- Gemini Pro: 32,000 tokens
- Claude 2: 100,000 tokens
- Mistral & Davinci Codes: Varies up to 8000 tokens
These token limits are currently applicable across all premium plans. We plan to increase the context size for all models over time as we better understand our costs and API costs decrease.
If you need full access to the context size of any model, the Flex Plan offers a pay-as-you-go pricing and full contexts for all models at cost.
Flex Pricing
Flex is an option for users who need more volume or context size than the Plus or Pro plans can provide. It offers a pay-as-you-go pricing model and full context sizes for all models at cost.
Tokens are pieces of words. 100 Tokens correspond to approximately 75 words.
OpenAI Models
-
GPT 4
- Input Tokens: $0.00003
- Output Tokens: $0.00006
-
GPT 4 Classic
- Input Tokens: $0.00006
- Output Tokens: $0.00012
-
GPT 4 Vision
- Input Tokens: $0.00001
- Output Tokens: $0.00003
Anthropic Models
-
Claude 2.1
- Input Tokens: $0.000008
- Output Tokens: $0.000024
-
Claude 3 Opus
- Input Tokens: $0.000015
- Output Tokens: $0.000075
-
Claude 3 Sonnet
- Input Tokens: $0.000003
- Output Tokens: $0.000015
-
Claude 3 Haiku
- Input Tokens: $0.00000025
- Output Tokens: $0.00000125
Perplexity Models
-
Perplexity Web Fast
- Input Cost per Request: $0.005 (fixed)
- Output Tokens: $0.00000028
-
Perplexity Web
- Input Cost per Request: $0.005 (fixed)
- Output Tokens: $0.0000028
-
Perplexity Fast
- Input Tokens: $0.00000007
- Output Tokens: $0.00000028
-
Perplexity
- Input Tokens: $0.0000007
- Output Tokens: $0.0000028
Gryphe Models
-
Mytho Mist
- Input Tokens: $0.000001875
- Output Tokens: $0.000001875
-
MythoMax
- Input Tokens: $0.000001875
- Output Tokens: $0.000001875
Nous Research Models
- Nous Hermes
- Input Tokens: $0.0000002
- Output Tokens: $0.0000002
Google Models
- Gemini Pro
- Input Tokens: $0.00000025
- Output Tokens: $0.0000005
Mistral Models
-
Mistral Medium
- Input Tokens: $0.0000027
- Output Tokens: $0.0000081
-
Mistral Small
- Input Tokens: $0.000002
- Output Tokens: $0.000006
-
Mistral Tiny
- Input Tokens: $0.00000015
- Output Tokens: $0.00000046
-
Mistral Large
- Input Tokens: $0.000008
- Output Tokens: $0.000024