Anthropic immediately enables caching within Claude to reduce development costs

Prompt Caching, a new feature that allows developers to save repeatedly used context between API calls, is now available on the Anthropic API.

Prompt caching allows customers to provide Claude with more background information and example output while reducing the cost of long prompts by up to 90 percent and latency by 85 percent. Prompt caching is now available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku. Anthropic says Claude 3 Opus will be supported soon.

Applications of prompt caching

Prompt caching is useful in situations where a lot of context is sent at once and then used repeatedly in subsequent requests. Examples include:

Conversation agents: Reduce costs and latency on longer calls, especially those with complex instructions or documents.
Coding Assistants: Improve auto-completion and questions and answers by storing a summary of the code base in the command prompt.
Process large documents: Process complete long texts and images without increasing response time.
Detailed instructions: Share detailed instructions and examples to refine Claude’s answers, with room for dozens of high-quality examples thanks to caching.
Agent search and tool usage: Improve performance in scenarios that require multiple tool calls and iterative changes.
Interacting with books, essays and other long content: Embed entire documents in the command prompt and allow users to ask questions about them.

Early adopters report significant speed improvements and cost savings from ingesting entire knowledge bases to multi-round conversations. Latency and cost are significantly reduced thanks to prompt caching, with costs reduced by 90 percent and latency reduced by 79 percent for prompts with 100,000 tokens.

Pricing structure for cached prompts

Cached prompts are priced based on the number of tokens stored and frequency of use. Storing content costs 25 percent more than the standard per-token price, while reusing it is significantly cheaper at just 10 percent of the standard price.

For example, Claude 3.5 Sonnet offers a context window of 200,000 tokens and charges $3 per million tokens for entry, with cache write fees of $3.75 per million tokens and cache read fees of $0.30 per million tokens. Claude 3 Haiku offers the fastest and most cost-effective option, with entry fees of $0.25 per million tokens and cache read prices of $0.03 per million tokens.

Anthropic is committed to cost savings and efficiency for users of its APIs, with a clear focus on advanced AI models like Claude.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024