How to Evaluate LLMs Amid Disruptions Like DeepSeek

Focus on agility, ethical integrity and value delivery amid rapid AI evolution.

Key considerations for evaluating LLMs and deployment strategies

As the large language model (LLM) market rapidly evolves — with disruptors like DeepSeek being the latest to intensify competition and upend cost assumptions — users still need to evaluate LLMs based on business impact. That means assessing LLMs to ensure scalability, efficiency and long-term value.

Evaluating LLMs solely on cost can lead to misalignment with long-term business objectives. Several critical factors should guide your assessments.

Set the Right Pace for AI Success

Gain strategic guidance on delivering AI outcomes safely and at scale. Download this full research to understand how volatility in AI costs figures in your AI scaling plans.

By clicking the "Continue" button, you are agreeing to the Gartner Terms of Use and Privacy Policy.

Contact Information

All fields are required.

Company/Organization Information

All fields are required.

Optional

LLM pricing trends and their short-term impact

In mid-2024, when DeepSeek, ByteDance and others were first announcing big reductions in their pricing, Gartner predicted that by 2027, the average price of GenAI APIs would be less than 1% of the current average price at that time — while those APIs would maintain the same levels of quality, throughput and latency. 

However, we maintain that this decline in AI inference costs (costs associated with using a trained AI model to generate output) has little immediate impact on enterprises with on-premises GenAI solutions, largely because of limited deployment options, early phases of adoption and current cost structures to date.

API prices: The short-term impact on enterprises

For cloud-based AI, API cost is just one factor in the total cost of ownership (TCO), which also includes:

Recommendations for evaluating LLMs given volatile costs:

  • Prioritize AI investments based on value delivery, risk and total cost structures.

  • Incorporate security, governance, and regulatory compliance into AI cost planning.

  • Assess model effectiveness beyond cost, considering quality, throughput and latency.

The shift to cloud: Long-term AI deployment considerations

As GenAI API costs decline, organizations should reassess AI deployment strategies, balancing cloud-based and on-premises models. Cloud AI offers scalability, agility and integration with existing AI ecosystems, while on-premises solutions may be preferable for regulatory compliance, security and specialized infrastructure needs.

Recommendations for evaluating use of cloud-based LLMs:

  • Align AI deployment with business priorities, balancing cloud and on-premises trade-offs.

  • Evaluate cloud adoption for AI use cases while ensuring compliance with data security policies.

  • Consider hybrid models that leverage both cloud and on-premises infrastructure for flexibility.

How to evaluate and select the right LLM for your use case

Assess LLMs across three key criteria: model type, performance and cost-efficiency.

Model type

  • General-purpose LLMs: Versatile models (e.g., GPT-4 Turbo) for content generation, summarization and conversational AI

  • Domain-specific LLMs: Tailored for industry-specific applications (e.g., finance, healthcare) with specialized capabilities

Performance metrics 

Combine industry benchmarks with custom evaluation metrics, including:

  • Accuracy and groundedness — fact-based responses and precision

  • Relevance and recall — alignment with business needs

  • Safety and bias detection — identifying and mitigating risks in outputs

Cost considerations

Beyond API pricing, account for:

  • Fine-tuning and model adaptation costs

  • AI governance, security and compliance expenses

  • Talent and infrastructure investment

FAQ on evaluating LLMs amid disruption

What is DeepSeek, and why are its AI cost claims significant?

DeepSeek is one of several Chinese developers claiming to develop AI APIs based on large language models (LLMs) at a fraction of the cost of U.S. developers without sacrificing performance. Other low-price Chinese LLM API models include those from ByteDance, Alibaba, Baidu and Tencent. These models upend the cost assumptions of traditional AI providers but raise concerns around exhibiting filtering variations across religion and culture.


What are AI inference costs, and why do they matter?

AI inference costs refer to the expenses associated with running trained AI models in production, covering compute power, energy consumption and infrastructure overhead. These costs impact scalability, efficiency and cloud expenses, making cost-optimized AI deployment critical for long-term success.


How does Gartner view the future of AI?

The future of AI is dynamic and expansive. Trends such as domain-specific models, synthetic data, and AI-driven automation are reshaping industries. Our Top Technology Trends 2025 report offers a detailed view of these emerging patterns and their implications.

Drive stronger performance on your mission-critical priorities.