AI & Analytics

Generative AI’s path to production echoes traditional AI, but with new twists

By Prakash Prakash, and Subbiah Sethuraman

May 8, 2024 | Article | 4-minute read

Generative AI's path to production echoes traditional AI, but with new twists


AI proof of concept? Nailed it. Production rollout? Take a breath. While gen AI’s initial promise has ignited excitement across industries, transitioning concepts into real-world applications presents a new set of challenges demanding thoughtful leadership.

 

Gartner’s been showcasing the massive leap in AI adoption: Over 80% of enterprises are projected to use AI application programming interfaces (APIs) or deployed gen AI-enabled applications in production by 2026, up from less than 5% in 2023. This is a phenomenal shift considering the historical challenge of moving AI to production where 8 in 10 AI solutions have traditionally failed in the real world.

 

And while generative AI offers immense potential combined with classical AI, navigating the journey from concept to production requires careful consideration of its unique challenges.

“Navigating the journey from proof of concept to production requires careful consideration of gen AI’s unique challenges.”


The stakes are higher for scaling generative AI solutions



In contrast to classical AI, generative AI models are now often being set up to directly interact with end consumers, raising the stakes for robust performance and responsible implementation. Consider the case of one auto manufacturer’s chatbot that was tricked into selling a car for $1— a stark reminder of the potential risks.

 

This shift in focus—from model-centric to product-centric—is exposing new twists and leaders must watch out for these common pitfalls:

  1. A lack of domain knowledge for large language model (LLM) domain translation. For example, in the life sciences industry, patient line of therapy definitions can vary based on the therapy and organizations involved. Similarly, customer segmentation definitions can vary by industry, business needs and targeting points and more. To address this translation, gen AI solutions must use either in-context learning or domain fine-tuning. However, both these approaches come with their own set of challenges. The effectiveness of in-context learning-based translation is highly sensitive to input examples, while domain-specific fine-tuning can lead to model overspecialization, reducing the zero-shot capabilities of the model. Consequently, both approaches may result in scaling issues during deployment if consumer input is not controlled. 

    The takeaway: Ensure high-quality training data through deep domain knowledge.
  2. Return-on-investment (ROI) roadblocks. With conventional AI, organizations report minimal to no return on investment from 70% of applications, hindering wider adoption. Today’s era of fast gen AI solutions is largely made up of chatbots and co-pilots, some 75% of solutions. Most are aimed at areas such as call centers, sales reps, customers and email generators.

    However, many of these solutions prioritize speed over value, focusing on tasks where faster completion doesn’t always translate to higher impact. Organizations might miss the bigger picture by solely focusing on immediate gains.

    Consider this: A patent-authoring co-pilot can significantly reduce patent filing time, allowing an organization to file more patents efficiently, which may hold higher value. 

    The takeaway: A strong ROI evaluation framework is the key to taking your AI solutions from innovation to production.

  3. LLMs can come up with many different answers. This helps them find better solutions in some ways, but it can also lead to poor user experiences, undermining trust. The stochastic nature of LLMs contributes to this experience on several dimensions:
    • Repeatability: LLMs often encounter the no one right answer (NORA) scenario. With this scenario, the output changes with each run for the same input query and model parameters. The same inherent randomness helps LLMs to explore multiple valid solutions, however they do lead to unit testing and validation challenges during scaling.
    • Reproducibility: This refers to inability of LLMs to generate a specific “right” answer across users, given the same input and model conditions. The reproducibility is impacted by context, model parameters, probabilistic token selection and hallucinations, leading to different responses across users. Think about this in practice: A diagnostic AI system must consistently provide prompt and accurate interpretations of medical images to assist radiologists in identifying abnormalities. In uses like this, both repeatability and reproducibility issues can impact verifiability and benchmarks, as well as trust and transparency for the end user.
    • Adaptability: LLMs are sensitive to varying inputs, contexts, training data and model conditions, so ensuring model adaptability over time is key to keeping performance consistent as conditions evolve. One effective strategy is to collect user feedback, which can inform intent detection, path selection, personalization, dynamic prompt optimization or fine tuning to refine responses, ultimately enhancing the overall performance and user experience.
    • Latency: LLMs can lead to high response times that are dependent on multiple factors such as model type, context window, task complexity (a set of tasks can involve multiple LLM calls), model rate limit, model inherent stochasticity and more. The high latency to response and variable response is a challenge from scaling and service level agreement (SLA) perspective.

      Guarding the user experience will require strong LLM Ops, guardrails that prevent misuse, LLM API throttling, rate limits, smart usage of the caching layer and more.
    The takeaway: Focus on repeatability, reliability and adaptability during scaling to ensure a good user experience and a consistent solution performance. Both are needed to build trust. 

4. Gen AI applications in production introduce a range of challenges in evaluation and troubleshooting issues. One key dimension involves monitoring and logging key performance parameters and errors. Many gen AI applications heavily rely on third-party service APIs for LLM calls, which can introduce potential fault points beyond an organization’s direct control. That’s why it’s vital to track application parameters, such as response time, service downtime, rate limits and error logs generated by these APIs. This monitoring will help teams proactively manage potential issues and ensure smoother operations for their gen AI-based applications. 

Additionally, error identification and mitigation are paramount in the realm of gen AI. Errors can take various forms: when a user lacks necessary access to the data to answer a question or asks a question outside of the purview of the application, when a generated response contains inappropriate language, when biases are identified in responses, or when hallucinations occur. Identifying these fault points within an application is important and establishing guardrails to mitigate them is crucial for maintaining integrity and reliability of gen AI systems in production environments.

The takeaway: Ensure you have the right skills in place to monitor and mitigate performance challenges and errors.

 

5. The cost of operating LLMs must be managed carefully. For example, depending on the model chosen, the cost of processing the same 1 million input tokens can vary significantly, ranging from $2 to $60. Therefore, monitoring cost and usage patterns within an application is essential. Implementing hard limits on user-level tokens help manage usage anomalies that may arise while the application is live. Other cost elements in operations include the infrastructure and personnel cost necessary to maintain the application operation and functionality. Proving good cost management practices during scaling can contribute significantly to overall sustainability and profitability in the eyes of business decision-makers.

 

The takeaway: Require diligent cost monitoring and efficient resource allocation. Regularly identify cost-saving opportunities.

 

6. Without the right guardrails in place, gen AI may be vulnerable to misuse or adversarial attacks, posing significant risks to both organizations and their customers. While the promise of generative AI is undeniable, successfully scaling these solutions requires a thoughtful and strategic approach to risk. The EU’s AI Act provides detailed guidelines on what constitutes safe and responsible AI to help shape your approach.

Keep in mind that different personas within the organization may face varying levels of risk, highlighting the importance of comprehensive risk management strategies. For instance, certain applications raise ethical concerns and should be prohibited. One such example is utilizing a chatbot to assist pharmaceutical sales representatives in answering sales-related inquiries from physicians regarding the Prescription Drug Redistribution Program (PDRP). Another example would be in the case of an adversarial attack, where a user would attempt to manipulate the system through unauthorized prompts. It’s important to anticipate the unique risks as part of the overall approach to scale.

 

The takeaway: Make responsible AI part of your governance framework, including the right input validation, data sanitization, access control, use of trusted libraries and regular security audits. Educate users on responsible practices that match the level of risk, so they understand their role in using AI responsibly. 

If you’re facing any of these challenges right now in your company’s AI journey, we can help. Ask your ZS team or contact us.

Add insights to your inbox

We’ll send you content you’ll want to read – and put to use.