Starting approximately 2018 with ELMO, a new class of machine learning models called large language models (LLMs) has been developing at an extremely rapid pace. These models may differ slightly in their architectures, training data sets, goals and limitations. However, LLMs are all typically neural network models trained on massive data sets, with the number of parameters of the models increasing at an exponential rate. Something that has been remarkable about these models is that a single pre-trained model can be effectively applied to a huge variety of natural language processing tasks such as text classification, text summarization, text generation, human language translation and even computer programming automation.
2020 saw OpenAI release GPT-3 which achieved outstanding performance on many NLP benchmark data sets and even demonstrated capability to perform mathematical operations phrased in a language context. Since then the arms race has continued, with Google releasing the Switch Transformers model with over 1 trillion parameters and OpenAI releasing GPT-4 in March 2023.
The good news for those of us that do not have a spare 10^24 training FLOPS (at a cost of an estimated $9M-$23M) to train our own model, is that many of these models are being made available either open sourced or accessible as APIs. However, practitioners should be aware of some of the risks and potential issues with these models. These models have been shown to repeat hate speech and other unseemly content. Further, they have given rise to many ethical issues such as the usage of open-source computer programs as training data sets and an emphasis on English to the exclusion of other languages.
Risks aside, the size and importance of language models will continue to increase in 2023 – even after the buzz of ChatGPT subsides. If you are an analytics leader that has needs for any type of language task from sentiment analysis, text summarization, question and answering and much more, you and your team should be exploring the performance of these pretrained large language models.
The following interview with fellow IIA Expert Pete Gordon sheds more light on the business application of LLMs and generative AI. In collaboration with Pete, IIA’s editorial team enhanced the text with ChatGPT-4.
IIA: Thanks for taking the time to help us wrap our arms around the implications of large language models and generative AI in an enterprise analytics setting. For starters, to add a bit of color to Adam’s introduction above, how do large language models work, and what are some best practices for training and fine-tuning them?
Pete Gordon: Generative AI LLMs came on the scene with the use of Transformers in a paper called “Attention is all You Need” written primarily by researchers at Google in 2017. In the paper, the authors propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. This was a major breakthrough in Natural Language Processing (NLP). The LLMs that have taken the world by storm the past few months are not something that companies will be creating and training on their own. The cost to create a general LLM trained on all of the available corpuses on the Internet is in the tens of millions of dollars. In fact, there is a standing invitation from a Stanford professor to work to build one if you have $50 million dollars. But the opportunity to build on top of LLMs with vector similarity and custom data sources is absolutely available to companies that wish to augment their existing knowledge bases and documented processes with conversational AI and NLP. The lead in this area is the OpenAI APIs.
The first step in best practices is the same step as in traditional analytics and data science, and that is data literacy and data preparation to properly define the entities and create a reusable knowledge base structure for vectorized search. Data quality and diversity is key. This helps the model learn a broad range of language patterns and reduces potential biases. From there, it’s important to continuously evaluate the model’s performance, implement measures to protect user data, and continuously update and improve the model bases on user feedback and new data. Above all, as practitioners, we need to engage with the AI research community and share best practices, challenges, and lessons learned to drive the development of more effective and ethical AI systems.
IIA: Clearly, the race is on among solution providers to integrate next-generation chatbots into their technologies. The constant stream of headlines in this field may leave senior leadership wondering what is possible and what is probable. What are some practical use cases of large language models in an enterprise setting? What is the effort/investment required to “personalize” models for specific use cases?
Pete Gordon: The headlines are definitely a constant stream. Few of them take us into the first principles of LLM’s conversational APIs and embedding vectorized data sets and the ability to bring together practical use of enterprise knowledge bases and documented processes that can be used to build out conversational understanding of internal knowledge. This can make it easier to train and communicate what was considered only tribal enterprise knowledge previously. Enterprises can begin piloting LLMs immediately and a good place to start is shared services (e.g. human resources, accounting, etc.). Among these services, LLMs are primed to create consistent streams of knowledge and standard processes.
To personalize LLMs, you’d want to use the same fundamentals as any analytics model. Gather domain-specific data relevant to the use case, fine-tune the model, and continuously evaluate and refine the model based on user feedback and new data. The effort and investment required to personalize LLMs will depend on several factors, such as the complexity of the task, the quality of the available data, and the desired level of customization. The costs associated with personalizing models for specific use cases may include data acquisition, storage, and processing; computation resources for training and fine-tuning, human expertise for model development and maintenance; and integrating with existing systems.
IIA: What are some common challenges or limitations of large language models, and how can these be addressed?
Pete Gordon: The number one challenge that seems to be top of mind for users of LLMs such as Open AI and ChatGPT is hallucinations. It is certain that LLMs can be wrong and provide answers as if they are true and certain when they are in fact false. It is important to be precise and provide quality checks when working with LLMs and to embrace prompt engineering when there are areas of uncertainty that must be navigated. It is too easy to think things are certain in language when there still exists ambiguity. Prompt engineering is the technique of working within in-context prompts and contextual dialogue to engineer the appropriate question and response from a LLM conversation model.
Another key barrier is model explainability and transparency. The decision-making process can be complex and difficult to interpret, making it challenging to understand how and why a specific output was generated. Enterprises should invest in explainable AI techniques like attention visualization, feature attribution, or model-agnostic approaches to help stakeholders understand and trust the model’s output.
IIA: What are some ethical considerations to keep in mind when working with large language models and generative AI?
Pete Gordon: The primary ethical concern is the potential for hallucinations. In addition, it is important to remember LLMs are trained on all the content of the Internet which can be biased and limited in its scope. The same ethical approaches for AI/ML apply to prompt engineering and API integrations with LLMs. It is critical to go into working with LLMs knowing they can be wrong, they can be biased, and they can be lacking specific knowledge and agreement.
As mentioned earlier, it’s crucial to curate diverse and representative training datasets to address bias. Also, clearly define the roles and responsibilities of stakeholders involved in the development, deployment, and maintenance of AI systems. Establish processes for identifying, reporting, and addressing potential ethical issues or unintending consequences that may arise during the AI system’s lifecycle. IIA recently published a good comprehensive guide to building responsible analytics governance across the enterprise (client only).
It's also worth mentioning here that a big concern that gets less airtime is the environmental footprint of AI systems. Training LLMs takes a lot of energy and contributes to carbon emissions. ESGs play more of a role in corporate strategy these days, given our climate outlook, and companies should do their best to optimize the model’s architecture, training strategies, and resource usage to minimize environmental impact.
IIA: How can enterprises measure the effectiveness and ROI of using large language models in their operations?
Pete Gordon: Enterprise implementations for LLMs are in the stage of piloting and determining ROI. It is necessary to take a staged approach to the implementation of NLP and LLM projects into your enterprise operations. As LLMs take hold in enterprise settings, I’d anticipate companies focusing on familiar analytics ROI metrics like cost savings, efficiency and productivity improvements, and scalability. Of course, enterprises will be keen on the relationship between LLMs and revenue generation as well. How can LLMs be used to increase sales, improve customer lifetime value, or conversion rates?
IIA: Predicting the future is always a precarious sport, but where do you see large language models and generative AI 3-5 years from now? How will large, legacy enterprises be leveraging this technology to drive business value?
Pete Gordon: LLMs have given us a much greater opportunity for natural language interface to all of our systems. Whether it is software engineers augmenting their workflow with Github Copilot X, or HR representatives communicating with consistent terminology with employees, or accounting professionals understanding specific business processes exponentially faster, there’s huge opportunity for efficiency gains in the enterprise.
Asking a LLM what their future will look was too irresistible. Here’s what ChatGPT said:
While predicting the future is indeed challenging, we can make some educated projections about the direction of large language models and generative AI in the next 3-5 years. These predictions are based on current trends and advancements in the field.
- Improved capabilities: Large language models are expected to become more accurate, versatile, and efficient, thanks to continuous research and development. Their ability to understand context, reason, and generate high-quality outputs will likely improve, making them even more valuable across various applications.
- More domain-specific models: We may see an increase in domain-specific language models tailored to the needs of specific industries, such as healthcare, finance, legal, or engineering. These specialized models will help address unique challenges and drive more significant value in their respective fields.
- Integration with other AI technologies: Large language models are likely to be combined with other AI technologies like computer vision, reinforcement learning, and knowledge graphs to create more comprehensive and robust AI solutions. This could lead to new applications and use cases that leverage the strengths of multiple AI approaches.
- Focus on ethical and responsible AI: As large language models become more powerful and pervasive, there will be an increased focus on addressing ethical concerns like bias, fairness, transparency, and data privacy. Organizations and the AI research community will work together to develop frameworks, guidelines, and technologies to ensure responsible AI deployment.
- Better explainability: Efforts to improve the explainability and interpretability of AI models will likely gain traction. Improved explainability will foster trust, enable better decision-making, and facilitate the integration of AI in various aspects of enterprise operations.
Large, legacy enterprises can leverage these advancements in large language models and generative AI to drive business value in several ways:
- Streamlining internal processes: AI-powered solutions can help automate and optimize various internal processes, such as data analysis, report generation, or document review, resulting in increased efficiency and reduced operational costs.
- Enhancing customer experience: By deploying advanced AI-driven chatbots, personalization engines, and content generation tools, enterprises can offer personalized and seamless customer experiences across various touchpoints.
- Empowering decision-making: AI-driven insights and predictions can assist decision-makers with more accurate, data-driven insights, enabling them to make informed strategic choices and respond proactively to market changes.
- Expanding product and service offerings: Leveraging generative AI capabilities, enterprises can innovate and expand their product and service portfolios, creating new revenue streams and staying competitive in the market.
- Fostering a data-driven culture: Large language models and generative AI can help enterprises harness the power of their data, fostering a data-driven culture that values evidence-based decision-making and continuous improvement.
By adopting and integrating large language models and generative AI, large legacy enterprises can unlock new opportunities, enhance their operations, and maintain a competitive edge in the rapidly evolving business landscape.