Exploring the Landscape of Large Language Models

Evolution and Market Trends

In recent years, the world has witnessed an unprecedented surge in interest and investment in the field of Artificial Intelligence (AI). This phenomenon has been driven by a confluence of factors, including advances in machine learning, the availability of vast datasets, and the ever-expanding computational capabilities of modern technology. At the forefront of this AI revolution stands Large Language Models (LLMs), a category of deep learning models designed to understand and generate human language. This transformation has not only redefined the technological paradigm but has also reshaped the economic and investment landscape. In this report, we explore the evolution of LLM technology, its applications, the competitive landscape, and the investment prospects it presents.

Definition of Large Language Models

Large language models are AI systems engineered to handle extensive natural language data, utilizing this information to generate responses to user queries. They are trained on massive datasets through advanced machine learning algorithms, enabling them to discern the patterns and structures of human language and produce natural responses across diverse written inputs. These models are gaining significance in applications like natural language processing, machine translation, code generation, and beyond.

A large language model (LLM) is defined by its substantial size, made possible by AI accelerators capable of processing extensive text data, often gathered from the internet. LLMs are artificial neural networks containing billions to trillions of parameters, and they undergo (pre-)training using self-supervised and semi-supervised learning techniques. The Transformer architecture contributed to expediting their training. An alternative architecture, the mixture of experts (MoE), was proposed by Google in 2017.

LLMs predict the next token or word in a sequence when provided with input text. LLMs are believed to accumulate knowledge about the syntax, semantics, and "ontology" found in human language corpora, but they may also inherit inaccuracies and biases present in the data. Prominent examples of LLMs include OpenAI's GPT models (such as GPT-3.5 and GPT-4, used in ChatGPT), Google's PaLM (utilized in Bard), Meta's LLaMa, and others like BLOOM, Ernie 3.0 Titan, and Anthropic's Claude 2.

Evolution of LLMs Technology

Table 1. The History of LLMs

Timeline	Evolution
1950s–1990s	In the initial decades, attempts were made to establish rigid language rules and follow logical procedures to achieve tasks like language translation. While this approach yielded results on occasion, it was effective only for specific, well-defined tasks that the system had prior knowledge of.
1990s	Language models began to transition into statistical models, with the analysis of language patterns. However, the scope of larger-scale projects was constrained by limited computing power.
2000s	Advancements in machine learning led to more intricate language models. The widespread use of the internet contributed to a substantial increase in available training data.
2012	Breakthroughs in deep learning architectures and the availability of larger datasets culminated in the development of GPT (Generative Pre-trained Transformer).
2018	Google introduced BERT (Bidirectional Encoder Representations from Transformers), a significant leap in architectural innovation that paved the way for future large language models.
2020	OpenAI unveiled GPT-3, the largest model to date with 175 billion parameters, setting a new standard for performance in language-related tasks.
2022	The launch of ChatGPT transformed models like GPT-3 into widely accessible services through a web interface, sparking a substantial increase in public awareness of Large Language Models (LLMs) and generative AI.
2023	Open source LLMs, including releases like Dolly 2.0, LLaMA, Alpaca, and Vicuna, began to deliver increasingly impressive results. Concurrently, the release of GPT-4 set new benchmarks in both parameter size and performance.

Stages in the Evolution of LLMs:

N-gram-based Models, 1980: In the early days, statistical language models relied on n-grams to assess the likelihood of word sequences. These models were straightforward but served as a foundation for language modelling.
Neural Network-based Models, late 2000s and early 2010s: The emergence of deep learning brought neural network-based language models into the spotlight. These models utilize deep neural networks to learn language patterns. Each word is represented as a vector, and the network processes them, enhancing efficiency compared to n-grams.
Recurrent Neural Network (RNN) Models, early 2010s: RNN language models, leveraging specialized networks for sequences, consider contextual information, improving performance. However, they face challenges like vanishing gradients when dealing with long sequences.
Long Short-Term Memory (LSTM) Models, early 2010s: LSTMs were introduced to tackle gradient-related issues in RNNs. They excel at capturing dependencies in lengthy sequences, enhancing overall performance.
Gated Recurrent Unit (GRU) Models, early 2010s: GRUs, a simpler variant of LSTMs, maintain sequence processing capabilities with fewer parameters. This results in quicker training and competitive performance in certain applications.
Large-scale Pretrained Models, ~2018: The latest trend involves large-scale pretrained language models. They leverage self-supervised learning on massive text datasets to create pretrained models like BERT, GPT, and XLNet. After fine-tuning on specific data, these models have demonstrated exceptional performance across various natural language processing tasks.

Deep into LLM Technology & Comparison

Booming in Recent Years

Large language models (LLMs) typically pertain to Transformer-based language models with hundreds of billions or more parameters. These models undergo extensive training on vast text datasets. Prominent examples include GPT-3, PaLM, Galactica, and LLaMA. LLMs demonstrate robust capabilities in comprehending natural language and addressing intricate tasks, particularly in text generation. To gain a brief insight into LLMs' operation, this section provides fundamental context on LLMs, encompassing scaling principles, emerging capabilities, and essential techniques.

Figure 1. The Evolution of Large Language ModelsSource: “Getting Started and Follow GPT Path Analysis”

Decoder-only models exclusively comprise the decoding component, lacking an encoder. They offer distinct advantages over Encoder-Decoder models:

Simplicity and Speed: With no encoder component, Decoder-only models have fewer parameters and demand less computation. This simplicity accelerates both training and inference, enhancing efficiency. With no encoder, they have fewer parameters, making training and inference faster and more efficient.
Tailored for Pure Generation Tasks: Decoder-only models focus solely on generating output sequences, making them well-suited for tasks like text generation, story creation, and dialogue generation that involve pure content generation.
Mitigating Training Complexities: Training only a decoder sidesteps challenges often encountered in Encoder-Decoder training, including issues related to different weight initializations and the avoidance of information bottlenecks.
Decoder Self-Supervision: In Decoder-only model training, the output from the previous step becomes the input for the next one. This self-supervision mechanism promotes the generation of coherent and structured output sequences.

Table 2.Recent Developments of LLMs

	Encoder-Only Models (Red)	Encoder-Decoder Models (Green)	Decoder-Only Models (Blue)
Workflow
	These models, initially led by BERT, have seen reduced development after 2020. While they have simple structures and are faster for certain tasks, they excel less in complex sequence-to-sequence tasks.	Companies like Google have been favouring this approach. These models exhibit strong sequence learning and generation capabilities, making them ideal for tasks like translation and summarization. However, they are computationally intensive.	This category, including GPT-1 and models from various companies, focuses solely on the decoding part, omitting the encoder. They offer simplicity, faster training, and are suitable for pure generation tasks. They also bypass some training challenges but are not as powerful in complex sequence tasks as Encoder-Decoder models.
Training Approach	Masked Language Model	Autoregressive Language Model
Model Type	Discriminative	Generative
Pre-training Task	Predicting masked words	Predicting next word
Representative Models	ELMo, BERT, RoBERTa, DistilBERT, BioBERT, XLM, Xlnet, ALBERT, ELECTRA, T5, XLM-E, ST-MoE, AlexaTM	GPT-3, OPT,PaLM, BLOOM, GLM, MT-NLG, GLaM,Gopher, chinchilla, LaMDA, GPT-J, LLaMA, GPT-4, BloombergGPT

Source: “A Comprehensive Comparison Of Open Source LLMs”

Key Change - The Transformer

The Transformer is a deep learning model introduced by Google Brain in 2017, widely used in natural language processing (NLP). Its key features include self-attention for capturing dependencies, encoder-decoder architecture for sequence tasks, multi-head attention, positional encoding, and layer normalization. Transformers revolutionized NLP by handling long-range dependencies, and they are the foundation for many large pretrained language models like BERT and GPT, driving advances in the field.

Figure 2. A Timeline of Existing Large Language ModelsSource: Medium

LLM Workflow

At a broad level, the process of training a Large Language Model (LLM) encompasses three primary phases: data collection, training, and evaluation.

Step1: Data Collection: Initially, data is accumulated from a diverse array of sources, which may encompass, but are not limited to, Wikipedia, news articles, books, and websites. This multifaceted dataset forms the fundamental basis for the model's training.
Step 2: Training: The collected data undergoes a rigorous pre-processing and cleansing procedure before integration into the training pipeline. This training phase is both computationally demanding and time-intensive, necessitating substantial computational resources.
Step3: Evaluation: In the final stage, the model's performance is scrutinized across various tasks, including question answering, summarization, translation, and more. This evaluation serves to measure the model's efficacy and suitability for diverse language-related applications.

The culmination of this training pipeline results in an LLM model, characterized by its acquired parameters or weights. These parameters are typically serialized and stored in a file, enabling their utilization in a wide array of applications that require language processing capabilities, spanning text generation, question answering, and natural language processing, among others.

Figure 3. The Understanding of Large Language ModelsSource: “Introduction to LLMs”

Cost of LLMs

Advancements in both software and hardware have significantly reduced costs in recent years.

In 2023, training a 12-billion-parameter Large Language Model (LLM) requires approximately 72,300 A100-GPU-hours;
To put this in perspective, back in 2020, training a 1.5-billion-parameter LLM, which was considerably smaller than the state-of-the-art at the time, cost anywhere from $80,000 to $1.6 million.

The investment in larger models has been substantial since 2020.

For instance, training the GPT-2, a 1.5-billion-parameter model, in 2019 cost $50,000;
In contrast, training PaLM, a massive 540-billion-parameter model in 2022, came with a staggering price tag of $8 million.

It's worth noting that for Transformer-based LLMs, the cost of training far exceeds the cost of inference.

Training consumes about 6 FLOPs (Floating-Point Operations) per parameter per token;
Inference typically ranges from 1 to 2 FLOPs per parameter per token. This distinction highlights the substantial computational resources required for training these models.

Table 3.Yearly Costs Based On Usage And OpenAI Model

Model	GPT-3.5 4K Context	GPT-3.5 16K Context	GPT-4 8K Context	GPT-4 32K Context	Cloud-Hosted 7B params	Cloud-Hosted 40B params
Yearly cost – 1K requests per day	$949	$1,898	$28,470	$56,940	$26,280	$175,200
Yearly cost – 1M requests per day	$949,000	$1,898,000	$28,470,000	$56,940,000	$29,930	$178,850

Source: “The Economics of Large Language Models”

GPT-3.5/4 API Costs: ChatGPT API is priced at $0.002 for 1,000 tokens. For lower usage, the cost is reasonable, but processing a million documents a day can cost around $1 million a year. More powerful models like GPT-4 have varying costs based on usage, ranging from $1,000 to $56 million annually.
Open-Source Model Hosting Costs: Hosting LLMs involves GPU RAM and hourly charges. For a 7 billion parameter model at 16-bit precision, it would cost around $2-3 per hour. Larger open-source models require more GPU memory and can cost around $20 per hour.
On-Premises Hosting Costs: Running LLMs on private servers requires high-quality GPUs, which can be expensive. Scaling up on-premises involves purchasing additional infrastructure, setting it up, and handling maintenance. It's important to consider other cloud-related infrastructure, such as load balancers, for maintaining network traffic and ensuring low latency.

LLMs Comparison

Key Factors

Capacity: the number of parameters;
Pre-training Data Scale: the scale of the pre-training data in terms of tokens or storage size;
Hardware Resource Costs: the computational resources required;
Release Time: the official release date of the corresponding paper;
Accessibility: distinguishing between publicly available and closed-source models;
Adaptation: specifying whether the model has undergone subsequent fine-tuning, denoted as IT (instruction tuning) or RLHF (reinforcement learning with human feedback);
Evaluation: indicating whether the model has been evaluated for specific abilities, such as ICL (in-context learning) or CoT (chain-of-thought).

Table 4: Statistics of Large Language Models (LLMs) with a Size Larger Than 10 Billion ParametersSource: “A Survey of Large Language Models”

Parameter Numbers Comparison

Take NL2Code task as an example for LLMs parameter numbers comparison, a comprehensive survey of 27 representative LLMs was conducted for the NL2Code task, and the following table summarizes the detailed information of each model. Model size evolution is as follows:

Figure 4. Numbers of LLMs Parameter ComparisonSource: “The Emergence of the Transformer - The Architectural Breakthrough, History, and the Future of The Model Behind ChatGPT”

According to the figure below, larger models usually produce better results. In addition, the current model, regardless of its size, can still achieve performance improvements by further increasing the model parameters.

Figure 5. LLMs Performance ComparisionSource: “The Emergence of the Transformer - The Architectural Breakthrough, History, and the Future of The Model Behind ChatGPT”

LLM Business Strategy

Value Chain Positioning

LLM technology is developed, deployed, and utilized in real-world applications: It starts with the cloud computing infrastructure, moves through the core LLM models, extends to the platforms for sharing and utilizing models, and finally reaches the end-users who benefit from LLM-powered applications.

Table 5. Value Chain Analysis

Categories	Companies and practice
Upstream	- Cloud Computing Platforms: Azure, Google Cloud, AWS, Ali Cloud These are the foundational cloud computing platforms that provide the infrastructure for hosting and running LLM models.
Large Language Models	- OpenAI GPT, DeepMind Gopher, Google GlaM, EML, BERT LLMs represent the core language models and technologies used in various applications across the industry chain.
Downstream (Parallel Categories)	- Model Hosting and Trading Platforms: Hugging Face, Replicat, LanguageX These platforms offer services for hosting, sharing, and trading LLM models, making them accessible to developers and businesses. - Intermediate Layer This category may include additional services, APIs, or middleware that facilitate the integration of LLMs into various applications.
End-users	- Consumer and Business Applications These are the final applications and solutions that leverage LLM technology for both consumer-facing (C-end) and business-facing (B-end) purposes.

LLM Business Model

Table 6. LLM Business Model

Approaches	Detail
Licensing and Subscription Model	- Many LLM developers and organizations offer licensing or subscription-based access to their models. - This model allows other companies and developers to use LLMs for various purposes, such as natural language understanding, content generation, or customer support.
API Services	- Some LLM providers offer Application Programming Interface (API) services, allowing developers to integrate LLM capabilities directly into their applications and services. These services are typically offered on a pay-as-you-go or subscription basis.
Customized Solutions	- LLM developers may offer customized solutions tailored to the specific needs of businesses or industries. This can include developing LLMs with specialized training data and fine-tuned parameters for tasks like medical diagnosis, legal document analysis, or financial forecasting.
Consulting and Integration Services	- Companies specializing in LLM technology may provide consulting and integration services to help businesses implement LLM solutions into their existing workflows. This can involve training and fine-tuning LLMs for specific use cases.
Content Generation and Monetization	- Some organizations leverage LLMs to generate content, such as articles, reports, or creative works, which can be monetized through advertising, subscriptions, or other revenue models.
Data Annotation and Enrichment	- LLM technology can be used to annotate and enrich datasets, improving the quality and relevance of data for various industries like e-commerce, recommendation systems, and sentiment analysis.
Research and Development Collaborations	- Academic institutions and research organizations may collaborate with LLM developers on joint research projects, contributing to advancements in language modelling and AI.
Regulatory and Compliance Solutions	- LLMs can be applied to develop solutions for regulatory compliance, risk assessment, and fraud detection in industries like finance and healthcare.
Content Moderation and Safety	- LLMs are utilized for content moderation to identify and mitigate harmful or inappropriate content on online platforms, which can be offered as a service.
Government and defence Contracts	- Governments and defence organizations may contract LLM developers to create models for various purposes, including natural language understanding, translation, and information extraction.

Market Landscape

Market Size

The generative AI market is poised for significant growth, with an expected increase from its 2023 valuation of USD 11.3 billion to a substantial USD 76.8 billion by 2030. This growth will be achieved at an impressive compound annual growth rate (CAGR) of 31.5% throughout the forecast period.

The rapid expansion of the LLM market has been a pivotal growth driver within the broader Generative Al market. In recent years, LLMs have been increasing an average of 10x per year in size and sophistication, ushering in new opportunities for Large Language Model market.

Figure 6. Generative AI Market Global Forecast to 2030 (USD BN)Source: “Generative AI Market”

Market Map

Large language models, as the infrastructure in the field of artificial intelligence, play an essential role in Generative AI. LLMs have revolutionized natural language processing and understanding by leveraging machine learning advances. Several organizations are actively involved in LLM research and development. Notable players include OpenAI, Google, Meta (formerly Facebook), Hugging Face, Cohere, and Stability AI. Prominent LLM models that have gained attention include GPT-4, Google's PaLM 2, Meta’s LLaMa (Language Learning Model), and Anthropic’s Claude.

Generative AI has diverse vertical use cases, including text generation, music generation, figure generation, and multimodal generation. LLMs facilitate comprehension and reasoning, reshaping the way of human-computer interaction. In terms of applications, we can see that generative AI is opening a new chapter across industries by providing high-quality prompt-based services.

Figure 7. Large Language Models Market Map

Competitive Landscape Overview

OpenAI has played a significant role in this field in recent years through its models and research endeavours. Nevertheless, the market features a diverse array of other players.

Table 7. Representative Companies in Different Categories

Categories	Companies and practice
Large Technology Companies	- OpenAI is one of the leaders in the LLM field and has developed the GPT family of models, including GPT-3 and GPT-4 - Google has contributed FLAN-T5 and BERT models - Meta has introduced OPT, OPT-IML, and LLaMA models
Open Source Communities	- Hugging Face offers pre-trained LLM models and NLP tools - EleutherAI is known for contributing models like GPT-Neo
Startups	- Stability AI offers the StableLM
Research Organizations & Universities	- Carnegie Mellon University researchers involved in the Tartan - Stanford has developed Alpaca

Table 8. Latest LLMs Market Players

Name	Release Date	Developer	Number of Parameters	Corpus Size	Training Cost (petaFLOP-day)	License	Notes
BERT	2018	Google	340 M	3.3 billion words	9	Apache 2.0	An early influential model, encoder-only, not built for prompting or generation.
XLNet	2019	Google	~340 M	33 billion words			An alternative to BERT, encoder-only.
GPT-2	2019	OpenAI	1.5 B	40GB (~10 billion tokens)		MIT	General-purpose model based on Transformer architecture.
GPT-3	2020	OpenAI	175 B	300 billion tokens	3640	Proprietary	Fine-tuned variant of GPT-3; GPT-3.5 made available in 2022.
GPT-Neo	2021.3	EleutherAI	2.7 B	825 GiB		MIT	First of free GPT-3 alternatives by EleutherAI.
GPT-J	2021.6	EleutherAI	6 B	825 GiB	200	Apache 2.0	GPT-3-style language model.
Megatron-Turing NLG	2021.10	Microsoft and Nvidia	530 B	338.6 billion tokens		Restricted web access	Standard architecture, trained on supercomputing cluster.
Ernie 3.0 Titan	2021.12	Baidu	260 B	4 Tb		Proprietary	Chinese-language LLM. Ernie Bot based on this model.
Claude	2021.12	Anthropic	52 B	400 billion tokens		Beta	Fine-tuned for desirable behavior in conversations.
GLaM	2021.12	Google	1.2 T	1.6 trillion tokens	5600	Proprietary	Sparse mixture of experts model; more expensive to train but cheaper to run inference.
Gopher	2021.12	DeepMind	280 T	300 billion tokens	5833	Proprietary
LaMDA	2022.1	Google	137 B	1.56T words, 168 billion tokens	4110	Proprietary	Specialized for response generation in conversations.
GPT-NeoX	2022.2	EleutherAI	20 B	825 GiB	740	Apache 2.0	Based on the Megatron architecture.
Chinchilla	2022.3	DeepMind	70 B	1.4 trillion tokens	6805	Proprietary	Reduced-parameter model trained on more data; used in Sparrow bot.
PaLM (Pathways Language Model)	2022.4	Google	540 B	768 billion tokens	29250	Proprietary	Aimed to reach the practical limits of model scale.
OPT (Open Pretrained Transformer)	2022.5	Meta	175 B	180 billion tokens	310	Non-commercial research	GPT-3 architecture with adaptations from Megatron.
YaLM 100B	2022.6	Yandex	100 B	1.7TB		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM.
Minerva	2022.6	Google	540 B	38.5B tokens from webpages and papers		Proprietary	LLM trained for solving mathematical and scientific questions.
BLOOM	2022.7	Large collaboration led by Hugging Face	175 B	350 billion tokens (1.6TB)		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus.
Galactica	2022.11	Meta	120 B	106 billion tokens		CC-BY-NC-4.0	Trained on scientific text and modalities.
AlexaTM (Teacher Models)	2022.11	Amazon	20 B	1.3 trillion		Proprietary	Bidirectional sequence-to-sequence architecture.
LLaMA (Large Language Model Meta AI)	2023.2	Meta	65 B	1.4 trillion	6300	Non-commercial research	Trained on a large 20-language corpus for better performance. Researchers fine-tuned Alpaca based on LLaMA weights.
GPT-4	2023.3	OpenAI	Exact number unknown	Unknown	Unknown	Proprietary	Available for ChatGPT Plus users and used in several products.
Cerebras-GPT	2023.3	Cerebras	13 B		270	Apache 2.0	Trained with Chinchilla formula.
Falcon	2023.3	Technology Innovation Institute	40 B	1 trillion tokens	2800	Apache 2.0	Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
BloombergGPT	2023.3	Bloomberg L.P.	50 B	363 billion token dataset		Proprietary	LLM trained on financial data, outperforms existing models on financial tasks.
PanGu-Σ	2023.3	Huawei	1.085 T	329 billion tokens		Proprietary
OpenAssistant	2023.3	LAION	17 B	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data.
Jurassic-2	2023.3	AI21 Labs	Exact size unknown	Unknown		Proprietary	Multilingual.
PaLM 2 (Pathways Language Model 2)	2023.5	Google	340 B	3.6 trillion tokens	85000	Proprietary	Used in Bard chatbot.
Llama 2	2023.7	Meta	70 B	2 trillion tokens		Llama 2 license	Successor of LLaMA.
Falcon 180B	2023.9	Technology Innovation Institute	180 B	3.5 trillion tokens		Falcon 180B TII license
Mistral 7B	2023.9	Mistral	7.3 B	Unknown		Apache 2.0

Source: Wikipedia

Listed Companies

Google

Bard

BARD, a revolutionary language model by Google, boasts an impressive 1.6 trillion parameters, enabling highly precise and contextually relevant text generation. It excels in providing scientifically accurate explanations and reasoning capabilities, thanks to its diverse training corpus from scientific manuscripts, research papers, books, and articles. BARD's specialized knowledge makes it a potent tool for tasks in the scientific domain.

Through its dedicated API, developers and researchers can seamlessly integrate BARD into various applications and platforms. Beyond text, BARD processes multimodal inputs, enhancing comprehension and responses, and supports multiple languages, making it adaptable to diverse linguistic contexts.

LaMDA / PaLM 2

Google's LaMDA has attracted substantial attention from mainstream observers, rivalling the level of interest seen with GPT-3. LaMDA operates in a manner similar to its primary competitor, albeit with fewer parameters, totalling 137 billion compared to GPT-3.5's 175 billion, which was employed in ChatGPT's training.

LaMDA served as the foundation for Google's chatbot competitor, Bard, which is presently undergoing testing for search functionality with select users. However, in response to the rapidly evolving landscape of LLM development, Google has elevated Bard to operate on PaLM 2, a significantly more advanced framework featuring over 340 billion parameters.

Table 9. Comparison: OpenAI vs Google

Main Competition Between OpenAI and Google in Recent Years
Round 1: BERT outperformed GPT-1 due to its bidirectional nature, allowing contextual analysis, whereas GPT, being unidirectional, lack this contextual capability and could only utilize preceding text.
- In June 2018, OpenAI released the 117 million parameter GPT-1 model, using a two-step process of pretraining and fine-tuning, setting the stage for future models.	- In June 2017, Google introduced the 65 million parameter Transformer model, a seminal moment in large language models. - In October 2018, Google presented BERT (Bidirectional Encoder Representation from Transformers) with 300 million parameters, outperforming GPT-1 due to its bidirectional nature.
Round 2: T5 won by a wide margin, with Google still in the lead.
- In February 2019, OpenAI launched GPT-2 with 1.5 billion parameters, raising concerns about its potential misuse. - In 2019, OpenAI's VP, Dario Amodei, and his team parted ways, founding Anthropic Lab and creating Claude.	- In October 2019, Google introduced T5 (Transfer Text-to-Text Transformer) with 11 billion parameters, setting a new NLP SOTA.
Round 3: The industry was still more accepting of Google's large models.
- In May 2020, OpenAI unveiled GPT-3 with a whopping 175 billion parameters, showcasing the power of large-scale models. - In January 2021, OpenAI introduced DALL-E, bridging text and image generation. - In June 2021, OpenAI and GitHub jointly released GitHub Copilot, powered by a 12 billion parameter Codex. - In March 2022, OpenAI released InstructGPT, highlighting the benefits of a 1.3 billion parameter model. - In July 2022, OpenAI unveiled DALL-E 2.	- In January 2021, Google debuted the Switch Transformer with 1.6 trillion parameters, superseding GPT-3. - In May 2021, Google showcased LaMDA (Language Model for Dialogue Applications) with 137 billion parameters.
Round 4: OpenAI's GPT-4 leads the way, but the model has become more closed off, accessible primarily through its API, raising concerns about transparency.
- In November 2022, OpenAI launched ChatGPT with around 200 billion parameters, a specialized conversational model. - In January 2023, Microsoft deepened its partnership with OpenAI, investing around $10 billion, valuing OpenAI at $290 billion. - In March 2023, OpenAI released GPT-4, with undisclosed parameters but eight times the token capacity of GPT-3, supporting multimodal input.	In February 2023, Google introduced Bard, a conversational AI system built on LaMDA, akin to ChatGPT.

Salesforce – CTRL

Salesforce's Conditional Transformer Language Model (CTRL), a formidable NLP model with 1.6B parameters, excels in precise text generation and control. It stands out with its capacity to attribute data sources to generated text, aiding analysis. Utilizing over 50 control codes, CTRL empowers users to finely control content and style, reducing randomness. Moreover, CTRL's adaptable nature has the potential to enhance various NLP applications through fine-tuning or leveraging its learned representations.

Databricks - Dolly

Dolly, developed by Databricks, is a practical 12B parameter LLM based on Pythia. It excels at following instructions, covering various domains like brainstorming, QA, and summarization. Dolly 2.0, an open-source model, offers ChatGPT-like interactivity, fine-tuned on a high-quality dataset. Databricks provides the entire package, including code, data, and model weights, allowing commercial use without external data sharing. The 15,000-pair training dataset is available for modification and commercial applications. While not state-of-the-art, Dolly serves specific instruction-following tasks effectively.

pen Source Communities

Hugging Face - Transformers library

Hugging Face is a company specializing in natural language processing (NLP) technologies. It has played a significant role in advancing the Transformers library, a resource offering pre-trained models for a wide range of NLP tasks, including question answering and translation. Established in 2016 by Clément Delangue, Thomas Wolf, and Julien Chaumond, Hugging Face has steadily made remarkable strides in the field since its inception.

BLOOM (BigScience)

BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a massive self-regressive model developed by over 1,000 AI researchers. It aims to provide a large-scale language model for public access. Trained on 366 billion tokens from March to July 2022, it competes with OpenAI's GPT-3, boasting 1.76 trillion parameters. BLOOM uses a decoder-only transformer architecture and has successors like Firefly, BELLE, and TigerBot.

Startup Companies

OpenAI - GPT

OpenAI has released influential language models like GPT-3 and GPT-4, shaping the AI landscape. These models excel in various language tasks, offering customizable features through APIs. OpenAI's pioneering use of Reinforcement Learning from Human Feedback (RLHF) improves model behavior in chat contexts. Their consistent top-tier performance makes them invaluable in research and development.

Figure 8. The Evolution of Large Language ModelsSource: “A Survey of Large Language Models”

OpenAI's GPT-4 pricing:

Gpt-3.5: $0.002 per 1000 tokens for both input and output.

Gpt-4–8k: $0.03 per 1000 tokens for input and $0.06 per 1000 tokens for output.

Gpt-4–32k: $0.06 per 1000 tokens for input and $0.12 per 1000 tokens for output.

Anthropic – Claude

Anthropic, founded by former OpenAI experts, rivals ChatGPT with Claude. Claude employs "constitutional AI" and adheres to predefined principles, setting it apart. Google's $300 million investment for a 10% stake mirrors Microsoft's involvement with OpenAI. Anthropic is a key figure in AI ethics and development, emphasizing safety and value alignment. Claude, their versatile language model, excels in text processing and natural conversations, offering extensive general knowledge, multilingual abilities, and automation. It introduces constitutional AI, enabling precise control over AI behaviour.

Cohere

Cohere, a prominent AI company, offers two key language models: command-xlarge (6 billion parameters) and command-medium (50 billion parameters). These models excel in handling instruction-like prompts, making them ideal for chatbots. Cohere's LLMs are geared toward content generation, summarization, and search, with a focus on enterprise-scale operations and data security. They provide a user-friendly API, supporting semantic search, text summarization, generation, and classification in over 100 languages. Cohere also offers the Cohere Playground, simplifying model testing through a code-free visual interface.

AI21 – Jurassic-2

AI21, an Israeli startup, challenges OpenAI effectively. In 2021, they unveiled Jurassic-2, a large language model with 178 billion parameters, slightly surpassing GPT-3.5. In March 2023, they prioritized performance over size, claiming even the smallest Jurassic-2 outperforms its predecessor's largest version. Jurassic-2 includes a grammatical correction API and text segmentation features.

AI21 studio allows users to train their own LLM versions with just 50-100 examples, granting exclusive access. Notably, AI21 used Jurassic-1 and Jurassic-2 to enhance its WordTune Spices chatbot. It sets itself apart from ChatGPT with live data retrieval and source citations, addressing accuracy and plagiarism concerns in a competitive field.

Stability AI - StableLM

StableLM is an open-source language model created by Stability AI. It's trained on "The Pile," a substantial dataset composed of over 20 smaller datasets, totaling 825 GB of natural language data. This model offers lightweight language generation capabilities suitable for various applications.

The Outlook of Large Language Models

Future Advancements in LLMs

ChatGPT, for example, already surpasses GPT-3.5 with context lengths exceeding 10,000 tokens. Future enhancements can potentially scale this to hundreds of thousands or even millions of tokens through techniques like efficient attention and recursive encoding. Current models haven't reached their size limit, and models of greater scale are achievable. Likewise, the volume of training data, including human feedback, continues to grow. In terms of multimodal processing,** videos can expand the training dataset by two orders of magnitude. This allows for linear scaling of existing capabilities and the emergence of new ones. For example, exposure to diverse visual and algebraic content might lead to spontaneous learning of analytical geometry. In addition, existing models excel in humanities but lag in scientific domains. It is feasible for fine-tuning for specialization, transferring skills from one domain to another, even at the cost of some general capabilities and without extensive scaling. For instance, bolstering humanities proficiency to expert level while trading off scientific prowess.

Table 10. LLM Influence Forecast on Generative AI market

Timeline	LLM Influence and Trends
Short-term 2023-2025	- Enhanced fine-tuning and optimization of Large Language Models (LLMs) across various applications, resulting in more precise and context-aware text generation. - Integration of text-based Generative AI with other modalities like images and videos, addressing diverse market needs.
Mid-term 2026-2028	- Increasing specialization of Generative AI models for specific domains, tailoring AI-generated content to industries and niches within the LLM market. - Users gaining fine-grained control over style, tone, and other attributes, enhancing versatility for the expanding LLM market. - Designing Generative AI models for seamless integration with commonly used software and platforms across industry verticals.
Long-term 2029-2030	- Emergence of fully autonomous Generative AI systems capable of generating content across various domains, including art, music, and literature. - Growing utilization of Generative AI in discovery and material design as researchers explore new technological possibilities. - Establishment of ethical guidelines for the responsible and safe use of Generative AI across diverse applications within the LLM market and the Generative AI ecosystem.

Challenges

Current evaluation datasets for large models are often limited and don't represent real-world complexity. We need diverse, real-world datasets to ensure models can handle real-world challenges effectively. In terms of ethical alignment, ensuring that large models align with human values and don't reinforce harmful behavior is crucial. Addressing ethical concerns is vital to prevent potential disasters.

In addition, large model research must prioritize security. This includes enhancing model interpretability and supervision to eliminate security risks. Security should be an integral part of model development. What’s more, whether model performance will continue to improve with larger sizes remains uncertain. Our understanding of large models is limited, and gaining fundamental insights is valuable.

Investment Insights

Companies with Strong Model Training Capabilities: Consider allocating investments to companies that possess robust and sustained capabilities in the development of large language models (LLMs). Such companies, when they leverage their models to create applications, enjoy substantial technological barriers and a competitive edge over those relying solely on third-party LLMs.
Multimodal Directions: Explore investment prospects in companies diversifying into multimodal applications after establishing a strong foundation in LLM development.
- Multimodal applications encompass the processing of various types of media data, including text, images, audio, and more.
- Investing in this area may yield innovative applications and technologies that cater to a wide range of industry needs, building upon the solid groundwork laid by LLM capabilities.
Cost-Advantaged Companies: Companies with cost competitiveness may present attractive opportunities in the market, especially for enterprise clients seeking large-scale deployments.
- Identify companies that excel in effectively controlling and reducing deployment costs associated with large language models.
- Consider enterprises with advantages in GPU resource management, model precision optimization, and cost-efficient infrastructure solutions.