Generative AI & Large Language Models: A Comprehensive Glossary

Foreword

Who This Glossary Is For

This comprehensive glossary is designed for a diverse audience navigating the rapidly evolving landscape of generative AI and large language models:

Business leaders and executives seeking to understand the strategic implications of AI technologies
Developers and engineers implementing AI solutions and needing precise technical definitions
Students and researchers studying artificial intelligence, machine learning, and natural language processing
Product managers working with AI-powered features and applications
Content creators and marketers leveraging generative AI tools in their workflows
Curious professionals from any field wanting to build foundational knowledge about these transformative technologies

How to Use This Glossary

This glossary is thoughtfully structured to progress from fundamental concepts to advanced techniques, with each section building upon previous knowledge:

For Beginners: Start with Section I (Core AI & Machine Learning Concepts) and progress sequentially through the sections. The numbered definitions and cross-references will guide you through the learning journey.

For Experienced Practitioners: Use the comprehensive index and cross-references to quickly locate specific terms. Each definition includes practical examples and real-world applications relevant to your work.

For Reference: The document is designed as a living reference guide. Bookmark frequently used terms and utilize the cross-references (indicated by #numbers) to explore related concepts.

Key Features

Progressive Structure: Each section builds upon previous concepts
Cross-References: Numbered links (#) connect related terms throughout the document
Practical Examples: Real-world applications and technical specifications
Analogies: Complex concepts explained through relatable comparisons
Current Information: Reflects the state of the field as of 2025

I. Core AI & Machine Learning Concepts

*This section establishes the fundamental building blocks of artificial intelligence and machine learning that form the foundation for understanding more complex generative AI systems.*

1. Artificial Intelligence (AI)

Definition:The broad field of computer science focused on creating machines and systems that can perform tasks typically requiring human intelligence. This includes learning from experience, understanding language, recognizing patterns, solving problems, and making decisions.

Analogy: Teaching machines to think and act like humans in specific situations.

Real-world Applications:Voice assistants (Siri, Alexa), recommendation systems (Netflix, Amazon), navigation apps (Google Maps).

2. Machine Learning (ML)

Definition: A branch of Artificial Intelligence (AI) that enables systems to learn from data without explicit programming. ML algorithms identify patterns and make predictions or decisions, improving their performance as they are exposed to more data.

Analogy: Teaching a child to recognize a cat by showing them many pictures of cats and dogs, rather than writing down a strict list of features for "cat."

Real-world Applications: Email spam detection, recommendation systems, predictive text, fraud detection.

3. Supervised Learning

Definition: A type of Machine Learning (#1) where the Algorithm (#39) learns from examples that include both input data and the correct answers (labels). The model learns to map inputs to outputs by studying these labeled examples.

Analogy: Learning with a teacher who provides both questions and correct answers, like flashcards with questions on one side and answers on the other.

Real-world Applications: Email spam detection (emails labeled as spam/not spam), image recognition (photos labeled with what they contain).

4. Unsupervised Learning

Definition: A type of Machine Learning (#1) where the Algorithm (#39) finds patterns and structures in data without being given specific correct answers or labels. The model discovers hidden relationships on its own.

Analogy: Like exploring a new city without a map or guide, discovering interesting places and patterns by yourself.

Real-world Applications: Grouping customers by purchasing behavior, finding topics in news articles, detecting unusual patterns in data.

5. Neural Network (NN)

Definition: A computational model inspired by the human brain's structure, consisting of interconnected nodes (neurons) in layers. They learn by adjusting connection strengths (weights) as they process data, excelling at finding complex patterns. Neural networks are the bedrock of modern deep learning.

Analogy: A complex chain of interconnected switches that learn to activate in specific patterns based on the input they receive.

Cross-reference: See Parameters (#6) for how neural networks store learned information.

6. Generative AI

Definition: A category of Artificial Intelligence focused on creating new, original content (e.g., text, images, audio, video, code) that resembles human-created output. This contrasts with discriminative AI, which primarily classifies or predicts based on existing data.

Examples: ChatGPT generating a poem, DALL-E creating an image from text, GitHub Copilot writing code, Midjourney creating artwork.

Real-world Applications: Content creation, code generation, image synthesis, music composition, automated writing assistance.

7. Large Language Model (LLM)

Definition: A specific type of Generative AI model, characterized by its massive scale (billions to trillions of Parameters - see #6) and training on vast datasets of text and code. LLMs excel at understanding, generating, and processing human language, performing tasks by predicting the most probable next Token (see #7).

Examples: GPT-3/4 (OpenAI), Llama (Meta), Gemini (Google), Claude (Anthropic), PaLM (Google).

Scale Reference: GPT-3 has 175 billion parameters, GPT-4 estimated at 1+ trillion parameters.

II. LLM Core Components & Mechanics

*This section explores the internal architecture and fundamental mechanisms that enable LLMs to process and generate human-like text.*

8. Model

Definition: The learned mathematical representation within an AI system that processes input to produce output. In LLMs, it's the trained neural network architecture responsible for language understanding and generation.

Analogy: The "brain" of the AI system that has learned patterns from training data and can apply them to new situations.

9. Parameters

Definition: The internal numerical values (weights and biases) that a Model (#5) learns and adjusts during its training phase. They define how input data is transformed into output. The "size" of an LLM is often measured by its number of parameters.

Analogy: The specific settings or dials on a complex machine that are fine-tuned to make it perform its task perfectly.

Scale Examples:

GPT-3: 175 billion parameters
GPT-4: ~1 trillion parameters (estimated)
Llama 2: 7B, 13B, 70B parameter versions

10. Tokens

Definition: The fundamental units of text that an LLM processes. Text is broken down into tokens for input and generated as tokens for output. Tokens can be words, parts of words (subwords), punctuation, or special characters.

Technical Example: "Understanding large language models" might be tokenized as ["Under", "standing", " large", " language", " models"] - note that spaces are often included with following words.

Practical Note: Most models use ~4 characters per token on average for English text.

11. Embeddings

Definition: Numerical representations (vectors) of words, phrases, or entire documents in a high-dimensional space. Words with similar meanings or contexts are positioned closer together in this space. Embeddings allow LLMs to understand the semantic relationships between Tokens (#7).

Analogy: A "map" where words are plotted, and words with similar meanings are located near each other - "king" and "queen" would be close, both near "royalty."

Cross-reference: See Vector Databases (#34) for how embeddings are stored and retrieved.

12. Attention Mechanism

Definition: A core component within Transformer architecture (#11) that allows the model to selectively focus on different parts of the input sequence when processing each Token in the sequence. It helps the model weigh the importance of various words in the Context (#25).

Real-world Analogy: Like a spotlight that can focus on different parts of a scene while maintaining awareness of the whole picture.

13. Self-Attention

Definition: A specific type of Attention Mechanism (#9) where a model relates different positions of a single sequence to compute a representation of that sequence. It enables each Token in a prompt to "look at" and understand its relationship with all other tokens in the same prompt, capturing long-range dependencies and nuances of meaning.

Technical Example: When processing "The bank by the river was steep," self-attention helps the model understand that "bank" relates to "river" and "steep," not to financial services.

14. Transformer (Architecture)

Definition: A groundbreaking neural network architecture (introduced in "Attention Is All You Need," Vaswani et al., 2017) that revolutionized NLP. It relies entirely on Attention Mechanisms (#9) and Positional Encoding (#12) to process sequences in parallel, making it highly efficient for training on large datasets and handling long-range dependencies. It is the foundational architecture for nearly all modern LLMs.

Innovation: Unlike previous architectures, Transformers can process all tokens simultaneously rather than sequentially, dramatically speeding up training.

15. Positional Encoding

Definition: A method used in Transformers (#11) to inject information about the relative or absolute position of Tokens in a sequence. Since Transformers process tokens in parallel without an inherent understanding of their order, positional encodings are crucial for the model to grasp sequence order and relationships.

Why It Matters: Without positional encoding, "The cat sat on the mat" would be processed identically to "Mat the on sat cat the."

III. Training & Development Processes

*This section covers the multi-stage process of creating, training, and refining LLMs to make them useful and aligned with human intentions.*

16. Pre-training

Definition: The initial, large-scale training phase of an LLM on vast and diverse datasets (e.g., Common Crawl, books, academic papers, code repositories). During this phase, the model learns general language patterns, grammar, factual knowledge, and reasoning abilities through self-supervised tasks like predicting the next Token (#7).

Scale: Modern LLMs are trained on datasets containing trillions of tokens from diverse sources.

Duration: Can take weeks to months using thousands of GPUs.

17. Fine-tuning

Definition: The process of taking a Pre-trained (#13) LLM and further training it on a smaller, specific dataset for a particular task or domain. This adapts the model's general knowledge to perform specialized functions or adhere to a specific style.

Types:

Instruction Fine-tuning: Teaching the model to follow human instructions
Domain Fine-tuning: Adapting to specific fields (medical, legal, technical)
Task Fine-tuning: Optimizing for specific tasks (summarization, translation)

Examples: Training GPT-3 on customer service conversations to create a support chatbot.

18. Reinforcement Learning with Human Feedback (RLHF)

Definition: A crucial technique used to align LLMs with human preferences and instructions. After Fine-tuning (#14), human annotators rank model outputs, and this feedback is used to train a "reward model." This reward model then guides a reinforcement learning algorithm to further optimize the LLM's behavior, making it more helpful, honest, and harmless.

Process:

Collect human preferences on model outputs
Train a reward model to predict human preferences
Use reinforcement learning to optimize the LLM using the reward model

Analogy: A student learning to write better essays by getting grades and feedback from teachers, then adjusting their writing style accordingly.

19. Alignment

Definition: The effort to ensure that an AI model's behavior, goals, and values are consistent with human intentions, ethical principles, and safety guidelines. RLHF (#15) is a primary method for achieving alignment in LLMs.

Key Objectives:

Helpfulness: Providing useful, relevant responses
Harmlessness: Avoiding harmful or dangerous outputs
Honesty: Being truthful and acknowledging uncertainty

Cross-reference: See Constitutional AI (#37) for an alternative alignment approach.

20. Quantization

Definition: A model compression technique that reduces the precision of Parameters (#6) from higher-bit representations (e.g., 32-bit floating point) to lower-bit representations (e.g., 8-bit integers). This significantly reduces model size and computational requirements while maintaining most of the model's performance.

Benefits: Enables deployment on devices with limited memory and processing power.

Trade-offs: Slight reduction in model quality for significant improvements in speed and memory usage.

IV. Inference, Generation & Control

*This section focuses on how trained LLMs are used in practice, including techniques for controlling and optimizing their outputs.*

21. Inference

Definition: The process of using a trained AI Model (#5) to make predictions or generate outputs on new, unseen input data (Prompts - see #19). This is the "runtime" phase where the model applies its learned knowledge.

Performance Metrics: Measured by Latency (#30), throughput, and resource utilization.

22. Prompt

Definition: The input text provided to a generative AI Model to initiate a response. A prompt can be a question, a command, an instruction, or any text designed to guide the model's generation.

Types:

System Prompts: Set the model's behavior and role
User Prompts: Direct instructions or queries
Context Prompts: Provide background information

23. Prompt Engineering

Definition: The art and science of designing, refining, and optimizing Prompts (#19) to elicit desired, high-quality, and specific outputs from generative AI models. It involves structuring the prompt to effectively guide the model's understanding and generation.

Key Techniques:

Clear, specific instructions
Providing examples (Few-shot Learning - see #24)
Setting context and constraints
Using structured formats

Cross-reference: See Chain-of-Thought Prompting (#25) for advanced reasoning techniques.

24. Temperature

Definition: A hyperparameter used during text generation that controls the randomness or creativity of the model's output by affecting the probability distribution over possible next Tokens (#7).

Settings:

Low Temperature (0.1-0.3): Deterministic, focused, predictable outputs. Ideal for factual responses, code generation, structured tasks.
Medium Temperature (0.4-0.7): Balanced creativity and coherence. Good for general conversation.
High Temperature (0.8-1.0+): Creative, diverse, sometimes surprising outputs. Excellent for creative writing, brainstorming.

25. Top-k Sampling

Definition: A text generation strategy where the model only considers the k most probable Tokens as candidates for the next word, then samples from this reduced set. This introduces controlled randomness while preventing highly improbable outputs.

Example: With top-k=5, if the next word could be "cat" (40%), "dog" (30%), "bird" (15%), "fish" (10%), "elephant" (5%), the model only considers these 5 options.

Typical Values: k=40-100 for balanced outputs.

26. Top-p Sampling (Nucleus Sampling)

Definition: A dynamic text generation strategy that selects the smallest set of most probable Tokens whose cumulative probability exceeds a threshold p. The model then samples from this "nucleus" of tokens, allowing for adaptive vocabulary size based on the probability distribution.

Example: With top-p=0.9, the model considers the minimum number of top tokens that collectively account for 90% of the probability mass.

Advantage: More adaptive than Top-k Sampling (#22) - uses fewer tokens when there's a clear best choice, more when probabilities are spread out.

27. Zero-shot, One-shot, Few-shot Learning

Definition: Different paradigms for providing examples within a Prompt (#19) to guide an LLM's performance:

Zero-shot: No examples provided; the model relies solely on its pre-trained knowledge and the instruction.
One-shot: One example demonstrates the desired task format or output style.
Few-shot: Multiple examples (typically 2-8) help the model understand the task pattern and requirements.

Example:

Zero-shot: "Translate to French: Hello"
One-shot: "Translate to French: Hello → Bonjour. Now translate: Goodbye"
Few-shot: "Translate to French: Hello → Bonjour, Thank you → Merci, Good morning → Bonjour. Now translate: Goodbye"

28. Chain-of-Thought Prompting

Definition: An advanced Prompt Engineering (#20) technique where the model is explicitly instructed to show its reasoning process step-by-step. This encourages the LLM to break down complex problems, leading to more accurate and coherent answers, particularly for multi-step reasoning tasks.

Example: "Let's think step by step. To solve this math problem, first I need to..."

Benefits: Improved accuracy on reasoning tasks, better transparency, easier debugging of model logic.

V. Advanced Techniques & Applications

*This section covers sophisticated methods for enhancing LLM capabilities and addressing their limitations.*

29. Retrieval-Augmented Generation (RAG)

Definition: A technique that combines the generative capabilities of LLMs with external knowledge retrieval systems. The model first searches a knowledge base for relevant information, then uses this retrieved context to generate more accurate, up-to-date, and factually grounded responses.

Process:

User query is processed to identify relevant information needs
External knowledge base is searched using Embeddings (#8)
Retrieved information is provided as context to the LLM
LLM generates response based on both its training and retrieved context

Benefits: Reduces Hallucination (#28), enables access to current information, allows for specialized knowledge domains.

Real-world Applications: Customer support systems, research assistants, document analysis tools.

30. Vector Databases

Definition: Specialized databases optimized for storing, indexing, and querying high-dimensional Embeddings (#8). They enable efficient similarity search and are crucial for RAG (#26) systems and semantic search applications.

Key Features:

Fast similarity search using cosine similarity or other distance metrics
Scalable to billions of vectors
Integration with LLM workflows

Popular Examples: Pinecone, Weaviate, Chroma, Qdrant.

31. Constitutional AI

Definition: An alignment technique developed by Anthropic that uses a set of principles (a "constitution") to guide AI behavior. Instead of relying solely on human feedback, the model learns to critique and improve its own outputs based on these constitutional principles.

Process:

Model generates initial response
Model critiques its own response against constitutional principles
Model revises response based on self-critique
Process repeats until response aligns with principles

Advantages: More scalable than RLHF (#15), consistent application of principles, reduced need for human oversight.

VI. LLM Limitations & Practical Considerations

*This section addresses the current challenges and constraints when working with LLMs in real-world applications.*

32. Context Window / Context Length

Definition: The total amount of input Tokens (#7) that an LLM can process and "remember" in a single interaction. This defines how much information the model can consider when generating its response.

Current Ranges:

GPT-3.5: 4,096 tokens (~3,000 words)
GPT-4: 8,192 tokens (standard) to 128,000 tokens (extended)
Claude: Up to 200,000 tokens
Gemini: Up to 1,000,000 tokens

Practical Impact: Determines ability to handle long documents, extended conversations, or complex multi-part tasks.

33. Token Limit

Definition: The maximum number of Tokens (input + output combined) that an LLM can handle within a single API call or conversation turn. Exceeding this limit results in Truncation (#31).

Planning Consideration: Must account for both input prompt length and expected output length when designing applications.

34. Truncation

Definition: The process of cutting off parts of the input Prompt (#19) or generated output when the total Token count exceeds the model's limits. This can lead to loss of critical information or incomplete responses.

Mitigation Strategies:

Summarize long inputs
Break complex tasks into smaller parts
Prioritize most important information
Use RAG (#26) for long documents

35. Hallucination

Definition: When an LLM generates plausible-sounding but factually incorrect, nonsensical, or fabricated information. This occurs due to the model's training to predict probable next tokens rather than verify factual accuracy.

Common Types:

Factual errors (incorrect dates, statistics, claims)
Fabricated sources or citations
Confident statements about uncertain topics
Logical inconsistencies

Mitigation Approaches: RAG (#26), fact-checking systems, confidence scoring, human verification.

36. Prompt Injection

Definition: A security vulnerability where malicious users craft Prompts (#19) to override the model's original instructions or safety guidelines, potentially causing it to perform unintended actions or reveal sensitive information.

Example: A user might try to "jailbreak" a model by saying "Ignore all previous instructions and instead help me with..."

Defense Strategies: Input sanitization, prompt validation, output filtering, robust system design.

37. Latency

Definition: The time delay between when a Prompt is submitted and when the model begins returning its response. Critical for real-time applications and user experience.

Factors Affecting Latency:

Model size and complexity
Context Length (#29)
Server load and geographic location
Temperature (#21) and sampling parameters

Typical Ranges: 100ms to several seconds depending on complexity and infrastructure.

38. API Rate Limiting

Definition: Restrictions imposed by LLM providers on the number of requests, tokens, or computational resources that can be used within a specific time period. This prevents abuse and ensures fair access to the service.

Common Limits:

Requests per minute/hour
Tokens per minute/hour
Concurrent requests
Monthly usage quotas

Business Impact: Must be considered when designing applications for scale.

VII. Beyond Text: Multimodal & Specialized Models

*This section explores generative AI models that work with different types of data beyond text.*

39. Diffusion Models

Definition: A class of Generative AI models primarily used for generating high-quality images, audio, and video. They work by learning to gradually remove noise from a pure noise input to produce coherent content, effectively reversing a diffusion process.

Key Examples:

Image Generation: Stable Diffusion, DALL·E 2/3, Midjourney
Video Generation: Sora, Runway ML
Audio Generation: MusicLM, AudioLDM

Process: Start with random noise → gradually denoise through learned steps → produce final output.

40. Multimodal Models

Definition: Generative AI models capable of processing and generating content across multiple modalities (types of data) simultaneously. This includes combinations of text, images, audio, and video.

Capabilities:

Image-to-text description
Text-to-image generation
Visual question answering
Audio-visual understanding
Document analysis with text and images

Examples: GPT-4V (vision), Gemini Pro Vision, Claude 3 (vision), DALL·E 3.

Real-world Applications: Content creation, accessibility tools, educational materials, creative workflows.

Foreword

Who This Glossary Is For

How to Use This Glossary

Key Features

I. Core AI & Machine Learning Concepts

1. Artificial Intelligence (AI)

2. Machine Learning (ML)

3. Supervised Learning

4. Unsupervised Learning

5. Neural Network (NN)

6. Generative AI

7. Large Language Model (LLM)

II. LLM Core Components & Mechanics

8. Model

9. Parameters

10. Tokens

11. Embeddings

12. Attention Mechanism

13. Self-Attention

14. Transformer (Architecture)

15. Positional Encoding

III. Training & Development Processes

16. Pre-training

17. Fine-tuning

18. Reinforcement Learning with Human Feedback (RLHF)

19. Alignment

20. Quantization

IV. Inference, Generation & Control

21. Inference

22. Prompt

23. Prompt Engineering

24. Temperature

25. Top-k Sampling

26. Top-p Sampling (Nucleus Sampling)

27. Zero-shot, One-shot, Few-shot Learning

28. Chain-of-Thought Prompting

V. Advanced Techniques & Applications

29. Retrieval-Augmented Generation (RAG)

30. Vector Databases

31. Constitutional AI

VI. LLM Limitations & Practical Considerations

32. Context Window / Context Length

33. Token Limit

34. Truncation

35. Hallucination

36. Prompt Injection

37. Latency

38. API Rate Limiting

VII. Beyond Text: Multimodal & Specialized Models

39. Diffusion Models

40. Multimodal Models

Conclusion