Foreword
Who This Glossary Is For
This comprehensive glossary is designed for a diverse audience navigating the rapidly evolving landscape of generative AI and large language models:
- Business leaders and executives seeking to understand the strategic implications of AI technologies
- Developers and engineers implementing AI solutions and needing precise technical definitions
- Students and researchers studying artificial intelligence, machine learning, and natural language processing
- Product managers working with AI-powered features and applications
- Content creators and marketers leveraging generative AI tools in their workflows
- Curious professionals from any field wanting to build foundational knowledge about these transformative technologies
How to Use This Glossary
This glossary is thoughtfully structured to progress from fundamental concepts to advanced techniques, with each section building upon previous knowledge:
For Beginners: Start with Section I (Core AI & Machine Learning Concepts) and progress sequentially through the sections. The numbered definitions and cross-references will guide you through the learning journey.
For Experienced Practitioners: Use the comprehensive index and cross-references to quickly locate specific terms. Each definition includes practical examples and real-world applications relevant to your work.
For Reference: The document is designed as a living reference guide. Bookmark frequently used terms and utilize the cross-references (indicated by #numbers) to explore related concepts.
Key Features
- Progressive Structure: Each section builds upon previous concepts
- Cross-References: Numbered links (#) connect related terms throughout the document
- Practical Examples: Real-world applications and technical specifications
- Analogies: Complex concepts explained through relatable comparisons
- Current Information: Reflects the state of the field as of 2025
I. Core AI & Machine Learning Concepts
*This section establishes the fundamental building blocks of artificial intelligence and machine learning that form the foundation for understanding more complex generative AI systems.*
1. Artificial Intelligence (AI)
Definition:The broad field of computer science focused on creating machines and systems that can perform tasks typically requiring human intelligence. This includes learning from experience, understanding language, recognizing patterns, solving problems, and making decisions.
Analogy: Teaching machines to think and act like humans in specific situations.
Real-world Applications:Voice assistants (Siri, Alexa), recommendation systems (Netflix, Amazon), navigation apps (Google Maps).
2. Machine Learning (ML)
Definition: A branch of Artificial Intelligence (AI) that enables systems to learn from data without explicit programming. ML algorithms identify patterns and make predictions or decisions, improving their performance as they are exposed to more data.
Analogy: Teaching a child to recognize a cat by showing them many pictures of cats and dogs, rather than writing down a strict list of features for "cat."
Real-world Applications: Email spam detection, recommendation systems, predictive text, fraud detection.
3. Supervised Learning
Definition: A type of Machine Learning (#1) where the Algorithm (#39) learns from examples that include both input data and the correct answers (labels). The model learns to map inputs to outputs by studying these labeled examples.
Analogy: Learning with a teacher who provides both questions and correct answers, like flashcards with questions on one side and answers on the other.
Real-world Applications: Email spam detection (emails labeled as spam/not spam), image recognition (photos labeled with what they contain).
4. Unsupervised Learning
Definition: A type of Machine Learning (#1) where the Algorithm (#39) finds patterns and structures in data without being given specific correct answers or labels. The model discovers hidden relationships on its own.
Analogy: Like exploring a new city without a map or guide, discovering interesting places and patterns by yourself.
Real-world Applications: Grouping customers by purchasing behavior, finding topics in news articles, detecting unusual patterns in data.
5. Neural Network (NN)
Definition: A computational model inspired by the human brain's structure, consisting of interconnected nodes (neurons) in layers. They learn by adjusting connection strengths (weights) as they process data, excelling at finding complex patterns. Neural networks are the bedrock of modern deep learning.
Analogy: A complex chain of interconnected switches that learn to activate in specific patterns based on the input they receive.
Cross-reference: See Parameters (#6) for how neural networks store learned information.
6. Generative AI
Definition: A category of Artificial Intelligence focused on creating new, original content (e.g., text, images, audio, video, code) that resembles human-created output. This contrasts with discriminative AI, which primarily classifies or predicts based on existing data.
Examples: ChatGPT generating a poem, DALL-E creating an image from text, GitHub Copilot writing code, Midjourney creating artwork.
Real-world Applications: Content creation, code generation, image synthesis, music composition, automated writing assistance.
7. Large Language Model (LLM)
Definition: A specific type of Generative AI model, characterized by its massive scale (billions to trillions of Parameters - see #6) and training on vast datasets of text and code. LLMs excel at understanding, generating, and processing human language, performing tasks by predicting the most probable next Token (see #7).
Examples: GPT-3/4 (OpenAI), Llama (Meta), Gemini (Google), Claude (Anthropic), PaLM (Google).
Scale Reference: GPT-3 has 175 billion parameters, GPT-4 estimated at 1+ trillion parameters.
II. LLM Core Components & Mechanics
*This section explores the internal architecture and fundamental mechanisms that enable LLMs to process and generate human-like text.*
8. Model
Definition: The learned mathematical representation within an AI system that processes input to produce output. In LLMs, it's the trained neural network architecture responsible for language understanding and generation.
Analogy: The "brain" of the AI system that has learned patterns from training data and can apply them to new situations.
9. Parameters
Definition: The internal numerical values (weights and biases) that a Model (#5) learns and adjusts during its training phase. They define how input data is transformed into output. The "size" of an LLM is often measured by its number of parameters.
Analogy: The specific settings or dials on a complex machine that are fine-tuned to make it perform its task perfectly.
Scale Examples:
- GPT-3: 175 billion parameters
- GPT-4: ~1 trillion parameters (estimated)
- Llama 2: 7B, 13B, 70B parameter versions
10. Tokens
Definition: The fundamental units of text that an LLM processes. Text is broken down into tokens for input and generated as tokens for output. Tokens can be words, parts of words (subwords), punctuation, or special characters.
Technical Example: "Understanding large language models" might be tokenized as ["Under", "standing", " large", " language", " models"] - note that spaces are often included with following words.
Practical Note: Most models use ~4 characters per token on average for English text.
11. Embeddings
Definition: Numerical representations (vectors) of words, phrases, or entire documents in a high-dimensional space. Words with similar meanings or contexts are positioned closer together in this space. Embeddings allow LLMs to understand the semantic relationships between Tokens (#7).
Analogy: A "map" where words are plotted, and words with similar meanings are located near each other - "king" and "queen" would be close, both near "royalty."
Cross-reference: See Vector Databases (#34) for how embeddings are stored and retrieved.
12. Attention Mechanism
Definition: A core component within Transformer architecture (#11) that allows the model to selectively focus on different parts of the input sequence when processing each Token in the sequence. It helps the model weigh the importance of various words in the Context (#25).
Real-world Analogy: Like a spotlight that can focus on different parts of a scene while maintaining awareness of the whole picture.
13. Self-Attention
Definition: A specific type of Attention Mechanism (#9) where a model relates different positions of a single sequence to compute a representation of that sequence. It enables each Token in a prompt to "look at" and understand its relationship with all other tokens in the same prompt, capturing long-range dependencies and nuances of meaning.
Technical Example: When processing "The bank by the river was steep," self-attention helps the model understand that "bank" relates to "river" and "steep," not to financial services.
14. Transformer (Architecture)
Definition: A groundbreaking neural network architecture (introduced in "Attention Is All You Need," Vaswani et al., 2017) that revolutionized NLP. It relies entirely on Attention Mechanisms (#9) and Positional Encoding (#12) to process sequences in parallel, making it highly efficient for training on large datasets and handling long-range dependencies. It is the foundational architecture for nearly all modern LLMs.
Innovation: Unlike previous architectures, Transformers can process all tokens simultaneously rather than sequentially, dramatically speeding up training.
15. Positional Encoding
Definition: A method used in Transformers (#11) to inject information about the relative or absolute position of Tokens in a sequence. Since Transformers process tokens in parallel without an inherent understanding of their order, positional encodings are crucial for the model to grasp sequence order and relationships.
Why It Matters: Without positional encoding, "The cat sat on the mat" would be processed identically to "Mat the on sat cat the."
III. Training & Development Processes
*This section covers the multi-stage process of creating, training, and refining LLMs to make them useful and aligned with human intentions.*
16. Pre-training
Definition: The initial, large-scale training phase of an LLM on vast and diverse datasets (e.g., Common Crawl, books, academic papers, code repositories). During this phase, the model learns general language patterns, grammar, factual knowledge, and reasoning abilities through self-supervised tasks like predicting the next Token (#7).
Scale: Modern LLMs are trained on datasets containing trillions of tokens from diverse sources.
Duration: Can take weeks to months using thousands of GPUs.
17. Fine-tuning
Definition: The process of taking a Pre-trained (#13) LLM and further training it on a smaller, specific dataset for a particular task or domain. This adapts the model's general knowledge to perform specialized functions or adhere to a specific style.
Types:
- Instruction Fine-tuning: Teaching the model to follow human instructions
- Domain Fine-tuning: Adapting to specific fields (medical, legal, technical)
- Task Fine-tuning: Optimizing for specific tasks (summarization, translation)
Examples: Training GPT-3 on customer service conversations to create a support chatbot.
18. Reinforcement Learning with Human Feedback (RLHF)
Definition: A crucial technique used to align LLMs with human preferences and instructions. After Fine-tuning (#14), human annotators rank model outputs, and this feedback is used to train a "reward model." This reward model then guides a reinforcement learning algorithm to further optimize the LLM's behavior, making it more helpful, honest, and harmless.
Process:
- Collect human preferences on model outputs
- Train a reward model to predict human preferences
- Use reinforcement learning to optimize the LLM using the reward model
Analogy: A student learning to write better essays by getting grades and feedback from teachers, then adjusting their writing style accordingly.
19. Alignment
Definition: The effort to ensure that an AI model's behavior, goals, and values are consistent with human intentions, ethical principles, and safety guidelines. RLHF (#15) is a primary method for achieving alignment in LLMs.
Key Objectives:
- Helpfulness: Providing useful, relevant responses
- Harmlessness: Avoiding harmful or dangerous outputs
- Honesty: Being truthful and acknowledging uncertainty
Cross-reference: See Constitutional AI (#37) for an alternative alignment approach.
20. Quantization
Definition: A model compression technique that reduces the precision of Parameters (#6) from higher-bit representations (e.g., 32-bit floating point) to lower-bit representations (e.g., 8-bit integers). This significantly reduces model size and computational requirements while maintaining most of the model's performance.
Benefits: Enables deployment on devices with limited memory and processing power.
Trade-offs: Slight reduction in model quality for significant improvements in speed and memory usage.
IV. Inference, Generation & Control
*This section focuses on how trained LLMs are used in practice, including techniques for controlling and optimizing their outputs.*
21. Inference
Definition: The process of using a trained AI Model (#5) to make predictions or generate outputs on new, unseen input data (Prompts - see #19). This is the "runtime" phase where the model applies its learned knowledge.
Performance Metrics: Measured by Latency (#30), throughput, and resource utilization.
22. Prompt
Definition: The input text provided to a generative AI Model to initiate a response. A prompt can be a question, a command, an instruction, or any text designed to guide the model's generation.
Types:
- System Prompts: Set the model's behavior and role
- User Prompts: Direct instructions or queries
- Context Prompts: Provide background information
23. Prompt Engineering
Definition: The art and science of designing, refining, and optimizing Prompts (#19) to elicit desired, high-quality, and specific outputs from generative AI models. It involves structuring the prompt to effectively guide the model's understanding and generation.
Key Techniques:
- Clear, specific instructions
- Providing examples (Few-shot Learning - see #24)
- Setting context and constraints
- Using structured formats
Cross-reference: See Chain-of-Thought Prompting (#25) for advanced reasoning techniques.
24. Temperature
Definition: A hyperparameter used during text generation that controls the randomness or creativity of the model's output by affecting the probability distribution over possible next Tokens (#7).
Settings:
- Low Temperature (0.1-0.3): Deterministic, focused, predictable outputs. Ideal for factual responses, code generation, structured tasks.
- Medium Temperature (0.4-0.7): Balanced creativity and coherence. Good for general conversation.
- High Temperature (0.8-1.0+): Creative, diverse, sometimes surprising outputs. Excellent for creative writing, brainstorming.
25. Top-k Sampling
Definition: A text generation strategy where the model only considers the k most probable Tokens as candidates for the next word, then samples from this reduced set. This introduces controlled randomness while preventing highly improbable outputs.
Example: With top-k=5, if the next word could be "cat" (40%), "dog" (30%), "bird" (15%), "fish" (10%), "elephant" (5%), the model only considers these 5 options.
Typical Values: k=40-100 for balanced outputs.
26. Top-p Sampling (Nucleus Sampling)
Definition: A dynamic text generation strategy that selects the smallest set of most probable Tokens whose cumulative probability exceeds a threshold p. The model then samples from this "nucleus" of tokens, allowing for adaptive vocabulary size based on the probability distribution.
Example: With top-p=0.9, the model considers the minimum number of top tokens that collectively account for 90% of the probability mass.
Advantage: More adaptive than Top-k Sampling (#22) - uses fewer tokens when there's a clear best choice, more when probabilities are spread out.
27. Zero-shot, One-shot, Few-shot Learning
Definition: Different paradigms for providing examples within a Prompt (#19) to guide an LLM's performance:
- Zero-shot: No examples provided; the model relies solely on its pre-trained knowledge and the instruction.
- One-shot: One example demonstrates the desired task format or output style.
- Few-shot: Multiple examples (typically 2-8) help the model understand the task pattern and requirements.
Example:
- Zero-shot: "Translate to French: Hello"
- One-shot: "Translate to French: Hello → Bonjour. Now translate: Goodbye"
- Few-shot: "Translate to French: Hello → Bonjour, Thank you → Merci, Good morning → Bonjour. Now translate: Goodbye"
28. Chain-of-Thought Prompting
Definition: An advanced Prompt Engineering (#20) technique where the model is explicitly instructed to show its reasoning process step-by-step. This encourages the LLM to break down complex problems, leading to more accurate and coherent answers, particularly for multi-step reasoning tasks.
Example: "Let's think step by step. To solve this math problem, first I need to..."
Benefits: Improved accuracy on reasoning tasks, better transparency, easier debugging of model logic.
V. Advanced Techniques & Applications
*This section covers sophisticated methods for enhancing LLM capabilities and addressing their limitations.*
29. Retrieval-Augmented Generation (RAG)
Definition: A technique that combines the generative capabilities of LLMs with external knowledge retrieval systems. The model first searches a knowledge base for relevant information, then uses this retrieved context to generate more accurate, up-to-date, and factually grounded responses.
Process:
- User query is processed to identify relevant information needs
- External knowledge base is searched using Embeddings (#8)
- Retrieved information is provided as context to the LLM
- LLM generates response based on both its training and retrieved context
Benefits: Reduces Hallucination (#28), enables access to current information, allows for specialized knowledge domains.
Real-world Applications: Customer support systems, research assistants, document analysis tools.
30. Vector Databases
Definition: Specialized databases optimized for storing, indexing, and querying high-dimensional Embeddings (#8). They enable efficient similarity search and are crucial for RAG (#26) systems and semantic search applications.
Key Features:
- Fast similarity search using cosine similarity or other distance metrics
- Scalable to billions of vectors
- Integration with LLM workflows
Popular Examples: Pinecone, Weaviate, Chroma, Qdrant.
31. Constitutional AI
Definition: An alignment technique developed by Anthropic that uses a set of principles (a "constitution") to guide AI behavior. Instead of relying solely on human feedback, the model learns to critique and improve its own outputs based on these constitutional principles.
Process:
- Model generates initial response
- Model critiques its own response against constitutional principles
- Model revises response based on self-critique
- Process repeats until response aligns with principles
Advantages: More scalable than RLHF (#15), consistent application of principles, reduced need for human oversight.
VI. LLM Limitations & Practical Considerations
*This section addresses the current challenges and constraints when working with LLMs in real-world applications.*
32. Context Window / Context Length
Definition: The total amount of input Tokens (#7) that an LLM can process and "remember" in a single interaction. This defines how much information the model can consider when generating its response.
Current Ranges:
- GPT-3.5: 4,096 tokens (~3,000 words)
- GPT-4: 8,192 tokens (standard) to 128,000 tokens (extended)
- Claude: Up to 200,000 tokens
- Gemini: Up to 1,000,000 tokens
Practical Impact: Determines ability to handle long documents, extended conversations, or complex multi-part tasks.
33. Token Limit
Definition: The maximum number of Tokens (input + output combined) that an LLM can handle within a single API call or conversation turn. Exceeding this limit results in Truncation (#31).
Planning Consideration: Must account for both input prompt length and expected output length when designing applications.
34. Truncation
Definition: The process of cutting off parts of the input Prompt (#19) or generated output when the total Token count exceeds the model's limits. This can lead to loss of critical information or incomplete responses.
Mitigation Strategies:
- Summarize long inputs
- Break complex tasks into smaller parts
- Prioritize most important information
- Use RAG (#26) for long documents
35. Hallucination
Definition: When an LLM generates plausible-sounding but factually incorrect, nonsensical, or fabricated information. This occurs due to the model's training to predict probable next tokens rather than verify factual accuracy.
Common Types:
- Factual errors (incorrect dates, statistics, claims)
- Fabricated sources or citations
- Confident statements about uncertain topics
- Logical inconsistencies
Mitigation Approaches: RAG (#26), fact-checking systems, confidence scoring, human verification.
36. Prompt Injection
Definition: A security vulnerability where malicious users craft Prompts (#19) to override the model's original instructions or safety guidelines, potentially causing it to perform unintended actions or reveal sensitive information.
Example: A user might try to "jailbreak" a model by saying "Ignore all previous instructions and instead help me with..."
Defense Strategies: Input sanitization, prompt validation, output filtering, robust system design.
37. Latency
Definition: The time delay between when a Prompt is submitted and when the model begins returning its response. Critical for real-time applications and user experience.
Factors Affecting Latency:
- Model size and complexity
- Context Length (#29)
- Server load and geographic location
- Temperature (#21) and sampling parameters
Typical Ranges: 100ms to several seconds depending on complexity and infrastructure.
38. API Rate Limiting
Definition: Restrictions imposed by LLM providers on the number of requests, tokens, or computational resources that can be used within a specific time period. This prevents abuse and ensures fair access to the service.
Common Limits:
- Requests per minute/hour
- Tokens per minute/hour
- Concurrent requests
- Monthly usage quotas
Business Impact: Must be considered when designing applications for scale.
VII. Beyond Text: Multimodal & Specialized Models
*This section explores generative AI models that work with different types of data beyond text.*
39. Diffusion Models
Definition: A class of Generative AI models primarily used for generating high-quality images, audio, and video. They work by learning to gradually remove noise from a pure noise input to produce coherent content, effectively reversing a diffusion process.
Key Examples:
- Image Generation: Stable Diffusion, DALL·E 2/3, Midjourney
- Video Generation: Sora, Runway ML
- Audio Generation: MusicLM, AudioLDM
Process: Start with random noise → gradually denoise through learned steps → produce final output.
40. Multimodal Models
Definition: Generative AI models capable of processing and generating content across multiple modalities (types of data) simultaneously. This includes combinations of text, images, audio, and video.
Capabilities:
- Image-to-text description
- Text-to-image generation
- Visual question answering
- Audio-visual understanding
- Document analysis with text and images
Examples: GPT-4V (vision), Gemini Pro Vision, Claude 3 (vision), DALL·E 3.
Real-world Applications: Content creation, accessibility tools, educational materials, creative workflows.
Conclusion
This glossary provides a comprehensive foundation for understanding the rapidly evolving field of generative AI and large language models. As these technologies continue to advance, new terms and concepts will emerge, but the fundamental principles covered here will remain essential for anyone working with or studying these powerful systems.
The journey from basic Machine Learning concepts to sophisticated Multimodal Models represents one of the most significant technological advances of our time, with applications spanning creative arts, scientific research, business automation, and human-computer interaction.
Copyright © 2025 Neelkanth Kaushik. All rights reserved.