public•

100 GenAI interview

100 terms•By guiem

Top 10 EloNo ratings yet

Play a challenge match to create ratings.

Study flashcards Learn mode Test mode

Terms in this set

Click study modes above to mirror Quizlet flow.

What is generative AI?

AI that can create new content such as text, images, or code

How does generative AI differ from discriminative AI?

Generative models learn p(x, y), discriminative learn p(y|x)

What is a Large Language Model (LLM)?

A neural network trained on large text corpora to model language

Which architecture underpins most state-of-the-art LLMs?

Transformers

What does the self-attention mechanism do in Transformers?

Computes weighted interactions between all token pairs in a sequence

Why are positional encodings used in Transformers?

To provide token order information

What is tokenization in LLMs?

Splitting text into subword units or tokens

What is the context window of an LLM?

The maximum number of tokens the model can consider in one prompt

What does 'temperature' control in LLM sampling?

Randomness of the output distribution

What is 'top-k' or 'top-p' sampling used for?

Restricting sampling to the most likely tokens to control diversity

What is Retrieval-Augmented Generation (RAG)?

A method that combines retrieval from external data with generation by an LLM

Which components are core to a RAG system?

Retriever, knowledge store, and generator

Why use RAG instead of relying solely on an LLM’s internal knowledge?

To ground answers in up-to-date, domain-specific data

What is typically stored in the knowledge store for RAG?

Documents or chunks with associated embeddings

What does the retriever do in RAG?

Fetches relevant documents given a query embedding

Which type of database is commonly used for RAG similarity search?

Vector database or embedding index

What is 'chunking' in RAG pipelines?

Splitting documents into smaller text segments for embedding

Why is chunk size important in RAG?

It trades off retrieval granularity vs. context completeness

What is 'query rewriting' in advanced RAG systems?

Rephrasing or expanding the query to improve retrieval quality

Which technique improves RAG by re-ranking retrieved documents?

Cross-encoder re-ranking

What is an 'AI agent' in the LLM ecosystem?

A system where an LLM can perceive context, plan, call tools, and act in a loop

What does 'tool use' or 'function calling' enable for LLM agents?

Calling external APIs, databases, or functions from model outputs

Which is a common pattern for an LLM agent loop?

Observe → Plan → Act → Observe

What is a 'multi-agent' system in GenAI?

Multiple specialized agents collaborating or competing on tasks

Why are tools important for production agents?

They let agents access real-time data, systems, and side effects

What is a 'planner' in an agent architecture?

A component that breaks goals into steps or sub-tasks

What is a 'memory' module for agents?

Structured store for past interactions, facts, or state

Short-term conversational memory is typically implemented using:

Appending recent messages into the context window

Long-term memory for agents is often stored in:

A vector store or database

Which is a risk of unconstrained autonomous agents?

Unbounded tool calls, costs, and harmful or unintended actions

What is prompt engineering?

Designing and structuring inputs to steer LLM behavior

Which is an example of a prompt engineering technique?

Few-shot prompting with examples

What is chain-of-thought (CoT) prompting?

Asking the model to show intermediate reasoning steps

Why do many providers restrict explicit chain-of-thought output?

To reduce risk of over-reliance, leakage, and misinterpretation of internal reasoning

What is 'system message' vs 'user message' in chat-based LLM APIs?

System sets high-level behavior, user provides the query/task

What is hallucination in LLMs?

The model generating confident but factually incorrect content

Which strategy helps reduce hallucinations?

RAG with grounded retrieval and citation

What is a safety or guardrail layer in GenAI systems?

A post-processing or filtering step to block unsafe or disallowed outputs

What is jailbreak testing?

Trying to bypass or defeat safety constraints of LLMs

Which is a common evaluation approach for GenAI answers?

Human evaluation or model-based rubric scoring

Which metric is often used for text similarity in GenAI evaluation?

BLEU / ROUGE / BERTScore

What is model-based evaluation in LLM systems?

Using another model (or the same) as a 'judge' to rate outputs

What is instruction tuning?

Fine-tuning a model on (instruction, response) pairs to follow instructions better

What is supervised fine-tuning (SFT) in LLM training?

Training the model on curated input-output examples with cross-entropy loss

What is RLHF (Reinforcement Learning from Human Feedback)?

Reinforcement learning where human preferences guide a reward model

What is DPO (Direct Preference Optimization) conceptually used for?

Directly aligning model outputs with preference pairs without an explicit reward model

What is LoRA (Low-Rank Adaptation) used for?

Parameter-efficient fine-tuning by adding low-rank matrices to weight updates

Why is parameter-efficient fine-tuning popular?

It allows adapting large models with fewer trainable parameters and lower cost

What is knowledge distillation in the context of LLMs?

Training a smaller student model to imitate a larger teacher model

What is the main idea behind Mixture-of-Experts (MoE) LLMs?

Multiple expert subnetworks where only a subset is activated per token

Why are MoE architectures attractive for scaling?

They allow very large parameter counts while keeping per-token compute manageable

What is a 'small language model' (SLM) in current discussions?

Any model under 10B parameters, optimized for on-device or low-latency use

Which is a key tradeoff between large LLMs and SLMs?

SLMs tend to be faster/cheaper but may have lower general capability

What is a diffusion model in generative AI?

A model that iteratively denoises random noise to generate images or other data

Which generative paradigm is most associated with image generation today?

Diffusion models and latent diffusion

What is a Generative Adversarial Network (GAN)?

Two models, generator and discriminator, trained in opposition

What is a multimodal model?

A model that handles multiple data types such as text, images, or audio together

Which is an example use case of a vision-language model?

Image captioning or visual question answering

What is 'few-shot' learning with LLMs?

Providing a few examples in the prompt to steer behavior without weight updates

What is 'zero-shot' capability in LLMs?

Model can perform tasks without explicit task-specific training or examples

Why is observability important in GenAI applications?

To understand prompts, outputs, costs, and failures for monitoring and debugging

What is a 'prompt log' in production systems?

Structured record of prompts, model versions, and outputs

What is response caching for LLMs?

Caching outputs for repeated prompts or semantically similar queries

Why is cost control critical for GenAI deployments?

LLM APIs and GPU inference can be expensive at scale

Which approach commonly reduces GenAI inference costs?

Routing easy queries to cheaper or smaller models (model routing)

What is 'model routing' or 'model cascade'?

Using smaller/cheaper models first and escalating to more powerful ones when needed

Why is PII handling important in GenAI systems?

To comply with privacy regulations and avoid leaking sensitive data

Which technique helps protect sensitive user data in logs?

Redaction or pseudonymization of PII

What is 'drift' in the context of GenAI systems?

Change in data distribution, user behavior, or model versions affecting performance

Why is versioning of prompts and models important?

To reproduce behavior, debug issues, and compare performance across changes

What is an orchestration framework like LangChain or LlamaIndex used for?

Managing pipelines of prompts, tools, retrieval, and models in GenAI apps

In LangChain terminology, what is a 'Chain'?

A sequence or graph of calls (LLMs, tools, retrievers) composed into a pipeline

What is the role of a 'Tool' in LangChain-style frameworks?

External function or API that the LLM/agent can invoke

What is 'guardrailing' in GenAI frameworks?

Enforcing safety, compliance, or formatting rules on LLM inputs and outputs

What is a 'stateful agent'?

Agent whose behavior depends on stored state or memory across turns

Why might you use a graph-based workflow (DAG) for GenAI pipelines?

To model complex branching, dependencies, and parallel steps

Which is a typical risk when connecting agents to powerful tools (e.g., shell, database writes)?

Potential destructive actions, data loss, or security issues

What is 'tool grounding'?

Verifying that tool calls and parameters are well-formed and safe before execution

Why are eval harnesses (automated tests) important for GenAI apps?

They provide repeatable checks that changes in prompts/models don’t regress behavior

What is a 'golden set' or 'eval set' in GenAI evaluation?

Curated set of inputs with expected or reference outputs to test system quality

Which of the following is an example of an LLM 'judge' pattern?

One LLM grades or critiques another LLM’s answer using a rubric

Why is deterministic behavior sometimes desired in GenAI APIs?

To ensure reproducible outputs for testing and compliance

How can you increase determinism for an LLM call?

Set temperature to 0 and disable sampling randomness

Why do many GenAI systems use hybrid search (sparse + dense)?

To combine keyword and semantic similarity for better retrieval

What is 're-ranking' in RAG or search?

Reordering retrieved documents using a more expensive scoring model

Why is grounding with citations valuable in GenAI responses?

It helps users verify sources and trust the answer

Which is a common ethical risk of deploying GenAI?

Amplifying bias, misinformation, or privacy leaks at scale

What is watermarking in the context of generative AI outputs?

Embedding hidden signals in outputs to indicate they are AI-generated

Why is domain adaptation important for enterprise GenAI?

Enterprises need models aligned to their domain vocabulary, workflows, and policies

Which is generally the lowest-risk way to adapt a base LLM to a domain?

RAG plus careful prompt design and safety filters

What is 'toolformer'-style training?

Training models to decide when and how to call tools during generation

Why are latency and throughput trade-offs important in GenAI APIs?

They determine user experience and how many requests can be served per second

What is streaming output from an LLM?

Sending tokens incrementally as they are generated to reduce perceived latency

Which technique helps reduce context window usage in long conversations?

Summarizing earlier turns into shorter context

What is 'semantic caching'?

Caching based on embedding similarity so semantically similar queries reuse answers

Why do some systems separate 'orchestration layer' from 'model provider'?

To allow multi-model, multi-provider routing, observability, and safety in one place

What is a 'safety policy' in GenAI systems?

Set of rules describing allowed and disallowed content or behavior

Why is continuous evaluation important after deploying a GenAI feature?

User behavior, data, or provider models can change over time impacting quality and risk

What is a 'safety sandwich' pattern?

Wrapping LLM calls with pre- and post-safety filters or checkers

100

Which of the following best describes a robust GenAI application architecture today?

Orchestration layer with RAG, tools/agents, safety, evals, logging, and multiple model backends