Database Administrator (DBA) Interview Preparation Guide

Prepared for: Scalong AI — Junior DBA Role

This document covers all theoretical concepts you need to understand before your DBA interview. It is written for a fresher/intern level and explains each topic in depth with clear explanations.

1. SQL Server Administration

What is SQL Server?

Microsoft SQL Server is a Relational Database Management System (RDBMS) developed by Microsoft. It stores, retrieves, and manages structured data using the SQL language. SQL Server is widely used in enterprise environments and is the core tool you will use as a DBA at Scalong AI.

Think of SQL Server as a highly organized warehouse. You are the warehouse manager (DBA) — responsible for making sure everything is stored correctly, accessible quickly, and never lost.

1.1 SQL Server Installation

When you install SQL Server, you are setting up the database engine on a Windows (or Linux) server. During installation, several key decisions are made:

Instance Types:

Default Instance — Only one can exist on a server. Accessed simply by the server name (e.g., MYSERVER).
Named Instance — Multiple can exist. Accessed by server name + instance name (e.g., MYSERVER\SQLEXPRESS).

SQL Server Editions:

Edition	Use Case
Enterprise	Large production environments, full features
Standard	Mid-sized businesses
Developer	Free for development/testing only, not production
Express	Free, limited features — good for learning

Services Installed:

SQL Server Database Engine — The core engine that processes queries and manages data.
SQL Server Agent — A job scheduler that runs automated tasks (backups, maintenance).
SQL Server Browser — Helps clients find the correct SQL Server instance on the network.

Key Directories Created:

Data Directory — Where .mdf (primary data file) and .ndf (secondary data file) are stored.
Log Directory — Where .ldf (transaction log file) is stored.
Backup Directory — Default location for backup files.

1.2 Authentication Modes

SQL Server supports two authentication modes:

Windows Authentication Mode (Recommended)

Users log in using their Windows (Active Directory) credentials.
SQL Server trusts the Windows operating system to verify the identity.
Considered more secure because passwords are managed centrally by Windows.
Example: A domain user COMPANY\john.doe can access SQL Server without a separate SQL password.

Mixed Mode Authentication

Allows both Windows Authentication AND SQL Server Authentication.
SQL Server Authentication means a separate username and password is stored inside SQL Server itself (not Windows).
The built-in sa (system administrator) account uses SQL Server Authentication.
Useful when connecting from non-Windows applications or external systems.

Interview Tip: Always say Windows Authentication is preferred for security. Mixed Mode is used for compatibility reasons with older applications.

1.3 SQL Server Databases — System vs User

System Databases (Created automatically during installation):

Database	Purpose
`master`	Stores all system-level information: logins, linked servers, server configuration, and the list of all databases. If this is corrupted, SQL Server cannot start.
`model`	A template database. Every new database you create is based on a copy of `model`. If you add a table to `model`, every new database will have that table.
`msdb`	Used by SQL Server Agent to store job schedules, backup history, and alert information.
`tempdb`	A temporary workspace. Used for temporary tables, sorting operations, and intermediate query results. Recreated fresh every time SQL Server restarts.

User Databases: Databases you create for applications. For example, a company might have a database called SalesDB or CustomerDB.

1.4 Creating Databases and Understanding Database Files

A SQL Server database consists of at least two files:

MDF — Primary Data File

Contains the actual data and database objects (tables, indexes, procedures).
Every database has exactly one MDF file.

NDF — Secondary Data File

Optional additional data files. Used when you want to spread data across multiple disks for performance.
A database can have zero or many NDF files.

LDF — Log File

Contains the transaction log.
Records every change made to the database before it is written to the data file.
Critical for recovery and rollback operations.
Every database must have at least one LDF file.

Why does this matter? As a DBA, you must place these files on the correct disks. Typically:

Data files (MDF/NDF) go on fast storage (SSD).
Log files (LDF) go on a separate disk to avoid I/O contention.

1.5 Creating Users and Roles

SQL Server has a two-layer security model:

Layer 1 — Login (Server Level)

A login is an account that can connect to the SQL Server instance.
Logins are created at the server level, not at the database level.
Types: Windows Login (domain account) or SQL Server Login (username + password inside SQL Server).

Layer 2 — User (Database Level)

A user is mapped to a login and exists inside a specific database.
A login must be mapped to a user in each database it needs to access.
Example: Login john_doe maps to User john_doe in SalesDB.

Roles — Grouping Permissions Instead of giving permissions to each user individually, you use roles.

Fixed Server Roles (affect the whole server):

Role	Permissions
`sysadmin`	Full control over the entire server
`dbcreator`	Can create and alter databases
`securityadmin`	Can manage logins and server security
`serveradmin`	Can configure server-wide settings

Fixed Database Roles (affect one database):

Role	Permissions
`db_owner`	Full control over the database
`db_datareader`	Can read all data (SELECT only)
`db_datawriter`	Can insert, update, delete data
`db_ddladmin`	Can create and modify database objects

Interview Tip: A common interview question is “How do you give a user read-only access to a database?” Answer: Create a login, create a user mapped to that login, then add the user to the db_datareader role.

1.6 Backup and Restore

Backup and Restore is one of the most important DBA responsibilities. If data is lost due to hardware failure, human error, or disaster, backups are what save the company.

Types of SQL Server Backups:

Full Backup

A complete copy of the entire database at a specific point in time.
Largest backup size.
Takes the longest time.
Foundation for all other backup types.
Example: You take a full backup every Sunday night at 11:00 PM.

Differential Backup

Backs up only the data that has changed since the last full backup.
Smaller and faster than a full backup.
To restore: You need the last full backup + the last differential backup.
Example: You take differential backups every night Monday through Saturday.

Transaction Log Backup

Backs up the transaction log — the record of every change made to the database.
Allows point-in-time recovery (restore to an exact time, e.g., “restore to 2:37 PM before the accidental deletion”).
Only available when the database is in Full Recovery Model (see below).
Example: You take log backups every 15 minutes throughout the day.

The Recovery Models:

Model	Description	Log Backups?	Point-in-Time Recovery?
Simple	Log is automatically truncated. No log backups possible.	No	No
Full	Log is kept until backed up. Full point-in-time recovery possible.	Yes	Yes
Bulk-Logged	Like Full but logs bulk operations minimally for performance.	Yes	Partial

Interview Tip: Production databases should always be in Full Recovery Model so you can recover to any point in time.

Backup Strategy Example (3-2-1 Rule):

3 copies of data (original + 2 backups)
2 different storage types (disk + tape or cloud)
1 copy offsite

Restoration Process:

Restore the most recent Full Backup (using NORECOVERY mode — leaves the database in a restoring state to accept more backups).
Apply the most recent Differential Backup (using NORECOVERY).
Apply Transaction Log Backups in order from oldest to newest.
On the final restore, use RECOVERY mode to bring the database online.

1.7 Monitoring Performance

As a DBA, you must constantly watch the health of your databases.

Key things to monitor:

Metric	Why It Matters
CPU Usage	High CPU may indicate inefficient queries or too many concurrent connections
Memory Usage	SQL Server uses RAM aggressively; low memory causes slow performance
Disk I/O	Slow disk reads/writes cause query delays
Blocking & Deadlocks	Queries waiting on each other can freeze applications
Wait Statistics	Shows what SQL Server is spending time waiting for
Active Connections	Too many connections can exhaust server resources

SQL Server Dynamic Management Views (DMVs): DMVs are special system views in SQL Server that provide real-time information about server health and performance.

Key DMVs to know:

sys.dm_exec_requests — Shows currently running queries.
sys.dm_os_wait_stats — Shows what SQL Server is waiting on (disk, CPU, locks).
sys.dm_exec_query_stats — Shows execution statistics for cached query plans.
sys.dm_db_index_usage_stats — Shows which indexes are being used (or not).

1.8 Managing Disk Space

Databases grow over time. As a DBA, you must manage disk space proactively.

Key concepts:

Database Auto Growth

SQL Server can automatically expand a data or log file when it runs out of space.
This is helpful but can cause performance problems if the auto-growth events happen frequently (each expansion causes a pause).
Best practice: Set files to an appropriate initial size and grow in large fixed increments (e.g., 256 MB), not percentages.

Shrinking Files

You can shrink a database file to reclaim disk space.
However, shrinking is generally discouraged because it causes index fragmentation and the file will just grow again.
Only shrink after a one-time large deletion of data that won’t recur.

Log File Growth

The transaction log grows until it is backed up (in Full Recovery Model).
If you forget to take log backups, the log file will fill the disk — causing the entire database to stop accepting writes.
Monitoring log space usage is critical.

1.9 Managing Indexes

Indexes are database structures that speed up data retrieval. Over time, indexes become fragmented (their internal structure becomes disorganized), which slows down queries.

Index Fragmentation:

Internal Fragmentation — Pages inside the index are not full.
External Fragmentation — Pages are not stored in logical order on disk.

Measuring Fragmentation: Use the DMV sys.dm_db_index_physical_stats to check fragmentation percentage.

Fixing Fragmentation:

Fragmentation Level	Action
< 10%	No action needed
10% – 30%	REORGANIZE (online, less resource-intensive)
> 30%	REBUILD (offline or online, fixes fragmentation completely)

Index Maintenance Jobs: DBAs schedule regular maintenance jobs (using SQL Server Agent) to rebuild/reorganize indexes automatically, usually on weekends or low-traffic periods.

2. Performance Tuning

Performance tuning is the process of making your database queries and server run faster and more efficiently. This is one of the most valued skills of a DBA.

2.1 Query Execution Plans

When you run a SQL query, SQL Server does not just immediately execute it. It first compiles the query and creates an execution plan — a detailed step-by-step map of exactly how it will retrieve the data.

Why Execution Plans Matter: The execution plan shows you exactly where time and resources are being spent. It can reveal that a query is doing a slow full table scan instead of using an index, or that it is sorting millions of rows unnecessarily.

Types of Execution Plans:

Estimated Execution Plan

Generated without running the query.
Based on SQL Server’s statistics about the data.
Useful for checking what SQL Server plans to do before running an expensive query.

Actual Execution Plan

Generated after running the query.
Shows real runtime statistics (actual rows processed, actual time taken).
More accurate and useful for diagnosing problems.

Reading an Execution Plan: Plans are read right to left, top to bottom. The most expensive operations are shown with the highest percentage cost.

Common Operations in Execution Plans:

Operation	Description	Performance
Index Seek	Uses an index to find specific rows directly	Fast ✅
Index Scan	Reads the entire index	Moderate
Table Scan	Reads the entire table page by page	Slow ❌
Key Lookup	Fetches extra columns from the main table after an index seek	Can be slow
Hash Match	Used for joining large datasets without matching indexes	Memory-intensive
Nested Loops	Efficient for small datasets in joins	Good for small data
Sort	Sorts data in memory or on disk	Slow if on disk

Interview Tip: If you see a “Table Scan” on a large table in an execution plan, that is a red flag. It means no index exists for that query. Creating an appropriate index can dramatically improve performance.

2.2 Index Optimization

What is an Index? An index is a separate data structure that SQL Server maintains alongside a table. It allows SQL Server to find rows quickly without reading the entire table — similar to the index at the back of a textbook.

Types of Indexes:

Clustered Index

Defines the physical order of data in the table.
A table can have only ONE clustered index.
The actual table data is stored in the leaf level of the clustered index.
Usually created on the Primary Key column.
Think of it as a phone book sorted alphabetically by last name — the data itself is sorted.

Non-Clustered Index

A separate structure that contains the indexed column(s) and a pointer back to the actual data row.
A table can have many non-clustered indexes (up to 999 in SQL Server).
Think of it as the index at the back of a book — it points you to the page but is separate from the content.

Unique Index

Enforces uniqueness on the indexed column(s). No two rows can have the same value.
Automatically created when you define a UNIQUE constraint or PRIMARY KEY constraint.

Composite Index

An index on multiple columns.
The order of columns matters: the index is most effective when queries filter on the leading column first.
Example: An index on (LastName, FirstName) helps queries filtering by LastName or by LastName + FirstName, but NOT queries filtering only by FirstName.

Covering Index (Include Columns)

A non-clustered index that includes all columns needed by a query in the index itself.
This avoids the need for a “Key Lookup” back to the main table, making queries much faster.
In SQL Server: CREATE INDEX idx_name ON table(col1) INCLUDE (col2, col3)

When NOT to Over-Index:

Every index speeds up SELECT queries but slows down INSERT, UPDATE, DELETE operations because SQL Server must maintain all indexes when data changes.
Too many indexes wastes disk space and memory.
DBAs regularly review “unused indexes” (via sys.dm_db_index_usage_stats) and remove ones that are never used.

2.3 Query Optimization

Beyond indexes, the way a query is written affects its performance significantly.

Common Query Performance Problems:

Non-SARGable Queries (Search ARGument NOT able)

A query is non-SARGable when it cannot use an index effectively because of how the WHERE clause is written.
Example of non-SARGable (bad): WHERE YEAR(OrderDate) = 2024 — SQL Server cannot use an index on OrderDate because a function is applied.
Example of SARGable (good): WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01' — SQL Server can use an index.

Implicit Data Type Conversion

When comparing columns of different data types, SQL Server converts one type to match the other.
This conversion often prevents index use and slows queries.
Example: comparing a VARCHAR column with an INT value.

SELECT * (Selecting All Columns)

Fetching all columns when only a few are needed wastes memory and network bandwidth.
Always specify only the columns you need.

Missing Statistics

SQL Server uses statistics (histograms of data distribution) to estimate how many rows a query will return and choose the best execution plan.
Outdated statistics lead to bad execution plans.
DBAs schedule automatic statistics updates as part of maintenance.

Parameter Sniffing

SQL Server caches execution plans for stored procedures based on the first set of parameters used.
If the first execution was with a rare value, the cached plan may be poor for typical values.
Solution: Use OPTION (RECOMPILE) or OPTIMIZE FOR hints.

2.4 Detecting Blocking and Deadlocks

Blocking: Blocking occurs when one query holds a lock on a resource (a table or row) and another query must wait for that lock to be released before it can proceed.

Short blocking (milliseconds) is normal and acceptable.
Long blocking (seconds or minutes) indicates a problem — users see the application “hang.”
Blocking is caused by transactions that hold locks too long.

How to Detect Blocking:

sys.dm_exec_requests — Shows blocking_session_id if a session is being blocked.
SQL Server Activity Monitor — Visual tool in SSMS showing blocking chains.
Extended Events — Capture blocking events automatically.

How to Resolve Blocking:

Identify and kill (using KILL <session_id>) a blocking session as an emergency measure.
Long-term fix: Optimize the blocking query or reduce transaction duration.

Deadlocks: A deadlock is a specific situation where two (or more) sessions are each waiting for the other to release a lock — neither can proceed.

Example:

Session A holds a lock on Table 1 and wants Table 2.
Session B holds a lock on Table 2 and wants Table 1.
Both are stuck forever.

SQL Server automatically detects deadlocks and kills one of the sessions (the “deadlock victim”), allowing the other to proceed. The killed session receives Error 1205.

How to Detect Deadlocks:

Trace Flag 1222 — Writes deadlock details to the SQL Server error log.
Extended Events — system_health session captures deadlock graphs automatically.
Deadlock Graph — Visual representation showing exactly which queries and resources were involved.

Preventing Deadlocks:

Access tables in the same order in all transactions.
Keep transactions short — do not hold locks longer than necessary.
Use appropriate isolation levels.

2.5 Monitoring CPU and Memory

CPU:

SQL Server is a CPU-intensive application.
High CPU can be caused by: inefficient queries, missing indexes, too many compilations/recompilations, or insufficient parallelism settings.
Check sys.dm_os_ring_buffers and sys.dm_exec_query_stats for high-CPU queries.

Memory:

SQL Server uses as much RAM as the operating system allows (by default).
You must configure Max Server Memory setting to leave some RAM for Windows and other processes. If you do not, Windows itself may run out of memory and crash.
SQL Server uses memory for the Buffer Pool — caching data pages in RAM so it does not have to read from disk repeatedly.
A high Page Life Expectancy (PLE) (above 300 seconds as a baseline) means data is staying in memory — good.
A low PLE means data is being constantly evicted from memory — SQL Server is reading from disk too often — bad.

2.6 SQL Server Profiler

SQL Server Profiler is a graphical tool used to trace and capture SQL Server activity in real time or to a file.

What Profiler Can Capture:

All queries being executed against the server
Stored procedure calls
Login and logout events
Deadlock events
Long-running queries (queries exceeding a duration threshold)

When to Use It:

Identifying which queries are consuming the most resources
Capturing a workload for replay or analysis
Troubleshooting intermittent slow performance

Important Note: SQL Server Profiler has high overhead and can slow down the server. In production, Extended Events is the preferred, lighter-weight alternative for tracing. However, Profiler is easier to learn and still valuable for development/testing environments.

3. High Availability (HA) Concepts

High Availability means designing your database infrastructure so that the database remains accessible even if one server or component fails. Downtime can cost companies millions — HA is critical for production systems.

3.1 Replication

What is Replication? Replication is the process of copying and distributing data from one SQL Server (Publisher) to one or more other SQL Servers (Subscribers). The goal is to keep data synchronized across multiple locations.

Key Roles in Replication:

Publisher — The source database that has the data you want to replicate.
Distributor — An intermediary server (or the same as the Publisher) that stores the replicated changes temporarily before sending to Subscribers.
Subscriber — The destination server(s) that receive the replicated data.
Publication — A collection of articles (tables, stored procedures, etc.) being replicated.
Article — An individual object (e.g., one table) included in a publication.

Types of Replication:

1. Snapshot Replication

Takes a complete “snapshot” (full copy) of the published data at a scheduled time and delivers it to the Subscriber.
The Subscriber replaces all its data with the fresh snapshot.
Best for: Data that does not change frequently and where subscribers can tolerate having slightly old data.
Examples: Reference tables, lookup data, product catalogs.
Downside: High bandwidth usage for large datasets. Subscribers have stale data between snapshots.

2. Transactional Replication

Starts with an initial snapshot, then continuously delivers individual data changes (INSERT, UPDATE, DELETE transactions) from Publisher to Subscriber.
Changes are captured from the Publisher’s transaction log and sent to Subscribers with very low latency (near real-time).
Best for: Highly active databases where subscribers need up-to-date data.
Examples: Reporting servers that need current data, load balancing reads to a secondary server.
Downside: One-directional by default (Publisher → Subscriber). The Subscriber is usually read-only.

3. Merge Replication

Both the Publisher and Subscriber can make changes to the data independently.
Changes from both sides are periodically synchronized and merged together.
Conflict resolution rules handle cases where the same row was changed differently on both sides.
Best for: Mobile applications, disconnected users who sync periodically.
Examples: Field sales reps with laptops that sync to headquarters when connected.
Downside: Conflict resolution is complex. Performance overhead.

3.2 Log Shipping

What is Log Shipping? Log Shipping is a method of maintaining a warm standby database by automatically backing up the transaction log from a primary database and restoring it on one or more secondary servers.

How It Works (Step by Step):

Backup Job — A SQL Server Agent job on the Primary server takes transaction log backups at a defined interval (e.g., every 15 minutes).
Copy Job — A SQL Server Agent job on the Secondary server copies the log backup files from the Primary server’s shared folder to the Secondary server.
Restore Job — A SQL Server Agent job on the Secondary server restores the copied log backups onto the secondary database.

The Result: The Secondary database is always a few minutes behind the Primary (the delay depends on your backup interval). It is kept in a restoring or standby state.

Modes for the Secondary Database:

No Recovery Mode — The secondary database is in a restoring state. No one can read it. Used when you want the fastest restore operations.
Standby Mode — The secondary database is readable between log restore operations. Users can run read-only queries against it (useful for reporting).

Failover Process: If the Primary server fails:

Restore any remaining log backups on the Secondary (if available).
Bring the Secondary database online (take it out of restoring state).
Redirect application connections to the Secondary server.

Limitations of Log Shipping:

No automatic failover — a DBA must manually perform the failover.
Secondary is always slightly behind the Primary (data loss equal to the last backup interval).
Not a true HA solution; more of a Disaster Recovery (DR) solution.

Monitoring Log Shipping: SQL Server includes the Log Shipping Monitor (viewable in SSMS) that shows whether jobs are running on schedule and alerts you if the secondary is falling too far behind.

3.3 Clustering (Windows Server Failover Clustering)

What is Clustering? A SQL Server Failover Cluster is a group of two or more Windows servers (nodes) that work together to host a single SQL Server instance. If one node fails, another node automatically takes over, keeping the database available.

Key Components:

Nodes The individual servers in the cluster. Typically 2 nodes (Active/Passive) but can be more.

Shared Storage All nodes in the cluster use shared storage (typically a SAN — Storage Area Network). The actual database files (MDF, LDF) live on this shared storage. Only one node accesses the storage at a time.

Windows Server Failover Cluster (WSFC) The underlying Windows technology that enables the cluster. It monitors node health and manages failover.

Quorum A voting mechanism to prevent “split-brain” (two nodes both thinking they are the active node). The cluster requires a majority of votes to continue operating. Votes come from nodes and optionally a “witness” (a disk or file share).

Virtual Network Name (VNN) Applications connect to the cluster using a virtual server name and IP address — not the individual node names. This virtual name always points to the currently active node, so applications automatically connect to the right server without reconfiguration after a failover.

Active/Passive vs Active/Active:

Active/Passive — One node runs SQL Server (Active), the other is on standby (Passive). The passive node uses no resources but is available for failover. Simple but the passive node is “wasted.”
Active/Active — Both nodes run different SQL Server instances simultaneously. More efficient use of hardware. Failover means one node temporarily hosts both instances (must have enough resources).

Failover Process:

WSFC detects that the active node has failed (missed heartbeat).
WSFC selects another node to take ownership of the SQL Server resources.
The new node mounts the shared storage and starts the SQL Server service.
The virtual network name and IP address now point to the new node.
Client applications reconnect (most modern drivers handle this automatically).

Typical failover time: 20–60 seconds.

Limitation: Clustering uses shared storage, meaning if the storage itself fails, all nodes are affected. Clustering protects against server failure but not storage failure.

3.4 Always On Availability Groups (Modern HA)

Note: This is not in your roadmap but is worth knowing as a bonus concept.

Always On Availability Groups (AG) is SQL Server’s modern high availability solution (introduced in SQL Server 2012). It improves on clustering by replicating the database itself to secondary replicas over the network — no shared storage required.

Key Advantages over Clustering:

Secondary replicas can be in different data centers (disaster recovery).
Secondary replicas can serve read-only queries (offload reporting).
Automatic failover available with synchronous replication.
Protects against both server and storage failure.

This is the industry standard for enterprise SQL Server HA today.

4. Windows Server Basics

SQL Server runs on Windows Server operating systems. As a DBA, you must be comfortable navigating and managing Windows Server environments.

4.1 Windows Services

A Windows Service is a background process that runs without user interaction, even when no one is logged into the server.

SQL Server itself runs as a collection of Windows Services:

SQL Server (MSSQLSERVER) — The database engine service. If this stops, no one can access the database.
SQL Server Agent (SQLSERVERAGENT) — Runs scheduled jobs, alerts, and maintenance tasks.
SQL Server Browser — Helps clients discover SQL Server instances on the network.

Managing Services:

Services.msc — The Windows Services management console. You can start, stop, pause, and configure services.
SQL Server Configuration Manager — The preferred tool for SQL Server services. Also manages network protocols and SQL Server aliases.

Service Accounts: Each service runs under a Windows account (service account). The SQL Server service account needs:

Access to the database files on disk.
Network access if connecting to network shares.
Sufficient permissions for backup/restore operations.

Best Practice: Use dedicated, low-privilege service accounts — not the built-in LocalSystem or a domain administrator account.

4.2 Disk Management

Disk management is critical for a DBA because SQL Server performance is heavily tied to disk speed and configuration.

Key Disk Management Concepts:

Disk Types:

HDD (Hard Disk Drive) — Spinning magnetic disk. Slower, cheaper, higher capacity. Used for less critical data or cold storage.
SSD (Solid State Drive) — No moving parts. Much faster random read/write. Used for production database files.
NVMe — Even faster than SATA SSD. Used for highest-performance database environments.

RAID (Redundant Array of Independent Disks): RAID combines multiple physical disks for performance and/or redundancy.

RAID Level	Description	Use Case
RAID 0	Striping — data split across disks. Fast but NO redundancy. If one disk fails, all data is lost.	Not recommended for databases
RAID 1	Mirroring — exact copy on two disks. Full redundancy but only 50% usable space.	Transaction log files
RAID 5	Striping with parity — data + error checking across 3+ disks. Good balance of performance and redundancy.	General data files
RAID 10 (1+0)	Mirroring + Striping. Best performance and redundancy. Requires 4+ disks.	Recommended for production SQL Server data files

Disk Partitioning and Formatting:

Windows uses NTFS file system for SQL Server data.
For best SQL Server performance, format data volumes with a 64KB allocation unit size (instead of the default 4KB). This aligns with SQL Server’s 64KB I/O operations.

Volume Letters and Mount Points: Large environments use mount points instead of drive letters (there are only 26 letters). A mount point is a folder path that represents a separate disk volume.

4.3 Task Scheduler

The Windows Task Scheduler is a built-in tool to run programs, scripts, or commands automatically at specified times or when certain events occur.

DBAs use Task Scheduler for:

Running PowerShell scripts to check disk space and email alerts.
Automating file cleanup (deleting old backup files).
Kicking off batch processes outside of SQL Server Agent.

Note: For SQL Server-specific tasks (backups, index maintenance, statistics updates), SQL Server Agent is preferred over Task Scheduler because it integrates directly with SQL Server and provides better history and alerting.

Key Concepts in Task Scheduler:

Trigger — When the task runs (schedule, system event, login, etc.).
Action — What the task does (run a program, send an email, show a message).
Conditions — Additional conditions that must be true (only run if the computer is on AC power, for example).
Settings — Behavior if the task misses its scheduled time, how to handle running instances, etc.

4.4 Event Viewer

The Windows Event Viewer is a log viewing tool that records significant events from the operating system and applications.

Three Main Log Types:

Application Log — Logs events from applications and programs. SQL Server logs its startup, shutdown, and error messages here (in addition to its own error log).

System Log — Logs events from Windows system components: driver failures, service starts/stops, disk errors.

Security Log — Logs security-related events: successful and failed login attempts (Windows authentication). Requires audit policy to be configured.

SQL Server Error Log: SQL Server also maintains its own error log, separate from Event Viewer. Located at the SQL Server data directory. View it in SSMS under Management > SQL Server Logs or with the system stored procedure sp_readerrorlog.

Why DBAs Use Event Viewer:

Diagnose SQL Server startup failures (if SQL Server won’t start, check Application log for error codes).
Investigate disk errors that could be causing SQL Server I/O errors.
Check for failed Windows services or out-of-memory (OOM) events.

4.5 Server Patching

What is Server Patching? Patching means applying updates (security fixes, bug fixes, feature enhancements) to Windows Server and SQL Server software.

Types of SQL Server Updates:

Service Packs (SP) — Major cumulative updates, released infrequently. Include all previous fixes.
Cumulative Updates (CU) — Released every few weeks for active SQL Server versions. Contains a collection of bug fixes since the last CU or SP.
General Distribution Releases (GDR) — Smaller, critical security-only patches. Applied between CUs when urgent.
Hotfix — A fix for a specific, critical bug — usually applied urgently.

Why Patching Requires Care in Production:

Patches often require a restart of SQL Server or Windows, causing downtime.
A patch may fix one bug but introduce another (regression).
Patching must be tested in a non-production environment first.

Patching Process (Best Practice):

Review the patch release notes to understand what it fixes and what it changes.
Apply the patch to a Development environment first.
Test that applications work correctly.
Apply to a QA/Staging environment.
Schedule a maintenance window for Production patching (typically off-peak hours).
Notify users/stakeholders of planned downtime.
Take a fresh backup before applying the patch.
Apply patch and verify SQL Server starts successfully.
Verify application functionality after patching.

5. PostgreSQL Basics

PostgreSQL (often called “Postgres”) is a powerful, open-source RDBMS. While SQL Server is Microsoft’s commercial product, PostgreSQL is free and widely used in modern cloud applications. Many companies run both.

5.1 PostgreSQL vs SQL Server — Key Differences

Feature	SQL Server	PostgreSQL
License	Commercial (Microsoft)	Open Source (Free)
Operating System	Windows (primarily), Linux	Linux (primarily), Windows, macOS
Management Tool	SSMS	pgAdmin, psql (command line)
Default Port	1433	5432
Authentication	Windows Auth or SQL Auth	Password-based (md5, scram-sha-256)
Schema	dbo (default)	public (default)
Identity Columns	IDENTITY	SERIAL or GENERATED AS IDENTITY
String Quotes	Single quotes for values, square brackets `[ ]` for identifiers	Single quotes for values, double quotes `" "` for identifiers

5.2 PostgreSQL Architecture

Cluster: In PostgreSQL, a “cluster” refers to a single PostgreSQL server instance that manages a collection of databases — not a HA cluster like in Windows. This is confusing terminology for people coming from SQL Server.

Databases: Each PostgreSQL cluster can contain multiple databases. Unlike SQL Server, objects in different databases cannot be directly joined.

Schemas: Within a database, schemas are namespaces that organize tables and other objects. The default schema is called public. Similar to SQL Server’s dbo schema.

Processes: PostgreSQL uses a multi-process model (one process per client connection), unlike SQL Server’s multi-threaded model. This affects memory usage at high connection counts.

5.3 PostgreSQL Database Creation and Management

Creating a Database:

CREATE DATABASE mydb;

You can also specify options like encoding, owner, and collation.

Connecting to a Database: In psql (command-line tool): \c mydb

Basic Management Commands in psql:

Command	Description
`\l`	List all databases
`\c dbname`	Connect to a database
`\dt`	List all tables in current database
`\d tablename`	Describe a table (show columns, types)
`\du`	List all users/roles
`\q`	Quit psql

5.4 User Management in PostgreSQL

In PostgreSQL, there is no distinction between a “login” and a “user” — they are both called roles.

Creating a Role:

CREATE ROLE john LOGIN PASSWORD 'securepassword';

LOGIN attribute allows the role to connect to the database.
Without LOGIN, the role is a group role (like a SQL Server role without login).

Granting Superuser:

CREATE ROLE admin SUPERUSER LOGIN PASSWORD 'password';

Equivalent to sysadmin in SQL Server.

Granting Permissions on a Database:

GRANT CONNECT ON DATABASE mydb TO john;
GRANT USAGE ON SCHEMA public TO john;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO john;

Role Inheritance: One role can inherit permissions from another. This is how group-based access works in PostgreSQL.

5.5 Backup and Restore in PostgreSQL

pg_dump — Database Backup Tool: pg_dump is the standard command-line utility for backing up a single PostgreSQL database.

pg_dump -U postgres -d mydb -F c -f mydb_backup.dump

-U postgres — Connect as the postgres superuser.
-d mydb — Database to back up.
-F c — Custom format (compressed, supports parallel restore).
-f mydb_backup.dump — Output file.

pg_dumpall — Full Server Backup: Backs up all databases plus global objects (roles, tablespaces).

pg_dumpall -U postgres -f full_backup.sql

pg_restore — Restore from Custom Format Backup:

pg_restore -U postgres -d mydb -F c mydb_backup.dump

psql — Restore from Plain SQL Backup:

psql -U postgres -d mydb -f backup.sql

Continuous Archiving (WAL Archiving): PostgreSQL equivalent of SQL Server transaction log backups. WAL (Write-Ahead Log) files are archived continuously and used for Point-in-Time Recovery (PITR). This is PostgreSQL’s equivalent of Full Recovery Model in SQL Server.

5.6 PostgreSQL Configuration Files

postgresql.conf: The main configuration file. Controls memory, connections, WAL settings, logging, and more.

Key parameters:

max_connections — Maximum number of client connections.
shared_buffers — Memory allocated for PostgreSQL buffer pool (like SQL Server buffer pool). Typically set to 25% of total RAM.
work_mem — Memory for each sorting/hashing operation per query.
maintenance_work_mem — Memory for maintenance operations (VACUUM, CREATE INDEX).
wal_level — Controls what information is written to WAL (important for replication and archiving).

pg_hba.conf (Host-Based Authentication): Controls who can connect to PostgreSQL and how they authenticate. Every connection attempt is checked against this file.

Example entries:

# TYPE  DATABASE  USER  ADDRESS     METHOD
local   all       all               peer
host    all       all   127.0.0.1/32  md5
host    mydb      john  10.0.0.0/8   scram-sha-256

This is a critical security file — only the PostgreSQL administrator should be able to modify it.

6. SSIS and SSRS

6.1 SSIS — SQL Server Integration Services

What is SSIS? SSIS is Microsoft’s platform for ETL (Extract, Transform, Load) — the process of moving and transforming data between different systems.

Why DBAs and Data Engineers Use SSIS:

Import data from Excel files, CSV files, or external databases into SQL Server.
Transform data (clean it, reformat it, apply business rules) during the move.
Load processed data into a data warehouse or reporting database.
Automate recurring data movement tasks.

Core Concept — The ETL Process:

Extract — Pull data from a source (a file, a database, a web service, etc.).
Transform — Clean, filter, aggregate, convert, or enrich the data.
Load — Write the final data to a destination.

SSIS Components:

Package: The top-level unit of work in SSIS. A package contains the workflow of tasks and data flows. Packages are saved as .dtsx files.

Control Flow: The orchestration layer of a package. Defines the order of execution.

Tasks — Individual units of work: Execute SQL Task (run a SQL query), File System Task (copy/move files), Send Mail Task (send email notifications), Script Task (run custom .NET code).
Containers — Group tasks together: Sequence Container (run tasks in order), For Loop Container (repeat tasks), Foreach Loop Container (iterate over files or rows).
Precedence Constraints — Lines connecting tasks that define execution order and conditions (run next task only if the previous succeeded, failed, or completed).

Data Flow: The component within a package that handles the actual movement and transformation of data.

Sources — Where data comes from: OLE DB Source (SQL Server), Flat File Source (CSV/text), Excel Source.
Transformations — Operations applied to data in flight: Data Conversion (change data types), Derived Column (calculate new values), Lookup (join to a reference table), Aggregate (sum/count/average), Conditional Split (route rows based on conditions), Sort.
Destinations — Where data is loaded: OLE DB Destination (SQL Server), Flat File Destination, Excel Destination.

SSIS Package Execution:

Packages can be run manually from SSDT (SQL Server Data Tools) during development.
In production, packages are deployed to the SSIS Catalog (a database called SSISDB) and executed via SQL Server Agent jobs on a schedule.

SSIS Catalog (SSISDB): A centralized repository in SQL Server for deploying, managing, monitoring, and executing SSIS packages. Provides detailed execution logs and reports.

Common SSIS Use Cases at a Company:

Nightly load of sales transactions from a point-of-sale system into a SQL Server data warehouse.
Weekly import of vendor product pricing from Excel files.
Real-time replication of customer records from a CRM system into an analytics database.

6.2 SSRS — SQL Server Reporting Services

What is SSRS? SSRS is Microsoft’s platform for creating, delivering, and managing reports and dashboards from SQL Server and other data sources.

Why Companies Use SSRS:

Deliver formatted, professional business reports (sales reports, financial summaries, operational dashboards) to business users.
Reports can be scheduled to run automatically and emailed as PDF or Excel attachments.
Users can access reports through a web portal or directly in applications.

Key Concepts:

Report Server: The server-side component that hosts, processes, and delivers reports. It can run in Native Mode (standalone web portal) or SharePoint Integrated Mode (embedded in SharePoint).

Report Definition (.rdl file): A report is defined in an XML-based file with a .rdl extension (Report Definition Language). This file contains the data source connections, dataset queries, layout, and formatting.

Report Designer (in SSDT/Visual Studio): The tool used to create and design reports. Provides a drag-and-drop interface for placing report items (tables, charts, images) and writing expressions.

Report Builder: A lightweight, standalone tool for business users (not just developers) to create their own reports without needing Visual Studio.

Data Sources: Reports connect to data sources to retrieve data. SSRS supports SQL Server, Oracle, OLE DB, ODBC, XML, SharePoint lists, and more.

Datasets: Each report contains one or more datasets — the result of a query (usually SQL) that brings back data for the report. A dataset is linked to a data source.

Report Items:

Table — Display data in rows and columns (classic report grid).
Matrix — A cross-tabulation (pivot table) report.
Chart — Bar, line, pie, and other charts.
Gauge — Speedometer-style KPI visualizations.
Map — Geographic data visualizations.
Text Box — Static text, calculated values, report title.
Image — Company logos, icons.
Subreport — A report embedded inside another report.

Parameters: Reports can have parameters that users enter to filter the report data (e.g., choose a date range, select a department). Parameters can be visible dropdowns or hidden.

Subscriptions:

Standard Subscriptions — Deliver a report automatically on a schedule (e.g., every Monday at 7 AM) to email addresses or a file share.
Data-Driven Subscriptions — Deliver personalized reports to a list of recipients stored in a database (e.g., send each manager only their department’s report).

Report Delivery Formats: PDF, Excel, Word, CSV, XML, TIFF (image), and web-based rendering.

SSRS Web Portal: A web-based interface where users browse, run, and subscribe to reports. The DBA/report administrator manages security and folder permissions here.

7. Production Environment Mindset

This is perhaps the most important section for a fresher entering a DBA role. Technical skills are necessary, but understanding the professional responsibility of working with production systems is equally critical.

7.1 Understanding Downtime Impact

A production environment is the live system that real users and customers depend on. This is not a test environment — mistakes here have real consequences.

What Downtime Means:

Users cannot access the application.
Transactions cannot be processed (financial, orders, etc.).
Company reputation is damaged.
Revenue is lost — in some industries, one hour of downtime can cost tens of thousands to millions of dollars.
SLA (Service Level Agreements) may be breached — the company may have contractually agreed to 99.9% uptime.

99.9% Uptime = only 8.7 hours of downtime per year.

Types of Downtime:

Planned Downtime — Scheduled maintenance windows (patching, migrations). Stakeholders are informed in advance.
Unplanned Downtime — Unexpected failures (hardware crash, software bug, human error). Must be resolved as fast as possible.

The DBA’s Responsibility:

Never make unapproved changes in production without a change management ticket.
Always test in lower environments (Dev → QA → Staging) before production.
Always have a rollback plan — what do you do if the change goes wrong?
Communicate proactively — if you suspect a problem, tell someone early.

7.2 Backup Strategy

A production backup strategy answers these questions:

How often should we back up?

RTO (Recovery Time Objective) — How long can the business afford to be down? If RTO = 1 hour, you need to be able to restore within 1 hour.
RPO (Recovery Point Objective) — How much data can the business afford to lose? If RPO = 15 minutes, you need log backups every 15 minutes.

A Typical Production Backup Strategy:

Backup Type	Frequency	Retention
Full Backup	Weekly (Sunday midnight)	4 weeks
Differential Backup	Daily (Monday–Saturday midnight)	1 week
Transaction Log Backup	Every 15 minutes	3 days

Backup Verification: A backup that has never been tested for restore is not a real backup — it may be corrupt and fail when you need it most.

Regularly restore backups to a test server and verify data integrity.
Use RESTORE VERIFYONLY to check backup integrity without restoring.

Backup Storage:

Keep backups on separate storage from the database server.
If the server’s disk fails, and the backups are on the same disk, both are lost.
Best practice: backup to local disk first (fast), then copy to network share or cloud storage (offsite).

7.3 Disaster Recovery (DR)

Disaster Recovery (DR) is the plan and process for restoring normal database operations after a major disaster — fire, flood, complete data center failure, ransomware attack.

Key DR Concepts:

DR Site: A secondary data center (physical or cloud) where database copies are maintained. If the primary site is destroyed, operations can be switched to the DR site.

RPO and RTO in DR Context:

If DR backups are replicated to the secondary site once per night, your RPO is up to 24 hours of data loss.
If a data center fire occurs, your RTO may be several hours to bring systems back online at the DR site.

DR vs HA:

HA protects against server/component failures — fast automatic recovery (seconds to minutes).
DR protects against site/catastrophic failures — manual recovery (minutes to hours). Both are needed in a production environment.

DR Drill: Companies periodically simulate a disaster to verify the DR process actually works. This is called a DR drill or failover test.

7.4 Change Documentation

Every change made to a production database must be documented. This is not optional — it is a professional and often legal/compliance requirement.

What to Document:

Change Management Ticket: Before making any production change, create a ticket that includes:

What is being changed (specific script, configuration, index, etc.)
Why the change is needed (business reason, performance fix, bug resolution)
When the change will be made (maintenance window date/time)
How the change will be made (exact steps, scripts)
Risk assessment — What could go wrong?
Rollback plan — Exact steps to undo the change if it goes wrong
Testing evidence — Proof the change was tested in a lower environment

Post-Change Documentation: After the change, document:

Was it successful or did issues occur?
Actual time taken
Any unexpected observations
Final status

Why Documentation Matters:

If something goes wrong weeks later, documentation shows what changed.
New team members can understand the history of the system.
Auditors require proof that changes were authorized and controlled.
Change log helps during disaster recovery.

7.5 Monitoring Alerts

Proactive monitoring means detecting and resolving problems before users notice them. Reactive monitoring means users call you because the system is already down — this is unacceptable in a professional environment.

What to Monitor:

Category	Metric	Alert Threshold (Example)
Disk	Free disk space on database volumes	Alert at < 20% free
SQL Server	Transaction log space used	Alert at > 80% used
SQL Server	Long-running queries	Alert if query runs > 5 minutes
SQL Server	SQL Server Agent job failures	Alert immediately on any failure
SQL Server	Database backup age	Alert if no backup for > 25 hours
SQL Server	Blocking	Alert if blocked > 30 seconds
Windows	CPU usage	Alert at > 90% for > 5 minutes
Windows	Available memory	Alert at < 500 MB free
Windows	Windows service stopped	Alert immediately

Alerting Tools:

SQL Server Agent Alerts + Database Mail (built-in, sends emails on SQL Server events)
Windows Performance Monitor (tracks counters over time, triggers alerts)
Third-party monitoring tools: SolarWinds DBA xPress, SentryOne/SQL Sentry, Redgate SQL Monitor, Nagios, Zabbix

On-Call and Escalation: In production environments, DBAs are often on-call. When a critical alert fires at 3 AM, someone must respond. Escalation procedures define: who is alerted first, how long they have to respond before escalating to a senior DBA or manager.

8. Tools Used by DBAs

8.1 Microsoft SQL Server Management Studio (SSMS)

SSMS is the primary graphical management tool for SQL Server. It is a free download from Microsoft.

Key Features:

Object Explorer — Tree view of all databases, tables, users, jobs, and server configuration. Your main navigation tool.
Query Editor — Write, execute, and debug T-SQL queries. Shows results in a grid.
Execution Plans — View graphical execution plans directly in the query editor.
Activity Monitor — Real-time view of CPU, waits, data I/O, and recent expensive queries.
SQL Server Agent — Manage and view status of scheduled jobs.
Database Backup/Restore Wizard — Graphical interface for backup and restore operations.
Import/Export Wizard — Simple data movement between sources (wrapper around SSIS).
Reports — Built-in performance reports (disk usage, top queries, etc.).

Essential SSMS Shortcuts:

F5 — Execute query
Ctrl + L — Display estimated execution plan (without running)
Ctrl + M — Toggle actual execution plan on/off (runs query with plan)
Ctrl + K, Ctrl + C — Comment out selected text
Ctrl + K, Ctrl + U — Uncomment selected text

8.2 pgAdmin

pgAdmin is the primary graphical management tool for PostgreSQL. It is free and open-source.

Key Features:

Browser Panel — Tree view of servers, databases, schemas, tables, and other objects.
Query Tool — Write and execute SQL queries against PostgreSQL databases.
Explain (Execution Plan) — View graphical query execution plans.
Dashboard — Server activity, sessions, transactions per second, block I/O.
Backup/Restore — Graphical interface wrapping pg_dump and pg_restore.
pgAgent — A job scheduler for PostgreSQL (similar to SQL Server Agent).

Difference from SSMS: pgAdmin is web-based — it runs as a local web server on your computer and you access it through a browser. SSMS is a standalone Windows desktop application.

8.3 Command Line Tools

SQL Server Command Line:

sqlcmd — Microsoft’s command-line utility for executing T-SQL scripts and queries against SQL Server.

sqlcmd -S MyServer -U sa -P password -Q "SELECT name FROM sys.databases"

Useful for: automating scripts in batch files or PowerShell, running maintenance scripts, and checking SQL Server status when SSMS is unavailable.

PostgreSQL Command Line:

psql — PostgreSQL’s interactive command-line client.

psql -U postgres -d mydb

pg_dump — Backup utility. pg_restore — Restore utility. pg_ctl — Control the PostgreSQL server service (start, stop, restart).

PowerShell: Windows PowerShell is increasingly important for DBAs. The SqlServer PowerShell module from Microsoft allows you to manage SQL Server programmatically.

Invoke-Sqlcmd -ServerInstance "MyServer" -Query "SELECT @@VERSION"

Used for: automated backup verification, disk space monitoring, bulk user creation, and deployment scripts.

8.4 Linux Command Line Basics for DBAs

As SQL Server now runs on Linux, and PostgreSQL is primarily a Linux tool, basic Linux command-line knowledge is increasingly important.

Essential Linux Commands for DBAs:

Command	Description
`ls -la`	List files with details and permissions
`cd /path/to/directory`	Change directory
`cat filename`	Display file contents
`tail -f logfile`	Watch a log file in real time
`grep "error" logfile`	Search for “error” in a log file
`df -h`	Show disk space usage (human-readable)
`du -sh /var/data`	Show directory disk usage
`ps -aux`	List all running processes
`systemctl status postgresql`	Check PostgreSQL service status
`systemctl start/stop/restart postgresql`	Control PostgreSQL service
`chmod 750 file`	Change file permissions
`chown postgres:postgres file`	Change file owner
`top`	Real-time process and CPU/memory monitor
`free -h`	Show memory usage

Final Summary: How These Topics Connect

As a DBA, your core responsibility can be summarized in one sentence:

Ensure the database is always available, performs well, and the data is never lost.

Here is how everything connects:

SQL Server Administration — You install, configure, and manage the database engine. You create databases, users, and manage disk space.
Backup & Restore — If disaster strikes, you restore the database. Without good backups, everything else is meaningless.
Performance Tuning — You use execution plans, indexes, and monitoring to keep queries fast and the server healthy.
High Availability — You implement Replication, Log Shipping, or Clustering to keep the database available even when hardware fails.
Windows Server — SQL Server lives on Windows Server. You must understand services, disk, patching, and event logs to manage the environment.
PostgreSQL — Many modern systems use PostgreSQL. Basic administration concepts (backup, users, monitoring) are similar to SQL Server.
SSIS/SSRS — You support data pipelines (SSIS) and report infrastructure (SSRS) as part of the SQL Server ecosystem.
Production Mindset — All of the above must be executed with discipline: change documentation, proactive monitoring, recovery planning, and respect for the impact of every action you take on a live system.

This document was prepared to support interview preparation for a Database Administrator role. All concepts are theoretical foundations — practical lab experience building on these concepts will significantly strengthen your interview performance.

Database Administrator (DBA) Interview Preparation Guide#

Prepared for: Scalong AI — Junior DBA Role#

Table of Contents#

1. SQL Server Administration#

What is SQL Server?#

1.1 SQL Server Installation#

1.2 Authentication Modes#

1.3 SQL Server Databases — System vs User#

1.4 Creating Databases and Understanding Database Files#

1.5 Creating Users and Roles#

1.6 Backup and Restore#

1.7 Monitoring Performance#

1.8 Managing Disk Space#

1.9 Managing Indexes#

2. Performance Tuning#

2.1 Query Execution Plans#

2.2 Index Optimization#

2.3 Query Optimization#

2.4 Detecting Blocking and Deadlocks#

2.5 Monitoring CPU and Memory#

2.6 SQL Server Profiler#

3. High Availability (HA) Concepts#

3.1 Replication#

3.2 Log Shipping#

3.3 Clustering (Windows Server Failover Clustering)#

3.4 Always On Availability Groups (Modern HA)#

4. Windows Server Basics#

4.1 Windows Services#

4.2 Disk Management#

4.3 Task Scheduler#

4.4 Event Viewer#

4.5 Server Patching#

5. PostgreSQL Basics#

5.1 PostgreSQL vs SQL Server — Key Differences#

5.2 PostgreSQL Architecture#

5.3 PostgreSQL Database Creation and Management#

5.4 User Management in PostgreSQL#

5.5 Backup and Restore in PostgreSQL#

5.6 PostgreSQL Configuration Files#

6. SSIS and SSRS#

6.1 SSIS — SQL Server Integration Services#

6.2 SSRS — SQL Server Reporting Services#

7. Production Environment Mindset#

7.1 Understanding Downtime Impact#

7.2 Backup Strategy#

7.3 Disaster Recovery (DR)#

7.4 Change Documentation#

7.5 Monitoring Alerts#

8. Tools Used by DBAs#

8.1 Microsoft SQL Server Management Studio (SSMS)#

8.2 pgAdmin#

8.3 Command Line Tools#

8.4 Linux Command Line Basics for DBAs#

Final Summary: How These Topics Connect#