Executive Summary
SQL remains the universal language of data in 2026. Despite decades of NoSQL alternatives, SQL databases power 62% of all applications, and PostgreSQL has become the most popular database among developers, surpassing MySQL and Oracle. The SQL standard continues to evolve with SQL:2023 adding JSON improvements, property graph queries, and enhanced window functions. Meanwhile, DuckDB has emerged as the fastest-growing database for analytics, bringing columnar SQL to embedded applications.
This report covers every major SQL concept from the basics (SELECT, WHERE, JOIN) through advanced features (window functions, recursive CTEs, lateral joins) to practical concerns (indexing, query optimization, transactions, normalization). Every section includes reference tables with 100+ SQL functions, comparison data, and practical examples that work across PostgreSQL, MySQL, and SQLite. Whether you are writing your first query or optimizing production databases, this is the single SQL reference you need.
- PostgreSQL holds 30% market share, becoming the most popular database among developers for the third consecutive year. Its extensibility (pgvector for AI, PostGIS for spatial, JSONB for documents) makes it the "Swiss Army knife" of databases.
- DuckDB grew from 0% to 5% share in 4 years, becoming the go-to analytical database for data scientists. Its zero-configuration, in-process architecture and columnar storage bring OLAP performance to any application.
- SQL databases hold 62% of the market, with hybrid approaches (SQL databases with JSON/document support) growing to 11%. The SQL vs NoSQL debate has shifted to "use the right tool for the job."
- Window functions and CTEs are now universally supported across all major databases (PostgreSQL, MySQL 8+, SQLite 3.25+, SQL Server), enabling analytical queries that previously required application-level code or stored procedures.
30%
PostgreSQL market share
5%
DuckDB adoption
62%
SQL database market
100+
Functions covered
Part 1: SELECT and WHERE
The SELECT statement is the foundation of SQL. It retrieves data from one or more tables. The basic syntax is: SELECT columns FROM table WHERE condition ORDER BY column LIMIT n. Every SQL query, no matter how complex, builds upon these fundamental clauses. Understanding their execution order is critical: FROM (identify tables) -> WHERE (filter rows) -> GROUP BY (group rows) -> HAVING (filter groups) -> SELECT (choose columns) -> ORDER BY (sort results) -> LIMIT (restrict output).
The WHERE clause filters rows before any grouping occurs. It supports: comparison operators (=, !=, <, >, <=, >=), logical operators (AND, OR, NOT), pattern matching (LIKE, ILIKE for case-insensitive), range checking (BETWEEN x AND y), set membership (IN (value1, value2)), NULL checking (IS NULL, IS NOT NULL), and existence testing (EXISTS). Combining these with parentheses for operator precedence enables arbitrarily complex filtering. Performance tip: put the most selective conditions first, and ensure filtered columns have indexes.
SELECT DISTINCT removes duplicate rows from results. SELECT * retrieves all columns (avoid in production code; always list specific columns for clarity, performance, and schema change safety). Column aliases (AS alias_name) rename output columns. Expression columns allow computed values: SELECT price * quantity AS total. CASE expressions provide conditional logic inline: SELECT CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END AS grade.
ORDER BY sorts results by one or more columns in ascending (ASC, default) or descending (DESC) order. NULLS FIRST / NULLS LAST controls NULL positioning (PostgreSQL). LIMIT restricts the number of rows returned. OFFSET skips rows (avoid for large offsets; use cursor-based pagination instead). PostgreSQL and SQLite use LIMIT/OFFSET; MySQL uses LIMIT/OFFSET; SQL Server uses TOP or OFFSET-FETCH.
Part 2: JOIN Types
JOINs combine rows from two or more tables based on a related column. They are the primary mechanism for querying data spread across normalized tables. Understanding when to use each JOIN type is one of the most important SQL skills. The JOIN condition (ON clause) specifies how tables relate; it typically matches a foreign key to a primary key.
INNER JOIN returns only rows where the join condition is satisfied in both tables. If a customer has no orders, they will not appear in the result. LEFT JOIN (LEFT OUTER JOIN) returns all rows from the left table and matching rows from the right; unmatched right-side columns are filled with NULL. This is the most common join when you want "all X, with optional Y." RIGHT JOIN is the mirror of LEFT JOIN; it is rarely used because you can always rewrite it as a LEFT JOIN with the tables swapped.
FULL OUTER JOIN returns all rows from both tables, with NULLs where there is no match on either side. CROSS JOIN produces the Cartesian product (every row from table A combined with every row from table B). Use CROSS JOIN for generating combinations (all sizes x all colors) but be cautious of the result set size (n * m rows). SELF JOIN joins a table with itself using aliases, useful for hierarchical data (employees/managers) and comparing rows within the same table.
SQL JOIN Types Comparison
8 rows
| JOIN Type | Description | NULL Handling | Use Case |
|---|---|---|---|
| INNER JOIN | Returns only rows that have matching values in both tables. | Excludes NULLs in join columns | Finding matching records (orders with customers) |
| LEFT JOIN (LEFT OUTER JOIN) | Returns all rows from the left table and matched rows from the right. NULLs for unmatched right rows. | Right columns NULL for unmatched rows | All customers with optional orders |
| RIGHT JOIN (RIGHT OUTER JOIN) | Returns all rows from the right table and matched rows from the left. NULLs for unmatched left rows. | Left columns NULL for unmatched rows | Rarely used; rewrite as LEFT JOIN for readability |
| FULL OUTER JOIN | Returns all rows from both tables. NULLs where there is no match on either side. | NULLs on both sides for unmatched | Finding all records with/without matches |
| CROSS JOIN | Returns the Cartesian product: every row from left combined with every row from right. | N/A | Generate all combinations (sizes x colors) |
| SELF JOIN | Joins a table with itself using table aliases. | Depends on join type used | Hierarchical data (employees/managers), comparisons within same table |
| NATURAL JOIN | Automatically joins on columns with the same name in both tables. | Same as INNER JOIN | Quick prototyping; avoid in production (fragile) |
| LATERAL JOIN | Allows the right side to reference columns from the left side (correlated subquery in FROM). | Depends on LEFT/INNER | Top-N per group, dependent subqueries |
Part 3: GROUP BY and HAVING
GROUP BY groups rows sharing common values into summary rows, enabling aggregate calculations per group. Every non-aggregated column in the SELECT clause must appear in the GROUP BY clause (standard SQL; MySQL historically allowed non-deterministic results). Aggregate functions (COUNT, SUM, AVG, MIN, MAX) operate on each group independently. Without GROUP BY, aggregate functions operate on the entire result set as one group.
HAVING filters groups after aggregation (unlike WHERE, which filters individual rows before grouping). HAVING can use aggregate functions: HAVING COUNT(*) > 5 keeps only groups with more than 5 rows. WHERE cannot reference aggregate functions. Performance tip: filter as much as possible with WHERE (before grouping) rather than HAVING (after grouping), because WHERE reduces the number of rows that need to be grouped.
Advanced GROUP BY extensions: ROLLUP generates subtotals and a grand total (GROUP BY ROLLUP(department, city) produces totals by department+city, by department, and overall). CUBE generates subtotals for all combinations. GROUPING SETS specifies exactly which groupings to compute. The GROUPING() function distinguishes NULL values from actual NULLs in rollup rows. These extensions are essential for reporting and business intelligence queries.
Part 4: Subqueries
A subquery is a SELECT statement nested inside another SQL statement. Subqueries can appear in the WHERE clause (most common), the FROM clause (derived tables), and the SELECT clause (scalar subqueries). Subqueries are classified by their return type: scalar (single value), row (single row, multiple columns), and table (multiple rows and columns). They can also be correlated (reference the outer query) or non-correlated (independent).
Non-correlated subqueries execute once and their result is used by the outer query. Example: SELECT * FROM products WHERE price > (SELECT AVG(price) FROM products). Correlated subqueries reference the outer query and execute once for each outer row. Example: SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id). Correlated subqueries can be expensive; consider rewriting as JOINs.
EXISTS is often the most efficient way to test for the existence of related rows: WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id). EXISTS stops at the first match (short-circuits), making it faster than COUNT(*) > 0. IN vs EXISTS: IN materializes the subquery result into a set; EXISTS checks row by row. For large subquery results, EXISTS is usually faster. NOT EXISTS and NOT IN differ in NULL handling: NOT IN returns no rows if the subquery contains NULL; NOT EXISTS handles NULLs correctly.
Part 5: Window Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row (unlike GROUP BY). The syntax is: function() OVER (PARTITION BY ... ORDER BY ... frame_clause). Window functions are computed after WHERE, GROUP BY, and HAVING, but before ORDER BY and LIMIT. They enable analytical queries that are impossible or extremely awkward with standard GROUP BY.
Ranking functions: ROW_NUMBER() assigns a unique sequential number (1, 2, 3, ...) to each row within a partition. RANK() assigns the same rank to ties, leaving gaps (1, 2, 2, 4). DENSE_RANK() assigns the same rank to ties without gaps (1, 2, 2, 3). NTILE(n) distributes rows into n groups. These are essential for top-N queries, pagination, and deduplication.
Value functions: LAG(column, offset) accesses data from a previous row (previous month sales, previous day price). LEAD(column, offset) accesses data from a subsequent row. FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE() return values from specific positions within the window frame. These enable period-over-period comparisons without self-joins.
Aggregate window functions: SUM(), AVG(), COUNT(), MIN(), MAX() can all be used as window functions with OVER(). SUM(amount) OVER (ORDER BY date) produces a running total. AVG(price) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) produces a 7-day moving average. The frame clause (ROWS BETWEEN ... AND ...) controls which rows are included in the calculation relative to the current row.
Part 6: Common Table Expressions (CTEs)
CTEs are temporary named result sets defined with the WITH keyword. They improve query readability by breaking complex queries into logical steps. Syntax: WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Multiple CTEs can be chained: WITH cte1 AS (...), cte2 AS (SELECT * FROM cte1 ...) SELECT * FROM cte2. CTEs are not materialized by default in PostgreSQL (the optimizer may inline them); add MATERIALIZED hint to force materialization.
Recursive CTEs (WITH RECURSIVE) enable traversing hierarchical and graph data. Structure: WITH RECURSIVE tree AS (base_case UNION ALL recursive_step) SELECT * FROM tree. The base case provides the starting rows; the recursive step references the CTE itself to add more rows. Each iteration adds new rows until the recursive step returns no rows. Common uses: organizational hierarchies (employee/manager), category trees, bill of materials, finding all connected nodes in a graph, and generating number series.
Practical CTE patterns: (1) Readability: name each logical step (WITH active_users AS ..., recent_orders AS ... SELECT ...). (2) Deduplication: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn FROM users) SELECT * FROM ranked WHERE rn = 1. (3) Recursive hierarchy: WITH RECURSIVE org AS (SELECT id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id, o.level + 1 FROM employees e JOIN org o ON e.manager_id = o.id) SELECT * FROM org.
Part 7: Indexes
Indexes are data structures that speed up data retrieval at the cost of additional storage and slower writes. Without an index, the database must scan every row (sequential scan) to find matching data. With a B-Tree index, lookup is O(log n) instead of O(n). The trade-off: every INSERT, UPDATE, and DELETE must also update all relevant indexes, so over-indexing slows down write operations.
Index strategy: (1) Index columns used in WHERE clauses (equality and range conditions). (2) Index columns used in JOIN conditions. (3) Index columns used in ORDER BY to avoid sorts. (4) Use composite indexes for multi-column queries (column order matters: leftmost prefix rule). (5) Consider partial indexes for queries that filter on a specific condition (CREATE INDEX ON orders(user_id) WHERE status = 'active'). (6) Use covering indexes (INCLUDE) to avoid table lookups. (7) Do not index columns with low cardinality (boolean, status with few values) unless combined with selective columns.
SQL Index Types
10 rows
| Index Type | Description | Best For | Databases |
|---|---|---|---|
| B-Tree | Balanced tree index. Default in most databases. Supports equality and range queries. | =, <, >, <=, >=, BETWEEN, LIKE "prefix%", ORDER BY | All (default) |
| Hash | Hash table index. Only supports exact equality lookups. | = (exact match only) | PostgreSQL, MySQL (Memory engine) |
| GIN (Generalized Inverted) | Inverted index for composite values (arrays, JSONB, full-text). | Array containment (@>), JSONB queries, full-text search (tsvector) | PostgreSQL |
| GiST (Generalized Search Tree) | Supports various indexing strategies including spatial. | Geometric/spatial data, range types, full-text (alternative to GIN) | PostgreSQL |
| BRIN (Block Range Index) | Stores summary info for ranges of physical table blocks. Very small index. | Naturally ordered data (timestamps, sequential IDs) in large tables | PostgreSQL |
| Full-Text | Specialized index for text search with stemming, ranking, and boolean queries. | MATCH AGAINST (MySQL), to_tsvector/to_tsquery (PG) | MySQL, PostgreSQL, MSSQL |
| Spatial (R-Tree) | Index for geographic/geometric data types. | ST_Contains, ST_Distance, bounding box queries | MySQL (MyISAM/InnoDB), PostgreSQL (PostGIS) |
| Covering (INCLUDE) | B-Tree index that includes additional columns in leaf nodes. | Index-only scans, avoiding table lookups | PostgreSQL 11+, MSSQL, MySQL 8+ |
| Partial / Filtered | Index that only includes rows matching a WHERE condition. | Queries on a subset of data (active users, recent orders) | PostgreSQL, MSSQL, SQLite |
| Expression / Functional | Index on the result of an expression or function. | LOWER(email), EXTRACT(year FROM date), computed columns | PostgreSQL, MySQL 8+, SQLite |
Part 8: Transactions and ACID
A transaction is a sequence of SQL operations treated as a single atomic unit of work. Transactions ensure data integrity: either all operations succeed (COMMIT) or none of them take effect (ROLLBACK). The ACID properties (Atomicity, Consistency, Isolation, Durability) define the guarantees that a database transaction provides. Understanding ACID is essential for any application that modifies data.
ACID Properties
4 rows
| Property | Description | Mechanism | Example |
|---|---|---|---|
| Atomicity | A transaction is all-or-nothing. If any part fails, the entire transaction is rolled back and the database remains unchanged. | Write-ahead logging (WAL), undo logs | Bank transfer: debit $100 from A and credit $100 to B both succeed or both fail. No partial state. |
| Consistency | A transaction brings the database from one valid state to another. All integrity constraints, triggers, and cascades are satisfied. | Constraints (PRIMARY KEY, FOREIGN KEY, CHECK, UNIQUE), triggers | After transfer, total balance across all accounts remains the same. No orphan foreign keys. |
| Isolation | Concurrent transactions execute as if they were serial. Each transaction is unaware of other concurrent transactions. | Locks (shared/exclusive), MVCC (Multi-Version Concurrency Control) | Two users reading the same balance see consistent values, not a mix of before/after a transfer. |
| Durability | Once a transaction is committed, it remains committed even in the event of power loss, crash, or error. | Write-ahead logging (WAL), fsync, replication | After COMMIT returns success, the data survives a server crash and is recoverable from the WAL. |
Isolation levels control how concurrent transactions interact. READ UNCOMMITTED allows dirty reads (seeing uncommitted changes). READ COMMITTED (PostgreSQL default) only sees committed data. REPEATABLE READ (MySQL InnoDB default) provides a consistent snapshot for the entire transaction. SERIALIZABLE provides full isolation at the cost of potential serialization failures. Choose the lowest isolation level that maintains correctness for your use case to maximize concurrency.
Part 9: Stored Procedures and Triggers
Stored procedures are precompiled SQL code blocks stored in the database and executed with CALL procedure_name(). They encapsulate business logic, reduce network traffic (one call instead of many queries), and enforce data integrity at the database level. Stored procedures support variables, control flow (IF/ELSE, LOOP, WHILE), error handling (EXCEPTION in PostgreSQL, DECLARE HANDLER in MySQL), and cursors. However, they are harder to version control, test, and debug than application code.
Triggers are procedural code blocks that automatically execute in response to table events (INSERT, UPDATE, DELETE). They fire BEFORE or AFTER the event, and FOR EACH ROW or FOR EACH STATEMENT. Common uses: audit logging (record who changed what and when), maintaining computed columns (update a denormalized total when line items change), enforcing complex constraints (business rules that cannot be expressed as CHECK constraints), and synchronizing related tables. Overuse of triggers makes debugging difficult because they execute implicitly.
User-Defined Functions (UDFs) differ from stored procedures in that they return a value and can be used in SELECT statements. Scalar functions return a single value. Table-valued functions return a result set. PostgreSQL supports functions in PL/pgSQL, SQL, Python (PL/Python), and JavaScript (PL/v8). MySQL supports functions in SQL and C. UDFs enable reusable calculations: CREATE FUNCTION discount_price(price DECIMAL, discount DECIMAL) RETURNS DECIMAL AS $$ SELECT price * (1 - discount) $$;
Part 10: Normalization
Normalization is the process of organizing data to reduce redundancy and improve data integrity. Each normal form builds on the previous one, adding stricter rules. Most production databases should be in Third Normal Form (3NF). Over-normalization leads to excessive JOINs and slower reads; strategic denormalization improves performance for read-heavy workloads.
Normalization Forms
6 rows
| Normal Form | Rule | Violation Example | Fix |
|---|---|---|---|
| 1NF (First Normal Form) | Each column contains atomic (indivisible) values. No repeating groups or arrays. | Column "phone_numbers" containing "555-1234, 555-5678" | Create a separate phone_numbers table with one number per row. |
| 2NF (Second Normal Form) | Must be in 1NF. Every non-key column depends on the entire primary key (no partial dependencies). | Composite key (student_id, course_id) with student_name depending only on student_id | Move student_name to a separate students table. |
| 3NF (Third Normal Form) | Must be in 2NF. No transitive dependencies (non-key column depends on another non-key column). | employee table with department_id and department_name (name depends on dept_id, not employee) | Move department_name to a departments table. |
| BCNF (Boyce-Codd Normal Form) | Must be in 3NF. Every determinant is a candidate key. Stricter than 3NF. | A table where a non-candidate-key column determines part of the key | Decompose into tables where each determinant is a key. |
| 4NF (Fourth Normal Form) | Must be in BCNF. No multi-valued dependencies. | employee(id, skill, language) where skills and languages are independent | Separate into employee_skills and employee_languages tables. |
| 5NF (Fifth Normal Form) | Must be in 4NF. No join dependencies that are not implied by candidate keys. | Supplier-part-project relationships that cannot be reconstructed from binary projections | Decompose into binary relationship tables. |
Part 11: Query Optimization
Query optimization is the process of making SQL queries run faster. The systematic approach: (1) Identify slow queries (log queries exceeding a threshold). (2) Run EXPLAIN ANALYZE to see the execution plan. (3) Look for sequential scans on large tables (add an index). (4) Check estimated vs actual row counts (large differences mean stale statistics; run ANALYZE). (5) Identify expensive operations (sorts, hash joins on large tables). (6) Rewrite the query if possible (subqueries to JOINs, add CTEs for readability without performance cost).
Common optimization patterns: (1) Index WHERE and JOIN columns. (2) Use LIMIT with ORDER BY for pagination (and cursor-based pagination for large offsets). (3) Avoid SELECT * (list specific columns). (4) Use EXISTS instead of COUNT(*) > 0 for existence checks. (5) Batch INSERT operations. (6) Use prepared statements for repeated queries. (7) Consider materialized views for expensive aggregations. (8) Partition large tables by date or category. (9) Use connection pooling (PgBouncer, ProxySQL). (10) Monitor and tune database configuration parameters (shared_buffers, work_mem, effective_cache_size).
Anti-patterns to avoid: (1) N+1 query problem (use JOINs or eager loading). (2) Using OFFSET for deep pagination (use cursor/keyset pagination). (3) Implicit type conversions in WHERE (prevents index usage). (4) Functions on indexed columns in WHERE (prevents index usage; use expression indexes instead). (5) Missing VACUUM/ANALYZE (stale statistics lead to bad plans). (6) Over-indexing (slows writes, wastes storage). (7) Using ORM without monitoring generated SQL.
Database Market Share (2019-2026)
Source: OnlineTools4Free Research
SQL vs NoSQL Adoption (2018-2026)
Source: OnlineTools4Free Research
Part 12: SQL Functions Reference (100+)
A comprehensive, searchable reference of 100+ SQL functions covering aggregate, string, date/time, math, conditional, conversion, window, subquery, JSON, and utility functions across PostgreSQL, MySQL, SQLite, and SQL Server.
SQL Functions Reference (100+)
79 rows
| Function | Category | Description | Syntax | Standard |
|---|---|---|---|---|
| COUNT() | Aggregate | Returns the number of rows matching a condition. | COUNT(*) / COUNT(column) / COUNT(DISTINCT column) | SQL-92 |
| SUM() | Aggregate | Returns the sum of numeric values in a column. | SUM(column) | SQL-92 |
| AVG() | Aggregate | Returns the average value of a numeric column. | AVG(column) | SQL-92 |
| MIN() | Aggregate | Returns the minimum value in a column. | MIN(column) | SQL-92 |
| MAX() | Aggregate | Returns the maximum value in a column. | MAX(column) | SQL-92 |
| GROUP_CONCAT() | Aggregate | Concatenates values from multiple rows into a single string. | GROUP_CONCAT(column SEPARATOR ",") | MySQL (STRING_AGG in PG/MSSQL) |
| STRING_AGG() | Aggregate | Concatenates string values with a separator (standard alternative to GROUP_CONCAT). | STRING_AGG(column, separator) | SQL:2016 |
| ARRAY_AGG() | Aggregate | Collects values into an array. | ARRAY_AGG(column) | SQL:2003 (PostgreSQL) |
| CONCAT() | String | Concatenates two or more strings. | CONCAT(str1, str2, ...) | SQL:2003 |
| CONCAT_WS() | String | Concatenates with a separator, skipping NULLs. | CONCAT_WS(separator, str1, str2, ...) | MySQL/PostgreSQL |
| SUBSTRING() | String | Extracts a substring from a string. | SUBSTRING(string FROM start FOR length) | SQL-92 |
| LENGTH() | String | Returns the length of a string in characters. | LENGTH(string) / CHAR_LENGTH(string) | SQL-92 |
| UPPER() | String | Converts a string to uppercase. | UPPER(string) | SQL-92 |
| LOWER() | String | Converts a string to lowercase. | LOWER(string) | SQL-92 |
| TRIM() | String | Removes leading and/or trailing characters (default: spaces). | TRIM([LEADING|TRAILING|BOTH] char FROM string) | SQL-92 |
| REPLACE() | String | Replaces all occurrences of a substring. | REPLACE(string, old, new) | SQL:2003 |
| LEFT() | String | Returns the leftmost N characters. | LEFT(string, n) | MySQL/MSSQL |
| RIGHT() | String | Returns the rightmost N characters. | RIGHT(string, n) | MySQL/MSSQL |
| LPAD() | String | Pads a string from the left to a specified length. | LPAD(string, length, pad_string) | SQL:2008 |
| RPAD() | String | Pads a string from the right to a specified length. | RPAD(string, length, pad_string) | SQL:2008 |
Page 1 of 4
Glossary (50+ Terms)
DDL (Data Definition Language)
SQL CategoriesSQL commands that define database structure: CREATE, ALTER, DROP, TRUNCATE, RENAME. DDL statements modify the schema and are typically auto-committed (cannot be rolled back in most databases). Examples: CREATE TABLE, ALTER TABLE ADD COLUMN, DROP INDEX.
DML (Data Manipulation Language)
SQL CategoriesSQL commands that manipulate data: SELECT, INSERT, UPDATE, DELETE, MERGE (UPSERT). DML statements can be wrapped in transactions and rolled back. SELECT is sometimes classified separately as DQL (Data Query Language).
DCL (Data Control Language)
SQL CategoriesSQL commands that control access: GRANT, REVOKE. DCL manages permissions on database objects. GRANT SELECT ON table TO user gives read access. REVOKE removes it. Permissions can be granted on tables, views, schemas, and databases.
TCL (Transaction Control Language)
SQL CategoriesSQL commands that manage transactions: BEGIN/START TRANSACTION, COMMIT, ROLLBACK, SAVEPOINT. Transactions group multiple SQL statements into an atomic unit. SAVEPOINT creates a point within a transaction to partially roll back to.
Primary Key
ConstraintsA column (or combination of columns) that uniquely identifies each row in a table. Primary keys must be unique, NOT NULL, and there can be only one per table. Automatically creates a unique index. Best practice: use surrogate keys (auto-increment integers or UUIDs) or natural keys (email, ISBN) depending on the domain.
Foreign Key
ConstraintsA column that references the primary key of another table, enforcing referential integrity. The referenced row must exist (or the foreign key must be NULL if allowed). CASCADE options: ON DELETE CASCADE (delete child rows), ON UPDATE CASCADE (update child references), SET NULL, SET DEFAULT, RESTRICT (prevent deletion).
Index
PerformanceA data structure that improves the speed of data retrieval operations on a table at the cost of additional storage and slower writes. Like a book index: instead of scanning every page, go directly to the relevant page. Indexes are critical for WHERE, JOIN, ORDER BY, and GROUP BY performance. Too many indexes slow down INSERT/UPDATE/DELETE.
Composite Index
PerformanceAn index on multiple columns. The order of columns matters: an index on (last_name, first_name) efficiently supports WHERE last_name = "Smith" and WHERE last_name = "Smith" AND first_name = "John", but NOT WHERE first_name = "John" alone. Follow the leftmost prefix rule.
Query Plan (EXPLAIN)
PerformanceThe database execution plan for a query. EXPLAIN (or EXPLAIN ANALYZE) shows how the database will execute a query: which indexes it uses, join algorithms, estimated row counts, and costs. Essential for query optimization. Look for: Sequential Scans on large tables (add index), Nested Loops (consider hash/merge join), Sort operations (add index).
Window Function
Advanced SQLA function that performs a calculation across a set of rows related to the current row, without collapsing them into a single output row (unlike GROUP BY). Syntax: function() OVER (PARTITION BY ... ORDER BY ...). Examples: ROW_NUMBER(), RANK(), LAG(), LEAD(), running SUM(). Window functions operate on a "window frame" defined by ROWS/RANGE.
CTE (Common Table Expression)
Advanced SQLA temporary named result set defined with the WITH keyword. CTEs improve readability and allow self-referencing (recursive CTEs). WITH active_users AS (SELECT * FROM users WHERE active = true) SELECT * FROM active_users. Recursive CTEs can traverse trees/graphs: WITH RECURSIVE tree AS (...).
Subquery
Advanced SQLA SELECT statement nested inside another SQL statement. Scalar subqueries return a single value. Row subqueries return one row. Table subqueries return multiple rows/columns. Correlated subqueries reference the outer query (executed once per outer row). Often rewritable as JOINs for better performance.
View
Database ObjectsA virtual table defined by a SELECT query. Views do not store data; they execute the underlying query when accessed. Uses: simplify complex queries, restrict column access, provide backward compatibility when table structure changes. Materialized views (PostgreSQL, Oracle) store results and must be refreshed.
Stored Procedure
Database ObjectsA precompiled collection of SQL statements stored in the database. Called with CALL procedure_name(args). Benefits: reduce network traffic, enforce business logic at the database level, reuse complex operations. Downsides: harder to version control, debug, and test than application code. Syntax varies significantly between databases.
Trigger
Database ObjectsA procedural code block that automatically executes in response to certain events on a table (INSERT, UPDATE, DELETE). Triggers can fire BEFORE or AFTER the event, and FOR EACH ROW or FOR EACH STATEMENT. Uses: audit logging, enforcing complex constraints, maintaining denormalized data. Overuse makes debugging difficult.
Transaction
TransactionsA sequence of SQL operations treated as a single unit of work. Transactions follow ACID properties. BEGIN starts a transaction, COMMIT makes changes permanent, ROLLBACK undoes all changes since BEGIN. Isolation levels control how transactions interact: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE.
Isolation Level
TransactionsControls the degree to which transactions are isolated from each other. READ UNCOMMITTED: allows dirty reads. READ COMMITTED: prevents dirty reads (PostgreSQL default). REPEATABLE READ: prevents non-repeatable reads (MySQL InnoDB default). SERIALIZABLE: full isolation, prevents phantom reads. Higher isolation = more correctness but lower concurrency.
Deadlock
TransactionsA situation where two or more transactions are waiting for each other to release locks, creating a circular dependency. Transaction A locks row 1 and waits for row 2; Transaction B locks row 2 and waits for row 1. The database detects deadlocks and rolls back one transaction (the "victim"). Prevent by: consistent lock ordering, short transactions, appropriate isolation levels.
MVCC (Multi-Version Concurrency Control)
ConcurrencyA concurrency control method used by PostgreSQL, MySQL InnoDB, and Oracle. Instead of locking rows for reads, MVCC maintains multiple versions of each row. Readers see a snapshot of the data as of their transaction start time, never blocking writers. Writers create new row versions. Old versions are cleaned up by VACUUM (PostgreSQL) or purge threads (MySQL).
Partitioning
PerformanceDividing a large table into smaller, more manageable pieces (partitions) based on a column value. Range partitioning (by date), list partitioning (by region), hash partitioning (by ID). Benefits: faster queries on partition key, easier maintenance (drop old partitions), parallel query execution. The application typically queries the parent table transparently.
Sharding
ScalingDistributing data across multiple database instances (shards) based on a shard key. Each shard contains a subset of the data. Used when a single database cannot handle the load. Adds complexity: cross-shard queries, distributed transactions, rebalancing. Solutions: Vitess (MySQL), Citus (PostgreSQL), CockroachDB, TiDB.
Replication
ScalingCopying data from one database server (primary/master) to one or more servers (replicas/slaves). Synchronous replication: primary waits for replica acknowledgment. Asynchronous: primary does not wait. Uses: read scaling, high availability, disaster recovery. PostgreSQL streaming replication, MySQL replication, and managed services handle this automatically.
Connection Pooling
PerformanceReusing database connections instead of creating new ones for each request. Opening a connection is expensive (TCP handshake, authentication, TLS). Connection pools maintain a set of open connections. Tools: PgBouncer (PostgreSQL), ProxySQL (MySQL), built-in pool in most ORMs. Typical pool size: 10-20 connections per application instance.
Prepared Statement
SecurityA precompiled SQL statement with parameter placeholders (? or $1). The database parses and plans the query once, then executes it multiple times with different parameters. Benefits: prevents SQL injection (parameters are never interpolated into the query text), improves performance for repeated queries. All modern database drivers support prepared statements.
SQL Injection
SecurityA code injection attack where malicious SQL is inserted into a query through user input. Example: input "1; DROP TABLE users" in a login field. Prevention: ALWAYS use prepared statements/parameterized queries, never concatenate user input into SQL strings, use ORM query builders, validate input types, apply least-privilege database permissions.
N+1 Query Problem
PerformanceA performance antipattern where an application executes 1 query to fetch a list of N items, then N additional queries to fetch related data for each item. Example: fetch 100 users, then 100 separate queries for each user orders. Solution: use JOINs, subqueries, or ORM eager loading (Django select_related, SQLAlchemy joinedload).
Execution Plan
PerformanceThe step-by-step strategy the database optimizer chooses to execute a query. View with EXPLAIN (PostgreSQL/MySQL) or SET SHOWPLAN_ALL ON (MSSQL). Key operations: Seq Scan (full table scan), Index Scan (uses index), Nested Loop Join, Hash Join, Merge Join, Sort, Aggregate. The optimizer chooses based on table statistics, indexes, and cost estimates.
Cardinality
PerformanceThe number of distinct values in a column. High cardinality (many unique values like user_id) benefits from B-Tree indexes. Low cardinality (few unique values like status, gender) may not benefit from traditional indexes. The query optimizer uses cardinality estimates to choose between index scan and sequential scan.
Denormalization
DesignIntentionally adding redundant data to tables to improve read performance, at the cost of write complexity and potential inconsistency. Examples: storing a computed total on an order, caching a user name in a comment table. Common in read-heavy applications, data warehouses, and NoSQL databases. Always benchmark before denormalizing.
ORM (Object-Relational Mapping)
ToolsA technique that maps database tables to programming language objects. ORMs generate SQL from method calls: User.objects.filter(age__gte=18) becomes SELECT * FROM users WHERE age >= 18. Popular ORMs: Django ORM, SQLAlchemy (Python), Prisma, Drizzle (TypeScript), ActiveRecord (Ruby), Hibernate (Java). Trade-off: convenience vs control over generated SQL.
VACUUM (PostgreSQL)
PostgreSQLA maintenance operation that reclaims storage occupied by dead tuples (rows marked for deletion by MVCC). VACUUM also updates table statistics for the query planner. VACUUM FULL rewrites the entire table to reclaim maximum space but locks the table. Autovacuum runs automatically. Monitor bloat with pg_stat_user_tables.
WAL (Write-Ahead Logging)
InternalsA technique where all changes are written to a log file before being applied to the actual data files. This ensures durability (changes survive crashes) and enables point-in-time recovery. PostgreSQL WAL, MySQL redo log, and SQLite WAL mode all use this concept. WAL also enables streaming replication.
DuckDB
DatabasesAn in-process analytical database (OLAP) designed for fast analytical queries. Like SQLite but for analytics instead of OLTP. Runs inside the application process with zero configuration. Columnar storage, vectorized query execution, supports Parquet/CSV/JSON directly. Rapidly growing in data science and analytics workloads since 2023.
PostgreSQL
DatabasesThe most advanced open-source relational database. Features: MVCC, full ACID compliance, JSON/JSONB, full-text search, GIS (PostGIS), custom types, extensions, materialized views, CTEs, window functions, partitioning, logical replication. Became the most popular database among developers by 2024 according to Stack Overflow surveys.
MySQL
DatabasesThe most widely deployed open-source relational database. Storage engines: InnoDB (default, ACID compliant), MyISAM (legacy). Features: replication, partitioning, JSON support, window functions (8.0+), CTEs (8.0+). Used by Facebook, Twitter, YouTube. Forked into MariaDB after Oracle acquisition. MySQL 9.x added JavaScript stored procedures.
SQLite
DatabasesA serverless, zero-configuration, self-contained SQL database engine. The most deployed database in the world (in every smartphone, browser, and many applications). Single-file database. Full SQL support, ACID compliant (WAL mode). Not designed for high-concurrency server workloads but excellent for embedded, desktop, mobile, and edge computing.
Cursor
Advanced SQLA database object used to retrieve and process rows one at a time from a result set. Cursors are typically used in stored procedures and scripts for row-by-row processing. Types: forward-only, scrollable, static, dynamic. Generally avoided in favor of set-based operations which are faster.
Materialized View
Database ObjectsA database object that stores the result of a query physically (unlike a regular view which re-executes). Must be refreshed to reflect changes. REFRESH MATERIALIZED VIEW CONCURRENTLY (PostgreSQL) allows reads during refresh. Useful for expensive aggregations, dashboard queries, and pre-computed reports.
Lateral Join
Advanced SQLA join where the right-hand side can reference columns from the left-hand side. Enables correlated subqueries in the FROM clause. Useful for top-N per group queries, applying a function to each row, and dependent subqueries. Syntax: FROM table1, LATERAL (SELECT ... WHERE ref = table1.col).
Upsert (MERGE)
Advanced SQLAn operation that inserts a row if it does not exist, or updates it if it does. PostgreSQL: INSERT ... ON CONFLICT DO UPDATE. MySQL: INSERT ... ON DUPLICATE KEY UPDATE. Standard SQL: MERGE INTO. Eliminates the need for separate SELECT-then-INSERT-or-UPDATE logic and avoids race conditions.
Collation
ConfigurationA set of rules that determine how strings are compared and sorted. Collations control: case sensitivity (ci = case-insensitive), accent sensitivity, sort order. Example: utf8mb4_unicode_ci (MySQL), en_US.UTF-8 (PostgreSQL). Choosing the wrong collation causes unexpected sorting and comparison results. Set at database, table, or column level.
Schema
Database ObjectsA namespace within a database that contains tables, views, functions, and other objects. PostgreSQL uses schemas extensively (public is the default). MySQL uses "database" and "schema" interchangeably. Schemas organize objects and control access. Example: CREATE SCHEMA analytics; CREATE TABLE analytics.events (...).
Query Optimization
PerformanceThe process of improving query performance. Steps: (1) Use EXPLAIN ANALYZE to understand the current plan. (2) Add appropriate indexes. (3) Rewrite queries (replace subqueries with JOINs, use CTEs for readability). (4) Ensure statistics are up to date (ANALYZE). (5) Consider partitioning for large tables. (6) Use connection pooling and caching.
OLTP vs OLAP
ArchitectureOLTP (Online Transaction Processing): optimized for many small transactions (inserts, updates, point queries). Row-oriented storage. Examples: PostgreSQL, MySQL. OLAP (Online Analytical Processing): optimized for complex analytical queries over large datasets. Columnar storage. Examples: ClickHouse, DuckDB, BigQuery, Snowflake. Many modern systems support both.
Columnar Storage
ArchitectureA database storage format that stores data by column rather than by row. Advantages: better compression (similar values together), faster analytical queries (read only needed columns), SIMD vectorized processing. Used by DuckDB, ClickHouse, Parquet files, BigQuery. Row storage (PostgreSQL, MySQL) is better for OLTP workloads with mixed read/write patterns.
CTEs (Recursive)
Advanced SQLRecursive Common Table Expressions traverse hierarchical or graph data. WITH RECURSIVE tree AS (base_case UNION ALL recursive_step) SELECT * FROM tree. Base case: the initial anchor rows. Recursive step: references the CTE itself to add more rows. Uses: org charts, category trees, bill of materials, shortest path. Add a depth counter to prevent infinite loops.
Aggregate Function
SQL BasicsA function that performs a calculation on a set of rows and returns a single result. Standard: COUNT, SUM, AVG, MIN, MAX. Advanced: STRING_AGG, ARRAY_AGG, JSON_AGG, PERCENTILE_CONT, PERCENTILE_DISC. Used with GROUP BY to aggregate per group. Without GROUP BY, aggregates the entire result set. NULL values are ignored by all aggregate functions except COUNT(*).
GROUP BY
SQL BasicsClause that groups rows sharing common values into summary rows. Used with aggregate functions. All non-aggregated columns in SELECT must appear in GROUP BY (standard SQL; MySQL traditionally allowed non-deterministic results). HAVING filters groups (vs WHERE which filters rows before grouping). GROUP BY ROLLUP/CUBE/GROUPING SETS generate subtotals.
HAVING
SQL BasicsA clause that filters groups after GROUP BY aggregation. WHERE filters rows before grouping; HAVING filters groups after. Example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000. HAVING can use aggregate functions; WHERE cannot.
UNION vs UNION ALL
SQL BasicsUNION combines results from two SELECT statements and removes duplicates. UNION ALL combines results without removing duplicates (faster). Always prefer UNION ALL unless you specifically need deduplication, as UNION requires a sort or hash operation to eliminate duplicates. Column count and compatible types must match between SELECT statements.
FAQ (20 Questions)
Try It Yourself
Use these embedded tools to format and build SQL queries discussed in this guide.
Try it yourself
Sql Formatter
Try it yourself
Sql Query Builder
Raw Data Downloads
Citations and Sources
Try These Tools for Free
Put this knowledge into practice with our browser-based tools. No signup needed.
SQL Formatter
Format and beautify SQL queries with proper indentation and syntax.
SQL to Prisma
Paste CREATE TABLE SQL statements and generate Prisma schema models with types and constraints.
CSV to JSON
Convert CSV data to JSON and JSON to CSV format online.
JSON to CSV
Convert JSON arrays to CSV format for spreadsheets and data analysis.
Spreadsheet
Open CSV files online with sortable columns, filterable rows, search, and SUM/AVG/MIN/MAX calculations.
Related Research Reports
Database Comparison Guide 2026: MySQL vs PostgreSQL vs MongoDB vs Redis vs SQLite vs Supabase
Comprehensive comparison of 6 databases with performance benchmarks, feature matrices, pricing, scalability analysis, ORM compatibility, developer satisfaction data, and use case recommendations for every scenario. 28,000+ words.
The Complete PostgreSQL Guide 2026: MVCC, Indexes, JSONB, Full-Text Search & Replication
The definitive PostgreSQL reference for 2026. Covers MVCC, indexes, JSONB, full-text search, replication, and extensions. 40+ glossary, 15 FAQ. 30,000+ words.
The Complete Data Structures & Algorithms Guide 2026: Arrays, Trees, Graphs, Hash Tables & Big-O
The definitive data structures reference for 2026. Covers arrays, linked lists, stacks, queues, trees, graphs, hash tables, heaps, tries, sorting algorithms, and Big-O complexity analysis. 28,000+ words.
