Skip to main content

SQL Formatter

What it does

The SQL Formatter turns one-line SQL blobs into properly indented, multi-line, readable code. It uses dialect-aware AST parsing, which means it understands the keywords, quoting rules, and syntactic quirks of the major SQL dialects — PostgreSQL, MySQL, MariaDB, SQLite, BigQuery, Snowflake, Trino, Spark, Redshift, plus a generic SQL parser for ANSI-standard cases. Indent width and keyword case (UPPER, lower, preserve) are configurable.

Common situations

You’re reading machine-generated SQL from an ORM. Active Record, Sequel, SQLAlchemy, Doctrine, Eloquent, Prisma — they all produce SQL that runs perfectly but reads as a single line of nested clauses. Pasting it here unwinds the structure into something you can actually evaluate.

You’re code-reviewing a query someone hand-wrote. The query works; the formatting is inconsistent. Running it through the formatter produces the version reviewers actually want to see — uppercase keywords, aligned clauses, predictable indentation.

You’re committing a migration to version control and want the SQL to diff cleanly when modified later. Formatted SQL diffs at the level of actual changes; unformatted SQL diffs at the level of “everything moved”.

You’re producing SQL examples for documentation or a knowledge base. Raw queries copied from a database client are unreadable in a static page. Formatted output produces examples readers can actually parse without running them.

You’re debugging a complex CTE or window-function query that returns wrong results. Reading the formatted version often reveals what’s being joined incorrectly or what’s missing from a GROUP BY — structural visibility you cannot get from a single line.

What you need to know

Real SQL formatting requires real parsing. Naive approaches that just insert newlines at keywords break on edge cases — keywords inside string literals, comments, complex CTEs, window functions. Dialect-aware AST parsing builds a syntax tree from the input, then re-emits it with proper formatting. The dialect choice matters because, for example, LIMIT in PostgreSQL and TOP in SQL Server occupy the same role with different syntax — a generic parser would stumble on dialect-specific keywords.

The formatter under the hood is the long-established sql-formatter package, which supports the dialects listed above and stays current with new SQL features. Quality is high — modern features like CTEs, window functions, lateral joins, JSON operators, and recursive queries all format cleanly.

Dialect choice matters when the SQL contains dialect-specific syntax. PostgreSQL’s RETURNING clause; MySQL’s LIMIT offset, count form; BigQuery’s STRUCT and ARRAY types; Snowflake’s LATERAL FLATTEN; Trino’s WITH RECURSIVE quirks. A generic parser handles ANSI SQL but fails on these. Pick the matching dialect for accurate formatting.

Keyword case is a style choice. SELECT (UPPER) is the convention in most style guides — it makes structural keywords scan visually distinct from table and column names. select (lower) is preferred by some teams; preserve keeps the input’s case unchanged. Pick one and stick with it across the codebase.

Indent width is similarly stylistic. 2 is compact; 4 is the most common professional standard. Match whatever your codebase already uses. Consistency matters more than the specific value.

Comments are preserved in formatted output. Inline -- comments stay on their lines; block /* */ comments stay where they are. The formatter does not strip explanatory text.

Output normalises subtle variations. Aliases without AS, column lists with different spacing, JOIN clauses with various indent styles — all collapse to the formatter’s canonical output. This is what makes formatted SQL diff-friendly: the same query always produces the same output regardless of how the input was written.

For very large queries (thousands of lines, deeply nested CTEs), the formatter may visibly delay. The parser is fast but not instant; for genuinely large SQL files, a CLI tool runs faster.

Frequently asked questions

How do I format SQL?

Paste the SQL into the tool, pick the dialect, copy the output. The formatter handles indentation, keyword case, and clause alignment automatically.

What’s the difference between SQL dialects?

Each database vendor has extended ANSI SQL with their own keywords, syntax, and behaviours. PostgreSQL has RETURNING; MySQL has IGNORE; BigQuery has STRUCT. A formatter that doesn’t know the dialect breaks on these.

Should SQL keywords be UPPER or lower case?

Convention is UPPER for keywords (SELECT, FROM, WHERE), lower for identifiers (table_name, column_name). Most professional style guides recommend this; it makes structure visually scannable.

Can I format SQL inside a stored procedure?

Yes — the formatter handles procedure bodies, including DECLARE, BEGIN/END blocks, and control flow. Pick the dialect that matches your database (PostgreSQL for plpgsql, MySQL for stored procs, etc.).

Does the formatter validate SQL?

It parses the SQL, which catches syntax errors. It does not check semantic validity (whether tables exist, whether columns are spelled right). Use the database’s own EXPLAIN or dry-run capability for that.

Will the formatter change my query’s behaviour?

No — formatting only changes whitespace and case. The query’s logic is identical before and after.

How do I format SQL in VS Code?

Several extensions handle this — “SQL Formatter” by adpyke is the popular choice. Install the extension, open a .sql file, run the formatter from the command palette.

Why does the formatter add AS to my aliases?

Some dialects allow column aliases without AS (SELECT col alias). The formatter normalises to explicit AS for clarity. This is consistent with most style guides; some teams prefer no-AS style. Currently no toggle for this — add manually after if needed.

Common problems

Problem: Formatter rejects my query with “syntax error”.

The dialect is wrong. PostgreSQL syntax fed to a MySQL formatter (or vice versa) produces parse errors on dialect-specific keywords. Check the dialect setting matches your actual database.

Problem: Output doesn’t match my team’s style guide.

Adjust the indent width and keyword case settings. If your team uses 4-space indent and lower-case keywords, set those. The output should now match. If a specific stylistic detail (alias AS, JOIN style) is wrong, that may need a different formatter.

Problem: Comments are being stripped.

They shouldn’t be — the formatter preserves comments. If you see them stripped, your input may have had comment-syntax inside string literals (-- inside a string is just text, but some formatters mishandle this). Try a different dialect or report the case.

Problem: Very long query takes 10+ seconds to format.

Multi-thousand-line queries with deeply nested CTEs can be slow to parse. For these, a CLI tool (pg_format for PostgreSQL, sqlformat for general SQL) handles them faster.

Problem: Output keeps formatting differently between attempts.

The formatter is deterministic — same input + same settings produces same output. Different output means the input or settings changed slightly between attempts. Check for trailing whitespace, dialect mismatch, or keyword-case toggle.

Quick guides

For VS Code: Install “SQL Formatter” extension. Cmd/Ctrl+Shift+P → Format Document. Configure default dialect in settings.

For build pipelines: Run a SQL formatter as a pre-commit hook. sql-formatter (npm) and pg_format (PostgreSQL-specific) both support CLI usage. Fail the build if formatted output doesn’t match committed input.

For database GUI clients: DataGrip, DBeaver, and TablePlus all have built-in formatters. Configure dialect and style; format-on-save makes it automatic.

Tips

  • Pick the right dialect. Generic ANSI SQL works for simple queries; anything involving dialect-specific functions (STRING_AGG, LISTAGG, JSON_EXTRACT) needs the matching dialect to parse correctly.
  • Uppercase keywords are the convention in most style guides. They make structure scannable — your eye finds SELECT, FROM, WHERE faster than the table and column names around them.
  • Indent width 2 is compact; 4 is the common professional standard. Match whatever your codebase already uses.
  • Very large queries (thousands of lines, deeply nested CTEs) may format in subtle ways that surprise — particularly around long argument lists. Compare formatted vs original carefully on any query that runs in production before committing the formatted version.
  • Comments are preserved. Trailing comments stay attached to their statements; block comments stay where they are.
  • For dialect-specific features, the matching dialect formatter is the only one that handles them correctly. ANSI SQL formatter on PostgreSQL WITH RECURSIVE produces wrong indentation.
  • Format on save (in your editor) keeps the codebase consistent without manual effort.

Related tools in this suite

The natural pairing is the JSON Formatter — both turn line-noise into readable structure, and both come up in API and database work. The Mock JSON Generator is useful when you need realistic test data to seed a database for SQL development.

Take it further

SQL formatting is a discipline that scales: enforce it in pre-commit hooks, run it in CI, configure your IDE to format on save. The services we deliver include the wider engineering hygiene work — turning ad-hoc SQL practices into documented standards, automated checks, and code review processes that catch SQL quality issues at the same level as application code.