Text Diff
What it does
The Text Diff tool compares two blocks of text and shows what changed inline — additions in green, removals struck-through in red, unchanged segments plain. Three granularity modes (lines, words, characters) control how the comparison is broken up, and a statistics bar tallies each. The engine is a real Longest Common Subsequence algorithm rather than a naive line-by-line scan, so the output stays meaningful even when content has been reordered, partially rewritten, or has long unchanged stretches between scattered edits.
Common situations
You changed something in a production config an hour ago, you have the “before” pasted from your terminal scrollback, and you want to know exactly what differs from the current state. The change is not in version control — it was an emergency edit, or it predates a clean commit history. Pasting both halves into a diff tool is faster than reconstructing the situation in git, and the visual output is what you would have wanted from a code review anyway.
You’re comparing two API responses to spot drift. The staging API returns one shape, the production API returns another, and the documentation insists they match. A side-by-side diff highlights the divergence — often one extra field, one renamed property, or a value type that quietly switched from string to number after a recent deploy.
You’re reviewing a colleague’s copy edits. They rewrote a paragraph; you want to know what they actually changed versus what stayed. Word-level diff is the right mode for prose — line-level diff treats the paragraph as a single unit and shows wholesale replacement, which is rarely what reviewers want to see. Word mode reveals individual substitutions, which matches how editors think about changes.
You’re auditing a legal document or contract for redlines. Tracked changes in Word are the conventional answer, but two plain-text exports compared side-by-side give a faster, lighter view — especially when the originals were not in Word to begin with (Markdown, plaintext, exported PDF text).
You’re investigating why a deployment behaves differently across environments. The .env.production file and .env.staging file should be near-identical with a handful of values flipped; a line-level diff surfaces unexpected drift — a flag set in one but missing in the other, a service URL that quietly changed.
What you need to know
Diff algorithms are built around finding the Longest Common Subsequence (LCS) between two inputs — the longest run of items that appear in both sides in the same order, though not necessarily contiguous. Once you have the LCS, additions and removals fall out naturally: anything in the left side that is not part of the LCS is a removal, anything in the right side that is not part of the LCS is an addition. This is the same algorithm git diff uses, which is why the output of a properly-implemented text diff matches what version control would show for equivalent inputs.
The mode controls what counts as an “item” for the LCS calculation. Line mode treats each line as atomic — added or removed wholesale. Word mode tokenises on whitespace and word boundaries; word substitutions show as a removed word followed by an added word in place. Character mode operates on individual Unicode code points; useful for short strings, overwhelming for long ones. Each mode produces a visibly different diff for the same inputs, which is why mode choice matters.
The implementation uses jsdiff, the long-standing JavaScript port of the algorithm. It is fast enough for inputs up to several megabytes and handles UTF-8 correctly, including emoji and combining characters.
Whitespace handling is the source of most surprises. Trailing spaces, mixed tabs and spaces, and line endings (CRLF vs. LF) all count as differences. A diff that “looks like everything changed” is almost always whitespace mismatch — check the line endings of both sources before assuming the content actually differs. Some workflows benefit from a whitespace-normalisation step before the diff.
The diff runs entirely client-side. The text never leaves the browser, which matters when comparing logs that contain API keys, customer data, internal URLs, or anything else not appropriate to upload. Online diff sites that send content to a backend are convenient but quietly leak privacy; this one does not.
For code review at scale, version control diff tools (GitHub, GitLab, native git diff) are the right shape. They handle whole-repo context, blame, and PR threading. This tool is for the everyday “compare these two snippets right now” case that doesn’t need a repo around it.
Frequently asked questions
What is a text diff?
A side-by-side or inline comparison of two text inputs that highlights what was added, removed, or unchanged. Diff is shorthand for “difference” and originated as a Unix utility in the early 1970s.
What’s the difference between line, word, and character diff?
Line diff treats each line as the smallest unit of change — adding a comma to a sentence shows the whole sentence as removed and the new sentence as added. Word diff splits on whitespace, so the comma addition shows as just one word changed. Character diff splits on individual letters, showing exactly the comma. Use line for code, word for prose, character for short strings.
How is this different from git diff?
Same underlying algorithm (LCS), different inputs. git diff works on tracked files in a repository; this tool works on any two text snippets you paste. For ad-hoc comparison without a repo, this is faster.
Can I compare JSON or XML structurally?
Not directly — this is a text diff, not a structural diff. Two JSON documents with the same content but different key order will appear different. For structural comparison, normalise both sides first (sort keys, format consistently) using a JSON or XML formatter, then diff the normalised output.
Why does the diff show changes when the text looks identical?
Whitespace and line endings. Tabs vs. spaces, trailing spaces, CRLF vs. LF — all invisible to the eye, all counted by the diff engine. Use a whitespace-stripping pre-process if you only care about content changes.
Does the diff handle very large inputs?
Up to a few megabytes per side. Beyond that, browser performance degrades because LCS is O(n×m) in the worst case. For genuinely large diffs, use a CLI tool (diff, git diff) which is implemented in C and orders of magnitude faster.
Can I export the diff result?
The visual output is in the browser; copy it via the browser’s selection. For programmatic use (CI, automation), use a CLI tool whose output is machine-parseable — diff -u produces a unified diff format that other tools understand.
Is the comparison case-sensitive?
Yes — by default, diffs are case-sensitive because most use cases (code, structured data) require it. For case-insensitive comparison of prose, lowercase both sides before pasting.
Common problems
Problem: Diff shows the entire file as changed when only one line was edited.
Almost certainly line-ending mismatch. The “before” was saved with CRLF (Windows line endings), the “after” with LF (Unix). Every line technically differs because of the invisible carriage return. Convert both to the same line ending before diffing — most editors have a “Save with LF” or “Convert line endings” option.
Problem: Word diff highlights single-character changes as whole-word substitutions.
That’s how word mode works — the smallest unit is a word, and any change inside a word marks the whole word changed. For sub-word visibility (typo fixes, single character corrections), switch to character mode.
Problem: Diff is too noisy to read.
Switch granularity. Character mode on long text is overwhelming; word mode on code is misleading; line mode on prose collapses too much. Match the mode to the content type. If still noisy, the inputs may genuinely have many small differences — a wholesale rewrite, not an edit.
Problem: Diff misaligns when one side has long unchanged sections.
LCS handles long unchanged sections correctly — the algorithm finds them and aligns around them. If the diff misaligns visibly, the unchanged section probably has subtle whitespace differences (mixed indent, trailing space) that prevent the algorithm from matching them. Normalise whitespace and try again.
Problem: Comparing two JSON responses produces noise from key-order differences.
JSON object key order is technically meaningful to text diffs but semantically irrelevant in most APIs. Sort keys on both sides first (use a JSON formatter with sort-keys enabled), then diff the normalised output. Differences that survive this process are the real ones.
Quick guides
For code review without a repo: Paste old and new versions, set mode to Lines. The diff matches what git diff would show. Good for snippets shared on Slack, copied from Stack Overflow, or pulled out of a longer file.
For prose review: Paste old and new versions, set mode to Words. Word-level changes are how editors think about prose; line-level groups too coarsely.
For comparing API responses: Format both with a JSON formatter (sort-keys on), paste the formatted output into the diff. Key-order noise disappears; only real differences remain.
Tips
- For code, use line mode. Programmers think in lines, code review tools work in lines, version control diffs in lines. Stay aligned with that mental model.
- For prose, use word mode. Sentences flow across lines unpredictably; word-level changes are the unit of editorial revision.
- Character mode is rarely the right answer except for short strings — passwords, tokens, hex values, single sentences with typo fixes. For longer text, character mode produces visual noise.
- If a diff looks like everything changed, check line endings first. CRLF vs. LF is the single most common cause of “diff thinks everything is different”.
- The “swap sides” action is the fix when you accidentally paste “after” into “before”. Faster than re-pasting both.
- For JSON or XML comparison, sort and format both sides before diffing. Structural sameness with cosmetic difference is the most common false-positive in API drift investigations.
- The diff stays in the browser — paste internal logs, customer data, or anything else you would not upload to a third-party site without worry.
Related tools in this suite
The natural pairing is the JSON Formatter — sort-keys both sides, then diff, and JSON comparisons stop being noisy. The Regex Tester is useful when “what changed?” is a question of pattern matching rather than literal text. The SQL Formatter helps when comparing two versions of a query and wanting them to diff cleanly at clause level rather than as one line of difference.
Take it further
Diffing is part of every developer’s day — code review, log analysis, configuration drift detection, API contract verification. A good diff tool covers the everyday tactical case; a culture of small, reviewable changes makes diffing cheap because each diff is small. The services we deliver include the wider engineering hygiene work — turning ad-hoc change processes into documented review flows, automated checks, and team practices that catch issues at the diff stage rather than in production.