How to Compare Two Texts and Find Differences Online

Understanding diff algorithms and practical uses for text comparison tools

🔍 📄 ↔️ 📄

TL;DR: A "diff" finds the differences between two pieces of text by computing the Longest Common Subsequence -- basically figuring out what stayed the same and highlighting everything else. It works at the line, word, or character level. Super useful for code reviews, contract revisions, and catching that one sneaky comma someone moved.

You have two versions of a document. Something changed. Maybe it is two drafts of a contract, two revisions of your thesis, or two versions of a config file that "nobody touched." Reading both versions line by line, squinting at each sentence, trying to spot the difference? That is a special kind of torture.

This is exactly what diff tools are for. They automate the comparison and highlight exactly what was added, removed, or changed. Let's look at how they work and why they are more useful than you might think.

What Even Is a "Diff"?

In computing, a "diff" is the set of differences between two pieces of text. The concept goes back to 1974, when Douglas McIlroy created the Unix diff command at Bell Labs. It compares two files and outputs the minimum set of changes needed to turn one into the other.

A diff shows three types of changes:

🟢🔴 Green = added. Red = removed. The universal language of "what did you change?"

How Diff Algorithms Actually Work

Under the hood, most diff tools use something called the Longest Common Subsequence (LCS) algorithm. The idea is: find the longest sequence of elements that appear in both texts in the same order. Everything not in that shared sequence? That is a change.

Here is a simple example:

Old: "The cat sat on the mat"
New: "The cat sat on a rug"

The LCS is "The cat sat on" (the shared beginning). The diff tells you: "the mat" was removed, "a rug" was added. Simple enough.

The classic LCS algorithm uses dynamic programming with O(mn) time complexity, where m and n are the text lengths. For two 1,000-line files, that is a million operations -- totally fine. For two 100,000-line files, that is 10 billion operations -- not fine at all.

Enter Myers' Algorithm (The One Git Uses)

In 1986, Eugene Myers published a smarter algorithm that is now the standard. Git uses it. GNU diff uses it. Most online diff tools use it.

The clever part: Myers' algorithm runs in O(ND) time, where D is the number of differences. If two texts are mostly similar (which is the common case -- you usually change a few lines, not rewrite everything), D is small and the algorithm is blazing fast.

Think of it like this: Myers' algorithm finds the shortest path through an "edit graph" where moving diagonally means "these characters match" and moving sideways or down means "something changed." It is like solving a maze where the shortest path is the cleanest diff.

Line-Level vs. Word-Level vs. Character-Level

Diff tools can zoom in to different levels of detail:

Line-level diff is what Git and most code tools use. If any character on a line changes, the whole line lights up. Great for code.

// Line-level diff:
- const greeting = "Hello, World!";
+ const greeting = "Hello, Universe!";

Word-level diff is better for prose. Instead of flagging the whole line, it highlights just the changed words. This is what Google Docs uses in "Suggesting" mode.

Character-level diff is the most granular. It catches a single changed letter or number. Useful for spotting subtle edits, but it can get noisy with larger changes.

Lawyer joke: A diff tool walks into a law firm. The senior partner says, "We changed 'may' to 'shall' on page 47." The diff tool says, "I know. I also found the 12 other changes you did not mention."

Where Text Diff Is Surprisingly Useful

Comparing Contract Revisions

Legal documents go through many drafts. A diff tool lets you see exactly what the other party changed. In legal negotiations, a single word change ("may" to "shall") can have massive implications. Don't trust someone saying "just minor formatting edits."

Reviewing Code Changes

Every pull request on GitHub shows a diff. Here is what the unified diff format looks like:

@@ -15,7 +15,7 @@ function processData(input) {
   const validated = validate(input);
   if (!validated) {
-    throw new Error("Invalid input");
+    throw new ValidationError("Invalid input: " + input.type);
   }
   return transform(validated);
 }

Lines starting with - are removed. Lines starting with + are added. Lines with no prefix are unchanged context to help you orient yourself.

📋 Every developer's morning routine: coffee, then reviewing diffs

Catching Sneaky Edits

Teachers can diff a suspicious essay against a known source. Not a full plagiarism detector, but it quickly reveals copy-paste with minor rewording.

Verifying Script Output

Ran a script that transforms a config file? Diff the before and after to make sure it did exactly what you expected and nothing more.

The "Final_FINAL_v3" Problem

Not everyone uses version control for documents. If you have a file named "proposal_v3_final_FINAL.docx" (we have all been there), a diff tool lets you compare the plain text of different versions without guessing what changed.

Tips for Better Comparisons

Normalize whitespace first. Trailing spaces, tabs vs. spaces, and line endings (LF vs. CRLF) create noise. Good diff tools have an option to ignore whitespace differences.

Use side-by-side view for big changes. When texts are very different, two columns are easier to read than inline. For small changes in mostly similar text, inline works better.

Compare the right format. Diffing a Markdown file against an HTML rendering of the same content will show differences everywhere, even though the content is identical.

Before diffing, make sure both texts are in the same format and encoding. A UTF-8 file compared with a UTF-16 file will look completely different even if the words are identical.

A Simple Diff in JavaScript

Need a basic diff in your own project? Here is a minimal line-by-line comparison to get you started:

function simpleDiff(oldText, newText) {
  const oldLines = oldText.split('\n');
  const newLines = newText.split('\n');
  const result = [];

  let i = 0, j = 0;
  while (i < oldLines.length || j < newLines.length) {
    if (i >= oldLines.length) {
      result.push({ type: 'added', text: newLines[j++] });
    } else if (j >= newLines.length) {
      result.push({ type: 'removed', text: oldLines[i++] });
    } else if (oldLines[i] === newLines[j]) {
      result.push({ type: 'unchanged', text: oldLines[i] });
      i++; j++;
    } else {
      result.push({ type: 'removed', text: oldLines[i++] });
      result.push({ type: 'added', text: newLines[j++] });
    }
  }
  return result;
}

This is a naive approach that works for simple cases. For production use, grab the diff package from npm -- it implements Myers' algorithm and handles all the edge cases properly.

Try It Yourself

Paste two versions of any text into our Text Diff tool to instantly see additions, deletions, and modifications highlighted. Supports side-by-side and inline views. No sign-up required.

Open Text Diff →