How Fresh Loads Huge Files Fast

Fresh uses a piece tree (also called a piece table) for text storage. The piece table originated in the early 1970s - first in J. Strother Moore and Bob Boyer's "77-Editor" at Edinburgh (1971-1973), where they applied structure-sharing techniques from their theorem proving research to text editing. Moore later brought the idea to Xerox PARC, where Charles Simonyi adopted it for the Bravo editor (1974). Simonyi later brought it to Microsoft Word (1983). The design minimizes memory copying by never modifying original file content - critical for early systems with limited RAM. VS Code adopted piece tables in 2018, replacing their previous line-array implementation.

Fresh's implementation adds immutable nodes (for concurrency and cheap snapshots) and lazy loading (for large files). (GitHub)

The Core Idea

Instead of storing text in the tree, store pointers to text that lives elsewhere.

Rope (Helix, Zed)

Text lives inside tree nodes:

Internal NodeLeaf: 'Hello 'Leaf: 'World'

Piece Tree (Fresh)

Tree nodes point to separate buffers:

BuffersTreeInternal NodePiece: buffer 0, offset 0, len6Piece: buffer 0, offset 6, len5Buffer 0: 'Hello World'

A piece is just three numbers: which buffer, starting offset, and length.

What Happens When You Edit

When you type or paste, Fresh appends to a new buffer and creates a piece pointing to it. The original buffer stays untouched.

Before: File contains "Hello World"

BuffersTreePiece: buf 0, 0-11Buffer 0 (original): 'HelloWorld'

After: Insert "Big " at position 6

BuffersTreePiece: buf 0, 0-6Piece: buf 1, 0-4Piece: buf 0, 6-11Buffer 0 (original): 'HelloWorld'Buffer 1 (added): 'Big '

Document now reads: "Hello Big World"

The original file content in Buffer 0 was never modified or copied.

Lazy Loading

For large files, Fresh doesn't load the entire file into memory. Buffers can be unloaded - they store a file path and byte range instead of actual data.

This is actually how the original Word 1.1a piece table worked: pieces pointed directly to spans in the original file on disk, loading content only when needed. VS Code's 2018 implementation loads files fully into memory; Fresh returns to the original memory-conscious design.

Before: 3MB file just opened

BuffersTreePiece: buf 0, 0-1MBPiece: buf 1, 0-1MBPiece: buf 2, 0-1MBBuffer 0: UNLOADEDfile: large.txt, bytes 0-1MBBuffer 1: UNLOADEDfile: large.txt, bytes 1-2MBBuffer 2: UNLOADEDfile: large.txt, bytes 2-3MB

After: User scrolls to middle

Fresh loads only the visible chunk from disk:

BuffersTreePiece: buf 0, 0-1MBPiece: buf 1, 0-1MBPiece: buf 2, 0-1MBBuffer 0: UNLOADEDfile: large.txt, bytes 0-1MBBuffer 1: LOADEDactual bytes in memoryBuffer 2: UNLOADEDfile: large.txt, bytes 2-3MB

Zero-Copy Reads

When reading text for display, Fresh doesn't copy bytes. It returns slices pointing directly into the buffers:

// Returns &[u8] pointing into buffer memory
// No allocation, no copy
let text = buffer.get_text_range(100, 200);

Multiple pieces can reference the same buffer region. Displaying the same text twice doesn't double memory usage.

The Tree Structure

The pieces are organized in a balanced binary tree for fast lookups:

Internalleft_bytes: 100Internalleft_bytes: 50Piece: 50 bytesPiece: 50 bytesPiece: 50 bytes

Each internal node caches the total bytes in its left subtree. To find byte offset 75:

  1. Root has left_bytes: 100, and 75 < 100, so go left
  2. Internal has left_bytes: 50, and 75 >= 50, so go right with offset 75-50=25
  3. Piece has 50 bytes, offset 25 is within it

This is O(log P) where P is the number of pieces.

Line Feed Tracking

Text editors need fast line-based operations: "go to line 1000", "what line is byte offset 50000 on?", or simply displaying line numbers. Scanning the entire document for newlines would be O(N) - too slow for large files.

Fresh tracks line feeds the same way it tracks bytes. Each piece stores its newline count, and internal nodes cache the count for their left subtree:

Internalleft_bytes: 100lf_left: 3Internalleft_bytes: 50lf_left: 2Piece: 50 bytesline_feeds: 1Piece: 30 bytesline_feeds: 2Piece: 20 bytesline_feeds: 0

To find which piece contains line 3:

  1. Root has lf_left: 3, and 3 >= 3, so go right with line 3-3=0
  2. Piece L3 has line_feeds: 1, line 0 is within it

This enables O(log P) line-to-offset conversion. The same structure supports offset-to-line lookup by accumulating line counts while traversing.

Large File Mode

For files over 100MB, Fresh switches to large file mode: no line indexing, pure byte-based navigation.

Why skip line indexing? Counting newlines in a 500MB file means reading the entire file - exactly what lazy loading tries to avoid. Instead, Fresh:

This means "go to line 50000" in a huge file is an approximation, but opening and scrolling remains instant regardless of file size.

Immutable Tree Structure

Fresh uses immutable nodes - once created, a node is never modified. Edits create new nodes instead.

Implementation note: Fresh is written in Rust. Nodes are wrapped in Arc<T> (Atomic Reference Counted pointer) - think of it as a pointer that tracks how many things reference it. Multiple parts of the code (even different threads) can safely hold the same Arc, and the data is automatically freed when the last reference is dropped. We'll gloss over this detail going forward - just know that "sharing" a node is cheap and thread-safe.

After an edit, old tree versions remain valid:

Version 2 (after edit)Version 1 (before edit)Root v1LeftRightRoot v2Right v2

Version 2 shares the unchanged left subtree with Version 1. This enables:

Structural Diffing

Fresh uses piece tree structure to power gutter change indicators (the marks showing modified lines). The algorithm finds what changed since last save:

  1. Pointer check: If saved and current tree roots are the same object, content is identical (instant).
  2. Structure diff: Compare piece references to find differing byte ranges. O(pieces), not O(file size).
  3. Content verification (optional): For small regions, verify bytes actually differ (handles undo edge cases).

The diff returns line ranges that Fresh checks against the current viewport - only visible changed lines need indicators. This is especially useful for large files: detecting modifications in a 500MB file requires comparing piece metadata, not reading from disk.

Crash Recovery

Fresh auto-saves modified buffers every 2 seconds for crash recovery. The piece tree structure makes this efficient for large files.

The key insight: Recovery doesn't need to save the entire file - just the modifications. For a 500MB file with a few edits, Fresh saves only the changed chunks (a few KB), not 500MB.

The recovery format mirrors piece tree structure:

This approach dates back to Word 1.1's "Fast Save" feature (1990). From the source code comments: "Quick save adds modification information to the end of the file rather than going through the laborious process of reconstructing and rewriting the file from scratch." Word appended "File Extensions" containing new text and formatting, with the piece table (Clx) tracking where everything lives. Fresh uses the same principle - chunk-based modifications on top of an original - but writes to a separate recovery file instead of the document itself.

On crash detection (lock file exists but process died), Fresh offers to recover by reconstructing: original file + stored chunks → recovered content.

Fresh vs VS Code

VS Code also uses a piece tree, but with a mutable tree - nodes are modified in place during rebalancing. Fresh uses a fully persistent data structure: edits create new tree versions via path-copying, sharing unchanged subtrees between versions.

This gives Fresh cheap undo (keep old roots), lock-free concurrent reads, and instant snapshots - at the cost of more allocations per edit.

Fresh also implements lazy loading; VS Code loads entire files into memory.

Comparison: Piece Tree vs Rope

Different editors use different text storage structures:

Aspect Piece Tree (Fresh, VS Code, MS Word 1.1) Rope (Helix, Zed, Neovim?)
Node contents Pointer to buffer Actual text (~1KB chunks)
Insert Append to buffer, add piece May copy/split chunk
Original file Preserved in buffer 0 Reorganized into chunks
Large files Lazy load regions (Fresh, Word 1.1a) Must load or stream
Memory for edits Minimal (just pointers) Copies chunk data

Piece Tree Trade-offs

Pros: Original file preserved, inserts never copy existing text, natural lazy loading, pieces are small (~24 bytes) so more fit in cache, enables structural diffing (see above).

Cons: Extra indirection (find piece, then fetch buffer), buffer fragmentation over many edits, poor cache locality for text content. Piece count grows with edits, though periodic coalescing can mitigate this.

(Note: Fresh's current implementation rebuilds the entire tree on each edit rather than path-copying, making edits O(P) instead of O(log P). This is an implementation gap, not inherent to the design.)

Further Reading