Blog

Data Recovery Case Studies and Technical Write-Ups

Real recovery workflows, tools, and lessons from failed hard drives, damaged optical media, and other data loss cases.

Weak-Head Seagate Drive Recovery

Controlled HDDSuperClone imaging, DMDE filesystem recovery, and EXIF-based sorting of carved photos on an unstable 1TB drive.

Unreadable CD Batch Recovery

Recovering data from 23 damaged CDs using multiple optical drives, ddrescue mapfiles, and DMDE.

MacWrite II Floppy Recovery + Custom Carver

Imaging 1990s Mac floppies with ddrescue, then writing a custom file carver to extract MacWrite II manuscripts that no modern recovery tool could read.

Parsing Classic Mac HFS Trees by Hand

When DMDE and The Sleuth Kit refused to read these 1990s Mac floppies, we wrote a parser for the HFS catalog and extent B-trees to rebuild the real file tree — folders, names, and fragmented files included.

Blog • 2/25/2026

Recovering a Weak-Head Seagate Drive with Controlled HDDSuperClone Imaging

This is a technical write-up of a real recovery workflow used on an unstable 1TB Seagate drive that would reset under sustained reads. The goal was to safely extract photos for a photographer while minimizing drive stress.

The drive would not show up in Windows on the Device Manager, and required using a SATA PCIe card to be able to hot swap the drive in after BIOS since it would lock up BIOS during hardware checks.

The drive would show up in output of lsblk and thus we took the following steps to recover data for our customer.

Tools used: Linux, smartctl, hdparm, HDDSuperClone, DMDE, PhotoRec, ExifTool.

Symptoms and initial triage

The drive presented as readable for short operations (SMART and small reads), but would become unstable under sustained sequential reads. ddrescue would stall, and imaging attempts with HDDSuperClone would trigger firmware resets and capacity/identify glitches. The error message read Source drive reports wrong size / size changed. No new clicking sounds were observed.

Quick verification commands

$ sudo smartctl -a /dev/sdX
$ sudo hdparm -I /dev/sdX
$ sudo dd if=/dev/sdX of=/dev/null bs=512 count=1

A key indicator was SMART responsiveness: after stressful imaging sessions, smartctl became slow, but returned to fast responses after a longer cool-down. This strongly suggested thermal / sustained-load sensitivity consistent with a weak head (not a dead motor, not obvious mechanical crash, and not immediate firmware lockup).

Why sustained imaging was failing

In this case, the drive tolerated only a limited amount of continuous reading before entering heavy internal retries and eventually resetting. The practical takeaway: avoid long, continuous reads. Use controlled reads of small sections of the drive and allow true rest between runs.

Controlled micro-burst imaging with HDDsuperclone

HDDSuperClone was used to image in small, controlled segments. The critical technique was to increase segment size gradually, only when the drive remained stable and SMART stayed responsive.

Segment sizing approach

Start conservative and increase in measured steps. Example working sizes observed during this recovery:

512000 sectors (~250MB)
640000 sectors (~320MB)
768000 sectors (~384MB)

If hesitation increases, SMART slows, or resets occur, drop back to the last stable size.

Operational pattern used between runs

After each successful segment, the drive was placed into standby to stop spindle rotation and park heads cleanly. This provided real rest compared to simply waiting while the drive continued spinning.

$ sudo hdparm -Y /dev/sdX
$ sudo hdparm -C /dev/sdX

The workflow was: run a segment, stop, disconnect the device handle, issue standby (hdparm -Y), rest 10–15 minutes, then resume. After several larger segments, take a longer cool-down.

Recovering from the image with DMDE (filesystem first)

Everything from here works on the image file, never the drive. Once the micro-burst process had produced the best image it could, the physical disk was powered down and set aside — there was no reason to risk another read.

The right first move on an NTFS drive is not carving — it's reading the filesystem. When the file table is intact, a filesystem-aware tool recovers files with their original names, folder structure, and timestamps, and reassembles fragmented files correctly. A signature carver can't do any of that. We loaded the image into DMDE and worked from the partition structure:

Open the image in DMDE (Open Disk → the .img file) and let it detect the partitions.
Open the NTFS volume and run a scan so DMDE rebuilds the directory tree, including deleted entries the MFT still references.
Recover the wanted files and folders to a separate healthy destination drive, preserving the original structure.

This is the part that mattered to the customer: the bulk of their photos came back already organized into the folders they originally used, with real filenames — no reconstruction required. Only after this pass did we look at carving, and only for what the filesystem could no longer account for.

Carving the remainder with PhotoRec (deleted / unreferenced photos)

Carving is a last resort, not a starting point — but it earns its place after the filesystem pass. Some photos had been deleted long ago, or sat in regions the damaged metadata no longer pointed to, so DMDE couldn't list them. To recover those, we carved the unallocated space of the same image with PhotoRec. It's a supplement to the DMDE results, not a replacement:

$ sudo photorec badimage.img

In the PhotoRec UI, choose File Opt, press s to disable all, then enable only the photo types you care about (JPEG/JPG and optionally PNG/RAW). This reduces junk output and speeds up the carve.

PhotoRec filenames: f######## vs t########

PhotoRec assigns generated filenames. Two common patterns:

f########.jpg – a carved file recovered by signature scan
t########.jpg – commonly thumbnails / small previews (often safe to separate)

If you want to isolate thumbnails for review later:

$ mkdir -p thumbs
$ mv t*.jpg thumbs/ 2>/dev/null || true

Sorting recovered photos for a photographer (EXIF-based)

The carved photos from the previous step come out as anonymous f########.jpg files with no names or folders. The DMDE-recovered set already kept its structure, so this step applies only to the carved leftovers: sorting them by EXIF date is the fastest way to fold them back into the timeline. Install ExifTool:

$ sudo apt install exiftool

We can check the contents of the metadata to see what value we want to use to sort, the best value was the Created Time value:

$ exiftool f6121520.jpg

The following script was created to sort recovered photos by date based on the exif data:

#!/usr/bin/env python3
import argparse
import json
import os
import shutil
import subprocess
from pathlib import Path
from typing import Dict, List


def find_recup_dirs(src_root: Path) -> List[Path]:
    if not src_root.is_dir():
        raise FileNotFoundError(f"Source root not found or not a directory: {src_root}")
    return sorted([p for p in src_root.iterdir() if p.is_dir() and p.name.startswith("recup")])


def iter_files_under(dirs: List[Path]) -> List[Path]:
    files: List[Path] = []
    for d in dirs:
        for p in d.rglob("*"):
            if p.is_file():
                files.append(p)
    return files


def chunked(lst: List[Path], n: int) -> List[List[Path]]:
    return [lst[i : i + n] for i in range(0, len(lst), n)]


def exiftool_create_month(files: List[Path]) -> Dict[str, str]:
    """
    Returns mapping: SourceFile -> "YYYY-MM"
    Missing CreateDate will simply not exist in mapping.
    Uses exiftool date formatting to output YYYY-MM directly.
    """
    if not files:
        return {}

    cmd = ["exiftool", "-json", "-CreateDate", "-d", "%Y-%m"] + [str(f) for f in files]
    try:
        proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
    except FileNotFoundError:
        raise RuntimeError("exiftool not found. Install it first (e.g., sudo apt install libimage-exiftool-perl).")

    if proc.returncode != 0 and not proc.stdout.strip():
        raise RuntimeError(f"exiftool failed:\n{proc.stderr.strip()}")

    try:
        data = json.loads(proc.stdout) if proc.stdout.strip() else []
    except json.JSONDecodeError as e:
        raise RuntimeError(f"Failed to parse exiftool JSON output: {e}\nStderr:\n{proc.stderr.strip()}")

    out: Dict[str, str] = {}
    for item in data:
        src = item.get("SourceFile")
        month = item.get("CreateDate")  # already formatted like YYYY-MM due to -d
        if src and month and isinstance(month, str) and len(month) == 7 and month[4] == "-":
            out[src] = month
    return out


def safe_dest_path(dest_dir: Path, filename: str) -> Path:
    dest_dir.mkdir(parents=True, exist_ok=True)
    base = Path(filename).stem
    ext = Path(filename).suffix
    candidate = dest_dir / (base + ext)
    if not candidate.exists():
        return candidate

    i = 1
    while True:
        candidate = dest_dir / f"{base}_{i}{ext}"
        if not candidate.exists():
            return candidate
        i += 1


def copy_file(src: Path, dest_dir: Path, dry_run: bool) -> Path:
    dest = safe_dest_path(dest_dir, src.name)
    if dry_run:
        return dest
    shutil.copy2(src, dest)
    return dest


def move_file(src: Path, dest_dir: Path, dry_run: bool) -> Path:
    dest = safe_dest_path(dest_dir, src.name)
    if dry_run:
        return dest
    dest_dir.mkdir(parents=True, exist_ok=True)
    shutil.move(str(src), str(dest))
    return dest


def main():
    ap = argparse.ArgumentParser(description="Sort PhotoRec recovered files into YYYY-MM folders using ExifTool CreateDate.")
    ap.add_argument("--src-root", default="/mnt/samsung/photorec_out", help="Root that contains recup* directories.")
    ap.add_argument("--dest-root", default="/mnt/samsung/recoveredphotos", help="Destination base directory.")
    ap.add_argument("--chunk-size", type=int, default=500, help="How many files to send to exiftool per batch.")
    ap.add_argument("--dry-run", action="store_true", help="Print what would happen, but don't copy/move anything.")
    args = ap.parse_args()

    src_root = Path(args.src_root)
    dest_root = Path(args.dest_root)
    thumbs_dir = dest_root / "thumbnails"
    misc_dir = dest_root / "Misc"

    recup_dirs = find_recup_dirs(src_root)
    if not recup_dirs:
        print(f"No recup* directories found under: {src_root}")
        return

    all_files = iter_files_under(recup_dirs)
    if not all_files:
        print(f"No files found under recup* directories in: {src_root}")
        return

    # Separate thumbnails (filename starts with 't')
    thumbs = [p for p in all_files if p.name.startswith("t")]
    normal = [p for p in all_files if not p.name.startswith("t")]

    print(f"Found recup dirs: {len(recup_dirs)}")
    print(f"Total files: {len(all_files)} | thumbnails (move): {len(thumbs)} | normal (copy): {len(normal)}")
    if args.dry_run:
        print("DRY RUN enabled: no changes will be made.\n")

    # Move thumbnails
    moved_thumbs = 0
    for p in thumbs:
        dest = move_file(p, thumbs_dir, args.dry_run)
        moved_thumbs += 1
        if moved_thumbs <= 10 or moved_thumbs % 500 == 0:
            print(f"[THUMB] {p}  ->  {dest}")
    if moved_thumbs > 10:
        print(f"[THUMB] ... moved {moved_thumbs} thumbnails total")

    # Copy normal files based on CreateDate -> YYYY-MM
    copied = 0
    to_misc = 0

    for batch in chunked(normal, args.chunk_size):
        month_map = exiftool_create_month(batch)
        for p in batch:
            month = month_map.get(str(p))
            target_dir = (dest_root / month) if month else misc_dir
            dest = copy_file(p, target_dir, args.dry_run)
            copied += 1
            if not month:
                to_misc += 1

            if copied <= 10 or copied % 1000 == 0:
                label = month if month else "Misc"
                print(f"[COPY:{label}] {p}  ->  {dest}")

    if copied > 10:
        print(f"[COPY] ... copied {copied} files total")

    print("\nDone.")
    print(f"Thumbnails moved: {moved_thumbs} -> {thumbs_dir}")
    print(f"Normal copied: {copied} -> {dest_root}")
    print(f"Missing CreateDate (copied to Misc): {to_misc} -> {misc_dir}")


if __name__ == "__main__":
    main()

We use the chmod command to make the script executable, test it with the dry run parameter, and after validation we can run this to sort the photos by exif data datetime.

$ chmod +x sort_recup_photos.py
$ sudo ./sort_recup_photos.py --dry-run
$ sudo ./sort_recup_photos.py

Takeaways

Weak-head drives can be readable in short bursts while failing under sustained reads.
SMART responsiveness is a practical indicator of fatigue; longer cool-downs can restore stability.
Using hdparm -Y between runs provides real rest without hard power pulls.
Recover from the image, never the drive — and read the filesystem first (DMDE) so files keep their names, folders, and fragmentation; carve only the unallocated remainder.
For carved photos that have no metadata, EXIF-date sorting quickly restores usable organization.

Blog • CD Recovery Case Study

Recovering Data from 23 Unreadable CDs Using Multi-Drive ddrescue and DMDE

This is a real-world workflow used to recover data from a large batch of CDs that would not read in standard environments. The customer had 23 discs containing photos and files accumulated over years, most of which failed to open or would hang during access.

Instead of treating each disc as a full imaging job, we built a workflow that prioritized quick wins, reduced unnecessary stress on marginal media, and used multiple optical drives to maximize read success.

Tools used: Linux, ddrescue, DMDE, multiple SATA and USB optical drives.

Initial Triage with DMDE

Before committing to full imaging, each CD was tested in DMDE to quickly determine whether files were accessible without heavy read operations.

cd Downloads/dmde
sudo ./dmde

This step allowed us to:

Identify discs that could be recovered immediately
Avoid unnecessary ddrescue runs on readable media
Reduce wear on already fragile discs

A small number of discs were successfully recovered at this stage, eliminating hours of imaging work.

Building a Multi-Drive Recovery Setup

Optical drives vary significantly in how they handle damaged or degraded discs. Instead of relying on a single drive, we created an “assembly line” using multiple drives simultaneously.

Older SATA DVD-RW drives (2008–2013 era)
USB DVD drives
USB Blu-ray drives

Each drive had different read tolerances. Some would fail instantly on a disc that another drive could partially read.

Devices appeared as:

/dev/sr0
/dev/sr1
/dev/sr2
/dev/sr3

This allowed us to rotate discs between drives without losing progress.

Imaging Workflow with ddrescue

Each disc was labeled physically and matched to an image file and log file inside a customer directory:

mkdir ~/mike
cd ~/mike

Initial imaging always started with a no-retry pass to quickly capture readable sectors:

sudo ddrescue -b 2048 -n /dev/sr0 ddcd24.img ddcd24.log

If the drive stalled or slowed significantly, the same job was resumed on another drive using the same mapfile:

sudo ddrescue -b 2048 -n /dev/sr1 ddcd24.img ddcd24.log
sudo ddrescue -b 2048 -n /dev/sr2 ddcd24.img ddcd24.log

This is a key technique. ddrescue’s mapfile allows seamless continuation across different drives, effectively combining their strengths.

Final pass on best-performing drive

Once the majority of readable sectors were captured, the best-performing drive was selected for retry attempts:

sudo ddrescue -b 2048 -r 3 /dev/sr3 ddcd24.img ddcd24.log

Limiting retries is important. CDs degrade quickly under repeated reads, and excessive retries can make things worse.

Why Multiple Drives Matter

One of the biggest takeaways from this job is how inconsistent optical drives are.

Some drives read through scratches better
Some handle dye degradation better
Some fail quickly but others recover slowly

Rotating drives is often the difference between partial recovery and near-complete recovery.

Extracting Files with DMDE

Once imaging completed, DMDE was used to scan raw images and extract files.

In many cases, filesystem metadata was incomplete or damaged, so raw scanning was required.

JPEG files were recovered and organized into folders that matched the original disc labels written by the customer years ago.

This small step makes a big difference for usability. Instead of a flat dump of files, the customer received structured data that reflected how they originally organized their discs.

Final Delivery

All recovered files were consolidated and transferred to a USB drive for the customer.

Recovered images organized by disc
Readable files separated from partial or corrupted files
Delivered on a single accessible device

Out of 23 discs, the majority yielded recoverable data, including photos that had not been accessible for years.

Key Takeaways

Always triage first. Not every disc needs full imaging
Use ddrescue mapfiles to continue across multiple drives
Older optical drives can outperform newer ones on damaged media
Limit retries to avoid further degradation
Organizing output improves customer experience significantly

If you have old CDs that won’t read, especially photo archives, there is often still recoverable data even when they appear completely dead.

Blog • 5/30/2026

Recovering MacWrite II Manuscripts from 1990s Mac Floppies with ddrescue and a Custom File Carver

A local Minnesota author brought us a shoebox of old Macintosh floppy disks holding decades of unpublished writing — essays, drafts, and full manuscripts. The work had been saved in MacWrite II, a word processor discontinued in the 1990s, and nothing on a modern computer could open the files. This is a technical write-up of how we imaged the disks, why every off-the-shelf carving tool failed, and how we built a custom file carver to pull the documents back out.

To protect the customer’s privacy, the author is not named and none of the recovered writing is reproduced here. The technical details below use only file structure, byte signatures, and our own tooling.

Tools used: Linux, GNU ddrescue, file/strings/hexdump, binwalk, foremost, PhotoRec, LibreOffice (libmwaw), and a custom Python carver.

The media and the problem

The source media was a set of 3.5" Mac-formatted (HFS) floppies read through a USB floppy drive. Two challenges stacked on top of each other:

The diskettes were 25–30 years old, so some sectors were weak or unreadable and needed careful imaging before anything else.
Even after imaging, the documents were in MacWrite II format. On classic Mac OS, a file’s type was stored in filesystem metadata (the resource fork / Finder type & creator codes), not in the data itself, so these files have no extension and modern apps don’t recognize them.

The plan was therefore two stages: first make safe, complete images of each disk, then solve the file format separately against those images.

Stage 1 — Imaging the floppies with ddrescue

Aging floppies should be read as few times as possible. We used GNU ddrescue, which captures all the easy sectors first and uses a mapfile (log file) so later passes only revisit the bad spots instead of re-reading the whole disk. Each disk got its own image and mapfile so progress was never lost.

First (fast) pass — grab everything readable

The first pass uses -n (no-scrape) to skip the slow sector-scraping phase and capture the bulk of the disk quickly, and -d for direct device access so we see true read errors rather than cached results. Floppies use a 512-byte sector, set with -b 512.

$ mkdir ~/floppy && cd ~/floppy
$ sudo ddrescue -d -n -b 512 -v /dev/sdd image1.img log1.log

Second (retry) pass — work the bad sectors

The second pass reuses the same mapfile and adds limited retries with -r3 (retry bad sectors three times). Because the mapfile already records what succeeded, ddrescue only re-attempts the sectors that previously failed.

$ sudo ddrescue -d -r3 -b 512 -v /dev/sdd image1.img log1.log

That two-pass pattern was repeated for every disk in the box (image1.img through image9.img, each with its own logN.log mapfile). Keeping a separate mapfile per disk means any disk can be re-imaged or resumed later without touching the others.

One stubborn disk

Most disks imaged cleanly. One refused to read on the first floppy drive, so we moved it to a second drive (it enumerated as /dev/sdc) and pushed the retry count higher:

$ sudo ddrescue -d -n -b 512 -v /dev/sdc image5.img log5.log
$ sudo ddrescue -d -r7 -b 512 -v /dev/sdc image5.img log5.log

That disk yielded nothing — it was physically dead/blank and produced an empty image. The honest outcome: of nine diskettes, eight contained recoverable data and one did not. Swapping to a second drive is the same trick that pays off with optical media — different hardware has different read tolerances.

Stage 2 — Why standard recovery tools came up empty

With the images in hand, we ran the usual identification and carving tools against them. First, basic inspection:

$ file image1.img
$ hexdump -C image1.img | head -50
$ strings image1.img | less
$ binwalk image1.img

strings proved there was real text on the disks — readable sentences scrolled by — so the data was there. But the structured carvers found nothing:

$ foremost -i image1.img -o foremost_1
$ photorec image1.img

Both came back with no documents. The reason is fundamental to how carvers work. Tools like foremost, scalpel, and PhotoRec identify files using either a header + footer signature or a fixed/embedded length. MacWrite II documents have a recognizable header, but:

there is no end-of-file marker (no footer to match), and
there is no length field anywhere in the header.

With no footer and no size, a generic carver has no way to know where each document ends, so it skips the format entirely. To recover these, we had to learn the format ourselves and write a purpose-built carver.

Reverse-engineering the MacWrite II format

We had a lucky advantage: the customer also had a folder of intact MacWrite II files from the same era, which gave us 233 known-good samples to study. Three properties held across every single one.

1. A confirmed type, and a strong header signature

The classic Mac type/creator codes — type MW2D, creator MWII — confirmed these were MacWrite II documents. In the raw data fork, every file began with the same 16-byte signature:

00 2E 00 2E 00 04 00 00 00 48 00 48 00 00 00 00

2. Lengths are always a multiple of 256 bytes

No sample ever broke this rule. That single fact is the lever that makes clean carving possible without a length field.

3. The importer tolerates trailing slack

LibreOffice still ships libmwaw, an import filter that understands MacWrite II. Crucially, it parses the document’s internal structure and ignores any extra bytes after the real content — so a carve that grabs a little too much still opens perfectly.

The carving algorithm

Putting those three facts together produced a simple, safe strategy:

Scan the image for every occurrence of the 16-byte header.
Carve from each header to the start of the next header (or end of image), capped at a sane maximum.
Round the carved length down to a 256-byte boundary and trim trailing zero padding.

Because a real file’s length is always a multiple of 256 and is never larger than the gap to the next file, rounding down can never cut into real content; it only trims slack. Any extra bytes that do get included are harmless, since libmwaw stops at the true end of the document. The core of it:

# 16-byte signature shared by every MacWrite II sample
MAGIC = bytes.fromhex("002e002e000400000048004800000000")
ALIGN = 256   # every real file length is a multiple of this

def find_headers(data):
    offs, pos = [], 0
    while True:
        i = data.find(MAGIC, pos)
        if i < 0:
            break
        offs.append(i)
        pos = i + 1
    return offs

def carve_one(data, start, end, max_size):
    end  = min(end, start + max_size, len(data))
    blob = data[start:end]
    n = (len(blob) // ALIGN) * ALIGN          # round down to 256
    return blob[:n].rstrip(b"\x00") or blob[:ALIGN]

This assumes each document is stored contiguously on the disk, which is the normal case on floppies and small HFS volumes.

The complete carver, including the conversion and Mac-Roman fallback described below, is open source on GitHub: Champlin-Guys-Data-Recovery-Scripts / macwrite_carver.

Two-tier recovery: formatting first, words always

For each carved blob the tool tries two methods, in order, so we always get the maximum out of every file:

.docx via LibreOffice/libmwaw — full text and original formatting. This is the preferred result.
.txt via direct Mac-Roman extraction — a fallback for files libmwaw can’t fully decode (an older format variant, or a fragmented file). It decodes the data fork as Mac-Roman, so curly quotes and accented characters survive, and keeps only the runs that look like real prose.

# headless conversion with LibreOffice's MacWrite filter
$ soffice --headless --convert-to docx:"MS Word 2007 XML" \
          --outdir out/ carved_document.mw2

The carver runs that conversion automatically, checks whether the resulting document actually contains text, and if not, writes the recovered plain text instead. Every carved document ends up as either a formatted .docx or, at worst, a readable .txt — nothing is silently dropped.

Proving it actually works

Before trusting it on the real disks, we validated the boundary logic against ground truth. We built a synthetic disk image out of five known-good documents (ranging from 2 KB to 647 KB), deliberately surrounded by leading junk, random zero-padding slack, and a block of unrelated data wedged between two of them — the kinds of conditions a carver has to survive. The carver pulled all five back out, and the recovered text matched the originals byte-for-byte.

Results on the customer’s disks: across the eight readable floppies, the carver recovered 56 MacWrite II documents — 34 with full formatting as .docx, and 22 as recovered text where the importer couldn’t decode the variant. Several were complete manuscripts hundreds of thousands of characters long. Net unrecoverable documents: zero.

Takeaways

Image fragile media first with ddrescue and a per-disk mapfile; do a fast -n pass, then a limited -r retry pass.
When a disk won’t read, try a second drive before giving up — read tolerances vary between hardware.
“No carver supports it” usually means the format lacks a footer or length field — not that the data is gone.
A handful of known-good samples can reveal the one invariant (here, the 256-byte alignment) that makes safe carving possible.
Always validate a custom carver against ground truth before trusting it on irreplaceable originals.

Have old floppies, Zip disks, or files in a format nothing will open anymore? Even when standard tools say “unsupported,” the data is often still recoverable with the right approach. We’re local to Champlin, Minnesota and happy to take a look.

Blog • 6/3/2026

Parsing Classic Mac HFS Directory and Extent Trees to Recover Files DMDE and The Sleuth Kit Can't Read

This is a follow-up to our MacWrite II floppy recovery case. The carver pulled the words back out of those 1990s Mac disks, but as anonymous blobs — no filenames, no folder structure, and a known weak spot on fragmented files. To recover the actual file tree the author had organized their manuscripts into, we went a layer deeper and parsed the classic Mac HFS filesystem itself: its Master Directory Block, Catalog B-tree, and Extents-Overflow B-tree.

As before, to protect the customer's privacy the author is not named and none of the recovered writing is reproduced here. Everything below is filesystem structure and our own tooling.

Tools used: Linux, The Sleuth Kit, the Linux hfs kernel module, hexdump, and a custom Python HFS parser (open-sourced below).

Where the carver left off

Signature carving recovers a file's bytes but knows nothing about the filesystem, so every document came out named something like image2_off0007a400.mw2, flattened into one folder, with fragmented files at risk of coming out incomplete. The author, though, had filed decades of work into a real hierarchy — draft folders, version folders, a working set versus a submission set. That organization is itself recovered information worth keeping. The natural next step was to read the disk the way a Mac of the era did: through its directory.

Why DMDE and The Sleuth Kit came up empty

In plain English: every disk format stamps a little label near its front saying what kind it is — like the edition notice on a book's copyright page. Recovery tools read that label first and bail out if they don't recognize it. These floppies use an old Apple format (its label reads BD), and the modern tools are only built for the newer Apple format (label H+). So they took one look at the old label and said "not my format" — even though the files were sitting right there. We just had to write something that reads the old label.

These floppies are formatted with classic HFS — the Hierarchical File System Apple shipped before HFS+. Pointed at the raw images, the usual filesystem tools wouldn't list a tree at all. The Sleuth Kit is explicit about why:

$ fsstat -f hfs image1.img
Invalid magic value (HFS file systems (other than wrappers
HFS+/HFSX file systems) are not supported)

The Sleuth Kit only implements HFS+/HFSX; DMDE likewise didn't recognize the structures and showed no file list. The reasons are simple once you look at the disk: a bare floppy has no partition map, and a classic-HFS volume's signature is BD (0x4244) at byte offset 1024 — not the H+ that HFS+ tools look for.

$ xxd -s 1024 -l 16 image1.img
00000400: 4244 a0c2 6797 b92f 3f41 0100 0007 0003  BD..g../?A......

The Linux hfs kernel module can mount classic HFS read-only, but it needs root and is unforgiving of the damaged, partially-imaged disks in this batch. So we wrote a small parser that reads the on-disk structures directly and tolerates missing sectors.

How an HFS volume is laid out

In plain English: think of the disk like a book. The very front has a short title page (it names the disk and points to where the table of contents lives). Then there's the table of contents — the list of every file and folder, what it's called, and which pages hold it. And because some files got split across non-adjacent pages, there's a small "continued on page…" index for the leftovers. Read those three things in order and you can find every file on the disk. The rest of this section is just naming those three parts the way HFS does.

Classic HFS keeps its entire directory in two special files — B-trees — whose locations are recorded in a header near the front of the volume:

Boot blocks (the first two 512-byte sectors).
The Master Directory Block (MDB) at sector 2: volume name, allocation-block size, file/folder counts, and the extent records that say where the two special files live.
The Catalog file — a B-tree of every folder and file: names, parent IDs, type/creator codes, and fork sizes.
The Extents-Overflow file — a B-tree that records extra fragments for any file too chopped-up to describe in the catalog alone.

A file's contents are stored in extents: runs of contiguous allocation blocks listed as (start block, block count). The catalog record holds the first three extents inline; if a file is more fragmented than that, the rest live in the Extents-Overflow B-tree. Recover the file tree, and then follow those extents, and you have the files.

Step 1 — Read the Master Directory Block

The MDB gives the allocation-block geometry and, at the end, the extent records for the Catalog and Extents-Overflow files. Reading one of the customer's disks revealed a detail that matters: the Catalog file was split into two extents — blocks 22–43 and again far away at blocks 2507–2528. A reader that only honors the first extent would silently read half a directory. That fragmentation is a plausible reason off-the-shelf tools stumble here.

Volume name : 'LEFT COAST'
Alloc blocks: 2874 x 512B
Catalog     : 22528 bytes, extents [(22, 22), (2507, 22)]   <- fragmented
Extents file: 11264 bytes, extents [(0, 22)]

With the block size (512 bytes here) and the first-block offset from the MDB, an allocation block number converts straight to a byte offset in the image, and the special files can be reassembled by concatenating their extents in order.

Step 2 — Walk the Catalog B-tree

In plain English: a "B-tree" is just a way of keeping a big list sorted and quick to search — think of an old library card catalog or the tabbed dividers in a filing cabinet, where a top drawer tells you which lower drawer to open. HFS uses one to track every file and folder. "Walking" it simply means reading the cards in order, one after another, until you've seen them all: read a card, note the filename and where its data lives, move to the next card, repeat. The whole directory on one of these floppies was only about forty of those little cards. The one quirk is that each drawer lists its cards in an index at the back, written in reverse — so you peek at the back of the drawer to find where each card starts. Mechanical, just unfamiliar.

An HFS B-tree is a series of fixed-size nodes (512 bytes on these disks). Node 0 is a header that points to the chain of leaf nodes, and the leaves hold the actual records in order. Each node ends with a small table of offsets — stored back-to-front — that marks where each record begins, so walking a node means reading that table and slicing out the records:

# records are located by an offset table at the END of the node,
# written in reverse: the last 2 bytes point at record 0, etc.
def node_records(node):
    nrecs = int.from_bytes(node[10:12], "big")
    ns = len(node)
    recs = []
    for i in range(nrecs):
        start = int.from_bytes(node[ns-2*(i+1):ns-2*(i+1)+2], "big")
        end   = int.from_bytes(node[ns-2*(i+2):ns-2*(i+2)+2], "big")
        recs.append(node[start:end])
    return recs

Each catalog record is a key plus a data payload. The key carries the parent folder's ID and the item's name; the payload says whether it's a folder or a file and, for files, the Mac type/creator codes and the data + resource fork extents. Collecting every leaf record and linking children to parents by ID rebuilds the entire tree, names and all:

$ python3 hfsrec.py image2.img list
Volume name : 'west coast'
...
ODYSSEY/   [dir cnid=17 valence=8]
  left coast preview   [MW2D/MWII] data=500736 rsrc=0 cnid=22  *fragmented*
  nw sequence-y!       [MW2D/MWII] data=96000  rsrc=0 cnid=23  *fragmented*
w.c. misc/   [dir cnid=28 valence=20]
  chop draft           [MW2D/MWII] data=249088 rsrc=0 cnid=32
  notes-- border to xxx [MW2D/MWII] data=199424 rsrc=0 cnid=41

The recovered file and folder counts matched each volume's own header tally exactly — a built-in sanity check that the walk was complete.

Step 3 — Follow the extents, including the overflow tree

In plain English: an "extent" is just one unbroken chunk of a file. A small file is one chunk; but as a disk fills up, the system tucks new files into whatever gaps are free, so a big file can end up scattered into several chunks — like a long document photocopied onto pages that got filed across different drawers. The table of contents lists the first three chunks of each file right next to its name; if a file has more than three, the extra locations are written in that separate "continued on page…" list. Miss that second list and you get a file that's cut short. Reading both lists is how we put scattered files back together completely — and it's exactly the step that simpler tools skip.

Reading a file means walking its extent list and concatenating those allocation blocks. The catch is the heavily-fragmented files: the catalog only holds three extents inline, and if that doesn't cover the file's length, the remaining fragments are in the Extents-Overflow B-tree, keyed by file ID. The left coast preview document above is a clean example — its three inline extents stopped short of its real size, and the missing fourth fragment was waiting in the overflow tree:

data extents (from catalog): [(100, 932), (2694, 20), (2754, 9)]   = 491,776 bytes
logical size:                                                        500,736 bytes
-> short by 8,960 bytes, so consult the extents-overflow B-tree:
full extent list:           [(100, 932), (2694, 20), (2754, 9), (2856, 17)]
reassembled size:                                                    500,736 bytes  (exact)

That fourth fragment is exactly what a first-extent-only reader — or a contiguous-assumption carver — would drop. Following the overflow tree puts the file back together to the byte.

Paying off against the format work

With a real file tree in hand, the MacWrite conversion got better in two concrete ways. First, we could select documents by their catalog type code (MW2D) instead of scanning for a byte signature — which caught a MacWrite II header variant that began FF FF FF FC rather than the 00 2E 00 2E the carver keyed on. A pure signature scan had been skipping those files entirely. Second, fragmented documents now arrived whole, so they converted cleanly instead of truncating. Each document was handed to LibreOffice's libmwaw filter for a formatted .docx, with a Mac-Roman text fallback — and now landed in a folder tree that mirrored how the author had filed it.

Being honest about the truncated disks

A few of the floppies imaged only partially — one came in at 786 KB and another at 721 KB against the full 1.44 MB. The parser is built to tolerate this: reads past the end of an image return zero-fill plus a warning rather than crashing, so everything that is present comes out. But it also tells the truth about what isn't. Several files' allocation blocks sat entirely beyond the point where imaging stopped, so their data was simply not in the image — confirmed by checking that the reassembled forks were 100% zeros. Those are genuinely unrecoverable from these partial images; the only way to get them is a fresh, more complete read of the physical disk.

Result: across the eight readable floppies the parser rebuilt the complete folder tree for every fully-imaged volume, with file and folder counts matching each volume header. Combined with the format conversion, the author got their manuscripts back as named, foldered .docx and .txt files — not anonymous blobs.

Takeaways

"Unsupported filesystem" often means a tool only handles the modern variant. Classic HFS (signature BD, no partition map) is well documented and parseable by hand when HFS+ tools refuse it.
An HFS volume's whole directory lives in two B-trees named by the Master Directory Block — read those and you have the file tree.
Always follow all the extents. The catalog holds three inline; the rest are in the Extents-Overflow tree, and that's exactly where fragmented files lose data to naive readers.
Recovering the directory makes everything downstream better: real filenames, real folders, type-code selection, and complete fragmented files.
When an image is truncated, verify recovered data is non-zero before claiming success — and say plainly when a file simply isn't in the image.

The HFS catalog/extent parser and the catalog-driven MacWrite converter are open source on GitHub: Champlin-Guys-Data-Recovery-Scripts / hfs_recover.

Have old Mac floppies, Zip disks, or drives in a filesystem nothing will mount anymore? Even when modern tools say "unsupported," the directory is usually still there to be read. We're local to Champlin, Minnesota and happy to take a look.