Building a cinematic scroll-frame sequence in plain HTML

The scroll sequence on our homepage runs 241 frames across a 300vh runway. Total weight: 4MB. No framework. Here's how it works.

The naive approach (don't do this)

You put 241 <img> tags in the DOM, each with position: absolute. You toggle opacity as the user scrolls. It looks fine on a MacBook and dies on an iPhone — layout thrashing, memory bloat, GPU stalls while compositing two hundred layers.

What we do instead

One <canvas>. Draw one frame at a time. Two-stage load: critical batch first (covers state-1 and state-2 entry), then 6 parallel workers chew through the rest in the background.

function loadFrame(idx) {
  return new Promise((res) => {
    if (images[idx]) return res();
    const img = new Image();
    img.decoding = "async";
    img.onload = () => { images[idx] = img; res(); };
    img.onerror = () => res();
    img.src = `/frames/frame_${String(idx + 1).padStart(4, "0")}.webp`;
  });
}

Plain new Image(), not createImageBitmap. Safari stalls on bitmap creation; the decoded image cache is fast enough after the first paint.

The lerp that sells it

A raw scroll-to-frame mapping feels stepped. You scroll half a line and the canvas jumps two frames.

The fix is a lerp between a scroll-derived target and a rendered value that chases it:

function tick() {
  const diff = targetFrame - renderedFrame;
  if (Math.abs(diff) < 0.01) {
    renderedFrame = targetFrame;
    lerpRunning = false;
  } else {
    renderedFrame += diff * 0.14;
  }
  const idx = Math.round(renderedFrame);
  if (idx !== cur && images[idx]?.complete) {
    draw(idx);
    cur = idx;
  }
  if (lerpRunning) requestAnimationFrame(tick);
}

A lerp factor of 0.14 at 60fps is roughly a 6-frame settle. Any faster and you lose the tactile weight that hides frame stepping. Any slower and it feels laggy.

File format

WebP at quality 78. Ran the sequence through cwebp with -q 78 -m 6 and it took the total from 33MB (JPEG) to 4MB. On real 4G connections the critical batch (first 40 frames) lands in under 1.5s.

The blend-mode trick

The source video had a grey volumetric-light haze around the glass subject. Straight image rendering showed the haze competing with page background.

The fix: mix-blend-mode: screen plus an aggressive filter: brightness(0.55) contrast(1.75). Screen-blend drops dark pixels into the page bg. Crush the grey haze to near-black before the blend and only the bright glass plus the light shaft paint through. Costs one GPU layer composite. Looks expensive, isn't.

Reduced motion

On prefers-reduced-motion: reduce, we skip the whole system — replace the canvas with a single static image at frame 120, collapse the 300vh runway to auto, show only the first text state. No rAF loop, no scroll listener, no accessibility violation.

Framework-less doesn't mean primitive. It means you understand every line.