Files
dchain/media/scrub_test.go
vsecoder f885264d23 feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar
Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:15:14 +03:00

150 lines
4.6 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package media
import (
"bytes"
"image"
"image/color"
"image/jpeg"
"testing"
)
// TestScrubImageRemovesEXIF: our scrubber re-encodes via stdlib JPEG, which
// does not preserve EXIF by construction. We verify that a crafted input
// carrying an EXIF marker produces an output without one.
func TestScrubImageRemovesEXIF(t *testing.T) {
// Build a JPEG that explicitly contains an APP1 EXIF segment.
// Structure: JPEG SOI + APP1 with "Exif\x00\x00" header + real image data.
var base bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 8, 8))
for y := 0; y < 8; y++ {
for x := 0; x < 8; x++ {
img.Set(x, y, color.RGBA{uint8(x * 32), uint8(y * 32), 128, 255})
}
}
if err := jpeg.Encode(&base, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode base: %v", err)
}
input := injectEXIF(t, base.Bytes())
if !bytes.Contains(input, []byte("Exif\x00\x00")) {
t.Fatalf("test setup broken: EXIF not injected")
}
// Also drop an identifiable string in the EXIF payload so we can prove
// it's gone.
if !bytes.Contains(input, []byte("SECRETGPS")) {
t.Fatalf("test setup broken: EXIF marker not injected")
}
cleaned, mime, err := ScrubImage(input, "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
if mime != "image/jpeg" {
t.Errorf("mime: got %q, want image/jpeg", mime)
}
// Verify the scrubbed output doesn't contain our canary string.
if bytes.Contains(cleaned, []byte("SECRETGPS")) {
t.Errorf("EXIF canary survived scrub — metadata not stripped")
}
// Verify the output doesn't contain the EXIF segment marker.
if bytes.Contains(cleaned, []byte("Exif\x00\x00")) {
t.Errorf("EXIF header string survived scrub")
}
// Output must still be a valid JPEG.
if _, err := jpeg.Decode(bytes.NewReader(cleaned)); err != nil {
t.Errorf("scrubbed output is not a valid JPEG: %v", err)
}
}
// injectEXIF splices a synthetic APP1 EXIF segment after the JPEG SOI.
// Segment layout: FF E1 <len_hi> <len_lo> "Exif\0\0" + arbitrary payload.
// The payload is NOT valid TIFF — that's fine; stdlib JPEG decoder skips
// unknown APP1 segments rather than aborting.
func injectEXIF(t *testing.T, src []byte) []byte {
t.Helper()
if len(src) < 2 || src[0] != 0xFF || src[1] != 0xD8 {
t.Fatalf("not a JPEG")
}
payload := []byte("Exif\x00\x00" + "SECRETGPS-51.5074N-0.1278W-Canon-EOS-R5")
segmentLen := len(payload) + 2 // +2 = 2 bytes of len field itself
var seg bytes.Buffer
seg.Write([]byte{0xFF, 0xE1})
seg.WriteByte(byte(segmentLen >> 8))
seg.WriteByte(byte(segmentLen & 0xff))
seg.Write(payload)
out := make([]byte, 0, len(src)+seg.Len())
out = append(out, src[:2]...) // SOI
out = append(out, seg.Bytes()...)
out = append(out, src[2:]...)
return out
}
// TestScrubImageMIMEMismatch: rejects bytes that don't match claimed MIME.
func TestScrubImageMIMEMismatch(t *testing.T) {
var buf bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 4, 4))
jpeg.Encode(&buf, img, nil)
// Claim it's a PNG.
_, _, err := ScrubImage(buf.Bytes(), "image/png")
if err == nil {
t.Fatalf("expected ErrMIMEMismatch, got nil")
}
}
// TestScrubImageDownscale: images over ImageMaxDim are shrunk.
func TestScrubImageDownscale(t *testing.T) {
// Make a 2000×1000 image — larger dim 2000 > 1080.
img := image.NewRGBA(image.Rect(0, 0, 2000, 1000))
for y := 0; y < 1000; y++ {
for x := 0; x < 2000; x++ {
img.Set(x, y, color.RGBA{128, 64, 200, 255})
}
}
var buf bytes.Buffer
if err := jpeg.Encode(&buf, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode: %v", err)
}
cleaned, _, err := ScrubImage(buf.Bytes(), "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
decoded, err := jpeg.Decode(bytes.NewReader(cleaned))
if err != nil {
t.Fatalf("decode scrubbed: %v", err)
}
b := decoded.Bounds()
if b.Dx() > ImageMaxDim || b.Dy() > ImageMaxDim {
t.Errorf("not downscaled: got %dx%d, want max %d", b.Dx(), b.Dy(), ImageMaxDim)
}
// Aspect ratio roughly preserved (2:1 → 1080:540 with rounding slack).
if b.Dx() != ImageMaxDim {
t.Errorf("larger dim: got %d, want %d", b.Dx(), ImageMaxDim)
}
}
// TestDetectMIME: a few magic-byte cases to ensure magic detection works.
func TestDetectMIME(t *testing.T) {
cases := []struct {
data []byte
want string
}{
{[]byte("\xff\xd8\xff\xe0garbage"), "image/jpeg"},
{[]byte("\x89PNG\r\n\x1a\n..."), "image/png"},
{[]byte("GIF89a..."), "image/gif"},
{[]byte{}, ""},
}
for _, tc := range cases {
got := detectMIME(tc.data)
if got != tc.want {
t.Errorf("detectMIME(%q): got %q want %q", string(tc.data[:min(len(tc.data), 12)]), got, tc.want)
}
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}