feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar

Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
vsecoder
2026-04-18 19:15:14 +03:00
parent 126658f294
commit f885264d23
8 changed files with 830 additions and 35 deletions

149
media/scrub_test.go Normal file
View File

@@ -0,0 +1,149 @@
package media
import (
"bytes"
"image"
"image/color"
"image/jpeg"
"testing"
)
// TestScrubImageRemovesEXIF: our scrubber re-encodes via stdlib JPEG, which
// does not preserve EXIF by construction. We verify that a crafted input
// carrying an EXIF marker produces an output without one.
func TestScrubImageRemovesEXIF(t *testing.T) {
// Build a JPEG that explicitly contains an APP1 EXIF segment.
// Structure: JPEG SOI + APP1 with "Exif\x00\x00" header + real image data.
var base bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 8, 8))
for y := 0; y < 8; y++ {
for x := 0; x < 8; x++ {
img.Set(x, y, color.RGBA{uint8(x * 32), uint8(y * 32), 128, 255})
}
}
if err := jpeg.Encode(&base, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode base: %v", err)
}
input := injectEXIF(t, base.Bytes())
if !bytes.Contains(input, []byte("Exif\x00\x00")) {
t.Fatalf("test setup broken: EXIF not injected")
}
// Also drop an identifiable string in the EXIF payload so we can prove
// it's gone.
if !bytes.Contains(input, []byte("SECRETGPS")) {
t.Fatalf("test setup broken: EXIF marker not injected")
}
cleaned, mime, err := ScrubImage(input, "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
if mime != "image/jpeg" {
t.Errorf("mime: got %q, want image/jpeg", mime)
}
// Verify the scrubbed output doesn't contain our canary string.
if bytes.Contains(cleaned, []byte("SECRETGPS")) {
t.Errorf("EXIF canary survived scrub — metadata not stripped")
}
// Verify the output doesn't contain the EXIF segment marker.
if bytes.Contains(cleaned, []byte("Exif\x00\x00")) {
t.Errorf("EXIF header string survived scrub")
}
// Output must still be a valid JPEG.
if _, err := jpeg.Decode(bytes.NewReader(cleaned)); err != nil {
t.Errorf("scrubbed output is not a valid JPEG: %v", err)
}
}
// injectEXIF splices a synthetic APP1 EXIF segment after the JPEG SOI.
// Segment layout: FF E1 <len_hi> <len_lo> "Exif\0\0" + arbitrary payload.
// The payload is NOT valid TIFF — that's fine; stdlib JPEG decoder skips
// unknown APP1 segments rather than aborting.
func injectEXIF(t *testing.T, src []byte) []byte {
t.Helper()
if len(src) < 2 || src[0] != 0xFF || src[1] != 0xD8 {
t.Fatalf("not a JPEG")
}
payload := []byte("Exif\x00\x00" + "SECRETGPS-51.5074N-0.1278W-Canon-EOS-R5")
segmentLen := len(payload) + 2 // +2 = 2 bytes of len field itself
var seg bytes.Buffer
seg.Write([]byte{0xFF, 0xE1})
seg.WriteByte(byte(segmentLen >> 8))
seg.WriteByte(byte(segmentLen & 0xff))
seg.Write(payload)
out := make([]byte, 0, len(src)+seg.Len())
out = append(out, src[:2]...) // SOI
out = append(out, seg.Bytes()...)
out = append(out, src[2:]...)
return out
}
// TestScrubImageMIMEMismatch: rejects bytes that don't match claimed MIME.
func TestScrubImageMIMEMismatch(t *testing.T) {
var buf bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 4, 4))
jpeg.Encode(&buf, img, nil)
// Claim it's a PNG.
_, _, err := ScrubImage(buf.Bytes(), "image/png")
if err == nil {
t.Fatalf("expected ErrMIMEMismatch, got nil")
}
}
// TestScrubImageDownscale: images over ImageMaxDim are shrunk.
func TestScrubImageDownscale(t *testing.T) {
// Make a 2000×1000 image — larger dim 2000 > 1080.
img := image.NewRGBA(image.Rect(0, 0, 2000, 1000))
for y := 0; y < 1000; y++ {
for x := 0; x < 2000; x++ {
img.Set(x, y, color.RGBA{128, 64, 200, 255})
}
}
var buf bytes.Buffer
if err := jpeg.Encode(&buf, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode: %v", err)
}
cleaned, _, err := ScrubImage(buf.Bytes(), "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
decoded, err := jpeg.Decode(bytes.NewReader(cleaned))
if err != nil {
t.Fatalf("decode scrubbed: %v", err)
}
b := decoded.Bounds()
if b.Dx() > ImageMaxDim || b.Dy() > ImageMaxDim {
t.Errorf("not downscaled: got %dx%d, want max %d", b.Dx(), b.Dy(), ImageMaxDim)
}
// Aspect ratio roughly preserved (2:1 → 1080:540 with rounding slack).
if b.Dx() != ImageMaxDim {
t.Errorf("larger dim: got %d, want %d", b.Dx(), ImageMaxDim)
}
}
// TestDetectMIME: a few magic-byte cases to ensure magic detection works.
func TestDetectMIME(t *testing.T) {
cases := []struct {
data []byte
want string
}{
{[]byte("\xff\xd8\xff\xe0garbage"), "image/jpeg"},
{[]byte("\x89PNG\r\n\x1a\n..."), "image/png"},
{[]byte("GIF89a..."), "image/gif"},
{[]byte{}, ""},
}
for _, tc := range cases {
got := detectMIME(tc.data)
if got != tc.want {
t.Errorf("detectMIME(%q): got %q want %q", string(tc.data[:min(len(tc.data), 12)]), got, tc.want)
}
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}