feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar

Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
vsecoder
2026-04-18 19:15:14 +03:00
parent 126658f294
commit f885264d23
8 changed files with 830 additions and 35 deletions

View File

@@ -0,0 +1,201 @@
// Media scrubber sidecar — tiny HTTP service that re-encodes video/audio
// through ffmpeg with all metadata stripped. Runs alongside the DChain
// node in docker-compose; the node calls it via DCHAIN_MEDIA_SIDECAR_URL.
//
// Contract (matches media.Scrubber in the node):
//
// POST /scrub/video Content-Type: video/* body: raw bytes
// → 200, Content-Type: video/mp4, body: cleaned bytes
// POST /scrub/audio Content-Type: audio/* body: raw bytes
// → 200, Content-Type: audio/ogg, body: cleaned bytes
//
// ffmpeg flags of note:
//
// -map_metadata -1 drop ALL metadata streams (title, author, encoder,
// GPS location atoms, XMP blocks, etc.)
// -map 0:v -map 0:a keep only video and audio streams — dumps attached
// pictures, subtitles, data channels that might carry
// hidden info
// -movflags +faststart
// put MOOV atom at the front so clients can start
// playback before the full download lands
// -c:v libx264 -crf 28 -preset fast
// h264 with aggressive-but-not-painful CRF; ~70-80%
// size reduction on phone-camera source
// -c:a libopus -b:a 64k
// opus at 64 kbps is transparent for speech, fine
// for music at feed quality
//
// Environment:
//
// LISTEN_ADDR default ":8090"
// FFMPEG_BIN default "ffmpeg" (must be in PATH)
// MAX_INPUT_MB default 32 — reject anything larger pre-ffmpeg
// JOB_TIMEOUT_SECS default 60
//
// The service is deliberately dumb: no queuing, no DB, no state. If you
// need higher throughput, run N replicas behind a TCP load balancer.
package main
import (
"bytes"
"context"
"fmt"
"io"
"log"
"net/http"
"os"
"os/exec"
"strconv"
"time"
)
func main() {
addr := envOr("LISTEN_ADDR", ":8090")
ffmpegBin := envOr("FFMPEG_BIN", "ffmpeg")
maxInputMB := envInt("MAX_INPUT_MB", 32)
jobTimeoutSecs := envInt("JOB_TIMEOUT_SECS", 60)
// Fail fast if ffmpeg is missing — easier to debug at container start
// than to surface cryptic errors per-request.
if _, err := exec.LookPath(ffmpegBin); err != nil {
log.Fatalf("ffmpeg not found in PATH (looked for %q): %v", ffmpegBin, err)
}
srv := &server{
ffmpegBin: ffmpegBin,
maxInputSize: int64(maxInputMB) * 1024 * 1024,
jobTimeout: time.Duration(jobTimeoutSecs) * time.Second,
}
mux := http.NewServeMux()
mux.HandleFunc("/scrub/video", srv.scrubVideo)
mux.HandleFunc("/scrub/audio", srv.scrubAudio)
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("ok"))
})
log.Printf("media-sidecar: listening on %s, ffmpeg=%s, max_input=%d MiB, timeout=%ds",
addr, ffmpegBin, maxInputMB, jobTimeoutSecs)
if err := http.ListenAndServe(addr, mux); err != nil {
log.Fatalf("ListenAndServe: %v", err)
}
}
type server struct {
ffmpegBin string
maxInputSize int64
jobTimeout time.Duration
}
func (s *server) scrubVideo(w http.ResponseWriter, r *http.Request) {
body, err := s.readLimited(r)
if err != nil {
httpErr(w, err.Error(), http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
defer cancel()
// Video path: re-encode with metadata strip, H.264 CRF 28, opus audio.
// Output format is MP4 (widest client compatibility).
args := []string{
"-hide_banner", "-loglevel", "error",
"-i", "pipe:0",
"-map", "0:v", "-map", "0:a?",
"-map_metadata", "-1",
"-c:v", "libx264", "-preset", "fast", "-crf", "28",
"-c:a", "libopus", "-b:a", "64k",
"-movflags", "+faststart+frag_keyframe",
"-f", "mp4",
"pipe:1",
}
out, ffErr, err := s.runFFmpeg(ctx, args, body)
if err != nil {
log.Printf("video scrub failed: %v | stderr=%s", err, ffErr)
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
return
}
w.Header().Set("Content-Type", "video/mp4")
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
_, _ = w.Write(out)
}
func (s *server) scrubAudio(w http.ResponseWriter, r *http.Request) {
body, err := s.readLimited(r)
if err != nil {
httpErr(w, err.Error(), http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
defer cancel()
args := []string{
"-hide_banner", "-loglevel", "error",
"-i", "pipe:0",
"-vn", "-map", "0:a",
"-map_metadata", "-1",
"-c:a", "libopus", "-b:a", "64k",
"-f", "ogg",
"pipe:1",
}
out, ffErr, err := s.runFFmpeg(ctx, args, body)
if err != nil {
log.Printf("audio scrub failed: %v | stderr=%s", err, ffErr)
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
return
}
w.Header().Set("Content-Type", "audio/ogg")
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
_, _ = w.Write(out)
}
func (s *server) runFFmpeg(ctx context.Context, args []string, input []byte) ([]byte, string, error) {
cmd := exec.CommandContext(ctx, s.ffmpegBin, args...)
cmd.Stdin = bytes.NewReader(input)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
if err != nil {
return nil, stderr.String(), err
}
return stdout.Bytes(), stderr.String(), nil
}
func (s *server) readLimited(r *http.Request) ([]byte, error) {
if r.Method != http.MethodPost {
return nil, fmt.Errorf("method not allowed")
}
limited := io.LimitReader(r.Body, s.maxInputSize+1)
buf, err := io.ReadAll(limited)
if err != nil {
return nil, fmt.Errorf("read body: %w", err)
}
if int64(len(buf)) > s.maxInputSize {
return nil, fmt.Errorf("input exceeds %d bytes", s.maxInputSize)
}
return buf, nil
}
func httpErr(w http.ResponseWriter, msg string, status int) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
w.WriteHeader(status)
_, _ = w.Write([]byte(msg))
}
func envOr(k, d string) string {
if v := os.Getenv(k); v != "" {
return v
}
return d
}
func envInt(k string, d int) int {
v := os.Getenv(k)
if v == "" {
return d
}
n, err := strconv.Atoi(v)
if err != nil {
return d
}
return n
}