feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar

Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
vsecoder
2026-04-18 19:15:14 +03:00
parent 126658f294
commit f885264d23
8 changed files with 830 additions and 35 deletions

View File

@@ -29,11 +29,13 @@ package node
// re-publish to another relay.
import (
"context"
"crypto/sha256"
"encoding/base64"
"encoding/hex"
"encoding/json"
"fmt"
"log"
"net/http"
"sort"
"strings"
@@ -41,6 +43,7 @@ import (
"go-blockchain/blockchain"
"go-blockchain/identity"
"go-blockchain/media"
"go-blockchain/relay"
)
@@ -53,6 +56,18 @@ type FeedConfig struct {
// /feed/publish so the client knows who to put in CREATE_POST tx.
HostingRelayPub string
// Scrubber strips metadata from image/video/audio attachments before
// they are stored. MUST be non-nil; a zero Scrubber (NewScrubber with
// empty sidecar URL) still handles images in-process — only video/audio
// require sidecar config.
Scrubber *media.Scrubber
// AllowUnscrubbedVideo controls server behaviour when a video upload
// arrives and no sidecar is configured. false (default) → reject; true
// → store as-is with a warning log. Set via --allow-unscrubbed-video
// flag on the node. Leave false in production.
AllowUnscrubbedVideo bool
// Chain lookups (nil-safe; endpoints degrade gracefully).
GetPost func(postID string) (*blockchain.PostRecord, error)
LikeCount func(postID string) (uint64, error)
@@ -136,6 +151,7 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
// Decode attachment.
var attachment []byte
var attachmentMIME string
if req.AttachmentB64 != "" {
b, err := base64.StdEncoding.DecodeString(req.AttachmentB64)
if err != nil {
@@ -145,11 +161,48 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
}
}
attachment = b
attachmentMIME = req.AttachmentMIME
// MANDATORY server-side scrub: strip ALL metadata (EXIF/GPS/
// camera/author/ICC/etc.) and re-compress. Client is expected
// to have done a first pass, but we never trust it — a photo
// from a phone carries GPS coordinates by default and the client
// might forget or a hostile client might skip the scrub entirely.
//
// Images are handled in-process (stdlib re-encode to JPEG kills
// all metadata by construction). Videos/audio are forwarded to
// the media sidecar; if none is configured and the operator
// hasn't opted in to AllowUnscrubbedVideo, we reject.
if cfg.Scrubber == nil {
jsonErr(w, fmt.Errorf("media scrubber not configured on this node"), 503)
return
}
ctx, cancel := context.WithTimeout(r.Context(), 60*time.Second)
cleaned, newMIME, err := cfg.Scrubber.Scrub(ctx, attachment, attachmentMIME)
cancel()
if err != nil {
// Graceful video fallback only when explicitly allowed.
if err == media.ErrSidecarUnavailable && cfg.AllowUnscrubbedVideo {
// Keep bytes as-is (operator accepted the risk), just log.
log.Printf("[feed] WARNING: storing unscrubbed video — no sidecar configured (author=%s)", req.Author)
} else {
status := 400
if err == media.ErrSidecarUnavailable {
status = 503
}
jsonErr(w, fmt.Errorf("scrub attachment: %w", err), status)
return
}
} else {
attachment = cleaned
attachmentMIME = newMIME
}
}
// Content hash binds the body to the on-chain metadata. We hash
// content+attachment so the client can't publish body-A off-chain
// and commit hash-of-body-B on-chain.
// Content hash is computed over the scrubbed bytes — that's what
// the on-chain tx will reference, and what readers fetch. Binds
// the body to the metadata so a misbehaving relay can't substitute
// a different body under the same PostID.
h := sha256.New()
h.Write([]byte(req.Content))
h.Write(attachment)
@@ -181,7 +234,7 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
Content: req.Content,
ContentType: req.ContentType,
Attachment: attachment,
AttachmentMIME: req.AttachmentMIME,
AttachmentMIME: attachmentMIME,
ReplyTo: req.ReplyTo,
QuoteOf: req.QuoteOf,
}