feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar
Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.
So: server-side scrub is mandatory for every /feed/publish upload.
New package: media
media/scrub.go
- Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
- ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
jpeg.Encode emits ZERO metadata → scrub is complete by construction.
- Sidecar client (HTTP): posts video/audio bytes to an external
FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
- Magic-byte MIME detection: rejects uploads where declared MIME
doesn't match actual bytes (prevents a PDF dressed as image/jpeg
from bypassing the scrubber)
- ErrSidecarUnavailable: explicit error when video arrives but no
sidecar is wired; operator opts in to fallback via
--allow-unscrubbed-video (default: reject)
media/scrub_test.go
- Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
verifies the string is gone after ScrubImage
- Downscale test (2000×1000 → 1080×540, aspect preserved)
- MIME-mismatch rejection
- Magic-byte detector sanity table
FFmpeg sidecar — new docker/media-sidecar/
Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
only video + audio streams survive (no subtitles, attached pictures,
or data channels that could carry hidden info).
Re-encode profile:
video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
audio → Opus 64k, Ogg container
Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
root user, /healthz endpoint for compose probes.
Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.
/feed/publish wiring
- cfg.Scrubber is a required dependency
- Before storing post body we call scrubber.Scrub(); attachment bytes
+ MIME are replaced with the cleaned version
- content_hash is computed over the SCRUBBED bytes — so the on-chain
CREATE_POST tx references exactly what readers will fetch
- EstimatedFeeUT uses the scrubbed size, so author's fee reflects
actual on-disk cost
- Content-type mismatches → 400; sidecar unavailable for video → 503
Flags / env vars
--feed-db / DCHAIN_FEED_DB (existing)
--feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
--media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL (NEW)
--allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)
Client responsibilities (for reference — client work lands in Phase C)
Even with server-side scrub, the client should still compress aggressively
BEFORE upload, because:
- upload time is ~N× larger for unscrubbed media (mobile networks)
- the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
are rejected, not silently truncated
- the on-chain fee is size-based, so users pay for every byte the
client didn't bother to shrink
Recommended client pipeline:
images → expo-image-manipulator: resize max-dim 1080px, WebP or
JPEG quality 50-60
videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
audio → expo-audio's default Opus 32k (already compressed)
Documented in docs/media-sidecar.md (added later with Phase C PR).
Tests
- go test ./... green across 6 packages (blockchain consensus identity
media relay vm)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
35
docker/media-sidecar/Dockerfile
Normal file
35
docker/media-sidecar/Dockerfile
Normal file
@@ -0,0 +1,35 @@
|
||||
# media-sidecar — FFmpeg-based metadata scrubber for DChain node.
|
||||
#
|
||||
# Build: docker build -t dchain/media-sidecar -f docker/media-sidecar/Dockerfile .
|
||||
# Run: docker run -p 8090:8090 dchain/media-sidecar
|
||||
# Compose: see docker-compose.yml; node points DCHAIN_MEDIA_SIDECAR_URL at it.
|
||||
#
|
||||
# Stage 1 — build a tiny static Go binary.
|
||||
FROM golang:1.22-alpine AS build
|
||||
WORKDIR /src
|
||||
# Copy only what we need (the sidecar main is self-contained, no module
|
||||
# deps on the rest of the repo, so this is a cheap, cache-friendly build).
|
||||
COPY docker/media-sidecar/main.go ./main.go
|
||||
RUN go mod init dchain-media-sidecar 2>/dev/null || true
|
||||
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /out/media-sidecar ./main.go
|
||||
|
||||
# Stage 2 — runtime with ffmpeg. Alpine has a lean ffmpeg build (~90 MB
|
||||
# total image, most of it codecs we actually need).
|
||||
FROM alpine:3.19
|
||||
RUN apk add --no-cache ffmpeg ca-certificates \
|
||||
&& addgroup -S dchain && adduser -S -G dchain dchain
|
||||
COPY --from=build /out/media-sidecar /usr/local/bin/media-sidecar
|
||||
|
||||
USER dchain
|
||||
EXPOSE 8090
|
||||
|
||||
# Pin sensible defaults; operator overrides via docker-compose env.
|
||||
ENV LISTEN_ADDR=:8090 \
|
||||
FFMPEG_BIN=ffmpeg \
|
||||
MAX_INPUT_MB=32 \
|
||||
JOB_TIMEOUT_SECS=60
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
|
||||
CMD wget -qO- http://127.0.0.1:8090/healthz || exit 1
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/media-sidecar"]
|
||||
201
docker/media-sidecar/main.go
Normal file
201
docker/media-sidecar/main.go
Normal file
@@ -0,0 +1,201 @@
|
||||
// Media scrubber sidecar — tiny HTTP service that re-encodes video/audio
|
||||
// through ffmpeg with all metadata stripped. Runs alongside the DChain
|
||||
// node in docker-compose; the node calls it via DCHAIN_MEDIA_SIDECAR_URL.
|
||||
//
|
||||
// Contract (matches media.Scrubber in the node):
|
||||
//
|
||||
// POST /scrub/video Content-Type: video/* body: raw bytes
|
||||
// → 200, Content-Type: video/mp4, body: cleaned bytes
|
||||
// POST /scrub/audio Content-Type: audio/* body: raw bytes
|
||||
// → 200, Content-Type: audio/ogg, body: cleaned bytes
|
||||
//
|
||||
// ffmpeg flags of note:
|
||||
//
|
||||
// -map_metadata -1 drop ALL metadata streams (title, author, encoder,
|
||||
// GPS location atoms, XMP blocks, etc.)
|
||||
// -map 0:v -map 0:a keep only video and audio streams — dumps attached
|
||||
// pictures, subtitles, data channels that might carry
|
||||
// hidden info
|
||||
// -movflags +faststart
|
||||
// put MOOV atom at the front so clients can start
|
||||
// playback before the full download lands
|
||||
// -c:v libx264 -crf 28 -preset fast
|
||||
// h264 with aggressive-but-not-painful CRF; ~70-80%
|
||||
// size reduction on phone-camera source
|
||||
// -c:a libopus -b:a 64k
|
||||
// opus at 64 kbps is transparent for speech, fine
|
||||
// for music at feed quality
|
||||
//
|
||||
// Environment:
|
||||
//
|
||||
// LISTEN_ADDR default ":8090"
|
||||
// FFMPEG_BIN default "ffmpeg" (must be in PATH)
|
||||
// MAX_INPUT_MB default 32 — reject anything larger pre-ffmpeg
|
||||
// JOB_TIMEOUT_SECS default 60
|
||||
//
|
||||
// The service is deliberately dumb: no queuing, no DB, no state. If you
|
||||
// need higher throughput, run N replicas behind a TCP load balancer.
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"os/exec"
|
||||
"strconv"
|
||||
"time"
|
||||
)
|
||||
|
||||
func main() {
|
||||
addr := envOr("LISTEN_ADDR", ":8090")
|
||||
ffmpegBin := envOr("FFMPEG_BIN", "ffmpeg")
|
||||
maxInputMB := envInt("MAX_INPUT_MB", 32)
|
||||
jobTimeoutSecs := envInt("JOB_TIMEOUT_SECS", 60)
|
||||
|
||||
// Fail fast if ffmpeg is missing — easier to debug at container start
|
||||
// than to surface cryptic errors per-request.
|
||||
if _, err := exec.LookPath(ffmpegBin); err != nil {
|
||||
log.Fatalf("ffmpeg not found in PATH (looked for %q): %v", ffmpegBin, err)
|
||||
}
|
||||
|
||||
srv := &server{
|
||||
ffmpegBin: ffmpegBin,
|
||||
maxInputSize: int64(maxInputMB) * 1024 * 1024,
|
||||
jobTimeout: time.Duration(jobTimeoutSecs) * time.Second,
|
||||
}
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/scrub/video", srv.scrubVideo)
|
||||
mux.HandleFunc("/scrub/audio", srv.scrubAudio)
|
||||
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
|
||||
_, _ = w.Write([]byte("ok"))
|
||||
})
|
||||
|
||||
log.Printf("media-sidecar: listening on %s, ffmpeg=%s, max_input=%d MiB, timeout=%ds",
|
||||
addr, ffmpegBin, maxInputMB, jobTimeoutSecs)
|
||||
if err := http.ListenAndServe(addr, mux); err != nil {
|
||||
log.Fatalf("ListenAndServe: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
type server struct {
|
||||
ffmpegBin string
|
||||
maxInputSize int64
|
||||
jobTimeout time.Duration
|
||||
}
|
||||
|
||||
func (s *server) scrubVideo(w http.ResponseWriter, r *http.Request) {
|
||||
body, err := s.readLimited(r)
|
||||
if err != nil {
|
||||
httpErr(w, err.Error(), http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
|
||||
defer cancel()
|
||||
// Video path: re-encode with metadata strip, H.264 CRF 28, opus audio.
|
||||
// Output format is MP4 (widest client compatibility).
|
||||
args := []string{
|
||||
"-hide_banner", "-loglevel", "error",
|
||||
"-i", "pipe:0",
|
||||
"-map", "0:v", "-map", "0:a?",
|
||||
"-map_metadata", "-1",
|
||||
"-c:v", "libx264", "-preset", "fast", "-crf", "28",
|
||||
"-c:a", "libopus", "-b:a", "64k",
|
||||
"-movflags", "+faststart+frag_keyframe",
|
||||
"-f", "mp4",
|
||||
"pipe:1",
|
||||
}
|
||||
out, ffErr, err := s.runFFmpeg(ctx, args, body)
|
||||
if err != nil {
|
||||
log.Printf("video scrub failed: %v | stderr=%s", err, ffErr)
|
||||
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
|
||||
return
|
||||
}
|
||||
w.Header().Set("Content-Type", "video/mp4")
|
||||
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
|
||||
_, _ = w.Write(out)
|
||||
}
|
||||
|
||||
func (s *server) scrubAudio(w http.ResponseWriter, r *http.Request) {
|
||||
body, err := s.readLimited(r)
|
||||
if err != nil {
|
||||
httpErr(w, err.Error(), http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
|
||||
defer cancel()
|
||||
args := []string{
|
||||
"-hide_banner", "-loglevel", "error",
|
||||
"-i", "pipe:0",
|
||||
"-vn", "-map", "0:a",
|
||||
"-map_metadata", "-1",
|
||||
"-c:a", "libopus", "-b:a", "64k",
|
||||
"-f", "ogg",
|
||||
"pipe:1",
|
||||
}
|
||||
out, ffErr, err := s.runFFmpeg(ctx, args, body)
|
||||
if err != nil {
|
||||
log.Printf("audio scrub failed: %v | stderr=%s", err, ffErr)
|
||||
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
|
||||
return
|
||||
}
|
||||
w.Header().Set("Content-Type", "audio/ogg")
|
||||
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
|
||||
_, _ = w.Write(out)
|
||||
}
|
||||
|
||||
func (s *server) runFFmpeg(ctx context.Context, args []string, input []byte) ([]byte, string, error) {
|
||||
cmd := exec.CommandContext(ctx, s.ffmpegBin, args...)
|
||||
cmd.Stdin = bytes.NewReader(input)
|
||||
var stdout, stderr bytes.Buffer
|
||||
cmd.Stdout = &stdout
|
||||
cmd.Stderr = &stderr
|
||||
err := cmd.Run()
|
||||
if err != nil {
|
||||
return nil, stderr.String(), err
|
||||
}
|
||||
return stdout.Bytes(), stderr.String(), nil
|
||||
}
|
||||
|
||||
func (s *server) readLimited(r *http.Request) ([]byte, error) {
|
||||
if r.Method != http.MethodPost {
|
||||
return nil, fmt.Errorf("method not allowed")
|
||||
}
|
||||
limited := io.LimitReader(r.Body, s.maxInputSize+1)
|
||||
buf, err := io.ReadAll(limited)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("read body: %w", err)
|
||||
}
|
||||
if int64(len(buf)) > s.maxInputSize {
|
||||
return nil, fmt.Errorf("input exceeds %d bytes", s.maxInputSize)
|
||||
}
|
||||
return buf, nil
|
||||
}
|
||||
|
||||
func httpErr(w http.ResponseWriter, msg string, status int) {
|
||||
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
|
||||
w.WriteHeader(status)
|
||||
_, _ = w.Write([]byte(msg))
|
||||
}
|
||||
|
||||
func envOr(k, d string) string {
|
||||
if v := os.Getenv(k); v != "" {
|
||||
return v
|
||||
}
|
||||
return d
|
||||
}
|
||||
func envInt(k string, d int) int {
|
||||
v := os.Getenv(k)
|
||||
if v == "" {
|
||||
return d
|
||||
}
|
||||
n, err := strconv.Atoi(v)
|
||||
if err != nil {
|
||||
return d
|
||||
}
|
||||
return n
|
||||
}
|
||||
Reference in New Issue
Block a user