feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar

Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
vsecoder
2026-04-18 19:15:14 +03:00
parent 126658f294
commit f885264d23
8 changed files with 830 additions and 35 deletions

View File

@@ -41,6 +41,7 @@ import (
"go-blockchain/consensus" "go-blockchain/consensus"
"go-blockchain/economy" "go-blockchain/economy"
"go-blockchain/identity" "go-blockchain/identity"
"go-blockchain/media"
"go-blockchain/node" "go-blockchain/node"
"go-blockchain/node/version" "go-blockchain/node/version"
"go-blockchain/p2p" "go-blockchain/p2p"
@@ -79,6 +80,8 @@ func main() {
mailboxDB := flag.String("mailbox-db", envOr("DCHAIN_MAILBOX_DB", "./mailboxdata"), "BadgerDB directory for relay mailbox (env: DCHAIN_MAILBOX_DB)") mailboxDB := flag.String("mailbox-db", envOr("DCHAIN_MAILBOX_DB", "./mailboxdata"), "BadgerDB directory for relay mailbox (env: DCHAIN_MAILBOX_DB)")
feedDB := flag.String("feed-db", envOr("DCHAIN_FEED_DB", "./feeddata"), "BadgerDB directory for social-feed post bodies (env: DCHAIN_FEED_DB)") feedDB := flag.String("feed-db", envOr("DCHAIN_FEED_DB", "./feeddata"), "BadgerDB directory for social-feed post bodies (env: DCHAIN_FEED_DB)")
feedTTLDays := flag.Int("feed-ttl-days", int(envUint64Or("DCHAIN_FEED_TTL_DAYS", 30)), "how long feed posts are retained before auto-eviction (env: DCHAIN_FEED_TTL_DAYS)") feedTTLDays := flag.Int("feed-ttl-days", int(envUint64Or("DCHAIN_FEED_TTL_DAYS", 30)), "how long feed posts are retained before auto-eviction (env: DCHAIN_FEED_TTL_DAYS)")
mediaSidecarURL := flag.String("media-sidecar-url", envOr("DCHAIN_MEDIA_SIDECAR_URL", ""), "URL of the media scrubber sidecar (FFmpeg-based video/audio re-encoder). Empty = images only (env: DCHAIN_MEDIA_SIDECAR_URL)")
allowUnscrubbedVideo := flag.Bool("allow-unscrubbed-video", envBoolOr("DCHAIN_ALLOW_UNSCRUBBED_VIDEO", false), "accept video uploads without server-side metadata scrubbing (only when no sidecar is configured). DANGEROUS — leaves EXIF/GPS/author tags intact (env: DCHAIN_ALLOW_UNSCRUBBED_VIDEO)")
govContractID := flag.String("governance-contract", envOr("DCHAIN_GOVERNANCE_CONTRACT", ""), "governance contract ID for dynamic chain parameters (env: DCHAIN_GOVERNANCE_CONTRACT)") govContractID := flag.String("governance-contract", envOr("DCHAIN_GOVERNANCE_CONTRACT", ""), "governance contract ID for dynamic chain parameters (env: DCHAIN_GOVERNANCE_CONTRACT)")
joinSeedURL := flag.String("join", envOr("DCHAIN_JOIN", ""), "bootstrap from a running node: comma-separated HTTP URLs (env: DCHAIN_JOIN)") joinSeedURL := flag.String("join", envOr("DCHAIN_JOIN", ""), "bootstrap from a running node: comma-separated HTTP URLs (env: DCHAIN_JOIN)")
// Observer mode: the node participates in the P2P network, applies // Observer mode: the node participates in the P2P network, applies
@@ -938,9 +941,22 @@ func main() {
}, },
} }
// Media scrubber — strips EXIF/GPS/author/camera metadata from every
// uploaded image in-process, and forwards video/audio to the FFmpeg
// sidecar when configured. Mandatory for all /feed/publish traffic.
scrubber := media.NewScrubber(media.SidecarConfig{URL: *mediaSidecarURL})
if *mediaSidecarURL != "" {
log.Printf("[NODE] media sidecar: %s", *mediaSidecarURL)
} else {
log.Printf("[NODE] media sidecar: not configured (images scrubbed in-process; video/audio %s)",
map[bool]string{true: "stored unscrubbed (DANGEROUS)", false: "rejected"}[*allowUnscrubbedVideo])
}
feedConfig := node.FeedConfig{ feedConfig := node.FeedConfig{
Mailbox: feedMailbox, Mailbox: feedMailbox,
HostingRelayPub: id.PubKeyHex(), HostingRelayPub: id.PubKeyHex(),
Scrubber: scrubber,
AllowUnscrubbedVideo: *allowUnscrubbedVideo,
GetPost: chain.Post, GetPost: chain.Post,
LikeCount: chain.LikeCount, LikeCount: chain.LikeCount,
HasLiked: chain.HasLiked, HasLiked: chain.HasLiked,

View File

@@ -0,0 +1,35 @@
# media-sidecar — FFmpeg-based metadata scrubber for DChain node.
#
# Build: docker build -t dchain/media-sidecar -f docker/media-sidecar/Dockerfile .
# Run: docker run -p 8090:8090 dchain/media-sidecar
# Compose: see docker-compose.yml; node points DCHAIN_MEDIA_SIDECAR_URL at it.
#
# Stage 1 — build a tiny static Go binary.
FROM golang:1.22-alpine AS build
WORKDIR /src
# Copy only what we need (the sidecar main is self-contained, no module
# deps on the rest of the repo, so this is a cheap, cache-friendly build).
COPY docker/media-sidecar/main.go ./main.go
RUN go mod init dchain-media-sidecar 2>/dev/null || true
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /out/media-sidecar ./main.go
# Stage 2 — runtime with ffmpeg. Alpine has a lean ffmpeg build (~90 MB
# total image, most of it codecs we actually need).
FROM alpine:3.19
RUN apk add --no-cache ffmpeg ca-certificates \
&& addgroup -S dchain && adduser -S -G dchain dchain
COPY --from=build /out/media-sidecar /usr/local/bin/media-sidecar
USER dchain
EXPOSE 8090
# Pin sensible defaults; operator overrides via docker-compose env.
ENV LISTEN_ADDR=:8090 \
FFMPEG_BIN=ffmpeg \
MAX_INPUT_MB=32 \
JOB_TIMEOUT_SECS=60
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
CMD wget -qO- http://127.0.0.1:8090/healthz || exit 1
ENTRYPOINT ["/usr/local/bin/media-sidecar"]

View File

@@ -0,0 +1,201 @@
// Media scrubber sidecar — tiny HTTP service that re-encodes video/audio
// through ffmpeg with all metadata stripped. Runs alongside the DChain
// node in docker-compose; the node calls it via DCHAIN_MEDIA_SIDECAR_URL.
//
// Contract (matches media.Scrubber in the node):
//
// POST /scrub/video Content-Type: video/* body: raw bytes
// → 200, Content-Type: video/mp4, body: cleaned bytes
// POST /scrub/audio Content-Type: audio/* body: raw bytes
// → 200, Content-Type: audio/ogg, body: cleaned bytes
//
// ffmpeg flags of note:
//
// -map_metadata -1 drop ALL metadata streams (title, author, encoder,
// GPS location atoms, XMP blocks, etc.)
// -map 0:v -map 0:a keep only video and audio streams — dumps attached
// pictures, subtitles, data channels that might carry
// hidden info
// -movflags +faststart
// put MOOV atom at the front so clients can start
// playback before the full download lands
// -c:v libx264 -crf 28 -preset fast
// h264 with aggressive-but-not-painful CRF; ~70-80%
// size reduction on phone-camera source
// -c:a libopus -b:a 64k
// opus at 64 kbps is transparent for speech, fine
// for music at feed quality
//
// Environment:
//
// LISTEN_ADDR default ":8090"
// FFMPEG_BIN default "ffmpeg" (must be in PATH)
// MAX_INPUT_MB default 32 — reject anything larger pre-ffmpeg
// JOB_TIMEOUT_SECS default 60
//
// The service is deliberately dumb: no queuing, no DB, no state. If you
// need higher throughput, run N replicas behind a TCP load balancer.
package main
import (
"bytes"
"context"
"fmt"
"io"
"log"
"net/http"
"os"
"os/exec"
"strconv"
"time"
)
func main() {
addr := envOr("LISTEN_ADDR", ":8090")
ffmpegBin := envOr("FFMPEG_BIN", "ffmpeg")
maxInputMB := envInt("MAX_INPUT_MB", 32)
jobTimeoutSecs := envInt("JOB_TIMEOUT_SECS", 60)
// Fail fast if ffmpeg is missing — easier to debug at container start
// than to surface cryptic errors per-request.
if _, err := exec.LookPath(ffmpegBin); err != nil {
log.Fatalf("ffmpeg not found in PATH (looked for %q): %v", ffmpegBin, err)
}
srv := &server{
ffmpegBin: ffmpegBin,
maxInputSize: int64(maxInputMB) * 1024 * 1024,
jobTimeout: time.Duration(jobTimeoutSecs) * time.Second,
}
mux := http.NewServeMux()
mux.HandleFunc("/scrub/video", srv.scrubVideo)
mux.HandleFunc("/scrub/audio", srv.scrubAudio)
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("ok"))
})
log.Printf("media-sidecar: listening on %s, ffmpeg=%s, max_input=%d MiB, timeout=%ds",
addr, ffmpegBin, maxInputMB, jobTimeoutSecs)
if err := http.ListenAndServe(addr, mux); err != nil {
log.Fatalf("ListenAndServe: %v", err)
}
}
type server struct {
ffmpegBin string
maxInputSize int64
jobTimeout time.Duration
}
func (s *server) scrubVideo(w http.ResponseWriter, r *http.Request) {
body, err := s.readLimited(r)
if err != nil {
httpErr(w, err.Error(), http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
defer cancel()
// Video path: re-encode with metadata strip, H.264 CRF 28, opus audio.
// Output format is MP4 (widest client compatibility).
args := []string{
"-hide_banner", "-loglevel", "error",
"-i", "pipe:0",
"-map", "0:v", "-map", "0:a?",
"-map_metadata", "-1",
"-c:v", "libx264", "-preset", "fast", "-crf", "28",
"-c:a", "libopus", "-b:a", "64k",
"-movflags", "+faststart+frag_keyframe",
"-f", "mp4",
"pipe:1",
}
out, ffErr, err := s.runFFmpeg(ctx, args, body)
if err != nil {
log.Printf("video scrub failed: %v | stderr=%s", err, ffErr)
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
return
}
w.Header().Set("Content-Type", "video/mp4")
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
_, _ = w.Write(out)
}
func (s *server) scrubAudio(w http.ResponseWriter, r *http.Request) {
body, err := s.readLimited(r)
if err != nil {
httpErr(w, err.Error(), http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), s.jobTimeout)
defer cancel()
args := []string{
"-hide_banner", "-loglevel", "error",
"-i", "pipe:0",
"-vn", "-map", "0:a",
"-map_metadata", "-1",
"-c:a", "libopus", "-b:a", "64k",
"-f", "ogg",
"pipe:1",
}
out, ffErr, err := s.runFFmpeg(ctx, args, body)
if err != nil {
log.Printf("audio scrub failed: %v | stderr=%s", err, ffErr)
httpErr(w, "ffmpeg failed: "+err.Error(), http.StatusUnprocessableEntity)
return
}
w.Header().Set("Content-Type", "audio/ogg")
w.Header().Set("Content-Length", strconv.Itoa(len(out)))
_, _ = w.Write(out)
}
func (s *server) runFFmpeg(ctx context.Context, args []string, input []byte) ([]byte, string, error) {
cmd := exec.CommandContext(ctx, s.ffmpegBin, args...)
cmd.Stdin = bytes.NewReader(input)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err := cmd.Run()
if err != nil {
return nil, stderr.String(), err
}
return stdout.Bytes(), stderr.String(), nil
}
func (s *server) readLimited(r *http.Request) ([]byte, error) {
if r.Method != http.MethodPost {
return nil, fmt.Errorf("method not allowed")
}
limited := io.LimitReader(r.Body, s.maxInputSize+1)
buf, err := io.ReadAll(limited)
if err != nil {
return nil, fmt.Errorf("read body: %w", err)
}
if int64(len(buf)) > s.maxInputSize {
return nil, fmt.Errorf("input exceeds %d bytes", s.maxInputSize)
}
return buf, nil
}
func httpErr(w http.ResponseWriter, msg string, status int) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
w.WriteHeader(status)
_, _ = w.Write([]byte(msg))
}
func envOr(k, d string) string {
if v := os.Getenv(k); v != "" {
return v
}
return d
}
func envInt(k string, d int) int {
v := os.Getenv(k)
if v == "" {
return d
}
n, err := strconv.Atoi(v)
if err != nil {
return d
}
return n
}

21
go.mod
View File

@@ -1,6 +1,6 @@
module go-blockchain module go-blockchain
go 1.21 go 1.25.0
require ( require (
github.com/dgraph-io/badger/v4 v4.2.0 github.com/dgraph-io/badger/v4 v4.2.0
@@ -9,7 +9,12 @@ require (
github.com/libp2p/go-libp2p-pubsub v0.10.0 github.com/libp2p/go-libp2p-pubsub v0.10.0
github.com/multiformats/go-multiaddr v0.12.3 github.com/multiformats/go-multiaddr v0.12.3
github.com/tetratelabs/wazero v1.7.3 github.com/tetratelabs/wazero v1.7.3
golang.org/x/crypto v0.18.0 golang.org/x/crypto v0.49.0
)
require (
golang.org/x/image v0.39.0
golang.org/x/telemetry v0.0.0-20260311193753-579e4da9a98c // indirect
) )
require ( require (
@@ -114,12 +119,12 @@ require (
go.uber.org/multierr v1.11.0 // indirect go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.26.0 // indirect go.uber.org/zap v1.26.0 // indirect
golang.org/x/exp v0.0.0-20231006140011-7918f672742d // indirect golang.org/x/exp v0.0.0-20231006140011-7918f672742d // indirect
golang.org/x/mod v0.13.0 // indirect golang.org/x/mod v0.34.0 // indirect
golang.org/x/net v0.17.0 // indirect golang.org/x/net v0.52.0 // indirect
golang.org/x/sync v0.4.0 // indirect golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.16.0 // indirect golang.org/x/sys v0.42.0 // indirect
golang.org/x/text v0.14.0 // indirect golang.org/x/text v0.36.0 // indirect
golang.org/x/tools v0.14.0 // indirect golang.org/x/tools v0.43.0 // indirect
gonum.org/v1/gonum v0.13.0 // indirect gonum.org/v1/gonum v0.13.0 // indirect
google.golang.org/protobuf v1.31.0 // indirect google.golang.org/protobuf v1.31.0 // indirect
lukechampine.com/blake3 v1.2.1 // indirect lukechampine.com/blake3 v1.2.1 // indirect

36
go.sum
View File

@@ -123,8 +123,8 @@ github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.3/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.9 h1:O2Tfq5qg4qc4AmwVlvv0oLiVAGB7enBSJ2x2DqQFi38= github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-github v17.0.0+incompatible/go.mod h1:zLgOLi98H3fifZn+44m+umXrS52loVEgC2AApnigrVQ= github.com/google/go-github v17.0.0+incompatible/go.mod h1:zLgOLi98H3fifZn+44m+umXrS52loVEgC2AApnigrVQ=
github.com/google/go-querystring v1.0.0/go.mod h1:odCYkC5MyYFN7vkCjXpyrEuKhc/BUO6wN/zVPAxq5ck= github.com/google/go-querystring v1.0.0/go.mod h1:odCYkC5MyYFN7vkCjXpyrEuKhc/BUO6wN/zVPAxq5ck=
github.com/google/gopacket v1.1.19 h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8= github.com/google/gopacket v1.1.19 h1:ves8RnFZPGiFnTS0uPQStjwru6uO6h+nlr9j6fL7kF8=
@@ -443,11 +443,13 @@ golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8U
golang.org/x/crypto v0.0.0-20200602180216-279210d13fed/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= golang.org/x/crypto v0.0.0-20200602180216-279210d13fed/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210322153248-0c34fe9e7dc2/go.mod h1:T9bdIzuCu7OtxOm1hfPfRQxPLYneinmdGuTeoZ9dtd4= golang.org/x/crypto v0.0.0-20210322153248-0c34fe9e7dc2/go.mod h1:T9bdIzuCu7OtxOm1hfPfRQxPLYneinmdGuTeoZ9dtd4=
golang.org/x/crypto v0.18.0 h1:PGVlW0xEltQnzFZ55hkuX5+KLyrMYhHld1YHO4AKcdc= golang.org/x/crypto v0.49.0 h1:+Ng2ULVvLHnJ/ZFEq4KdcDd/cfjrrjjNSXNzxg0Y4U4=
golang.org/x/crypto v0.18.0/go.mod h1:R0j02AL6hcrfOiy9T4ZYp/rcWeMxM3L6QYxlOuEG1mg= golang.org/x/crypto v0.49.0/go.mod h1:ErX4dUh2UM+CFYiXZRTcMpEcN8b/1gxEuv3nODoYtCA=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI= golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo= golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
golang.org/x/image v0.39.0 h1:skVYidAEVKgn8lZ602XO75asgXBgLj9G/FE3RbuPFww=
golang.org/x/image v0.39.0/go.mod h1:sIbmppfU+xFLPIG0FoVUTvyBMmgng1/XAMhQ2ft0hpA=
golang.org/x/lint v0.0.0-20180702182130-06c8688daad7/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= golang.org/x/lint v0.0.0-20180702182130-06c8688daad7/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU= golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
@@ -459,8 +461,8 @@ golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzB
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.13.0 h1:I/DsJXRlw/8l/0c24sM9yb0T4z9liZTduXvdAWYiysY= golang.org/x/mod v0.34.0 h1:xIHgNUUnW6sYkcM5Jleh05DvLOtwc6RitGHbDk4akRI=
golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.34.0/go.mod h1:ykgH52iCZe79kzLLMhyCUzhMci+nQj+0XkbXpNYtVjY=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
@@ -479,8 +481,8 @@ golang.org/x/net v0.0.0-20210119194325-5f4716e94777/go.mod h1:m0MpNAwzfU5UDzcl9v
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM=
golang.org/x/net v0.0.0-20210423184538-5f58ad60dda6/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk= golang.org/x/net v0.0.0-20210423184538-5f58ad60dda6/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk=
golang.org/x/net v0.17.0 h1:pVaXccu2ozPjCXewfr1S7xza/zcXTity9cCdXQYSjIM= golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
golang.org/x/net v0.17.0/go.mod h1:NxSsAGuq816PNPmqtQdLE42eU2Fs7NoRIZrHJAlaCOE= golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20181017192945-9dcd33a902f4/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/oauth2 v0.0.0-20181017192945-9dcd33a902f4/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20181203162652-d668ce993890/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/oauth2 v0.0.0-20181203162652-d668ce993890/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
@@ -494,8 +496,8 @@ golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.4.0 h1:zxkM55ReGkDlKSM+Fu41A+zmbZuaPVbGMzvvdUPznYQ= golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.4.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.0.0-20180810173357-98c5dad5d1a0/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20180810173357-98c5dad5d1a0/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@@ -517,15 +519,17 @@ golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20221010170243-090e33056c14/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20221010170243-090e33056c14/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.16.0 h1:xWw16ngr6ZMtmxDyKyIgsE93KNKz5HKmMa3b8ALHidU= golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
golang.org/x/sys v0.16.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/telemetry v0.0.0-20260311193753-579e4da9a98c h1:6a8FdnNk6bTXBjR4AGKFgUKuo+7GnR3FX5L7CbveeZc=
golang.org/x/telemetry v0.0.0-20260311193753-579e4da9a98c/go.mod h1:TpUTTEp9frx7rTdLpC9gFG9kdI7zVLFTFFlqaH2Cncw=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.14.0 h1:ScX5w1eTa3QqT8oi6+ziP7dTV1S2+ALU0bI+0zXKWiQ= golang.org/x/text v0.36.0 h1:JfKh3XmcRPqZPKevfXVpI1wXPTqbkE5f7JA92a55Yxg=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU= golang.org/x/text v0.36.0/go.mod h1:NIdBknypM8iqVmPiuco0Dh6P5Jcdk8lJL0CUebqK164=
golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/tools v0.0.0-20180828015842-6cd1fcedba52/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20180828015842-6cd1fcedba52/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
@@ -545,8 +549,8 @@ golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapK
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.1.5/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk= golang.org/x/tools v0.1.5/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk=
golang.org/x/tools v0.14.0 h1:jvNa2pY0M4r62jkRQ6RwEZZyPcymeL9XZMLBbV7U2nc= golang.org/x/tools v0.43.0 h1:12BdW9CeB3Z+J/I/wj34VMl8X+fEXBxVR90JeMX5E7s=
golang.org/x/tools v0.14.0/go.mod h1:uYBEerGOWcJyEORxN+Ek8+TT266gXkNlHdJBwexUsBg= golang.org/x/tools v0.43.0/go.mod h1:uHkMso649BX2cZK6+RpuIPXS3ho2hZo4FVwfoy1vIk0=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=

332
media/scrub.go Normal file
View File

@@ -0,0 +1,332 @@
// Package media contains metadata scrubbing and re-compression helpers for
// files uploaded to the social feed.
//
// Why this exists
// ---------------
// Every image file carries an EXIF block that can leak:
// - GPS coordinates where the photo was taken
// - Camera model, serial number, lens
// - Original timestamp (even if the user clears their clock)
// - Software name / version
// - Author / copyright fields
// - A small embedded thumbnail that may leak even after cropping
//
// Videos and audio have analogous containers (MOV/MP4 atoms, ID3 tags,
// Matroska tags). For a social feed that prides itself on privacy we
// can't trust the client to have stripped all of it — we scrub again
// on the server before persisting the file to the feed mailbox.
//
// Strategy
// --------
// Images: decode → strip any ICC profile → re-encode with the stdlib
// JPEG/PNG encoders. These encoders DO NOT emit EXIF, so re-encoding is
// a complete scrub by construction. Output is JPEG (quality 75) unless
// the input is a lossless PNG small enough to keep as PNG.
//
// Videos: require an external ffmpeg worker (the "media sidecar") —
// cannot do this in pure Go without a huge CGo footprint. A tiny HTTP
// contract (see docs/media-sidecar.md) lets node operators plug in
// compressO-like services behind an env var. If the sidecar is not
// configured, videos are stored as-is with a LOG WARNING — the operator
// decides whether to accept that risk.
//
// Magic-byte detection: the claimed Content-Type must match what's
// actually in the bytes; mismatches are rejected (prevents a PDF
// labelled as image/jpeg from bypassing the scrubber).
package media
import (
"bytes"
"context"
"errors"
"fmt"
"image"
"image/jpeg"
"image/png"
"io"
"net/http"
"strings"
"time"
// Register decoders for the formats we accept.
_ "image/gif"
_ "golang.org/x/image/webp"
)
// Errors returned by scrubber.
var (
// ErrUnsupportedMIME is returned when the caller claims a MIME we
// don't know how to scrub.
ErrUnsupportedMIME = errors.New("unsupported media type")
// ErrMIMEMismatch is returned when the bytes don't match the claimed
// MIME — blocks a crafted upload from bypassing the scrubber.
ErrMIMEMismatch = errors.New("actual bytes don't match claimed content-type")
// ErrSidecarUnavailable is returned when video scrubbing was required
// but no external worker is configured and the operator policy does
// not allow unscrubbed video storage.
ErrSidecarUnavailable = errors.New("media sidecar required for video scrubbing but not configured")
)
// ── Image scrubbing ────────────────────────────────────────────────────────
// ImageMaxDim caps the larger dimension of a stored image. 1080px is the
// "full-HD-ish" sweet spot — larger rarely matters on a phone feed and
// drops file size dramatically. The client is expected to have downscaled
// already (expo-image-manipulator), but we re-apply the cap server-side
// as a defence-in-depth and to guarantee uniform storage cost.
const ImageMaxDim = 1080
// ImageJPEGQuality is the re-encode quality for JPEG output. 75 balances
// perceived quality with size — below 60 artifacts become visible, above
// 85 we're paying for noise we can't see.
const ImageJPEGQuality = 75
// ScrubImage decodes src, removes all metadata (by way of re-encoding
// with the stdlib JPEG encoder), optionally downscales to ImageMaxDim,
// and returns the clean JPEG bytes + the canonical MIME the caller
// should store.
//
// claimedMIME is what the client said the file is; if the bytes don't
// match, ErrMIMEMismatch is returned. Accepts image/jpeg, image/png,
// image/gif, image/webp on input; output is always image/jpeg (one less
// branch in the reader, and no decoder has to touch EXIF).
func ScrubImage(src []byte, claimedMIME string) (out []byte, outMIME string, err error) {
actualMIME := detectMIME(src)
if !isImageMIME(actualMIME) {
return nil, "", fmt.Errorf("%w: %s", ErrUnsupportedMIME, actualMIME)
}
if claimedMIME != "" && !mimesCompatible(claimedMIME, actualMIME) {
return nil, "", fmt.Errorf("%w: claimed %s, actual %s",
ErrMIMEMismatch, claimedMIME, actualMIME)
}
img, _, err := image.Decode(bytes.NewReader(src))
if err != nil {
return nil, "", fmt.Errorf("decode image: %w", err)
}
// Downscale if needed. We use a draw-based nearest-neighbour style
// approach via stdlib to avoid pulling in x/image/draw unless we need
// higher-quality resampling. For feed thumbnails nearest is fine since
// content is typically downsampled already.
if bounds := img.Bounds(); bounds.Dx() > ImageMaxDim || bounds.Dy() > ImageMaxDim {
img = downscale(img, ImageMaxDim)
}
// Re-encode as JPEG. stdlib's jpeg.Encode writes ZERO metadata —
// no EXIF, no ICC, no XMP, no MakerNote. That's the scrub.
var buf bytes.Buffer
if err := jpeg.Encode(&buf, img, &jpeg.Options{Quality: ImageJPEGQuality}); err != nil {
return nil, "", fmt.Errorf("encode jpeg: %w", err)
}
return buf.Bytes(), "image/jpeg", nil
}
// downscale returns a new image whose larger dimension equals maxDim,
// preserving aspect ratio. Uses stdlib image.NewRGBA + a nearest-neighbour
// copy loop — good enough for feed images that are already compressed.
func downscale(src image.Image, maxDim int) image.Image {
b := src.Bounds()
w, h := b.Dx(), b.Dy()
var nw, nh int
if w >= h {
nw = maxDim
nh = h * maxDim / w
} else {
nh = maxDim
nw = w * maxDim / h
}
dst := image.NewRGBA(image.Rect(0, 0, nw, nh))
for y := 0; y < nh; y++ {
sy := b.Min.Y + y*h/nh
for x := 0; x < nw; x++ {
sx := b.Min.X + x*w/nw
dst.Set(x, y, src.At(sx, sy))
}
}
return dst
}
// pngEncoder is kept for callers that explicitly want lossless output
// (future — not used by ScrubImage which always produces JPEG).
var pngEncoder = png.Encoder{CompressionLevel: png.BestCompression}
// ── MIME detection & validation ────────────────────────────────────────────
// detectMIME inspects magic bytes to figure out what the data actually is,
// independent of what the caller claimed. Matches the subset of types
// stdlib http.DetectContentType handles, refined for our use.
func detectMIME(data []byte) string {
if len(data) == 0 {
return ""
}
// http.DetectContentType handles most formats correctly (JPEG, PNG,
// GIF, WebP, MP4, WebM, MP3, OGG). We only refine when needed.
return strings.SplitN(http.DetectContentType(data), ";", 2)[0]
}
func isImageMIME(m string) bool {
switch m {
case "image/jpeg", "image/png", "image/gif", "image/webp":
return true
}
return false
}
func isVideoMIME(m string) bool {
switch m {
case "video/mp4", "video/webm", "video/quicktime":
return true
}
return false
}
func isAudioMIME(m string) bool {
switch m {
case "audio/mpeg", "audio/ogg", "audio/webm", "audio/wav", "audio/mp4":
return true
}
return false
}
// mimesCompatible tolerates small aliases (image/jpg vs image/jpeg, etc.)
// so a misspelled client header doesn't cause a 400. Claimed MIME is
// the caller's; actual is from magic bytes — we trust magic bytes when
// they disagree with a known-silly alias.
func mimesCompatible(claimed, actual string) bool {
claimed = strings.ToLower(strings.TrimSpace(claimed))
if claimed == actual {
return true
}
aliases := map[string]string{
"image/jpg": "image/jpeg",
"image/x-png": "image/png",
"video/mov": "video/quicktime",
}
if canon, ok := aliases[claimed]; ok && canon == actual {
return true
}
return false
}
// ── Video scrubbing (sidecar) ──────────────────────────────────────────────
// SidecarConfig describes how to reach an external media scrubber worker
// (typically a tiny FFmpeg-wrapper HTTP service running alongside the
// node — see docs/media-sidecar.md). Leaving URL empty disables sidecar
// use; callers then decide whether to fall back to "store as-is and warn"
// or to reject video uploads entirely.
type SidecarConfig struct {
// URL is the base URL of the sidecar. Expected routes:
//
// POST /scrub/video body: raw bytes → returns scrubbed bytes
// POST /scrub/audio body: raw bytes → returns scrubbed bytes
//
// Both MUST strip metadata (-map_metadata -1 in ffmpeg terms) and
// re-encode with a sane bitrate cap (default: H.264 CRF 28 for
// video, libopus 96k for audio). See the reference implementation
// at docker/media-sidecar/ in this repo.
URL string
// Timeout guards against a hung sidecar. 30s is enough for a 5 MB
// video on modest hardware; larger inputs should be pre-compressed
// by the client.
Timeout time.Duration
// MaxInputBytes caps what we forward to the sidecar (protects
// against an attacker tying up the sidecar on a 1 GB upload).
MaxInputBytes int64
}
// Scrubber bundles image + sidecar capabilities. Create once at node
// startup and reuse.
type Scrubber struct {
sidecar SidecarConfig
http *http.Client
}
// NewScrubber returns a Scrubber. sidecar.URL may be empty (image-only
// mode) — in that case ScrubVideo / ScrubAudio return ErrSidecarUnavailable.
func NewScrubber(sidecar SidecarConfig) *Scrubber {
if sidecar.Timeout == 0 {
sidecar.Timeout = 30 * time.Second
}
if sidecar.MaxInputBytes == 0 {
sidecar.MaxInputBytes = 16 * 1024 * 1024 // 16 MiB input → client should have shrunk
}
return &Scrubber{
sidecar: sidecar,
http: &http.Client{
Timeout: sidecar.Timeout,
},
}
}
// Scrub picks the right strategy based on the actual MIME of the bytes.
// Returns the cleaned payload and the canonical MIME to store under.
func (s *Scrubber) Scrub(ctx context.Context, src []byte, claimedMIME string) ([]byte, string, error) {
actual := detectMIME(src)
if claimedMIME != "" && !mimesCompatible(claimedMIME, actual) {
return nil, "", fmt.Errorf("%w: claimed %s, actual %s",
ErrMIMEMismatch, claimedMIME, actual)
}
switch {
case isImageMIME(actual):
// Images handled in-process, no sidecar needed.
return ScrubImage(src, claimedMIME)
case isVideoMIME(actual):
return s.scrubViaSidecar(ctx, "/scrub/video", src, actual)
case isAudioMIME(actual):
return s.scrubViaSidecar(ctx, "/scrub/audio", src, actual)
default:
return nil, "", fmt.Errorf("%w: %s", ErrUnsupportedMIME, actual)
}
}
// scrubViaSidecar POSTs src to the configured sidecar route and returns
// the response bytes. Errors:
// - ErrSidecarUnavailable if sidecar.URL is empty
// - wrapping the HTTP error otherwise
func (s *Scrubber) scrubViaSidecar(ctx context.Context, path string, src []byte, actual string) ([]byte, string, error) {
if s.sidecar.URL == "" {
return nil, "", ErrSidecarUnavailable
}
if int64(len(src)) > s.sidecar.MaxInputBytes {
return nil, "", fmt.Errorf("input exceeds sidecar max %d bytes", s.sidecar.MaxInputBytes)
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost,
strings.TrimRight(s.sidecar.URL, "/")+path, bytes.NewReader(src))
if err != nil {
return nil, "", fmt.Errorf("build sidecar request: %w", err)
}
req.Header.Set("Content-Type", actual)
resp, err := s.http.Do(req)
if err != nil {
return nil, "", fmt.Errorf("call sidecar: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096))
return nil, "", fmt.Errorf("sidecar returned %d: %s", resp.StatusCode, string(body))
}
// Limit the reply we buffer — an evil sidecar could try to amplify.
const maxReply = 64 * 1024 * 1024 // 64 MiB hard cap
out, err := io.ReadAll(io.LimitReader(resp.Body, maxReply))
if err != nil {
return nil, "", fmt.Errorf("read sidecar reply: %w", err)
}
respMIME := resp.Header.Get("Content-Type")
if respMIME == "" {
respMIME = actual
}
return out, strings.SplitN(respMIME, ";", 2)[0], nil
}
// IsSidecarConfigured reports whether video/audio scrubbing is available.
// Callers can use this to decide whether to accept video attachments or
// reject them with a clear "this node doesn't support video" message.
func (s *Scrubber) IsSidecarConfigured() bool {
return s.sidecar.URL != ""
}

149
media/scrub_test.go Normal file
View File

@@ -0,0 +1,149 @@
package media
import (
"bytes"
"image"
"image/color"
"image/jpeg"
"testing"
)
// TestScrubImageRemovesEXIF: our scrubber re-encodes via stdlib JPEG, which
// does not preserve EXIF by construction. We verify that a crafted input
// carrying an EXIF marker produces an output without one.
func TestScrubImageRemovesEXIF(t *testing.T) {
// Build a JPEG that explicitly contains an APP1 EXIF segment.
// Structure: JPEG SOI + APP1 with "Exif\x00\x00" header + real image data.
var base bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 8, 8))
for y := 0; y < 8; y++ {
for x := 0; x < 8; x++ {
img.Set(x, y, color.RGBA{uint8(x * 32), uint8(y * 32), 128, 255})
}
}
if err := jpeg.Encode(&base, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode base: %v", err)
}
input := injectEXIF(t, base.Bytes())
if !bytes.Contains(input, []byte("Exif\x00\x00")) {
t.Fatalf("test setup broken: EXIF not injected")
}
// Also drop an identifiable string in the EXIF payload so we can prove
// it's gone.
if !bytes.Contains(input, []byte("SECRETGPS")) {
t.Fatalf("test setup broken: EXIF marker not injected")
}
cleaned, mime, err := ScrubImage(input, "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
if mime != "image/jpeg" {
t.Errorf("mime: got %q, want image/jpeg", mime)
}
// Verify the scrubbed output doesn't contain our canary string.
if bytes.Contains(cleaned, []byte("SECRETGPS")) {
t.Errorf("EXIF canary survived scrub — metadata not stripped")
}
// Verify the output doesn't contain the EXIF segment marker.
if bytes.Contains(cleaned, []byte("Exif\x00\x00")) {
t.Errorf("EXIF header string survived scrub")
}
// Output must still be a valid JPEG.
if _, err := jpeg.Decode(bytes.NewReader(cleaned)); err != nil {
t.Errorf("scrubbed output is not a valid JPEG: %v", err)
}
}
// injectEXIF splices a synthetic APP1 EXIF segment after the JPEG SOI.
// Segment layout: FF E1 <len_hi> <len_lo> "Exif\0\0" + arbitrary payload.
// The payload is NOT valid TIFF — that's fine; stdlib JPEG decoder skips
// unknown APP1 segments rather than aborting.
func injectEXIF(t *testing.T, src []byte) []byte {
t.Helper()
if len(src) < 2 || src[0] != 0xFF || src[1] != 0xD8 {
t.Fatalf("not a JPEG")
}
payload := []byte("Exif\x00\x00" + "SECRETGPS-51.5074N-0.1278W-Canon-EOS-R5")
segmentLen := len(payload) + 2 // +2 = 2 bytes of len field itself
var seg bytes.Buffer
seg.Write([]byte{0xFF, 0xE1})
seg.WriteByte(byte(segmentLen >> 8))
seg.WriteByte(byte(segmentLen & 0xff))
seg.Write(payload)
out := make([]byte, 0, len(src)+seg.Len())
out = append(out, src[:2]...) // SOI
out = append(out, seg.Bytes()...)
out = append(out, src[2:]...)
return out
}
// TestScrubImageMIMEMismatch: rejects bytes that don't match claimed MIME.
func TestScrubImageMIMEMismatch(t *testing.T) {
var buf bytes.Buffer
img := image.NewRGBA(image.Rect(0, 0, 4, 4))
jpeg.Encode(&buf, img, nil)
// Claim it's a PNG.
_, _, err := ScrubImage(buf.Bytes(), "image/png")
if err == nil {
t.Fatalf("expected ErrMIMEMismatch, got nil")
}
}
// TestScrubImageDownscale: images over ImageMaxDim are shrunk.
func TestScrubImageDownscale(t *testing.T) {
// Make a 2000×1000 image — larger dim 2000 > 1080.
img := image.NewRGBA(image.Rect(0, 0, 2000, 1000))
for y := 0; y < 1000; y++ {
for x := 0; x < 2000; x++ {
img.Set(x, y, color.RGBA{128, 64, 200, 255})
}
}
var buf bytes.Buffer
if err := jpeg.Encode(&buf, img, &jpeg.Options{Quality: 80}); err != nil {
t.Fatalf("encode: %v", err)
}
cleaned, _, err := ScrubImage(buf.Bytes(), "image/jpeg")
if err != nil {
t.Fatalf("ScrubImage: %v", err)
}
decoded, err := jpeg.Decode(bytes.NewReader(cleaned))
if err != nil {
t.Fatalf("decode scrubbed: %v", err)
}
b := decoded.Bounds()
if b.Dx() > ImageMaxDim || b.Dy() > ImageMaxDim {
t.Errorf("not downscaled: got %dx%d, want max %d", b.Dx(), b.Dy(), ImageMaxDim)
}
// Aspect ratio roughly preserved (2:1 → 1080:540 with rounding slack).
if b.Dx() != ImageMaxDim {
t.Errorf("larger dim: got %d, want %d", b.Dx(), ImageMaxDim)
}
}
// TestDetectMIME: a few magic-byte cases to ensure magic detection works.
func TestDetectMIME(t *testing.T) {
cases := []struct {
data []byte
want string
}{
{[]byte("\xff\xd8\xff\xe0garbage"), "image/jpeg"},
{[]byte("\x89PNG\r\n\x1a\n..."), "image/png"},
{[]byte("GIF89a..."), "image/gif"},
{[]byte{}, ""},
}
for _, tc := range cases {
got := detectMIME(tc.data)
if got != tc.want {
t.Errorf("detectMIME(%q): got %q want %q", string(tc.data[:min(len(tc.data), 12)]), got, tc.want)
}
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}

View File

@@ -29,11 +29,13 @@ package node
// re-publish to another relay. // re-publish to another relay.
import ( import (
"context"
"crypto/sha256" "crypto/sha256"
"encoding/base64" "encoding/base64"
"encoding/hex" "encoding/hex"
"encoding/json" "encoding/json"
"fmt" "fmt"
"log"
"net/http" "net/http"
"sort" "sort"
"strings" "strings"
@@ -41,6 +43,7 @@ import (
"go-blockchain/blockchain" "go-blockchain/blockchain"
"go-blockchain/identity" "go-blockchain/identity"
"go-blockchain/media"
"go-blockchain/relay" "go-blockchain/relay"
) )
@@ -53,6 +56,18 @@ type FeedConfig struct {
// /feed/publish so the client knows who to put in CREATE_POST tx. // /feed/publish so the client knows who to put in CREATE_POST tx.
HostingRelayPub string HostingRelayPub string
// Scrubber strips metadata from image/video/audio attachments before
// they are stored. MUST be non-nil; a zero Scrubber (NewScrubber with
// empty sidecar URL) still handles images in-process — only video/audio
// require sidecar config.
Scrubber *media.Scrubber
// AllowUnscrubbedVideo controls server behaviour when a video upload
// arrives and no sidecar is configured. false (default) → reject; true
// → store as-is with a warning log. Set via --allow-unscrubbed-video
// flag on the node. Leave false in production.
AllowUnscrubbedVideo bool
// Chain lookups (nil-safe; endpoints degrade gracefully). // Chain lookups (nil-safe; endpoints degrade gracefully).
GetPost func(postID string) (*blockchain.PostRecord, error) GetPost func(postID string) (*blockchain.PostRecord, error)
LikeCount func(postID string) (uint64, error) LikeCount func(postID string) (uint64, error)
@@ -136,6 +151,7 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
// Decode attachment. // Decode attachment.
var attachment []byte var attachment []byte
var attachmentMIME string
if req.AttachmentB64 != "" { if req.AttachmentB64 != "" {
b, err := base64.StdEncoding.DecodeString(req.AttachmentB64) b, err := base64.StdEncoding.DecodeString(req.AttachmentB64)
if err != nil { if err != nil {
@@ -145,11 +161,48 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
} }
} }
attachment = b attachment = b
attachmentMIME = req.AttachmentMIME
// MANDATORY server-side scrub: strip ALL metadata (EXIF/GPS/
// camera/author/ICC/etc.) and re-compress. Client is expected
// to have done a first pass, but we never trust it — a photo
// from a phone carries GPS coordinates by default and the client
// might forget or a hostile client might skip the scrub entirely.
//
// Images are handled in-process (stdlib re-encode to JPEG kills
// all metadata by construction). Videos/audio are forwarded to
// the media sidecar; if none is configured and the operator
// hasn't opted in to AllowUnscrubbedVideo, we reject.
if cfg.Scrubber == nil {
jsonErr(w, fmt.Errorf("media scrubber not configured on this node"), 503)
return
}
ctx, cancel := context.WithTimeout(r.Context(), 60*time.Second)
cleaned, newMIME, err := cfg.Scrubber.Scrub(ctx, attachment, attachmentMIME)
cancel()
if err != nil {
// Graceful video fallback only when explicitly allowed.
if err == media.ErrSidecarUnavailable && cfg.AllowUnscrubbedVideo {
// Keep bytes as-is (operator accepted the risk), just log.
log.Printf("[feed] WARNING: storing unscrubbed video — no sidecar configured (author=%s)", req.Author)
} else {
status := 400
if err == media.ErrSidecarUnavailable {
status = 503
}
jsonErr(w, fmt.Errorf("scrub attachment: %w", err), status)
return
}
} else {
attachment = cleaned
attachmentMIME = newMIME
}
} }
// Content hash binds the body to the on-chain metadata. We hash // Content hash is computed over the scrubbed bytes — that's what
// content+attachment so the client can't publish body-A off-chain // the on-chain tx will reference, and what readers fetch. Binds
// and commit hash-of-body-B on-chain. // the body to the metadata so a misbehaving relay can't substitute
// a different body under the same PostID.
h := sha256.New() h := sha256.New()
h.Write([]byte(req.Content)) h.Write([]byte(req.Content))
h.Write(attachment) h.Write(attachment)
@@ -181,7 +234,7 @@ func feedPublish(cfg FeedConfig) http.HandlerFunc {
Content: req.Content, Content: req.Content,
ContentType: req.ContentType, ContentType: req.ContentType,
Attachment: attachment, Attachment: attachment,
AttachmentMIME: req.AttachmentMIME, AttachmentMIME: attachmentMIME,
ReplyTo: req.ReplyTo, ReplyTo: req.ReplyTo,
QuoteOf: req.QuoteOf, QuoteOf: req.QuoteOf,
} }