Commit Graph

2 Commits

Author SHA1 Message Date
vsecoder
f885264d23 feat(media): mandatory metadata scrubbing on /feed/publish + FFmpeg sidecar
Every photo from a phone camera ships with an EXIF block that leaks:
GPS coordinates, camera model + serial, original timestamp, software
name, author/copyright fields, sometimes an embedded thumbnail that
survives cropping. For a social feed positioned as privacy-friendly
we can't trust the client alone to scrub — a compromised build,
a future plugin, or a hostile fork would simply skip the step and
leak authorship data.

So: server-side scrub is mandatory for every /feed/publish upload.

New package: media

  media/scrub.go
    - Scrubber type with Scrub(ctx, bytes, claimedMIME) → (clean, actualMIME)
    - ScrubImage handles JPEG/PNG/GIF/WebP in-process: decodes, optionally
      downscales to 1080px max-dim, re-encodes as JPEG Q=75. Stdlib
      jpeg.Encode emits ZERO metadata → scrub is complete by construction.
    - Sidecar client (HTTP): posts video/audio bytes to an external
      FFmpeg worker at DCHAIN_MEDIA_SIDECAR_URL
    - Magic-byte MIME detection: rejects uploads where declared MIME
      doesn't match actual bytes (prevents a PDF dressed as image/jpeg
      from bypassing the scrubber)
    - ErrSidecarUnavailable: explicit error when video arrives but no
      sidecar is wired; operator opts in to fallback via
      --allow-unscrubbed-video (default: reject)

  media/scrub_test.go
    - Crafted EXIF segment with "SECRETGPS-…Canon-EOS-R5" canary —
      verifies the string is gone after ScrubImage
    - Downscale test (2000×1000 → 1080×540, aspect preserved)
    - MIME-mismatch rejection
    - Magic-byte detector sanity table

FFmpeg sidecar — new docker/media-sidecar/

  Tiny Go HTTP service (~180 LOC, no non-stdlib deps) that shells out
  to ffmpeg with -map_metadata -1 + -map 0:v -map 0:a? to guarantee
  only video + audio streams survive (no subtitles, attached pictures,
  or data channels that could carry hidden info).

  Re-encode profile:
    video → H.264 CRF 28 preset=fast, Opus 64k, MP4 faststart
    audio → Opus 64k, Ogg container

  Dockerfile: two-stage build (Go → alpine+ffmpeg), ~90 MB image, non-
  root user, /healthz endpoint for compose probes.

  Node reaches it via DCHAIN_MEDIA_SIDECAR_URL. Without it, video uploads
  are rejected with 503 unless operator sets DCHAIN_ALLOW_UNSCRUBBED_VIDEO.

/feed/publish wiring

  - cfg.Scrubber is a required dependency
  - Before storing post body we call scrubber.Scrub(); attachment bytes
    + MIME are replaced with the cleaned version
  - content_hash is computed over the SCRUBBED bytes — so the on-chain
    CREATE_POST tx references exactly what readers will fetch
  - EstimatedFeeUT uses the scrubbed size, so author's fee reflects
    actual on-disk cost
  - Content-type mismatches → 400; sidecar unavailable for video → 503

Flags / env vars

  --feed-db / DCHAIN_FEED_DB            (existing)
  --feed-ttl-days / DCHAIN_FEED_TTL_DAYS (existing)
  --media-sidecar-url / DCHAIN_MEDIA_SIDECAR_URL   (NEW)
  --allow-unscrubbed-video / DCHAIN_ALLOW_UNSCRUBBED_VIDEO (NEW; default false)

Client responsibilities (for reference — client work lands in Phase C)

  Even with server-side scrub, the client should still compress aggressively
  BEFORE upload, because:
    - upload time is ~N× larger for unscrubbed media (mobile networks)
    - the server's 256 KiB MaxPostSize is a HARD cap — oversized uploads
      are rejected, not silently truncated
    - the on-chain fee is size-based, so users pay for every byte the
      client didn't bother to shrink

  Recommended client pipeline:
    images → expo-image-manipulator: resize max-dim 1080px, WebP or
             JPEG quality 50-60
    videos → react-native-compressor: H.264 CRF 28, 720p max, 64k audio
    audio  → expo-audio's default Opus 32k (already compressed)

  Documented in docs/media-sidecar.md (added later with Phase C PR).

Tests
  - go test ./... green across 6 packages (blockchain consensus identity
    media relay vm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:15:14 +03:00
vsecoder
126658f294 feat(feed): relay body storage + HTTP endpoints (Phase B of v2.0.0)
Phase A (the previous commit) added the on-chain foundations. Phase B
is the off-chain layer: post bodies live in a BadgerDB-backed feed
mailbox, and a full HTTP surface makes the feed usable from clients.

New components

  relay/feed_mailbox.go (+ tests)
    - FeedPost: body + content-type + attachment + hashtags + thread refs
    - Store / Get / Delete with TTL-bounded eviction (30 days default)
    - View counter (IncrementView / ViewCount) — off-chain because one
      tx per view would be nonsense
    - Hashtag inverted index: auto-extracts #tokens from content on
      Store, lowercased + deduped + capped at 8/post
    - Author chrono index: PostsByAuthor returns newest-first IDs
    - RecentPostIDs: scan-by-age helper used by trending/foryou

  node/api_feed.go
    POST /feed/publish           — author-signed body upload, returns
                                   post_id + content_hash + size +
                                   hashtags + estimated fee for the
                                   follow-up on-chain CREATE_POST tx
    GET  /feed/post/{id}         — fetch body (respects on-chain soft
                                   delete, returns 410 when deleted)
    GET  /feed/post/{id}/stats   — {views, likes, liked_by_me?}
    POST /feed/post/{id}/view    — bump the counter
    GET  /feed/author/{pub}      — chain-authoritative post list
                                   enriched with body + stats
    GET  /feed/timeline          — merged feed from people the user
                                   follows (reads chain.Following,
                                   fetches each author's recent posts)
    GET  /feed/trending          — top-scored posts in last 24h
                                   (score = likes × 3 + views)
    GET  /feed/foryou            — simple recommendations: recent posts
                                   minus authors the user already
                                   follows, already-liked posts, and
                                   own posts; ranked by engagement
    GET  /feed/hashtag/{tag}     — posts tagged with the given #tag

  cmd/node/main.go wiring
    - --feed-db flag (DCHAIN_FEED_DB) + --feed-ttl-days (DCHAIN_FEED_TTL_DAYS)
    - Opens FeedMailbox + registers FeedRoutes alongside RelayRoutes
    - Threads chain.Post / LikeCount / HasLiked / PostsByAuthor / Following
      into FeedConfig so HTTP handlers can merge on-chain metadata with
      off-chain body+stats.

Auth & safety
  - POST /feed/publish: Ed25519 signature over "publish:<post_id>:
    <content_sha256_hex>:<ts>"; ±5-minute skew window for anti-replay.
  - content_hash binds body to the on-chain tx — you can't publish
    body-A off-chain and commit hash-of-body-B on-chain.
  - Writes wrapped in withSubmitTxGuards (rate-limit + size cap), reads
    in withReadLimit — same guards as /relay.

Trending / recommendations
  - V1 heuristic (likes × 3 + views) + time window. Documented as
    v2.2.0 "Feed algorithm" candidate for a proper ranking layer
    (half-life decay, follow-of-follow boost, hashtag collaborative).

Tests
  - Store round-trip, size enforcement, hashtag indexing (case-insensitive
    + dedup), view counter increments, author chrono order, delete
    cleans all indices, RecentPostIDs time-window filter.
  - Full go test ./... is green (blockchain + consensus + identity +
    relay + vm all pass).

Next (Phase C): client Feed tab — composer, timeline, post detail,
profile follow, For You + Trending screens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:52:22 +03:00