remoteEaze
Features

Observability (OpenTelemetry)

Vendor-neutral observability with OpenTelemetry: gRPC-based trace and metric export, automatic instrumentation for HTTP, database, cache, and queue layers, and configurable sampling.

The observability system uses OpenTelemetry (OTel) to produce traces and metrics from both the API server and the background worker. It is designed from the ground up to be platform-agnostic — no vendor-specific SDKs, no proprietary codecs. Switch backends by changing a single environment variable.

Design Principles

  1. Vendor-Neutral: Only official @opentelemetry/* packages. Zero lock-in to Signoz, Grafana, Datadog, Sentry, or any other backend.
  2. gRPC First: OTLP over gRPC (port 4317) for high-throughput, low-latency export. Suitable for financial-system traffic volumes from day one.
  3. Automatic Instrumentation: Traces and metrics are produced by instrumentation wrappers — no manual span creation for standard operations.
  4. Separate Service Identity: The API server and background worker each report as distinct services with their own resource attributes, making it easy to distinguish trace origin in any dashboard.
  5. Graceful Lifecycle: Telemetry SDK starts before application code and shuts down after the server closes, ensuring no data loss during deployments.

Architecture

┌─────────────────────┐    ┌─────────────────────────┐
│   API Server        │    │   Background Worker      │
│   (remote-eaze-api) │    │   (remote-eaze-worker)  │
│                     │    │                         │
│  ┌───────────────┐  │    │  ┌───────────────────┐  │
│  │  NodeSDK      │  │    │  │  NodeSDK          │  │
│  │  - HTTP       │  │    │  │  - Prisma         │  │
│  │  - Fastify    │  │    │  │  - IORedis        │  │
│  │  - Prisma     │  │    │  │  - Pino           │  │
│  │  - IORedis    │  │    │  │  - BullMQ Otel    │  │
│  │  - Pino       │  │    │  └───────┬───────────┘  │
│  └───────┬───────┘  │    └──────────┼──────────────┘
└──────────┼──────────┘               │
           │                          │
           │      OTLP / gRPC         │
           │      (port 4317)         │
           └──────────┬───────────────┘


         ┌────────────────────────┐
         │  OTLP-Compatible      │
         │  Backend              │
         │  (Signoz / Grafana /  │
         │   Datadog / New Relic │
         │   / Jaeger / etc.)    │
         └────────────────────────┘

Platform Compatibility

The system exports standard OTLP over gRPC. Any backend that accepts OTLP gRPC can consume the data:

PlatformEndpoint Configuration
SignozOTEL_EXPORTER_OTLP_ENDPOINT=http://signoz-otel-collector:4317
Grafana TempoOTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317
DatadogDatadog Agent OTLP intake: http://datadog-agent:4317
New RelicOTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4317 + API key header
JaegerJaeger OTLP gRPC receiver (v1.35+)
AWS X-RayVia AWS Distro for OpenTelemetry Collector
Google Cloud TraceVia OTel Collector with Google Cloud exporter

No code changes required — just set the environment variable.

OTel Collector Sidecar

In production, an OpenTelemetry Collector sidecar will be deployed alongside each service. The collector handles authentication headers, batch processing, retry logic, and multi-backend routing. The current direct-export design allows the team to select any platform first, then add the collector sidecar configuration once the platform is chosen.

Service Configuration

API Server (remote-eaze-api)

The API service instruments all layers that handle inbound traffic and external calls:

InstrumentationWhat It Captures
HTTP (@opentelemetry/instrumentation-http)Inbound/outbound HTTP requests. Health endpoint (/health) excluded to reduce noise.
Fastify (@fastify/otel)Route handlers, middleware, and request lifecycle. Registered on initialization.
Prisma (@prisma/instrumentation)Database queries — query text, model, duration.
IORedis (@opentelemetry/instrumentation-ioredis)Redis commands — gets, sets, pub/sub, etc.
Pino (@opentelemetry/instrumentation-pino)Log correlation — trace and span IDs injected into log entries.

Health endpoint exclusions:

HTTP instrumentation:   ignoreIncomingRequestHook → req.url === "/health"
Fastify instrumentation: ignorePaths → "/health"

This prevents health-check spam from polluting trace data in load-balanced environments.

Background Worker (remote-eaze-worker)

The worker service has a focused instrumentation set — no HTTP or Fastify, since it processes jobs rather than serving requests:

InstrumentationWhat It Captures
Prisma (@prisma/instrumentation)Database queries within job handlers.
IORedis (@opentelemetry/instrumentation-ioredis)Redis commands — queue state, caching, pub/sub.
Pino (@opentelemetry/instrumentation-pino)Log correlation within job processing.
BullMQ (bullmq-otel)Job-level traces — queue name, job ID, attempts, duration.

The BullMQ integration is not an OpenTelemetry instrumentation package. It is a BullMQ-specific wrapper (BullMQOtel) passed to Queue and Worker constructors to create spans for each job lifecycle event.

gRPC vs HTTP Export

FactorgRPC (port 4317)HTTP (port 4318)
EncodingBinary protobufJSON or protobuf
TransportHTTP/2 multiplexedHTTP/1.1 per-request
LatencyLowerHigher
ThroughputHigherLower
ConnectionPersistent, multiplexedPer-request
CompressionBuilt-in gzip supportVaries

The implementation uses gRPC because financial systems produce high trace volumes and benefit from lower serialization and network overhead. Port 4317 is the OTel convention for gRPC receivers.

Sampling Strategy

Sampling controls what percentage of traces are collected, balancing observability with storage cost.

OTEL_SAMPLING_RATIO=1.0    → 100% of traces (development default)
OTEL_SAMPLING_RATIO=0.1    → 10% of traces (typical production)
OTEL_SAMPLING_RATIO=0.01   →  1% of traces (high-volume production)

The system uses ParentBasedSampler with TraceIdRatioBasedSampler as the root:

Incoming request (no parent)


  TraceIdRatioBasedSampler
  (decides based on OTEL_SAMPLING_RATIO)


  Sampled? ──Yes──→ All child spans are sampled

       No


  Entire trace discarded

Parent-based sampling ensures complete distributed traces — if a root span is sampled, all of its children are sampled too. There are no partial traces with missing spans.

Resource Attributes

Each service reports identity and environment metadata:

AttributeAPI ServerWorker
service.nameremote-eaze-apiremote-eaze-worker
deployment.environment.nameFrom NODE_ENV (default: development)From NODE_ENV (default: development)

These attributes appear in every exported span and metric, enabling filtering by service and environment in any dashboard.

Audit-Trace Correlation

The audit log system captures the active trace ID alongside every audit event:

traceId: trace.getActiveSpan()?.spanContext().traceId

This creates a bidirectional link:

Trace Dashboard ──→ Find trace by ID → See full request lifecycle
Audit Log Search ──→ Find traceId field → Jump to trace in dashboard

Every audit log entry includes traceId in its context fields, indexed for fast lookups.

Log Correlation

Pino instrumentation automatically injects trace and span IDs into log entries:

{
  "level": 30,
  "time": 1710000000000,
  "msg": "Transaction processed",
  "trace_id": "abc123def456",
  "span_id": "789ghi012"
}

This enables log-to-trace and trace-to-log navigation in any observability platform that supports OTel log correlation.

Startup and Shutdown

Startup: Pre-Load Pattern

Telemetry must initialize before application code so that early operations (database connections, plugin registration) are captured.

Production (Docker):

CMD ["node", "--enable-source-maps", "--import", "./dist/telemetry.js", "dist/main.js"]

The --import flag loads the telemetry module before the main entry point, ensuring the NodeSDK is fully initialized before Fastify starts.

Development:

A dev-bootstrap.ts file imports telemetry first, then the application. This guarantees consistent behavior across environments.

Shutdown: Graceful Flush

Both services flush pending telemetry on termination:

SIGINT / SIGTERM received


  Server closes (stops accepting requests)


  Database disconnects


  Telemetry SDK flushes remaining spans/metrics


  Process exits (code 0) or error exit (code 1)

A singleton guard prevents double-shutdown:

let shutdownPromise: Promise<void> | null = null;

export const shutdownTelemetry = async (): Promise<void> => {
  if (!shutdownPromise) {
    shutdownPromise = sdk.shutdown().catch((err: unknown) => {
      console.error("Telemetry shutdown error:", err);
    });
  }
  await shutdownPromise;
};

If the SDK has already been shut down, subsequent calls await the same promise rather than attempting a second shutdown.

Frontend Instrumentation

Decision Pending

The web application (PWA) does not currently include OpenTelemetry instrumentation. Browser-side tracing (Real User Monitoring) will be evaluated based on whether the additional dependency bundle size is acceptable for the PWA's offline-first constraints. If added, it would use @opentelemetry/web with XMLHttpRequestInstrumentation to capture frontend HTTP calls and link them to backend traces via W3C Trace Context propagation.

Configuration

VariableDefaultDescription
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4317OTLP gRPC endpoint (no trailing slash, no path)
OTEL_SAMPLING_RATIO1.0Trace sampling ratio (1.0 = 100%, 0.1 = 10%)

The endpoint must point to a gRPC receiver. Port 4317 is the standard OTel gRPC port. Do not include a path suffix (e.g., /v1/traces) — the SDK appends the correct paths automatically.

Package Dependencies

All telemetry packages are scoped under @opentelemetry/* (official OpenTelemetry packages only):

Root (shared):

PackagePurpose
@opentelemetry/apiPublic API (trace, metrics, context)
@opentelemetry/sdk-nodeNodeSDK — all-in-one SDK for Node.js
@opentelemetry/sdk-metricsMetrics SDK (PeriodicExportingMetricReader)
@opentelemetry/sdk-trace-nodeTrace SDK (samplers)
@opentelemetry/resourcesResource attribute construction
@opentelemetry/semantic-conventionsStandard attribute names
@opentelemetry/exporter-trace-otlp-grpcgRPC trace exporter
@opentelemetry/exporter-metrics-otlp-grpcgRPC metric exporter

API Service additional:

PackagePurpose
@fastify/otelFastify auto-instrumentation
@opentelemetry/instrumentation-httpHTTP/HTTPS auto-instrumentation
@opentelemetry/instrumentation-ioredisRedis auto-instrumentation
@opentelemetry/instrumentation-pinoPino log correlation
@prisma/instrumentationPrisma query auto-instrumentation

Worker / Queue package:

PackagePurpose
bullmq-otelBullMQ job-level tracing integration

On this page