Monitoring Overview

Monitoring and observability are crucial for maintaining healthy, performant applications. This guide covers the monitoring tools, metrics, and best practices for Solverhood applications.

Monitoring Stack

Core Components

Application Performance Monitoring (APM): New Relic, DataDog, or Sentry
Infrastructure Monitoring: Prometheus with Grafana
Logging: ELK Stack (Elasticsearch, Logstash, Kibana) or CloudWatch
Error Tracking: Sentry for error monitoring and alerting
Health Checks: Custom health check endpoints

Recommended Tools

New Relic: Full-stack observability platform
Prometheus + Grafana: Metrics collection and visualization
Sentry: Error tracking and performance monitoring
CloudWatch: AWS-native monitoring and logging
Datadog: Comprehensive monitoring platform

Key Metrics to Monitor

Application Metrics

Response Time: Average, 95th percentile, 99th percentile
Throughput: Requests per second (RPS)
Error Rate: Percentage of failed requests
Availability: Uptime percentage
Memory Usage: Heap usage, garbage collection
CPU Usage: CPU utilization percentage

Business Metrics

User Activity: Daily/Monthly Active Users (DAU/MAU)
Feature Usage: Most used features and endpoints
Conversion Rates: User journey completion rates
Revenue Metrics: If applicable to your business

Infrastructure Metrics

Server Resources: CPU, memory, disk usage
Database Performance: Query execution time, connection pool usage
Network: Bandwidth usage, latency
Cache Performance: Hit/miss ratios

Implementation

1. Health Check Endpoints

Implement comprehensive health checks:

const express = require('express');
const app = express();

// Basic health check
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || '1.0.0',
  });
});

// Detailed health check
app.get('/health/detailed', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    externalServices: await checkExternalServices(),
  };

  const isHealthy = Object.values(checks).every((check) => check.status === 'healthy');

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? 'healthy' : 'unhealthy',
    timestamp: new Date().toISOString(),
    checks,
  });
});

async function checkDatabase() {
  try {
    await db.query('SELECT 1');
    return { status: 'healthy', responseTime: Date.now() };
  } catch (error) {
    return { status: 'unhealthy', error: error.message };
  }
}

2. Application Performance Monitoring

Set up APM with New Relic:

// Install New Relic
npm install newrelic

// newrelic.js configuration
exports.config = {
  app_name: ['Solverhood App'],
  license_key: process.env.NEW_RELIC_LICENSE_KEY,
  logging: {
    level: 'info'
  },
  distributed_tracing: {
    enabled: true
  },
  transaction_tracer: {
    enabled: true,
    transaction_threshold: 5,
    record_sql: 'obfuscated',
    stack_trace_threshold: 0.5
  }
};

3. Structured Logging

Implement structured logging with Winston:

const winston = require('winston');

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: { service: 'solverhood-api' },
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
  ],
});

// Add console transport in development
if (process.env.NODE_ENV !== 'production') {
  logger.add(
    new winston.transports.Console({
      format: winston.format.simple(),
    })
  );
}

// Usage example
logger.info('User logged in', {
  userId: user.id,
  email: user.email,
  ip: req.ip,
  userAgent: req.get('User-Agent'),
});

4. Error Tracking

Set up Sentry for error tracking:

const Sentry = require('@sentry/node');

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 1.0,
  integrations: [new Sentry.Integrations.Http({ tracing: true }), new Sentry.Integrations.Express({ app })],
});

// Error handling middleware
app.use(Sentry.Handlers.errorHandler());

// Capture errors
app.get('/debug-sentry', function mainHandler(req, res) {
  throw new Error('My first Sentry error!');
});

Alerting

1. Alert Rules

Define alerting rules for critical metrics:

# prometheus/rules/alerts.yml
groups:
  - name: solverhood-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: 'High error rate detected'
          description: 'Error rate is {{ $value }} errors per second'

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'High response time detected'
          description: '95th percentile response time is {{ $value }} seconds'

      - alert: DatabaseConnectionHigh
        expr: pg_stat_database_numbackends > 80
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: 'High database connections'
          description: 'Database has {{ $value }} active connections'

2. Notification Channels

Configure notification channels:

// notifications/slack.js
const { WebClient } = require('@slack/web-api');

const slack = new WebClient(process.env.SLACK_BOT_TOKEN);

async function sendSlackAlert(alert) {
  try {
    await slack.chat.postMessage({
      channel: process.env.SLACK_ALERT_CHANNEL,
      text: `🚨 Alert: ${alert.summary}`,
      blocks: [
        {
          type: 'section',
          text: {
            type: 'mrkdwn',
            text: `*${alert.summary}*\n${alert.description}`,
          },
        },
        {
          type: 'context',
          elements: [
            {
              type: 'mrkdwn',
              text: `Severity: ${alert.labels.severity} | Environment: ${process.env.NODE_ENV}`,
            },
          ],
        },
      ],
    });
  } catch (error) {
    console.error('Failed to send Slack alert:', error);
  }
}

Dashboards

1. Grafana Dashboard

Create comprehensive dashboards:

{
  "dashboard": {
    "title": "Solverhood Application Metrics",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{route}}"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m])",
            "legendFormat": "5xx errors"
          }
        ]
      }
    ]
  }
}

2. Business Metrics Dashboard

Track business-critical metrics:

// metrics/business.js
const { register, Counter, Histogram } = require('prom-client');

// Business metrics
const userRegistrations = new Counter({
  name: 'user_registrations_total',
  help: 'Total number of user registrations',
  labelNames: ['source'],
});

const projectCreations = new Counter({
  name: 'project_creations_total',
  help: 'Total number of projects created',
  labelNames: ['user_type'],
});

const apiUsage = new Histogram({
  name: 'api_requests_duration_seconds',
  help: 'API request duration',
  labelNames: ['endpoint', 'method'],
  buckets: [0.1, 0.5, 1, 2, 5],
});

// Usage in application
app.post('/api/users', async (req, res) => {
  const timer = apiUsage.startTimer();

  try {
    const user = await createUser(req.body);
    userRegistrations.inc({ source: req.body.source || 'direct' });

    timer({ endpoint: '/api/users', method: 'POST' });
    res.status(201).json(user);
  } catch (error) {
    timer({ endpoint: '/api/users', method: 'POST' });
    res.status(500).json({ error: error.message });
  }
});

Log Analysis

1. Log Aggregation

Set up centralized logging:

// logging/elasticsearch.js
const { Client } = require('@elastic/elasticsearch');

const client = new Client({
  node: process.env.ELASTICSEARCH_URL,
  auth: {
    username: process.env.ELASTICSEARCH_USERNAME,
    password: process.env.ELASTICSEARCH_PASSWORD,
  },
});

async function logToElasticsearch(logEntry) {
  try {
    await client.index({
      index: `solverhood-logs-${new Date().toISOString().split('T')[0]}`,
      body: {
        timestamp: new Date().toISOString(),
        level: logEntry.level,
        message: logEntry.message,
        service: logEntry.service,
        ...logEntry.meta,
      },
    });
  } catch (error) {
    console.error('Failed to log to Elasticsearch:', error);
  }
}

2. Log Search and Analysis

Create Kibana dashboards for log analysis:

{
  "dashboard": {
    "title": "Application Logs Analysis",
    "panels": [
      {
        "title": "Log Volume by Level",
        "type": "visualization",
        "visState": {
          "type": "pie",
          "aggs": [
            {
              "type": "count",
              "schema": "metric"
            },
            {
              "type": "terms",
              "schema": "segment",
              "params": {
                "field": "level.keyword"
              }
            }
          ]
        }
      }
    ]
  }
}

Performance Optimization

1. Database Monitoring

Monitor database performance:

-- Slow query analysis
SELECT
  query,
  calls,
  total_time,
  mean_time,
  rows
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Connection monitoring
SELECT
  datname,
  numbackends,
  xact_commit,
  xact_rollback
FROM pg_stat_database;

2. Memory Monitoring

Track memory usage:

// monitoring/memory.js
const v8 = require('v8');

function logMemoryUsage() {
  const usage = process.memoryUsage();
  const heapStats = v8.getHeapStatistics();

  logger.info('Memory usage', {
    rss: usage.rss,
    heapUsed: usage.heapUsed,
    heapTotal: usage.heapTotal,
    external: usage.external,
    heapSizeLimit: heapStats.heap_size_limit,
    totalAvailableSize: heapStats.total_available_size,
  });
}

// Log memory usage every 5 minutes
setInterval(logMemoryUsage, 5 * 60 * 1000);

Monitoring Overview

Monitoring Stack

Core Components

Recommended Tools

Key Metrics to Monitor

Application Metrics

Business Metrics

Infrastructure Metrics

Implementation

1. Health Check Endpoints

2. Application Performance Monitoring

3. Structured Logging

4. Error Tracking

Alerting

1. Alert Rules

2. Notification Channels

Dashboards

1. Grafana Dashboard

2. Business Metrics Dashboard

Log Analysis

1. Log Aggregation

2. Log Search and Analysis

Performance Optimization

1. Database Monitoring

2. Memory Monitoring

Best Practices

1. Monitoring Strategy

2. Alert Management

3. Data Retention

Next Steps

Resources

Monitoring Overview

Monitoring Stack​

Core Components​

Recommended Tools​

Key Metrics to Monitor​

Application Metrics​

Business Metrics​

Infrastructure Metrics​

Implementation​

1. Health Check Endpoints​

2. Application Performance Monitoring​

3. Structured Logging​

4. Error Tracking​

Alerting​

1. Alert Rules​

2. Notification Channels​

Dashboards​

1. Grafana Dashboard​

2. Business Metrics Dashboard​

Log Analysis​

1. Log Aggregation​

2. Log Search and Analysis​

Performance Optimization​

1. Database Monitoring​

2. Memory Monitoring​

Best Practices​

1. Monitoring Strategy​

2. Alert Management​

3. Data Retention​

Next Steps​

Resources​

Monitoring Stack

Core Components

Recommended Tools

Key Metrics to Monitor

Application Metrics

Business Metrics

Infrastructure Metrics

Implementation

1. Health Check Endpoints

2. Application Performance Monitoring

3. Structured Logging

4. Error Tracking

Alerting

1. Alert Rules

2. Notification Channels

Dashboards

1. Grafana Dashboard

2. Business Metrics Dashboard

Log Analysis

1. Log Aggregation

2. Log Search and Analysis

Performance Optimization

1. Database Monitoring

2. Memory Monitoring

Best Practices

1. Monitoring Strategy

2. Alert Management

3. Data Retention

Next Steps

Resources