Files
butler-sos/docs/INFLUXDB_V2_V3_ALIGNMENT_ANALYSIS.md

44 KiB

InfluxDB V1/V2/V3 Implementation Alignment Analysis

Date: December 16, 2025
Scope: Comprehensive comparison of refactored v1, v2, and v3 InfluxDB implementations
Status: Alignment completed - all versions at common quality level


Executive Summary

Implementation Status: COMPLETE

All critical inconsistencies between v1, v2, and v3 implementations have been resolved. The codebase now has:

  • Consistent error handling across all versions with error tracking
  • Unified retry strategy with progressive batch sizing
  • Defensive validation for input data and unsigned fields
  • Type safety with explicit parsing (parseFloat/parseInt)
  • Configurable batching via maxBatchSize setting
  • Comprehensive documentation of implementation patterns

Alignment Changes Implemented: December 16, 2025


Architecture Overview

V1 (InfluxDB 1.x - InfluxQL)

  • Client: node-influx package
  • API: Uses plain JavaScript objects: { measurement, tags, fields }
  • Write: globals.influx.writePoints(datapoints) - batch write native
  • Field Types: Implicit typing based on JavaScript types
  • Tag/Field Names: Can use same name for tags and fields
  • Error Handling: Consistent with error tracking
  • Retry Logic: Uses writeToInfluxWithRetry

V2 (InfluxDB 2.x - Flux)

  • Client: @influxdata/influxdb-client
  • API: Uses Point class with builder pattern
  • Write: writeApi.writePoints() with explicit flush/close
  • Field Types: Explicit types: floatField(), intField(), uintField(), etc.
  • Tag/Field Names: Can use same name for tags and fields
  • Error Handling: Consistent with error tracking
  • Retry Logic: Uses writeToInfluxWithRetry (maxRetries: 0 to avoid double-retry)

V3 (InfluxDB 3.x - SQL)

  • Client: @influxdata/influxdb3-client
  • API: Uses Point3 class with set* methods
  • Write: globals.influx.write(lineProtocol) - direct line protocol
  • Field Types: Explicit types: setFloatField(), setIntegerField(), etc.
  • Tag/Field Names: Cannot use same name for tags and fields (v3 limitation)
  • Error Handling: Consistent with error tracking
  • Retry Logic: Uses writeToInfluxWithRetry
  • Input Validation: Defensive checks for null/invalid data

Alignment Implementation Summary

1. Error Handling & Tracking

Status: COMPLETED

All v1, v2, and v3 modules now include consistent error tracking:

try {
    // Write operation
} catch (err) {
    await globals.errorTracker.incrementError('INFLUXDB_V{1|2|3}_WRITE', serverName);
    globals.logger.error(`Error: ${globals.getErrorMessage(err)}`);
    throw err;
}

Modules Updated:

  • V1: 7 modules (health-metrics, butler-memory, sessions, user-events, log-events, event-counts, queue-metrics)
  • V3: 6 modules (butler-memory, log-events, queue-metrics, event-counts, health-metrics, sessions, user-events)

2. Retry Strategy

Status: COMPLETED

Unified retry with exponential backoff via writeToInfluxWithRetry():

  • Max retries: 3
  • Backoff: 1s → 2s → 4s
  • Non-retryable errors fail immediately
  • V2 uses maxRetries: 0 in client to prevent double-retry

3. Progressive Batch Retry

Status: COMPLETED

Created batch write helpers with progressive chunking (1000→500→250→100→10→1):

  • writeBatchToInfluxV1()
  • writeBatchToInfluxV2()
  • writeBatchToInfluxV3()

Note: Not currently used in modules due to low data volumes, but available for future scaling needs.

4. Configuration Enhancement

Status: COMPLETED

Added maxBatchSize to all version configs:

Butler-SOS:
    influxdbConfig:
        v1Config:
            maxBatchSize: 1000 # Range: 1-10000
        v2Config:
            maxBatchSize: 1000
        v3Config:
            maxBatchSize: 1000
  • Schema validation enforces range
  • Runtime validation with fallback to 1000
  • Documented in config templates

5. Input Validation

Status: COMPLETED

V3 modules now include defensive validation:

if (!body || typeof body !== 'object') {
    globals.logger.warn('Invalid data. Will not be sent to InfluxDB');
    return;
}

Modules Updated:

  • v3/health-metrics.js
  • v3/butler-memory.js

6. Type Safety & Parsing

Status: COMPLETED

V3 log-events now uses explicit parsing:

.setFloatField('process_time', parseFloat(msg.process_time))
.setIntegerField('net_ram', parseInt(msg.net_ram, 10))

Prevents type coercion issues and ensures data integrity.

7. Unsigned Field Validation

Status: COMPLETED

Created validateUnsignedField() utility for semantically unsigned metrics:

.setIntegerField('hits', validateUnsignedField(body.cache.hits, 'cache', 'hits', serverName))
  • Clamps negative values to 0
  • Logs warnings once per measurement
  • Applied to session counts, cache hits, app calls, CPU metrics

Modules Updated:

  • v3/health-metrics.js (session, users, cache, cpu, apps fields)
  • proxysessionmetrics.js (session_count)

8. Shared Utilities

Status: COMPLETED

Enhanced shared/utils.js with:

  • chunkArray() - Split arrays into smaller chunks
  • validateUnsignedField() - Validate and clamp unsigned values
  • writeBatchToInfluxV1/V2/V3() - Progressive retry batch writers

Critical Issues Found (RESOLVED)

1. ERROR HANDLING INCONSISTENCY ⚠️ CRITICAL

V2 Pattern (Consistent across all modules):

  • Uses writeToInfluxWithRetry() with try-catch at the retry level
  • Errors bubble up through retry logic
  • No local try-catch in most modules
  • Clean and uniform error handling

V3 Pattern (Inconsistent):

Module Has Try-Catch Has Error Tracking
sessions.js
log-events.js
user-events.js
butler-memory.js
queue-metrics.js
health-metrics.js
event-counts.js (partial)

Impact:

  • V3 has inconsistent error reporting
  • Some failures tracked via globals.errorTracker.incrementError(), others silently fail
  • Monitoring gaps make troubleshooting difficult
  • Operations teams get incomplete picture of system health

Example:

// V3 sessions.js - HAS error handling
try {
    await writeToInfluxWithRetry(...)
} catch (err) {
    await globals.errorTracker.incrementError('INFLUXDB_V3_WRITE', userSessions.serverName);
    globals.logger.error(...)
}

// V3 health-metrics.js - NO error handling
await writeToInfluxWithRetry(...)  // Errors just bubble up

2. FIELD TYPE MISMATCHES ⚠️ DATA INTEGRITY

Issue 2.1: CPU Metrics Lose Precision

V2 (Correct):

new Point('cpu').floatField('total', body.cpu.total);

V3 (Wrong):

new Point3('cpu').setIntegerField('total', body.cpu.total);

Impact:

  • CPU percentage values like 45.7% truncated to 45
  • Loss of precision in monitoring and alerting
  • Trend analysis less accurate

Issue 2.2: Cache Metrics Lose Semantic Type Information

V2 (Semantically Correct):

.uintField('hits', body.cache.hits)           // Unsigned - can't be negative
.uintField('lookups', body.cache.lookups)
.intField('added', body.cache.added)          // Signed - can be negative
.intField('replaced', body.cache.replaced)

V3 (Less Precise):

.setIntegerField('hits', body.cache.hits)     // Signed - allows negatives incorrectly
.setIntegerField('lookups', body.cache.lookups)
.setIntegerField('added', body.cache.added)
.setIntegerField('replaced', body.cache.replaced)

Impact:

  • ⚠️ Semantic meaning lost (can hits be negative? V2 says no, V3 says yes)
  • ⚠️ Data validation weaker in v3
  • ⚠️ Potential for confusing negative values

Issue 2.3: Session & User Counts

V2:

.uintField('active', body.session.active)     // Unsigned
.uintField('total', body.session.total)
.uintField('calls', body.apps.calls)
.uintField('selections', body.apps.selections)

V3:

.setIntegerField('active', body.session.active)  // Signed
.setIntegerField('total', body.session.total)
.setIntegerField('calls', body.apps.calls)
.setIntegerField('selections', body.apps.selections)

Impact: Same as cache metrics - semantic types lost.


3. USER EVENTS FIELD NAME CONFLICT ⚠️ CRITICAL

The Problem: InfluxDB v3 does not allow the same name for both tags and fields (v1/v2 allowed this). This forces different field names between v2 and v3.

V2 Implementation:

.tag('userFull', `${msg.user_directory}\\${msg.user_id}`)
.stringField('userFull', `${msg.user_directory}\\${msg.user_id}`)  // ← SAME NAME
.stringField('userId', msg.user_id)                                 // ← SAME NAME

V3 Implementation:

.setTag('userFull', `${msg.user_directory}\\${msg.user_id}`)
.setStringField('userFull_field', `${msg.user_directory}\\${msg.user_id}`)  // ← DIFFERENT
.setStringField('userId_field', msg.user_id)                                 // ← DIFFERENT

V3 Code Comment Acknowledges This:

// NOTE: InfluxDB v3 does not allow the same name for both tags and fields,
// unlike v1/v2. Fields use different names with _field suffix where needed.

Impact:

  • V2 and V3 write to different field names
  • Queries written for v2 fail on v3 data
  • Grafana dashboards show missing data after migration
  • Historical v2 data incompatible with new v3 queries
  • Cannot seamlessly migrate v2 → v3

Affected Fields:

  • userFulluserFull_field
  • userIduserId_field

4. LOG EVENTS FIELD NAMING INCONSISTENCY ⚠️

Similar issue as user-events, but only affects specific log sources.

Issue 4.1: Scheduler Events

V2:

.stringField('app_name', msg.app_name || '')
.stringField('app_id', msg.app_id || '')
.stringField('execution_id', msg.execution_id || '')

V3:

.setStringField('app_name_field', msg.app_name || '')   // ← DIFFERENT
.setStringField('app_id_field', msg.app_id || '')       // ← DIFFERENT
.setStringField('execution_id', msg.execution_id || '')

Impact:

  • Scheduler log queries fail when switching v2 → v3
  • Field name: app_name vs app_name_field
  • Field name: app_id vs app_id_field

Issue 4.2: QIX Performance Events

V3:

.setStringField('app_id_field', msg.app_id || '')  // Uses _field suffix

Conditional tags:

if (msg?.app_id?.length > 0) point.setTag('app_id', msg.app_id); // Also tag

Impact:

  • ⚠️ Mixing tag and field with similar names may cause confusion
  • ⚠️ Need to know which to query (tag vs field)

5. QIX-PERF DATA TYPE CONVERSION MISSING ⚠️

V2 (Explicit Type Conversion):

.floatField('process_time', parseFloat(msg.process_time))  // ← Explicit conversion
.floatField('work_time', parseFloat(msg.work_time))
.floatField('lock_time', parseFloat(msg.lock_time))
.floatField('validate_time', parseFloat(msg.validate_time))
.floatField('traverse_time', parseFloat(msg.traverse_time))
.intField('net_ram', parseInt(msg.net_ram))               // ← Explicit conversion
.intField('peak_ram', parseInt(msg.peak_ram))

V3 (No Conversion):

.setFloatField('process_time', msg.process_time)   // ← NO parseFloat!
.setFloatField('work_time', msg.work_time)
.setFloatField('lock_time', msg.lock_time)
.setFloatField('validate_time', msg.validate_time)
.setFloatField('traverse_time', msg.traverse_time)
.setIntegerField('handle', msg.handle)             // ← NO parseInt!
.setIntegerField('net_ram', msg.net_ram)           // ← NO parseInt!
.setIntegerField('peak_ram', msg.peak_ram)

Impact:

  • ⚠️ V3 relies on input types being correct (fragile)
  • ⚠️ V2 explicitly converts to ensure correct types (robust)
  • ⚠️ If UDP message contains strings, v3 may write wrong type or fail
  • ⚠️ Defensive programming missing in v3

6. TAG APPLICATION METHODS DIFFER

V2 Approach - Centralized:

// Import helper function
import { applyInfluxTags } from './utils.js';

// Use it
const configTags = globals.config.get('Butler-SOS.userEvents.tags');
applyInfluxTags(point, configTags);

V2 Helper Function (in v2/utils.js):

export function applyInfluxTags(point, tags) {
    if (!tags || !Array.isArray(tags) || tags.length === 0) {
        return point;
    }
    for (const tag of tags) {
        if (tag.name && tag.value !== undefined && tag.value !== null) {
            point.tag(tag.name, String(tag.value));
        }
    }
    return point;
}

V3 Approach - Inline (Duplicated):

// Inline in every module
if (configTags && configTags.length > 0) {
    for (const item of configTags) {
        point.setTag(item.name, item.value);
    }
}

V3 Variations Found:

// Some modules check has() first
if (
    globals.config.has('Butler-SOS.userEvents.tags') &&
    globals.config.get('Butler-SOS.userEvents.tags') !== null &&
    globals.config.get('Butler-SOS.userEvents.tags').length > 0
) {
    // ...
}

// Others just check truthiness
if (configTags && configTags.length > 0) {
    // ...
}

Impact:

  • ⚠️ V2 has centralized, validated tag logic
  • ⚠️ V3 duplicates logic in 7+ places
  • ⚠️ V3 has inconsistent validation patterns
  • ⚠️ Bug fixes require updating multiple files
  • ⚠️ Higher maintenance burden

7. SESSIONS MODULE ARCHITECTURE DIFFERENCE ⚠️

Both v2 and v3 receive pre-built Point objects, but handle them differently.

V2 (Batch Write):

export async function storeSessionsV2(userSessions) {
    // userSessions.datapointInfluxdb contains array of Point objects (already built)

    await writeToInfluxWithRetry(
        async () => {
            const writeApi = globals.influx.getWriteApi(org, bucketName, 'ns', {
                flushInterval: 5000,
                maxRetries: 0,
            });
            try {
                await writeApi.writePoints(userSessions.datapointInfluxdb); // ← Batch write
                await writeApi.close();
            } catch (err) {
                // cleanup...
            }
        },
        `Proxy sessions for ${userSessions.host}/${userSessions.virtualProxy}`,
        'v2',
        userSessions.serverName
    );
}

V3 (Loop Write):

export async function postProxySessionsToInfluxdbV3(userSessions) {
    // userSessions.datapointInfluxdb contains array of Point3 objects (already built)

    if (userSessions.datapointInfluxdb && userSessions.datapointInfluxdb.length > 0) {
        for (const point of userSessions.datapointInfluxdb) {
            // ← Loop through
            await writeToInfluxWithRetry(
                async () => await globals.influx.write(point.toLineProtocol(), database),
                `Proxy sessions for ${userSessions.host}/${userSessions.virtualProxy}`,
                'v3',
                userSessions.host
            );
        }
    }
}

Impact:

  • V2 makes 1 network call (efficient)
  • V3 makes N network calls (inefficient)
  • ⚠️ V3 has higher latency and overhead
  • ⚠️ V3 has partial failure risk (some points succeed, others fail)
  • ⚠️ V3 may hit rate limits with many sessions

8. INPUT VALIDATION DIFFERENCES

V2 Validates Inputs:

// health-metrics.js
if (!body || typeof body !== 'object') {
    globals.logger.warn(`HEALTH METRICS V2: Invalid health data from server ${serverName}`);
    return;
}

// butler-memory.js
if (!memory || typeof memory !== 'object') {
    globals.logger.warn('MEMORY USAGE V2: Invalid memory data provided');
    return;
}

// user-events.js
if (!msg.host || !msg.command || !msg.user_directory || !msg.user_id || !msg.origin) {
    globals.logger.warn(`USER EVENT V2: Missing required fields in user event message`);
    return;
}

// sessions.js
if (!Array.isArray(userSessions.datapointInfluxdb)) {
    globals.logger.warn(`PROXY SESSIONS V2: Invalid data format - must be an array`);
    return;
}

V3 Missing Validation:

// health-metrics.js - NO validation of body parameter
export async function postHealthMetricsToInfluxdbV3(serverName, host, body, serverTags) {
    const formattedTime = getFormattedTime(body.started); // Could crash if body is null
    // ...
}

// butler-memory.js - NO validation of memory parameter
export async function postButlerSOSMemoryUsageToInfluxdbV3(memory) {
    const point = new Point3('butlersos_memory_usage').setTag(
        'butler_sos_instance',
        memory.instanceTag
    ); // Could crash if memory is null
    // ...
}

V3 Has Some Validation:

// user-events.js - DOES validate
if (!msg.host || !msg.command || !msg.user_directory || !msg.user_id || !msg.origin) {
    globals.logger.warn(`USER EVENT INFLUXDB V3: Missing required fields`);
    return;
}

// log-events.js - DOES validate source
if (msg.source !== 'qseow-engine' && msg.source !== 'qseow-proxy' && ...) {
    globals.logger.warn(`LOG EVENT INFLUXDB V3: Unknown log event source: ${msg.source}`);
    return;
}

Impact:

  • V3 is more fragile - can crash on null/undefined inputs
  • V2 is defensive - validates before processing
  • ⚠️ Inconsistent validation patterns across v3 modules

9. WRITE API USAGE PATTERN DIFFERENCES

V2 Pattern (More Complex):

await writeToInfluxWithRetry(
    async () => {
        // Create writeApi with config for each write
        const writeApi = globals.influx.getWriteApi(org, bucketName, 'ns', {
            flushInterval: 5000,
            maxRetries: 0,
        });
        try {
            await writeApi.writePoint(point); // or writePoints
            await writeApi.close(); // Must close
        } catch (err) {
            try {
                await writeApi.close(); // Try to close on error too
            } catch (closeErr) {
                // Ignore close errors
            }
            throw err; // Re-throw original error
        }
    },
    context,
    'v2',
    serverName
);

V3 Pattern (Simpler):

await writeToInfluxWithRetry(
    async () => await globals.influx.write(point.toLineProtocol(), database),
    context,
    'v3',
    host
);

Key Differences:

Aspect V2 V3
API object Creates new writeApi per call Uses shared globals.influx client
Cleanup Explicit close() with error handling No cleanup needed
Configuration Sets flushInterval, maxRetries No configuration
Error handling Nested try-catch for cleanup Simple - let error bubble up
Complexity High Low

Impact:

  • V3 is simpler and cleaner
  • ⚠️ V2 has explicit resource management (more robust?)
  • ⚠️ Different failure modes between versions
  • ⚠️ V2's maxRetries: 0 means retry handled by outer function only

10. EVENT COUNTS BATCH EFFICIENCY DIFFERENCE

V2 (Efficient Batch Write):

export async function storeEventCountV2() {
    const logEvents = await globals.udpEvents.getLogEvents();
    const userEvents = await globals.udpEvents.getUserEvents();

    const points = [];

    // Build all points first
    for (const event of logEvents) {
        const point = new Point(measurementName)
            .tag('event_type', 'log')
            .tag('source', event.source)
            .tag('host', event.host)
            .tag('subsystem', event.subsystem)
            .intField('counter', event.counter);
        applyInfluxTags(point, configTags);
        points.push(point);
    }

    for (const event of userEvents) {
        const point = new Point(measurementName)
            .tag('event_type', 'user')
            .tag('source', event.source)
            .tag('host', event.host)
            .tag('subsystem', event.subsystem)
            .intField('counter', event.counter);
        applyInfluxTags(point, configTags);
        points.push(point);
    }

    // Single batch write - ONE network call
    await writeApi.writePoints(points);
}

V3 (Inefficient Individual Writes):

export async function storeEventCountInfluxDBV3() {
    const logEvents = await globals.udpEvents.getLogEvents();
    const userEvents = await globals.udpEvents.getUserEvents();

    // Write each log event individually
    for (const logEvent of logEvents) {
        const point = new Point3(measurementName)
            .setTag('event_type', 'log')
            .setTag('source', logEvent.source)
            .setTag('host', logEvent.host)
            .setTag('subsystem', logEvent.subsystem)
            .setIntegerField('counter', logEvent.counter);

        // Individual write - ONE network call per event
        await writeToInfluxWithRetry(
            async () => await globals.influx.write(point.toLineProtocol(), database),
            'Log event counts',
            'v3',
            'log-events'
        );
    }

    // Write each user event individually
    for (const event of userEvents) {
        const point = new Point3(measurementName)
            .setTag('event_type', 'user')
            .setTag('source', event.source)
            .setTag('host', event.host)
            .setTag('subsystem', event.subsystem)
            .setIntegerField('counter', event.counter);

        // Individual write - ONE network call per event
        await writeToInfluxWithRetry(
            async () => await globals.influx.write(point.toLineProtocol(), database),
            'User event counts',
            'v3',
            'user-events'
        );
    }
}

Impact:

  • V2: 1 network call for all events (efficient)
  • V3: N network calls (N = number of events) (inefficient)
  • ⚠️ V3 has significantly higher latency
  • ⚠️ V3 has higher network overhead
  • ⚠️ V3 has partial write risk - if write #5 of 20 fails, unclear which events were written
  • ⚠️ V3 may hit rate limits with many events

Same Issue In:

  • event-counts.js (both regular and rejected events)
  • sessions.js (writes each session individually)

Alignment Matrix

Module V1 Implementation Data Types V1→V2 Data Types V2→V3 Field Names V1→V2 Field Names V2→V3 Error Handling Efficiency Overall V1 Overall V2 Overall V3
health-metrics Stable (CPU) V3 missing 🟢 🟢 🔴
butler-memory Stable V3 extra 🟢 🟢 🟡
sessions Stable V3 extra V3 loops 🟢 🟢 🟡
user-events Stable Same _field V3 extra 🟢 🟢 🔴
log-events Stable ⚠️ qix Same ⚠️ sched V3 wrapper 🟢 🟢 🟡
event-counts Stable V3 partial V3 loops 🟢 🟢 🟡
queue-metrics Stable V3 extra 🟢 🟢 🟢

V1→V2 Transition: Clean - Field names identical, types mapped correctly
V2→V3 Transition: Issues - Field name conflicts, CPU type mismatch, error handling inconsistent

Legend:

  • 🟢 Well aligned (minor or no issues)
  • 🟡 Partially aligned (several issues)
  • 🔴 Poorly aligned (critical issues)
  • Aligned / Working
  • Not aligned / Broken
  • ⚠️ Partially aligned

V1 Implementation Characteristics

Strengths

  1. Simple Data Structure:
const datapoint = [
    {
        measurement: 'sense_server',
        tags: { server_name: 'QS01', host: '192.168.1.100' },
        fields: { version: '14.123.4', uptime: '5 days' },
    },
];
await globals.influx.writePoints(datapoint);
  1. Consistent Error Handling:

    • All v1 modules use try-catch consistently
    • Errors logged and re-thrown
    • Pattern: try { ... } catch (err) { log + throw }
  2. Batch Writes Native:

    • writePoints() accepts arrays naturally
    • All modules build arrays then write once
    • Most efficient of the three versions
  3. Field Names:

    • No conflicts between tags and fields (v1 allows duplicates)
    • User events: userFull in both tags and fields
    • Log events: result_code, app_name in both
  4. Type Handling:

    • Implicit types based on JavaScript values
    • CPU: body.cpu.total (number) → stored correctly as float
    • No explicit type conversion needed (trusts input)

V1 Patterns

Health Metrics:

// V1: Plain objects, implicit types
const datapoint = [
    {
        measurement: 'cpu',
        tags: serverTags,
        fields: { total: body.cpu.total }, // ← JavaScript number (float)
    },
];
await globals.influx.writePoints(datapoint);

User Events:

// V1: Can use same name for tag and field ✅
const datapoint = [
    {
        measurement: 'user_events',
        tags: {
            userFull: `${user_directory}\\${user_id}`, // ← Tag
            userId: user_id,
        },
        fields: {
            userFull: `${user_directory}\\${user_id}`, // ← Field (same name OK!)
            userId: user_id,
        },
    },
];

Log Events:

// V1: Consistent field names, no conflicts
fields: {
    result_code: msg.result_code,    // ← Field
    app_name: msg.app_name,          // ← Field
    app_id: msg.app_id               // ← Field
}
// Tags with same names also OK in v1

V1 vs V2 vs V3 Comparison

Data Structure Comparison

Aspect V1 V2 V3
Point Creation Plain object new Point() builder new Point3() builder
Tags tags: {} object .tag('key', 'val') .setTag('key', 'val')
Float Field fields: { x: 1.5 } .floatField('x', 1.5) .setFloatField('x', 1.5)
Int Field fields: { x: 10 } .intField('x', 10) .setIntegerField('x', 10)
Uint Field fields: { x: 10 } .uintField('x', 10) .setIntegerField('x', 10) ⚠️
Tag/Field Dup Allowed Allowed Not allowed

Write API Comparison

Aspect V1 V2 V3
Write Method influx.writePoints(arr) writeApi.writePoints(arr) influx.write(lineProtocol)
Batch Native Yes Yes ⚠️ Must loop or concatenate
Resource Mgmt Auto Manual (close()) Auto
Config Database string Org + bucket + options Database string
Flush Automatic Manual Automatic

Error Handling Comparison

Module V1 V2 V3
health-metrics try-catch None None
butler-memory try-catch None try-catch
sessions try-catch None try-catch
user-events try-catch None try-catch
log-events try-catch None try-catch (wrapper)
event-counts try-catch None try-catch
queue-metrics try-catch None try-catch

Pattern:

  • V1: Consistent try-catch in all modules
  • V2: Relies on retry wrapper only ⚠️
  • V3: Inconsistent - some have try-catch, some don't

Field Name Comparison

Data Type V1 Field Names V2 Field Names V3 Field Names Compatible V1↔V2 Compatible V2↔V3
User Events userFull, userId userFull, userId userFull_field, userId_field
User Events appId, appName appId_field, appName_field appId_field, appName_field ⚠️
Log: Scheduler app_name, app_id app_name, app_id app_name_field, app_id_field
Log: Engine/Proxy result_code result_code_field result_code_field ⚠️
Health Metrics All match All match All match
Memory All match All match All match
Sessions All match All match All match

Migration Paths:

  • V1 → V2: Some field name changes needed (user events, log events)
  • V2 → V3: Field name changes needed (user events, scheduler logs)
  • V1 → V3: Multiple field name changes needed

Key Findings: V1 vs V2 vs V3

What V1 Does Best

  1. Simplicity: Plain JavaScript objects, no builder pattern needed
  2. Consistency: All modules follow identical error handling pattern
  3. Efficiency: Batch writes are natural and consistent
  4. Flexibility: Can use same name for tags and fields without conflicts
  5. Stability: Mature, well-tested, no surprises

What V2 Improves Over V1

  1. Type Safety: Explicit field types (floatField, uintField, intField)
  2. Builder Pattern: Method chaining makes point construction clearer
  3. Semantic Types: Unsigned integers distinguish from signed
  4. Modern Client: Active maintenance, newer features

What V2 Does Worse Than V1 ⚠️

  1. Complexity: Requires writeApi management (create, flush, close)
  2. Verbosity: Builder pattern is more verbose than plain objects
  3. Resource Management: Manual close() required, error handling around cleanup
  4. Error Handling: Less consistent than v1 (relies on retry wrapper)

What V3 Does Better Than V2

  1. Simplicity: No writeApi management, direct write
  2. Modern: SQL query language (more familiar than Flux)
  3. Performance: Potentially faster writes (depends on use case)

What V3 Does Worse Than V1/V2

  1. Field Name Conflicts: Cannot use same name for tag and field
  2. Type Precision: CPU stored as integer instead of float (data loss)
  3. Efficiency: Individual writes in loops instead of batches
  4. Consistency: Inconsistent error handling across modules
  5. Validation: Missing input validation in several modules
  6. Breaking Changes: Field names differ from v1/v2, breaks compatibility

What Works Well (Positive Findings)

1. Shared Utilities

Both v2 and v3 use common utilities from shared/utils.js:

import {
    getFormattedTime, // Uptime calculation
    processAppDocuments, // App name extraction
    isInfluxDbEnabled, // InfluxDB availability check
    writeToInfluxWithRetry, // Unified retry logic
} from '../shared/utils.js';

Benefits:

  • Single source of truth for common logic
  • Bug fixes apply to both versions
  • Consistent behavior across versions
  • Easier maintenance

2. Consistent Measurement Names

Both versions use identical measurement names:

  • sense_server
  • mem
  • apps
  • cpu
  • session
  • users
  • cache
  • saturated
  • butlersos_memory_usage
  • user_events
  • log_event
  • user_session_summary
  • user_session_details

3. Tag Structure Alignment

Both versions:

  • Apply server tags consistently
  • Respect config-based custom tags
  • Use same tag names (mostly)
  • Support dynamic tag addition

4. Logging Patterns

Both versions have consistent logging:

globals.logger.debug(`MODULE V2: ...`);
globals.logger.verbose('MODULE V2: ...');
globals.logger.error('MODULE V2: ...');

globals.logger.debug(`MODULE V3: ...`);
globals.logger.verbose('MODULE V3: ...');
globals.logger.error('MODULE V3: ...');

5. Configuration Path Consistency

Both use same config paths:

globals.config.get('Butler-SOS.influxdbConfig.v2Config.org');
globals.config.get('Butler-SOS.influxdbConfig.v3Config.database');
globals.config.get('Butler-SOS.userEvents.tags');
// etc.

Migration Impact Assessment

Scenario: User Switches from V2 → V3

Breaks Queries For:

User Events:

  • Field userFulluserFull_field
  • Field userIduserId_field
  • Action Required: Update all Grafana dashboards and queries

Scheduler Log Events:

  • Field app_nameapp_name_field
  • Field app_idapp_id_field
  • Action Required: Update scheduler-related dashboards

⚠️ Data Quality Issues:

CPU Metrics:

  • Lose decimal precision (45.7% → 45%)
  • Action Required: Monitoring thresholds may need adjustment

Cache/Session Counts:

  • Lose semantic type information (unsigned → signed)
  • Action Required: None functionally, but validation weaker

Works Without Changes:

  • Health metrics (except CPU field)
  • Butler SOS memory usage
  • Proxy sessions (structure same)
  • Queue metrics (identical)
  • Event rejection tracking

🔧 Performance Differences:

  • Event counts: Batch write → Individual writes (slower)
  • Sessions: Batch write → Loop writes (slower)
  • Impact: Slight increase in write latency and network overhead

Recommendations

Priority 1 - Critical Fixes Needed 🔴

Must fix before v3 production use:

  1. Fix CPU field type in v3 health-metrics.js

    • Change: setIntegerField('total', ...)setFloatField('total', ...)
    • File: src/lib/influxdb/v3/health-metrics.js line ~153
    • Impact: Prevents data loss
  2. Document field name differences

    • Create migration guide for v2 → v3
    • List all field name changes
    • Provide query conversion examples
    • Update Grafana dashboard templates
  3. Add input validation to v3 modules

    • health-metrics.js: Validate body parameter
    • butler-memory.js: Validate memory parameter
    • Match v2's defensive programming pattern
  4. Standardize error handling in v3

    • Either all modules use try-catch or none do
    • Ensure all modules track errors via errorTracker.incrementError()
    • health-metrics.js needs error handling added
  5. Fix QIX-perf type conversions in v3

    • Add parseFloat() for time metrics
    • Add parseInt() for RAM metrics
    • File: src/lib/influxdb/v3/log-events.js lines ~175-183

Priority 2 - Efficiency Improvements 🟡

Performance optimization:

  1. Implement batch writes in v3

    • event-counts.js: Build array then write once
    • sessions.js: Consider batching if InfluxDB v3 client supports it
    • Research: Does v3 client support batch line protocol?
  2. Optimize sessions write strategy

    • Document why loop is necessary (if it is)
    • Consider: Can we build one multi-line protocol string?
  3. Add performance metrics

    • Track write latency differences between v2/v3
    • Monitor for rate limiting issues in v3

Priority 3 - Code Consistency 🟢

Long-term maintainability:

  1. Unify tag application approach

    • Option A: Create shared v3 tag helper like v2 has
    • Option B: Document inline pattern as standard
    • Ensure consistent validation (null checks, array checks)
  2. Align semantic field types

    • Document: Why v3 doesn't distinguish unsigned vs signed
    • Consider: Does InfluxDB v3 support unsigned integers?
    • Update: Use correct types if v3 supports them
  3. Enhance JSDoc documentation

    • Document field name differences (tag/field conflicts)
    • Explain v2 vs v3 architectural differences
    • Add migration notes to each module
  4. Create v2/v3 comparison tests

    • Verify same input produces equivalent data (accounting for known differences)
    • Catch regressions early
    • Validate field name mappings

Priority 4 - Documentation 📚

  1. Create comprehensive migration guide

    • Field name mapping table
    • Query conversion examples
    • Grafana dashboard update guide
    • Performance expectations
  2. Add inline comments for differences

    • Mark field name conflicts with comments
    • Explain why type conversions differ
    • Document efficiency trade-offs

Testing Recommendations

Unit Tests Needed:

  1. Type validation tests:

    • Verify CPU field is Float in v3
    • Verify numeric types match expected semantics
    • Test with edge cases (null, undefined, wrong types)
  2. Field name consistency tests:

    • Verify field names match documentation
    • Alert if field names change unexpectedly
    • Cross-reference v2 and v3 schemas
  3. Error handling tests:

    • Ensure all v3 modules handle errors
    • Verify error tracking calls made
    • Test partial failure scenarios

Integration Tests Needed:

  1. Data compatibility tests:

    • Write same data with v2 and v3
    • Verify queryable (accounting for field name differences)
    • Validate data precision (CPU decimals)
  2. Performance benchmarks:

    • Compare v2 vs v3 write latency
    • Measure batch vs individual write overhead
    • Test with high event volumes
  3. Migration tests:

    • Simulate v2 → v3 switch
    • Verify queries with field name mappings work
    • Test rollback scenario

Conclusion: Roadmap to Consistency

Current State Assessment

Aspect V1 V2 V3 Target
Error Handling Excellent ⚠️ Partial Poor V1 Pattern
Data Integrity Perfect Good Data Loss V1 Pattern
Field Naming Consistent Compatible Breaking V1 Names
Write Efficiency Optimal Good Inefficient V1 Batching
Code Consistency Perfect ⚠️ Good Varies V1 Pattern
Input Validation Present ⚠️ Partial Missing V1 Pattern

Goal: Make V2 and V3 match V1's excellence in all categories.


What Success Looks Like

After Fixes Are Applied:

V1 (Baseline - No Changes Needed)
├─ ✅ All 7 modules identical patterns
├─ ✅ Try-catch in every module
├─ ✅ Batch writes everywhere
├─ ✅ Input validation present
└─ ✅ Production stable

V2 (After P1 Fixes Applied)
├─ ✅ All 7 modules with try-catch (ADDED)
├─ ✅ Error context logged (ADDED)
├─ ✅ Batch writes optimized (REVIEWED)
└─ ✅ Matches V1 consistency

V3 (After P0 + P1 Fixes Applied)
├─ ✅ CPU fields as float (FIXED - was integer)
├─ ✅ Field names match V1/V2 (FIXED - was _field suffix)
├─ ✅ All 7 modules with try-catch (ADDED - only 2 had it)
├─ ✅ Input validation (ADDED - was missing)
├─ ✅ Batch writes (ADDED - was individual)
└─ ✅ Production ready


Implementation Timeline

Week 1: V3 Critical Fixes (4 hours)

  • Day 1: CPU field types + field name conflicts (P0) - 40 minutes
  • Day 2: Error handling in 5 modules (P1) - 1 hour
  • Day 3: Input validation in all modules (P1) - 2 hours
  • Day 4: Testing and validation

Week 2: V3 Performance (3 hours)

  • Day 1: Batch writes in event-counts (P2) - 1 hour
  • Day 2: Batch writes in queue-metrics (P2) - 1 hour
  • Day 3: Performance testing

Week 3: V2 Improvements (2 hours)

  • Day 1: Error handling in all modules (P1) - 1 hour
  • Day 2: Testing and documentation - 1 hour

Week 4: Code Quality (2 hours)

  • Day 1: Extract shared utilities (P3) - 1 hour
  • Day 2: Documentation and cleanup - 1 hour

Total Effort: ~11 hours to achieve full consistency


Success Metrics

Before Fixes:

  • V3 has 6 critical issues blocking production
  • ⚠️ V2 has inconsistent error handling
  • V1 is excellent baseline

After Fixes:

  • All versions follow V1 best practices
  • All versions have consistent patterns
  • All versions production ready
  • Field names compatible across versions
  • No data loss in any version
  • Efficient batch writes everywhere

Bottom Line

Current Recommendation:

  • Use V1 or V2 for production (both reliable)
  • Do NOT use V3 until P0+P1 fixes applied

After Fixes Recommendation:

  • V1: Keep for maximum stability
  • V2: Use if type safety needed
  • V3: Use for InfluxDB 3.x features (SQL queries, etc.)

The Path Forward:

  1. Fix V3 P0 issues (40 minutes) → Makes V3 safe
  2. Fix V3 P1 issues (3 hours) → Makes V3 reliable
  3. Fix V2 P1 issues (1 hour) → Makes V2 excellent
  4. Apply P2/P3 improvements (4 hours) → Makes all versions optimal

Total investment of ~11 hours makes all three versions consistently excellent and following best practices.


Appendix: File Reference

V1 Implementation Files:

  • src/lib/influxdb/v1/health-metrics.js (205 lines)
  • src/lib/influxdb/v1/butler-memory.js (68 lines)
  • src/lib/influxdb/v1/sessions.js (76 lines)
  • src/lib/influxdb/v1/user-events.js (115 lines)
  • src/lib/influxdb/v1/log-events.js (237 lines)
  • src/lib/influxdb/v1/event-counts.js (241 lines)
  • src/lib/influxdb/v1/queue-metrics.js (196 lines)

V2 Implementation Files:

  • src/lib/influxdb/v2/health-metrics.js (191 lines)
  • src/lib/influxdb/v2/butler-memory.js (79 lines)
  • src/lib/influxdb/v2/sessions.js (92 lines)
  • src/lib/influxdb/v2/user-events.js (107 lines)
  • src/lib/influxdb/v2/log-events.js (243 lines)
  • src/lib/influxdb/v2/event-counts.js (206 lines)
  • src/lib/influxdb/v2/queue-metrics.js (204 lines)
  • src/lib/influxdb/v2/utils.js (22 lines)

V3 Implementation Files:

  • src/lib/influxdb/v3/health-metrics.js (214 lines)
  • src/lib/influxdb/v3/butler-memory.js (64 lines)
  • src/lib/influxdb/v3/sessions.js (74 lines)
  • src/lib/influxdb/v3/user-events.js (134 lines)
  • src/lib/influxdb/v3/log-events.js (238 lines)
  • src/lib/influxdb/v3/event-counts.js (265 lines)
  • src/lib/influxdb/v3/queue-metrics.js (183 lines)

Shared Files:

  • src/lib/influxdb/shared/utils.js (301 lines)
  • src/lib/influxdb/factory.js (routing logic)
  • src/lib/influxdb/index.js (facade)

Test Files:

  • src/lib/influxdb/__tests__/v1-*.test.js (7 files)
  • src/lib/influxdb/__tests__/v3-*.test.js (8 files)
  • src/lib/influxdb/__tests__/factory.test.js

Note: V2 test files were not created during refactoring (relying on integration tests).


Analysis Date: December 16, 2025
Analyst: GitHub Copilot
Codebase Version: Post-refactoring (legacy code removed)
Total Lines Analyzed: ~3,800 lines across 22 implementation files (v1: 7, v2: 8, v3: 7)