Troubleshooting

Common issues, diagnostics, and debugging techniques for merobox

6
issue categories
DEBUG
log level
--verbose
CLI flag
merobox logs
node output

Node Startup Problems

Docker not running

Symptom: Cannot connect to the Docker daemon or docker.errors.DockerException

  • Verify Docker Desktop (or dockerd) is running: docker info
  • On Linux, check the daemon: sudo systemctl status docker
  • Ensure the Docker socket is accessible: ls -la /var/run/docker.sock
  • If using a remote Docker host, verify DOCKER_HOST is set correctly
Port conflicts

Symptom: Bind for 0.0.0.0:2428 failed: port is already allocated

  • Check what’s using the port: lsof -i :2428 (macOS/Linux) or netstat -ano | findstr 2428 (Windows)
  • Stop conflicting processes or containers: docker ps to find running containers
  • Run merobox nuke -y to clean up stale merobox containers and free ports
  • Use a different port range in your workflow YAML via node configuration
Permission issues

Symptom: Permission denied when creating containers, volumes, or binding ports

  • On Linux, add your user to the docker group: sudo usermod -aG docker $USER (requires logout/login)
  • Verify group membership: groups
  • For binary backend, ensure the merod binary is executable: chmod +x /path/to/merod
  • Check that the data directory is writable: ls -la ~/.merobox/

Workflow Execution Issues

Variable resolution fails

Symptom: KeyError or Variable 'XXX' not found during step execution

  • Environment variables (${ENV_VAR}): ensure the variable is exported in the shell before running merobox. Use echo $ENV_VAR to verify.
  • Dynamic values (${results.step_name.field}): verify the referenced step has a name field and has executed successfully before the current step.
  • Check for typos in variable names — both the reference and the definition must match exactly.
  • Use ${VAR:-default} syntax for optional environment variables.
  • Run with --verbose to see which variables are being resolved and their values.
Step validation fails

Symptom: StepValidationError: Missing required field 'xxx'

  • Check the step type’s required fields in the Workflow YAML reference.
  • Ensure YAML indentation is correct — misaligned fields may not be parsed as part of the step.
  • Verify field types: some fields expect strings, others expect lists or integers.
  • Run merobox run --dry-run workflow.yaml to validate without executing.
API calls fail (JSON-RPC errors)

Symptom: ClientError, ConnectionRefused, or JSON-RPC error responses in call steps

  • Verify the target node is running and healthy: merobox health --name <node>
  • Check that the application is installed and the context exists on the target node.
  • Ensure the method name and arguments match the application’s expected interface.
  • Inspect node logs for server-side errors: merobox logs --name <node> --tail 50
  • For auth-related failures, verify JWT token validity or re-authenticate.

Auth Service Issues

nip.io URL not resolving

Symptom: DNS resolution failure for *.nip.io addresses used by the Traefik auth stack

  • Check DNS resolution: nslookup 127.0.0.1.nip.io
  • Some corporate networks or VPNs block wildcard DNS services — try disconnecting from VPN.
  • As a workaround, add manual entries to /etc/hosts:
    127.0.0.1 auth.127.0.0.1.nip.io
  • Consider using the binary backend which does not require Traefik or nip.io.
404 on auth URLs

Symptom: 404 Not Found when accessing /auth/token or /auth/refresh

  • Verify the Traefik container is running: docker ps | grep traefik
  • Check Traefik routing rules: docker logs <traefik-container>
  • Ensure the auth service container is healthy and registered with Traefik.
  • Verify the correct host header is being sent — auth routing depends on Host-based rules.
  • Try a direct request bypassing Traefik to isolate the issue.
Network connection problems

Symptom: ConnectionError or timeouts when authenticating with remote nodes

  • Test basic connectivity: curl -v <node-url>/health
  • For remote nodes, check firewall rules and security group settings.
  • Verify TLS certificates if using HTTPS — self-signed certs may need to be trusted.
  • Run merobox remote test <name> for a structured connectivity diagnostic.
  • Check if the node’s auth endpoint is on a different port than the main API.

Docker Issues

Container creation fails

Symptom: docker.errors.APIError during container creation

  • Check Docker disk space: docker system df — prune unused resources with docker system prune
  • Verify the image exists locally or can be pulled: docker pull <image>
  • Check Docker resource limits (memory, CPU) in Docker Desktop settings.
  • For image pull failures, check network connectivity to the container registry.
  • Review the full error message — Docker API errors usually include the root cause.
Docker networking problems

Symptom: Nodes can’t discover each other, peer connections fail, or bootstrap nodes are unreachable

  • Verify the merobox Docker network exists: docker network ls | grep merobox
  • Check that all containers are on the same network: docker network inspect merobox-net
  • Ensure bootstrap multiaddresses use container names (not localhost) for intra-container communication.
  • Run merobox nuke -y and re-run to recreate the network from scratch.
  • On macOS, Docker networking has known limitations — containers can reach the host via host.docker.internal.

Performance Issues

Slow workflow execution

Symptom: Workflows take significantly longer than expected to complete

  • Enable debug logging to identify which steps are slow: LOG_LEVEL=DEBUG merobox run workflow.yaml
  • Check for excessive retries in step configuration — each retry adds delay with exponential backoff.
  • Verify node health — unhealthy nodes cause timeouts and retries: merobox health
  • For wait_for_sync steps, ensure the timeout and poll interval are appropriate for the data volume.
  • Consider using parallel steps to execute independent operations concurrently.
  • Check Docker resource allocation — insufficient CPU or memory causes slow container performance.
High memory usage

Symptom: System becomes unresponsive or Docker reports OOM (out of memory) kills

  • Check container memory usage: docker stats
  • Reduce the number of concurrent nodes in the workflow.
  • Increase Docker Desktop memory allocation in Settings → Resources.
  • For the binary backend, check merod process memory: ps aux | grep merod
  • If running a NEAR sandbox, it consumes additional memory — ensure at least 4 GB total is available.
  • Use merobox nuke -y between test runs to clean up orphaned containers.

Debugging

Systematic techniques for diagnosing issues in merobox workflows and node management.

Enable Debug Logging

Set the LOG_LEVEL environment variable to get detailed output from all merobox components.

LOG_LEVEL=DEBUG merobox run workflow.yaml

# Log levels: DEBUG, INFO (default), WARNING, ERROR, CRITICAL

Verbose CLI Output

The --verbose flag increases output detail for any merobox command.

merobox run workflow.yaml --verbose
merobox health --verbose
merobox remote test my-server --verbose

Check Node Logs

View merod output for a specific node to diagnose server-side issues.

# View recent logs
merobox logs --name node-1 --tail 100

# Follow live output
merobox logs --name node-1 --follow

# Logs since a time
merobox logs --name node-1 --since "5m"

Inspect Containers

Access a running container’s shell for direct inspection.

# List merobox containers
docker ps --filter "label=merobox"

# Shell into a container
docker exec -it <container-id> /bin/sh

# View container details
docker inspect <container-id>

Network Diagnostics

Diagnose connectivity between nodes, the sandbox, and external services.

# Test node endpoint
curl -s http://localhost:2428/health | python -m json.tool

# Check NEAR sandbox
curl -s http://localhost:3030/status | python -m json.tool

# Inspect Docker network
docker network inspect merobox-net

# DNS resolution test
nslookup node-1.127.0.0.1.nip.io