Run this helper free — no credit card
Every helper is free for 30 days. Answer 3 questions and get the full result in 2 minutes.
Start free →Etcd Troubleshooting Skill
This document defines the Claude Code skill for troubleshooting etcd issues on two-node OpenShift clusters with fencing topology. When activated, Claude becomes an expert etcd/Pacemaker troubleshooter capable of iterative diagnosis and reme
👁 2 views · 📦 0 installs
Install in one line
CLI$ mfkvault install openshift-eng-etcd-troubleshooting-skillRequires the MFKVault CLI. Prefer MCP?
Free to install — no account needed
Copy the command below and paste into your agent.
Instant access • No coding needed • No account needed
What you get in 5 minutes
- Full skill code ready to install
- Works with 3 AI agents
- Lifetime updates included
Run this helper
Answer a few questions and let this helper do the work.
▸Advanced: use with your AI agent
Description
# Etcd Troubleshooting Skill This document defines the Claude Code skill for troubleshooting etcd issues on two-node OpenShift clusters with fencing topology. When activated, Claude becomes an expert etcd/Pacemaker troubleshooter capable of iterative diagnosis and remediation. ## Skill Overview This skill enables Claude to: - Validate and test access to cluster components via Ansible and OpenShift CLI - Iteratively collect diagnostic data from Pacemaker, etcd, and OpenShift - Analyze symptoms and identify root causes - Propose and execute remediation steps - Verify fixes and adjust approach based on results - Provide comprehensive troubleshooting throughout the diagnostic process ## Step-by-Step Procedure ### 1. Validate Access **1.1 Ansible Inventory Validation:** - Check if `deploy/openshift-clusters/inventory.ini` exists - Verify the inventory file has valid cluster node entries - Test SSH connectivity to cluster nodes using Ansible ping module **1.2 OpenShift Cluster Access Validation:** - Test direct cluster access with `oc version` - If direct access fails, check for `deploy/openshift-clusters/proxy.env` - If proxy.env exists, source it before running oc commands - Verify cluster access with `oc get nodes` - Remember proxy requirement for all subsequent oc commands **IMPORTANT: No Cluster Access Scenario** If OpenShift cluster API access is unavailable (which is expected when etcd is down), **all diagnostics and remediation must be performed via Ansible** using direct VM access. The troubleshooting workflow remains fully functional using only: - Ansible ad-hoc commands to cluster VMs - Ansible playbooks for diagnostics collection - Direct SSH access to nodes via Ansible When cluster access is unavailable: - ✓ You can still diagnose and fix etcd issues completely - ✓ All Pacemaker operations work via Ansible - ✓ All etcd container operations work via Ansible (podman commands) - ✓ All logs are accessible via Ansible (journalctl commands) - ✗ Cannot query OpenShift operators or cluster-level resources - ✗ Cannot use oc commands for verification (use Ansible equivalents instead) This is a **normal scenario** when etcd is down - proceed with VM-based troubleshooting. ### 2. Collect Data **Choose Your Diagnostic Approach** There are two approaches to data collection: **Quick Manual Triage (recommended for initial assessment)** Start with a few targeted commands to assess the situation: - `pcs status` - Check Pacemaker cluster state and failed actions - `podman ps -a --filter name=etcd` - Verify etcd containers are running - `etcdctl endpoint health` - Confirm etcd health This takes ~30 seconds and is often sufficient to identify simple issues (stale failures, container restarts, etc.) that can be fixed immediately with `pcs resource cleanup etcd`. **Full Diagnostic Collection (for complex/unclear issues)** If quick triage reveals complex problems or the root cause is unclear, run the comprehensive diagnostic script: ```bash ./helpers/etcd/collect-all-diagnostics.sh ``` This script (~5-10 minutes): - Validates Ansible and cluster access automatically - Collects all VM-level diagnostics via Ansible playbook - Collects OpenShift cluster-level data (if accessible) - Saves everything to `/tmp/etcd-diagnostics-<timestamp>/` - Generates a `DIAGNOSTIC_REPORT.txt` with analysis commands Use the full collection when: - Quick triage doesn't reveal the cause - Multiple components appear affected - You need to preserve diagnostic data for later analysis - The issue involves cluster ID mismatches or split-brain scenarios **Manual Collection Commands** For manual data collection, use the commands below. **IMPORTANT: Target the Correct Host Group** - **All etcd/Pacemaker commands** must target the `cluster_vms` host group (the OpenShift cluster nodes) - **VM lifecycle commands** (start/stop VMs) target the hypervisor host - Use Ansible ad-hoc commands with `-m shell` or run playbooks that target `cluster_vms` - All
Security Status
Scanned
Passed automated security checks
Related AI Tools
More Coding tools you might like
fbdl-mcp (MCP)
Free during launchNormally $5MCP server for Meta's FBDL (Facebook Developer Language). Lets AI agents generate, validate, and explore FBDL scripts used in Meta's bug bounty program (MMBRC).
Run freePocketBase Task MCP Server (MCP)
FreeMCP server integrating with PocketBase to manage tasks and projects. Enables AI models to create tasks, list tasks with status filters, and browse projects.
Run freecio
Free首席信息官专家,精通 IT 战略规划、数据管理、信息系统架构、网络安全和数字化转型
Run freepr
FreeOpen or update a draft PR for the current branch. Use when: create PR, open PR, draft PR, pull request, prepare for review.
Run freemcp-phish (MCP)
FreeCombines phish.net and phish.in APIs into twelve tools for setlists, songs, jam-charts, reviews, and audio.
Run free@babel/helper-wrap-function
FreeHelper to wrap functions inside a function call.
Run free