Browse Source
Nostr-Signature: fd370d2613105f16b0cfdd55b33f50c5b724ecef272109036a7cce5477da29bc 573634b648634cbad10f2451776089ea21090d9407f715e83c577b4611ae6edc 1d3cb4392f722b1b356247bde64691576d41fdb697e8dfe62d5e7ecd5ad8ea35757da2d56db310a2005e4b5528013aa1205256e37fc230f024d3b5a2e26735bfmain
3 changed files with 265 additions and 3 deletions
@ -0,0 +1,204 @@ |
|||||||
|
# Server Maintenance Commands |
||||||
|
|
||||||
|
## 1. Investigate Zombie Processes (CRITICAL - 3300 zombies) |
||||||
|
|
||||||
|
```bash |
||||||
|
# Find processes with zombie children |
||||||
|
ps aux | awk '$8 ~ /^Z/ { print $2, $11 }' | head -20 |
||||||
|
|
||||||
|
# Find parent processes that are creating zombies |
||||||
|
ps aux | awk '$8 ~ /^Z/ { print $3 }' | sort | uniq -c | sort -rn | head -10 |
||||||
|
|
||||||
|
# Check for specific problematic processes |
||||||
|
ps auxf | grep -E 'Z|defunct' |
||||||
|
|
||||||
|
# Check systemd services that might be spawning zombies |
||||||
|
systemctl status | grep -i failed |
||||||
|
systemctl list-units --type=service --state=failed |
||||||
|
``` |
||||||
|
|
||||||
|
## 2. Identify the Root Cause (Git Processes Detected) |
||||||
|
|
||||||
|
Based on initial investigation, zombies are `[git]` processes. Run these commands: |
||||||
|
|
||||||
|
```bash |
||||||
|
# Check all git processes (including zombies) |
||||||
|
ps aux | grep -E 'git|\[git\]' | head -30 |
||||||
|
|
||||||
|
# Find what's spawning git processes |
||||||
|
ps auxf | grep -B 5 -A 5 git | head -50 |
||||||
|
|
||||||
|
# Check for web server processes that might spawn git |
||||||
|
ps aux | grep -E 'node|nginx|apache|php-fpm|plesk' | head -20 |
||||||
|
|
||||||
|
# Check system logs for git-related errors |
||||||
|
journalctl -p err -n 100 | grep -i git |
||||||
|
journalctl -u nginx -n 50 |
||||||
|
journalctl -u apache2 -n 50 |
||||||
|
|
||||||
|
# Check for processes with many children (potential zombie creators) |
||||||
|
ps aux --sort=-%cpu | head -20 |
||||||
|
ps aux --sort=-%mem | head -20 |
||||||
|
|
||||||
|
# Monitor zombie creation in real-time (run for 30 seconds) |
||||||
|
watch -n 1 'ps aux | awk '\''$8 ~ /^Z/ { count++ } END { print "Zombies:", count+0 }'\''' |
||||||
|
|
||||||
|
# Check if it's a GitRepublic application issue |
||||||
|
ps aux | grep -E 'node.*gitrepublic|gitrepublic.*node' |
||||||
|
systemctl status | grep -i gitrepublic |
||||||
|
``` |
||||||
|
|
||||||
|
## 3. Apply Security Updates |
||||||
|
|
||||||
|
```bash |
||||||
|
# Update package lists |
||||||
|
apt update |
||||||
|
|
||||||
|
# See what security updates are available |
||||||
|
apt list --upgradable | grep -i security |
||||||
|
|
||||||
|
# Apply security updates |
||||||
|
apt upgrade -y |
||||||
|
|
||||||
|
# Or apply all updates (after investigating zombies) |
||||||
|
apt upgrade |
||||||
|
``` |
||||||
|
|
||||||
|
## 4. System Health Check |
||||||
|
|
||||||
|
```bash |
||||||
|
# Check disk space |
||||||
|
df -h |
||||||
|
|
||||||
|
# Check memory usage |
||||||
|
free -h |
||||||
|
|
||||||
|
# Check system load |
||||||
|
uptime |
||||||
|
top -bn1 | head -20 |
||||||
|
|
||||||
|
# Check for failed services |
||||||
|
systemctl list-units --type=service --state=failed |
||||||
|
|
||||||
|
# Check system logs |
||||||
|
journalctl -p err -n 50 |
||||||
|
``` |
||||||
|
|
||||||
|
## 5. Plan System Restart |
||||||
|
|
||||||
|
```bash |
||||||
|
# Check what requires restart |
||||||
|
cat /var/run/reboot-required.pkgs 2>/dev/null || echo "No reboot required file found" |
||||||
|
|
||||||
|
# Schedule maintenance window and restart |
||||||
|
# (Only after fixing zombie issue) |
||||||
|
# reboot |
||||||
|
``` |
||||||
|
|
||||||
|
## 6. Plesk-Specific Checks |
||||||
|
|
||||||
|
```bash |
||||||
|
# Check Plesk services |
||||||
|
plesk repair all -y |
||||||
|
|
||||||
|
# Check Plesk logs |
||||||
|
tail -100 /var/log/plesk/panel.log |
||||||
|
|
||||||
|
# Check for Plesk-related zombie processes |
||||||
|
ps aux | grep -i plesk | grep -i defunct |
||||||
|
``` |
||||||
|
|
||||||
|
## Root Cause Identified ✅ |
||||||
|
|
||||||
|
**Problem**: Node.js GitRepublic process (PID 330225, `node build`) is spawning git processes that aren't being properly reaped, creating zombies. |
||||||
|
|
||||||
|
**Evidence**: |
||||||
|
- All zombie processes are `[git] <defunct>` children of the Node.js process |
||||||
|
- Active git process: `git remote set-head remote-0 -a` (from `git-remote-sync.ts`) |
||||||
|
- Git spawns subprocesses like `git-remote-https` that can become zombies if not properly waited for |
||||||
|
|
||||||
|
**Code Fix**: Updated `src/lib/services/git/git-remote-sync.ts` to: |
||||||
|
- Add timeout handling (30 minutes) |
||||||
|
- Properly clean up processes on exit |
||||||
|
- Handle signals correctly |
||||||
|
- Prevent zombie processes |
||||||
|
|
||||||
|
## Immediate Server Fix |
||||||
|
|
||||||
|
**Option 1: Restart the GitRepublic service (RECOMMENDED)** |
||||||
|
```bash |
||||||
|
# Find the service/container |
||||||
|
docker ps | grep gitrepublic |
||||||
|
# or |
||||||
|
systemctl list-units | grep -i gitrepublic |
||||||
|
|
||||||
|
# Restart it (this will clean up zombies temporarily) |
||||||
|
docker restart <container-id> |
||||||
|
# or |
||||||
|
systemctl restart <service-name> |
||||||
|
``` |
||||||
|
|
||||||
|
**Option 2: Kill and let it restart (if managed by systemd/docker)** |
||||||
|
```bash |
||||||
|
# Find the process |
||||||
|
ps aux | grep "node build" | grep -v grep |
||||||
|
|
||||||
|
# Kill it (systemd/docker will restart it) |
||||||
|
kill -TERM 330225 |
||||||
|
|
||||||
|
# Wait a moment, then check if it restarted |
||||||
|
ps aux | grep "node build" | grep -v grep |
||||||
|
``` |
||||||
|
|
||||||
|
**Option 3: Clean up zombies manually (temporary fix)** |
||||||
|
```bash |
||||||
|
# This won't fix the root cause but will clean up existing zombies |
||||||
|
# The zombies will come back until the code is fixed |
||||||
|
# Note: You can't kill zombies directly, but killing the parent will clean them up |
||||||
|
``` |
||||||
|
|
||||||
|
## Recommended Action Plan |
||||||
|
|
||||||
|
1. **IMMEDIATE**: Restart GitRepublic service to clean up existing zombies |
||||||
|
2. **URGENT**: Deploy the code fix (updated `git-remote-sync.ts`) |
||||||
|
3. **HIGH PRIORITY**: Apply security updates (section 3) |
||||||
|
4. **MONITOR**: Watch for zombie process count after restart |
||||||
|
5. **MAINTENANCE WINDOW**: Schedule system restart after deploying fix |
||||||
|
|
||||||
|
## Common Causes of Zombie Processes |
||||||
|
|
||||||
|
- Process spawning children without proper signal handling |
||||||
|
- Systemd service not properly configured |
||||||
|
- Application bugs (especially Node.js, Python, or long-running processes) |
||||||
|
- Resource exhaustion causing process management issues |
||||||
|
- Plesk or web server processes not reaping children |
||||||
|
|
||||||
|
## Git-Specific Zombie Issues |
||||||
|
|
||||||
|
Since zombies are `[git]` processes, likely causes: |
||||||
|
- **Git operations not being properly waited for** - parent process exits before git finishes |
||||||
|
- **Git HTTP backend issues** - web server spawning git processes that aren't reaped |
||||||
|
- **GitRepublic application** - Node.js app spawning git commands without proper signal handling |
||||||
|
- **Plesk Git integration** - Plesk's git features not properly managing child processes |
||||||
|
- **Git hooks** - hooks spawning processes that become zombies |
||||||
|
|
||||||
|
### Quick Fixes to Try |
||||||
|
|
||||||
|
```bash |
||||||
|
# Restart web server (if using nginx/apache) |
||||||
|
systemctl restart nginx |
||||||
|
# or |
||||||
|
systemctl restart apache2 |
||||||
|
|
||||||
|
# Restart GitRepublic application (if running as service) |
||||||
|
systemctl restart gitrepublic-web |
||||||
|
# or find and restart the Node.js process |
||||||
|
ps aux | grep node | grep gitrepublic |
||||||
|
# Then restart it |
||||||
|
|
||||||
|
# Check git-http-backend processes |
||||||
|
ps aux | grep git-http-backend |
||||||
|
|
||||||
|
# Kill any stuck git processes (CAREFUL - only if safe) |
||||||
|
# pkill -9 git # Only if you're sure no important operations are running |
||||||
|
``` |
||||||
Loading…
Reference in new issue