{
    "slug": "regex_grep_sed_awk",
    "term": "Regex with grep, sed & awk",
    "category": "linux",
    "difficulty": "intermediate",
    "short": "Three essential Linux text processing tools — grep for filtering lines, sed for stream editing, awk for field-based processing — all using regular expressions.",
    "long": "grep: filter lines matching a pattern. grep -E for extended regex (ERE), grep -P for PCRE, grep -v to invert, grep -r for recursive. sed: stream editor — s/pattern/replacement/flags for substitution, d for delete, p for print. sed -i for in-place edit, sed -i.bak for backup. awk: field-based processing — splits input into fields ($1, $2...), supports conditions, arithmetic, and custom output. Essential combos: grep | awk for filter+format, sed | grep for transform+filter. For PHP log analysis, deployment scripts, and server administration.",
    "aliases": [
        "grep",
        "sed",
        "awk",
        "text processing",
        "regex linux"
    ],
    "tags": [
        "linux",
        "bash",
        "regex"
    ],
    "misconception": "grep, sed, and awk are interchangeable — each has a different primary purpose: grep filters, sed transforms line-by-line, awk processes structured tabular data with field access.",
    "why_it_matters": "Analysing PHP error logs, parsing nginx access logs, and automating config changes all require grep/sed/awk — knowing the right tool for each task is a core server administration skill.",
    "common_mistakes": [
        "Using grep when awk is needed for field extraction — grep can't access specific fields.",
        "sed -i without .bak backup — irreversible in-place edits without testing first.",
        "Grepping binary files — add -a or convert to text first.",
        "POSIX regex with + ? | — these are ERE; use grep -E or egrep for extended regex."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "bash_scripting",
        "linux_log_files",
        "linux_performance_tools"
    ],
    "prerequisites": [
        "regex_syntax",
        "linux_performance_tools",
        "bash_scripting"
    ],
    "refs": [
        "https://www.gnu.org/software/gawk/manual/gawk.html"
    ],
    "bad_code": "# Log analysis without proper tools:\n# Manually reading 500MB nginx log to find slow requests\n# grep 'ERROR' app.log | wc -l  -- correct but misses details\n# sed s/foo/bar app.log > app.log  -- truncates file! (can't read and write same file)",
    "good_code": "# grep — filter PHP errors in the last hour:\ngrep 'PHP Fatal error' /var/log/php/error.log | tail -100\n\n# awk — extract IPs with > 100 requests from nginx log:\nawk '{print $1}' /var/log/nginx/access.log \\\n    | sort | uniq -c | sort -rn | head -20\n\n# sed — replace DB host in config (with backup):\nsed -i.bak 's/db.old.internal/db.new.internal/g' /var/www/app/.env\n\n# Combined — show slow requests (> 1s) with their URLs:\nawk '$NF > 1.0 {print $7, $NF}' /var/log/nginx/access.log \\\n    | sort -k2 -rn | head -20",
    "quick_fix": "Use grep -E for extended regex on PHP log files, sed for in-place fixes, and awk for field-based extraction — these three tools cover 90% of log analysis and text transformation tasks",
    "severity": "low",
    "effort": "medium",
    "created": "2026-03-16",
    "updated": "2026-03-22",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/regex_grep_sed_awk",
        "html_url": "https://codeclaritylab.com/glossary/regex_grep_sed_awk",
        "json_url": "https://codeclaritylab.com/glossary/regex_grep_sed_awk.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Regex with grep, sed & awk](https://codeclaritylab.com/glossary/regex_grep_sed_awk) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/regex_grep_sed_awk"
            }
        }
    }
}