Mining Data from Git Repos
Interesting Files
The regex patterns found on this page are just some examples you could use to extract data from git repositories. Get creative and build upon these with your own grep regex patterns.
cd git_loot
find . -type f \( -name '*config*' -o -name '*setting*' \) |
grep -iE '\.ya?ml$|\.ini$|\.php$|\.json$|\.conf$|\.config$|\.txt$'
find configuration or settings files ending with specific extensions
find . -type f \( -name '*config*' -o -name '*setting*' \) |
grep -iE '\.ya?ml$|\.ini$|\.php$|\.json$|\.conf$|\.config$|\.txt$' |
xargs -I {} grep --color -PHair '^(?!(\s{0,}?//\s?|\s{0,}?\*\s{1,}?|\s{0,}?\#\s{1,}?)).*(root|admin|passw|database|db|sql|domain\.tld)' {}
Send interesting files down the pipeline and search for strings in these files
Explaining the Regex
'^(?!(\s{0,}?//\s?|\s{0,}?\*\s{1,}?|\s{0,}?\#\s{1,}?)).*(root|admin|passw|database|db|sql|domain\.tld)'
^ ^^ ^
'---------------------------------------------------''----------------------------------------------'
| |
| '---- .* Any single character followed by anything else
| (root|admin|passw|...) boolean OR group of words to search for
|
|
'------- ^ Starts with
(?!(\s{0,}? ...) Perl style regex lookahead to ignore characters
Ignores any `//` at the start that may or may not be preceded or succeeded by a space
Ignores any `*` at the start that may or may not be preceded or succeeded by a space
Ignores any `#`at the start that may or may not be preceded or succeeded by a space
Effectively trying to ignore comments in files
The goal is to find any line that is not a comment
And, look for any sensitive keywords that may be a configuration
password, secret, etc
Recursively Grep Files
Emails and Hostnames
cd git_loot
TARGET_DOMAIN='domain.tld'
grep -Eair "$TARGET_DOMAIN"
Passwords
cd git_loot
grep -Eair "(secret|passwd|password)\ ?[=|:]\ ?['|\"]?\w{1,}['|\"]?" \
--exclude '*.css' --exclude '*.js'
Explaining the Regex
"(secret|passwd|password)\ ?[=|:]\ ?['|\"]?\w{1,}['|\"]?"
^ ^^ ^^ ^
'----------------------''---------''------------------'
| | |
| | ['|\"]? Optional ' or " on the left
| | \w{1,} one or more alphanumeric characters
| | ['|\"]? Optional ' or " on the right
| |
| |
| '--- \ ? Optional space on the left
| [=|:] Either a `=` or `:`
| \ ? Optional space on the right
|
|
'------- Either secret, passwd, or password
All together, trying to match on
secret=some_text
passwd="some_text"
password = "some_text"
secret: some_text
password:some_text
etc.
etc.
Search the Git Revision History
cd git_loot
git rev-list --all | xargs git -P grep --color -Eair "(secret|passwd|password)\ ?[=:]\ ?['|\"]?\w{1,}['|\"]?" | sort -u
Search for passwords in the git commit history
TARGET_DOMAIN='domain.tld'
git rev-list --all | xargs git -P grep --color -Eair "$TARGET_DOMAIN" | sort -u
Search for emails and hostnames in the git commit history
GitLeaks
- Download latest release:
https://github.com/gitleaks/gitleaks/releases - Download
gitleaks.tomlfile: https://raw.githubusercontent.com/gitleaks/gitleaks/refs/heads/master/config/gitleaks.toml
cd git_loot
/path/to/gitleaks -c /path/to/gitleaks.toml -r findings.json dir "$PWD"
Search the current directory, git_loot, using gitleaks
less findings.json
Peruse the findings.json file for any goodies