Skip to main content

Mining Data from Git Repos

Interesting Files

The regex patterns found on this page are just some examples you could use to extract data from git repositories. Get creative and build upon these with your own grep regex patterns.

cd git_loot
find . -type f \( -name '*config*' -o -name '*setting*' \) | 
grep -iE '\.ya?ml$|\.ini$|\.php$|\.json$|\.conf$|\.config$|\.txt$'

find configuration or settings files ending with specific extensions

find . -type f \( -name '*config*' -o -name '*setting*' \) |
grep -iE '\.ya?ml$|\.ini$|\.php$|\.json$|\.conf$|\.config$|\.txt$' |
xargs -I {} grep --color -PHair '^(?!(\s{0,}?//\s?|\s{0,}?\*\s{1,}?|\s{0,}?\#\s{1,}?)).*(root|admin|passw|database|db|sql|domain\.tld)' {}

Send interesting files down the pipeline and search for strings in these files

Explaining the Regex
'^(?!(\s{0,}?//\s?|\s{0,}?\*\s{1,}?|\s{0,}?\#\s{1,}?)).*(root|admin|passw|database|db|sql|domain\.tld)'
 ^                                                   ^^                                              ^
 '---------------------------------------------------''----------------------------------------------'
                   |                                                          |
                   |                                                          '---- .* Any single character followed by anything else
                   |                                                          (root|admin|passw|...) boolean OR group of words to search for
                   |                                                                     
                   |
                   '------- ^ Starts with
                            (?!(\s{0,}? ...) Perl style regex lookahead to ignore characters
                                             Ignores any `//` at the start that may or may not be preceded or succeeded by a space
                                             Ignores any `*` at the start that may or may not be preceded or succeeded by a space
                                             Ignores any `#`at the start that may or may not be preceded or succeeded by a space
                                             Effectively trying to ignore comments in files

                   
                    The goal is to find any line that is not a comment
                    And, look for any sensitive keywords that may be a configuration
                    password, secret, etc

Recursively Grep Files

Emails and Hostnames

cd git_loot
TARGET_DOMAIN='domain.tld'
grep -Eair "$TARGET_DOMAIN"

Passwords

cd git_loot
grep -Eair "(secret|passwd|password)\ ?[=|:]\ ?['|\"]?\w{1,}['|\"]?" \
--exclude '*.css' --exclude '*.js'
Explaining the Regex
"(secret|passwd|password)\ ?[=|:]\ ?['|\"]?\w{1,}['|\"]?"
 ^                      ^^         ^^                  ^
 '----------------------''---------''------------------'
             |                |               |
             |                |     ['|\"]? Optional ' or " on the left
             |                |     \w{1,} one or more alphanumeric characters
             |                |     ['|\"]? Optional ' or " on the right
             |                |
             |                |
             |                '--- \ ? Optional space on the left
             |                      [=|:] Either a `=` or `:`
             |                      \ ? Optional space on the right
             |
             |
             '------- Either secret, passwd, or password
                      
                      All together, trying to match on 
                      secret=some_text
                      passwd="some_text"
                      password = "some_text"
                      secret: some_text
                      password:some_text
                      etc.
                      etc.


Search the Git Revision History

cd git_loot
git rev-list --all | xargs git -P grep --color -Eair "(secret|passwd|password)\ ?[=:]\ ?['|\"]?\w{1,}['|\"]?" | sort -u

Search for passwords in the git commit history

TARGET_DOMAIN='domain.tld'
git rev-list --all | xargs git -P grep --color -Eair "$TARGET_DOMAIN" | sort -u

Search for emails and hostnames in the git commit history

GitLeaks

  1. Download latest release: 
    https://github.com/gitleaks/gitleaks/releases
  2. Download gitleaks.toml file: https://raw.githubusercontent.com/gitleaks/gitleaks/refs/heads/master/config/gitleaks.toml
cd git_loot
/path/to/gitleaks -c /path/to/gitleaks.toml -r findings.json dir "$PWD"

Search the current directory, git_loot, using gitleaks

less findings.json

Peruse the findings.json file for any goodies