It’s hard not to have a GitHub repository if you’re doing any kind of development.
Of course, with the code, a person leaves an email in the repository – most often personal, sometimes work, sometimes both together.
What to do when we only have an email address on hand?
Code search on GitHub itself has several limitations and non-obviousness. But it still gives a great chance to stumble upon a match – often the author’s contact appears in the code.
Search by email in commits, unfortunately, is not possible. But there is a search in a public mailbox using the test in: email type modifier. Useless for the most part, because the box from the profile is very likely to be indexed by Google.
Fuzzy search is frustrating, for example: in the screenshot above, GitHub for some reason searched not for the string [email protected] , but for three substrings: test, example and com.
The grep.app site will definitely solve the problem with fuzzy search. It searches 500K popular repositories, supports regular expressions, and of course, exact matches.
As can be seen from the screenshot, the results are poor . And it also can’t search by commits. But for complex searches, the site is perfect, so when you only know part of the email, it will be verry useful.
Example below: search in a corporate mailbox, for following format: [email protected], the name “Vlad” and the end of the surname “ov” are known.
The biggest disadvantage: of course, the small amount of data. We are not sure that the site will ever be able to search all repositories in real time (because GitHub simply does not provide API).
I still consider the dataset of free GitHub repositories posted in Google BigQuery to be the most promising tool. 2.8 million repositories, more than 2 billion files, 145 million commits, the total size is about 3 terabytes.
And for all this, an SQL-query interface is available, which allows you to make queries by email, name, pieces of code.
The dataset is a bit depersonalized. This means that in the commit metadata we will find not the mailbox [email protected] , but [email protected] This will not stop us, since we already must have the name of the box. Therefore, it is enough to make SHA1 from the header of the email – and the search expression is ready.
And yes the commit link is always stored nearby, so the email can be observed manually if the repository is still available on GitHub.
- Suppose we are sure that the person’s nickname is also used as the mailbox name. We take it as search criteria (example: soxoj).
2. We make SHA1 — online or via console: echo -n “soxoj” | sha1sum
3. We check the interface for requests. We will be asked to create a project, if you don’t have it yet, create it.
4. Enter the following query:
select repo_name, commit, author.name, author.email, committer.name, committer.email from bigquery-public-data.github_repos.commits where author.email like '4b9e910872a66d9b7d7e137ad70e3abfaad7eda7%' or committer.email like '4b9e910872a66d9b7d7e137ad70e3abfaad7eda7%'
What are we doing with this request?
We request the name of the repository, the hash of the commit, the name and email for the author and committer. We are looking for in the table with commits. We need to filter by email, specifying its beginning – the hashed name.
- Click “Run”, wait, get a table of repositories, commits and “impersonal” email addresses.
The email is obvious (now we can see the domain), and the rest can be viewed directly in the repository. The link is formed as repo_name+commit:
It is required to count on Quotas. That is why we were asked to create a project. It means that with mass requests to the API, it would be required to allocate some money amount at Google. For example, I quickly ran out of 1 TB money quota due to regional restrictions – it is calculated for each project.
Thus, you can search for more complex cases – depending on your case.
Of course, no universal tools that will solve all your search problems. It always makes sense to take a fresh look at familiar things and try to use them outside the box.
Subscribe to our channel and do not miss new collections of tools in various areas of Information Security.
Posted by: @ESPYER.