Rafał Wójcicki

Junior Pentester, EXATEL

Is it possible to remain anonymous online?

October 11, 2021

In the following article I will endeavour to show you unusual techniques for finding information about any person, and how seemingly small pieces of information add up to a larger whole, allowing you to obtain even more data about your target.

I would also like to encourage the readers to do a better job of hiding online and managing their data (OPSEC).

I will start with a brief introduction covering some of the main definitions, then move on to not quite common, but very effective methods of searching for information. Towards the end of the article, I will provide you with an interesting example of OSINT.

There are as many methods as there are objectives, therefore I will not describe each and every one of them; even several volumes of books would certainly not be enough for that. This article was written for information purposes, it calls for OPSEC and can serve as a simple guide for non-technical persons interested in the subject; it may even contain some tidbits for the experienced.

What is OSINT?

OSINT or Open-Source intelligence is, according to many industry experts, the most important step in a well conducted penetration test. OSINT techniques, often referred to as white intelligence, involve gathering information about, for example, a company, employees, facilities, etc. from publicly available data. Very well conducted preliminary survey can often be the only guarantee of success, especially in attacks where a foothold in the system is gained by phishing. An important advantage of the white interview is its passivity (no contact with the target). A person who obtains information in this way is undetectable, at least in most cases. We can also distinguish grey and black intelligence, which I will cover in this publication, but it is worth noting that grey intelligence often balances on the edge of the law, while black intelligence usually already involves obvious violations of it. OSINT techniques are often specific and personalised (tailored to the target), but several main ways of obtaining information can be identified.

What is OPSEC?

In the introduction, I mentioned OPSEC (Operations Security), i.e. the process of managing risks and creating strategies to reveal as little as possible about oneself in order to protect information, the company, the facility or oneself. OPSEC is the answer to white and grey intelligence. It does not exclude it, but complements it. By knowing OSINT methods you can improve your OPSEC and thus benefit the company you work for as well as yourself.

OSINT and Doxing

White intelligence is closely associated with doxing, or revealing private information about a target. OSINT in this case is mainly used to find information about the victim and then, based on that, attempt various attacks to gain as much data as possible. The information obtained is published on websites such as doxbin.org. The aim of doxing is to harm the victim, and the methods used during the attack often go beyond white intelligence. Nevertheless, the two concepts have much in common.

Preliminary assumptions for OSINT

It is important to remember that here, just like in every science, there are certain axioms which constitute the basis and according to which further assumptions are made. The difficulty of OSINT lies mainly in choosing an appropriate method of looking for information, finding the right axioms and correlations between data. In the following, I will outline various preliminary points (axioms); the information we want to extract will be case-specific. It is also important to remember that when it comes to hacking and information extraction, it is worth repeating your steps but with different tools. Sometimes the results will differ or introduce something new into our search.

Basic methods of obtaining information

#1 Obtaining data – from phone number to account information and personal data.

A lot of people, especially the elderly, do not realise that some of their data, for example passwords, circulate on the internet due to leaks that may happen to every company, even the biggest giants. Those who are aware of this may even use special sites to check if their data has been leaked (https://haveibeenpwned.com/), but how many people know about websites like DeHashed (https://www.dehashed.com/) or breachdirectory (https://breachdirectory.tk/ that has recently been taken down, but may be reactivated) where huge databases of leaked data are aggregated. There seems to be a widespread belief that leak databases are difficult to access, like one would need great skills, access to relevant sites on the Tor network and a handful of acquaintances. While this is true for some databases – that even their owners do not know about – it does not have some broad applicability. A huge amount of stolen data is available at your fingertips. The DeHashed website aggregates really extensive content. If you take their word for it, as of this moment (September 2021) it has around 14.5 billion records.

OSINT

Those include: logins, passwords, phone numbers, names, home addresses, e-mails, and vin. Those results are amplified by the use of regular expressions, which I will now present in the following example.

Scenario:

Bob knows our phone number and writes harassing text messages to us without any restraints. Our goal is to determine Bob’s identity.

Attack:

Using DeHashed we can find a lot of information about him, including his name, IP address (which has probably been changed a long time ago), login data and email, which is quite a substantial amount of information. Here is an example of data from one of the leaks:

Now we know who is behind these text messages.

#2 When everything else fails, Gmail will help

Scenario:

We get lots of messages from a mysterious admirer, but we’d like to find out who they really are. We only have their e-mail address at the gmail.com. Additionally, we cannot find any information about this e-mail.

Attack:

No problem at all. Just use the Ghunt tool https://github.com/mxrch/GHunt which pulls interesting information from one’s profile, such as: name (or whatever they put in those fields), avatar, reviews, calendar, google images, YouTube channel, location, Google Hangouts. Some of this information is often given with a certain level of probability. Using this tool, we were able to learn a person’s likely name and location based on reviews of various restaurants. Additionally, in the process of trying to recover the password, we learned the phone model used by the admirer by using the “I forgot my password” option and flipping between different methods.

#3 There’s almost nothing left, but Google dorks is here to help

Google dorking involves obtaining information using the popular google search engine. It contains various commands to make your search easier. Their list can be found on this webpage https://gist.github.com/sundowndev/283efaddbcf896ab405488330d1bbc06

Scenario:

Our target sent us an e-mail that looks like spear phishing (targeting a specific victim, not a multitude of targets as in phishing). We want to find out who this person is or at least gain more information about them.

Attack:

We will make use of google dorking and more specifically the “site:” filter, which will search particular site and lists all the results for it. Two sites are perfect for this purpose https://pastebin.com/ and https://doxbin.org/where the first one is a popular site where you paste mainly code snippets, but also very often there are published data from small leaks, doxbin I presented earlier (you will probably not find anything there, but it’s always worth a try). In a google search, type site:pastebin.com

example@email.com or site:doxbin.org example@email.com. In this case, we were able to find an unusual password that, when searched in DeHashed, pointed to a specific person, which does not end the investigation yet. You must now demonstrate that the newly found information belongs to the owner of the e-mail.

#4 Images with metadata

Scenario:

We want to know the name of the person who sent us the mysterious photo via the text message.

Attack:

Every file has some metadata in it that serves different purposes. In the case of photos, it’s usually camera parameters and (increasingly rare) location. Unfortunately, it is not often possible to use this technique. because many websites scrubs their files of this data. However, this time it’s different – we received this picture in a text message. Reading metadata is easy – you can use online tools http://exif-viewer.com/or desktop tools such as exiftool.

(Be wary of older versions https://blogs.blackberry.com/en/2021/06/from-fix-to-exploit-arbitrary-code-execution-for-cve-2021-22204-in-exiftool).

After reading the EXIF data, we can easily track down the victim as long as they provide their location in the metadata. The rest is quite simple. Sometimes you can also find out the model of the phone used to take that picture, which narrows down your search when, for example, you will be dealing with several e-mails registered under gmail.com.

The above screenshot shows some of the information that we were able to pull out. Here I used the gnome-screenshot tool.

#5 Images without metadata

Scenario:

This time we got a photo without metadata (e.g. from Facebook ). So, once again, we want to track the sender.

Attack:

If the photo depicts a place or a person, try using image search. Most of you have probably used Google Images which is a really great tool, but unfortunately only for identical images; searching for similar images is inefficient. There are much more effective search engines, such like Yandex that can do this really well. I could end with this, but it certainly wouldn’t be satisfying. There is another site that may be even better than yandex, namely https://pimeyes.com/. See the example below:

As expected, in this case it was quite easy. Unfortunately, I can’t use a private person’s image for an example, so I encourage you to upload your own photo.

Sometimes the results are downright surprising so I really recommend checking them out. Using that tool I was able to find this person’s photos in the local newspaper. The rest is just a matter of time.

#6 Stroke of luck

Scenario:

Once again we received a picture – but not just any picture. There is an airplane and a patch of wet grass in the background, but, unfortunately, no Bob. Bob, of course, is in Poland.

Attack:

This case is a bit of a stretch because it’s hard to get a photo of the plane in the background, much less the area where you can see what the weather was like. Additionally, it is not always possible to check where the victim is. I want to mention it as an interesting scenario that might happen someday. We have just received this photo and the context suggests that it was taken no more than a few minutes earlier. Using https://www.flightradar24.com/we can track down potential locations where the aircraft, and therefore the person we are looking for, may be located. See the example below:

It turned out that during that time it rained only in two voivodeships and just then several planes flew over them, one of them over the city. Depending on the situation, perhaps a voivodeship or even a city could be determined. It is important that we have obtained some information, and we have already narrowed down the area to “only” two provinces. In an open-source intelligence, any information can be at a premium. Let’s not forget that!

#7 When everything else fails

This case is unscripted. I want to mention one of the most effective open-source intelligence tools, namely https://osintframework.com/. It contains a ton of pages and resources for gathering information. Additionally, the whole database is divided into sections depending on the state of knowledge we have about the victim. Try it out, because OSINT framework is an invaluable help.

#8 Too much information

Scenario:

One person writes to us notoriously from an e-mail address. We easily find a lot of information about this person (even accounts for specific domains!), but we have too much data and connections.

Attack:

This is where Maltego comes into play. A really great tool that is even more effective when combined with websites. Maltego sifts through huge amounts of information collected from various social networks and aggregates all this content displaying it as a mind map, making it easier to interpret. We easily determined what information is needed in the case of the victim. We also got a handful of new data such as the domain owner, potential phone numbers and some usernames. Personally, I don’t use it as one of my first tools due to the fact that it does a much better job looking for information about a website than about a specific user.

#9 Be like Sherlock

Scenario:

Alice inadvertently communicated to us her username.

Attack:

Of course, you can use the DeHashed website, but you there is no guarantee that you will get any results. After all, all the data comes from leaks. What about a case where the username did not leak? This seems to be a problem, however, we can search popular portals for the username. For this purpose we can use a tool https://github.com/sherlock-project/sherlock which searches many popular websites for accounts with the specified username. Sometimes it returns a few false-positives (information that appears to be true, but turns out to be false), but either way it is really impressive. An alternative as well as an additional tool is https://whatsmyname.app/. It is a good idea to use both tools, as the results can often exceed expectations.

#10 Regex is really powerful

Scenario:

John impersonates us on Facebook. He created a profile with the same name and picture. Worst of all, he has more friends and received more likes than us!

Attack:

At first glance, it seems that we will not be able to figure anything out. The photo is ours, plus Facebook clears EXIF. There is one more piece of information that we can obtain (by the way, this is a common practice on many portals) – account recovery. Most of the time, it’s unnoticeable (Instagram is the exception to the rule!). We are usually provided with very little information before we recover the account. In the case of Facebook and Gmail, these are the last two digits of your phone number. Additionally, Facebook reveals the first and last letter of the e-mail name and its length. You will have to guess the domain yourself, but that is not a problem – most popular domains have different lengths so it is not particularly hard to determine which mailbox you are dealing with. Click on the “Forgot Password” button and enter his Facebook username, which appears in the url bar at the top of the browser. If it is an FBID (a string of numbers replacing the name) you can try using the option “I Can’t Access My Account”. Unfortunately, I do not know if the latter solution is unnoticeable. Fortunately, most people have a Facebook username set up.

From this we can deduce much more information. Let us suppose John has an e-mail address:
j****7@*****.com

We can enter the phrase provided into DeHashed.

email:j????7&domain:gmail&number:???????54.

DeHashed will search for all results with the given patterns, thus significantly narrowing the searching scope to no more than a few results. Then we can use Ghunt to try to find some more information.

#10 WiFi name only

Scenario:

One person sent us a screenshot of our photos that they should not have access to. Your phone status bar shows the WiFi name.

Attack:

It turns out that even seemingly insignificant information can reveal much more about us than we could ever imagine. One such piece of information is the name of our WiFi. On the website https://wigle.net you can view various information about WiFi parameters and its location. Of course, a number of WiFi does not appear on the list, but this time were lucky and found the culprit.

#11 Allegro is not just about shopping

Scenario:

We know that our figurehead has a Twitter account with a specific username. We would like to learn a little more about him.

Attack:

Thanks to the username, with the help of the Maltego tool we were able to find a matching e-mail address. Unfortunately, all other methods have failed, and this is far too little information. We know that the victim is Polish, so he most likely has an Allegro account. We click on the “Forgot password” button, enter the e-mail address and we’re done. Allegro reveals as many as 6 digits of a phone number. Now, using DeHashed and Regex, we can search potential phone numbers and then, after excluding some numbers determine the identity of our figurehead. I have covered this in more detail on my blog: https://wandenreich.weebly.com/blog/ustalanie-numeru-telefonu-oraz-innych-danych-z-wykorzystaniem-allegro-oraz-baz-danych.

#12 No information available

Scenario:

We only know the full name of the figurehead. All OSINT methods have failed.

Attack:

In that case, the only option left is e-mail permutation, which means creating potential e-mails based on some standard such as firstname.lastname@domain. One of the tools we can use is:

https://github.com/jacobgoh101/email-permutator

with a bit of luck we will find the right person, but before we start searching, we still need a list of domains and only then, by combining these two elements, we can check which e-mails exist and which do not. When using this method, we must be careful and not go too far. Some e-mails can be checked unnoticeably (Gmail) while some, unfortunately, cannot.

If we have the e-mail address we can do much more. In this case, we found the location of the person assigned to a particular e-mail address.

Advanced information retrieval methods

#4chan vs Trump’s opponent

I decided to include the following content in order to demonstrate the fact that OSINT is a really broad subject and the number of ways to access data is huge, so I will start with a brief introduction.

One of Donald Trump’s opponents often broadcast the phrase “He will not divide us” in various places. Sometimes it was on the walls of buildings or famous locations, but each time the police interfered. Therefore, he decided to hide the flag with the aforementioned inscription in a place where no one would find it. He then started the 24-hour broadcasts; one would think that no one would find such a small flag, but nothing could be further from the truth. One user of the notorious 4chan determined a reasonably accurate location with a radius of several kilometres based on planes that flew over the flag at different times of the day. Another user who lived in this location decided that he would drive around it, honking his car the whole time. This way, you could determine on which side the flag was most likely placed. After this it was slightly easier and as dusk approached one user determined a very accurate location by analysing the position of the stars. When the location was discovered, the flag was of course switched. The moral of the story is that unless there is a need, you should not reveal any information, no matter how insignificant it may seem.

Summary

I hope that with this post I have encouraged many people to OPSECU and further study of OSINT methods. This is a really interesting topic that comes in handy quite often in everyday life. In my next post, I will look at advanced methods and fascinating cases of white intelligence. Thank you all for your time!

Rafał Wójcicki

Junior Pentester, EXATEL

Published by: Katarzyna Chojecka