Effective Subdomain Crawling using Python

Azhar Ghafoor
3 min readAug 24, 2022

Crawling to discover subdomains of any domain

If you want to test a website’s security and hack into it, you must be able to examine the whole web application as well as all of the provided capabilities and technologies in order to find a flaw that will allow you to access the system. A vulnerability in anything as simple as a search field or a forgotten page might be exploited to compromise a whole system. Therefore, it is imperative to gather as much information about the website as possible, including its directories, files, subdomains, and so on.

What exactly do we mean by a subdomain?

Subdomains are subsets of the main domain. For instance, “Google.com” is the main domain, and we have a variety of subdomains that enable you to access other Google websites, such as Google Drive, Gmail, Google Meet, Google Map, etc. We can visit “mail.google.com” to access Google’s email web application (Gmail), “plus.google.com” to access Google’s social network, and so on.

So, if you want to hack into any of the websites, you must first test all subdomains, which requires you to locate them. This is very essential since subdomains are often excellent areas to uncover vulnerabilities, as they are likely not to be as securely designed as the main websites.

Domain Crawler

So, now we will be writing our own tool that will allow us to discover all of the subdomains on the target website. Because manually testing is a very time-consuming task, to reduce this overhead, there must be some autonomous way of doing this tiresome task for us. We will use the Python programming language to craft a “subdomain-crawler” for us that will try different combinations from a list of subdomain names. This whole process is shown in the figure below.

Flow of Instructions

To determine whether a subdomain exists, we require a method of communicating with the website, such as entering the subdomain’s URL into a browser and then determining whether it exists. Since we want to accomplish this with a Python script, we’ll need a way to automatically make website requests. This programme will utilise the “requests” library, which can be installed with the command pip install requests.Using different word combinations, distinct subdomains will be created at runtime and their responses will be obtained using the request library.

This process appears like this:

Word = apt

Domain = google.com

Link_formation = word+’.’+domain

Final_link = http://+ Link_formation

So, the final link that we will be passed to the request library to get the response will look like this “http://apt.google.com

Steps to Follow

You may get the code from the Github repository or Linkedin post. These are the simple steps to follow to use this script. The file hierarchy looks like this:

  1. In this simple script, you will type the domain name such as target.com
  2. This script will use a file words_list.txt that contains a list of words to create multiple subdomains
  3. Each time a new domain is created it will be tested for response 200
  4. In the end, it will create a list of all discovered sub-domains
  5. To run the script use command python sub-domain-crawler.py
import requestsdiscovered_subdomains = []
url = ‘cytomate.net’ # main domain
def url_test(url):
try:
return requests.get(url)
except requests.exceptions.ConnectionError:
pass
with open(“words_list.txt”, “r”) as words:
for word in words:
test_url = “http://”+word.strip()+”.”+url

response = url_test(test_url)
try:
if(response.status_code == 200):
print(“[+] Discovered >”,test_url)
discovered_subdomains.append(test_url)
else:
pass
except:
pass

print(discovered_subdomains) # to see all discovered subdomains
print(“Done”)

I publish posts on numerous issues of cyber security, and you may find them useful. Check out my Linkedin and Medium profiles for more updates.

--

--

Azhar Ghafoor

Cybersecurity Researcher | Ethical Hacking | Data Analyst