Effective Subdomain Crawling using Python
Crawling to discover subdomains of any domain
If you want to test a website’s security and hack into it, you must be able to examine the whole web application as well as all of the provided capabilities and technologies in order to find a flaw that will allow you to access the system. A vulnerability in anything as simple as a search field or a forgotten page might be exploited to compromise a whole system. Therefore, it is imperative to gather as much information about the website as possible, including its directories, files, subdomains, and so on.
What exactly do we mean by a subdomain?
Subdomains are subsets of the main domain. For instance, “Google.com” is the main domain, and we have a variety of subdomains that enable you to access other Google websites, such as Google Drive, Gmail, Google Meet, Google Map, etc. We can visit “mail.google.com” to access Google’s email web application (Gmail), “plus.google.com” to access Google’s social network, and so on.
So, if you want to hack into any of the websites, you must first test all subdomains, which requires you to locate them. This is very essential since subdomains are often excellent areas to uncover vulnerabilities, as they are likely not to be as securely designed as the main websites.
Domain Crawler
So, now we will be writing our own tool that will allow us to discover all of the subdomains on the target website. Because manually testing is a very time-consuming task, to reduce this overhead, there must be some autonomous way of doing this tiresome task for us. We will use the Python programming language to craft a “subdomain-crawler” for us that will try different combinations from a list of subdomain names. This whole process is shown in the figure below.
Flow of Instructions
To determine whether a subdomain exists, we require a method of communicating with the website, such as entering the subdomain’s URL into a browser and then determining whether it exists. Since we want to accomplish this with a Python script, we’ll need a way to automatically make website requests. This programme will utilise the “requests” library, which can be installed with the command pip install requests.
Using different word combinations, distinct subdomains will be created at runtime and their responses will be obtained using the request library.
This process appears like this:
Word = apt
Domain = google.com
Link_formation = word+’.’+domain
Final_link = http://+ Link_formation
So, the final link that we will be passed to the request library to get the response will look like this “http://apt.google.com”
Steps to Follow
You may get the code from the Github repository or Linkedin post. These are the simple steps to follow to use this script. The file hierarchy looks like this:
- In this simple script, you will type the domain name such as
target.com
- This script will use a file
words_list.txt
that contains a list of words to create multiple subdomains - Each time a new domain is created it will be tested for
response 200
- In the end, it will create a list of all discovered sub-domains
- To run the script use command
python sub-domain-crawler.py
import requestsdiscovered_subdomains = []
url = ‘cytomate.net’ # main domaindef url_test(url):
try:
return requests.get(url)
except requests.exceptions.ConnectionError:
passwith open(“words_list.txt”, “r”) as words:
for word in words:
test_url = “http://”+word.strip()+”.”+url
response = url_test(test_url)
try:
if(response.status_code == 200):
print(“[+] Discovered >”,test_url)
discovered_subdomains.append(test_url)
else:
pass
except:
pass
print(discovered_subdomains) # to see all discovered subdomains
print(“Done”)