Extraction of Emails & Header Information from Outlook in Order to Detect Phishing Attacks

Note: Section-1 of this article described the prerequisites for this topic; please read it before proceeding with this part.

Before delving into the practical demonstration of the work, it is necessary to examine the modules that I utilized in this project. This whole procedure is done out on Windows-10 using the Python programming language and the Outlook client that is installed as part of MS Office or separately. Now I’ll talk about the Python libraries that helped me finish this project.

A practical demonstration of phishing detection

imaplib: This module provides three classes: IMAP4, IMAP4 SSL, and IMAP4 stream, all of which are used to represent a connection to an IMAP4 server in a standardized manner.

email: A library for dealing with email messages, while it is not meant to send emails to SMTP or other servers. Under email.utils packages Pasrsedate() utility attempts to parse a date according to the rules while Parsedate_tz() returns ‘None’ or a 10-tuple, where the first 9 components are time zone offsets. Email.parser() is able to interpret the majority of email document structures and returns the object’s root instance, which is called an Email Message.

Moving on to the coding phase, the code written below is used to get emails from the inbox folder of Outlook from the account on which you want to see the phishing attempts. Simply replace imap_password and imap_username with your own credentials in code, and it will get all emails and their header information from the account and assign them to a pandas Dataframe for further analysis.

If you wish to export all of the retrieved data to a CSV file, use the command df.to_csv (“myMail.csv”) Otherwise, go to the next step.

As discussed in my previous article, SPF verification is used to check if the server sending the mail is authorized by the domain or not, we can use this value as a first step of detecting the phishing emails. If SPF verification fails then chances increase that received email is not from an authentic source. But for now, to keep it simple I will not be using this step and directly move to the main part.

Systems that transmit fraudulent emails often do so by modifying the header values of the emails being sent. To make phishing undetected, various systems use different tactics. I was able to decipher two of these ways and identify the email’s original senders.


As part of the first attempt, I utilized the x-sender-id of the header to track down the primary individual who was behind the fake email that pretended to be the original user listed in the header’s from section. As can be seen in the code snippet below, SPF verification is also unsuccessful when an email is faked, in addition to the obvious difference in x-sender-id and from.

In code below, I have used RE python library to extract email address from x-sender-id because it also contains other information. X-mailer is used to check the original server such as outlook, Gophish(in case of spoofed emails) or Gmail.



Although I was able to successfully detect the phishing emails but in some experiments servers changed those header values, so my script was not working smoothly, I had to again check for the possible values that could help in sorting out the issue. Then I came across the message-id, that contained the information about the fake user. The code of this approach is added in code box.




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Azhar Ghafoor

Azhar Ghafoor


Cybersecurity Researcher | Ethical Hacking | Data Analyst