{"id":11970,"date":"2013-12-09T11:32:27","date_gmt":"2013-12-09T11:32:27","guid":{"rendered":"http:\/\/www.davidnaylor.co.uk\/?p=11970"},"modified":"2023-02-28T14:48:39","modified_gmt":"2023-02-28T14:48:39","slug":"bulk-email-validation-checker-in-python-free-tool","status":"publish","type":"post","link":"https:\/\/www.bronco.co.uk\/our-ideas\/bulk-email-validation-checker-in-python-free-tool\/","title":{"rendered":"Bulk Email Validation Checker in Python [Free Tool]"},"content":{"rendered":"<p>Whether it\u2019s for outreach purposes or link removals, you won\u2019t get far without searching for email addresses, this of course can be time consuming but well worth it and a manual approach is always recommended, this way you\u2019re going to be able to contact the right person first time and have a more tailored approach when you\u2019re doing so &#8211; but of course there are ways to speed this process up.<\/p>\n<p>There are times though when you can\u2019t find contact details, even legitimate sites don\u2019t always display a clear way to contact them so there are a number of ways you can try and get the details, this might be checking Whois data, checking to see if they have social accounts, I\u2019ve had some good success in the past using services such as spyonweb.com to see if they have other properties with shared analytics that might be displaying the contact details or using an email validation service and having a guess at common email address.<\/p>\n<p>Recently I was doing a large link clean-up; removing all the poor quality links that have been built to a new client\u2019s site over the years and I was left with around 100 domains that had no visible contact details or contact forms, now I could just disavow the links and hopefully that will be enough but there\u2019s no fun in that and I prefer to remove links than just disavow to ensure they don\u2019t get caught in an algorithm update at some point in the future \u2013 The problem is, this is now a big task.<\/p>\n<p>To solve this problem and try to speed up the process I wrote a Python script that basically tests a list of URLs against common email names such as \u2018admin@example.com\u2019, \u2018info@example.com\u2019, \u2018support@example.com\u2019 etc. etc.<\/p>\n<p><strong>Here it is in action\u2026(play it in full screen)<\/strong><\/p>\n<p><iframe loading=\"lazy\" width=\"650\" height=\"396\" src=\"http:\/\/www.screenr.com\/embed\/pnoH\" frameborder=\"0\"><\/iframe><\/p>\n<p>These names can be whatever you want, I have just provided a few for you to start with but if you really wanted you could check it against a names database or registrar names. Also this obviously checks against multiple URLs (line by line in the text file), I just added the one to quickly show you as an example.<\/p>\n<p>The validation works by checking if the host has a SMPT Server and therefore actually exists. Once you have the list of valid emails, you&#8217;ll want to make a judgement call on which email is best, you can also use a tracking <a title=\"Email tracking script\" href=\"http:\/\/www.labnol.org\/internet\/email\/track-gmail-with-google-analytics\/8082\/\" target=\"_blank\" rel=\"noopener\">script to see if your email has been opened<\/a>, if not try another email address.<\/p>\n<p>Now I know what you&#8217;re thinking, why not just stick the variations in an email and send them all through BCC?, well I can think of a few reasons; it\u2019s potentially more time consuming to start with and less organised, also if one person receives the emails for multiple accounts, you may come across as a spammer and simply have your emails deleted. You also might run into problems if you do this on a large scale as Gmail, for example, has a 500 message send limit per day (1 message to 500 recipients counts as 500 emails).<\/p>\n<h2>A few things to keep in mind&#8230;<\/h2>\n<p>The code speaks to two files; <strong>common.txt<\/strong> (this contains your &#8216;admin@&#8217; type names) and <strong>urls.txt<\/strong> (this contains your URLs), there is a third file <strong>output.csv<\/strong>, this will be automatically created so don\u2019t worry about it too much it will be created in the directory where you have the program running (this should be the location of the other two files also.)<\/p>\n<p>This is using python 2.7 &#8211; If you&#8217;ve not used Python before you can just install it, it&#8217;s as easy as installing any software. Download it from <a href=\"http:\/\/www.python.org\/download\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>The more domains and names you load into it, the longer it will take to run but just set it going and get on with something else.<\/p>\n<p>Any issues and the program just keeps going and marks it as an error in the csv file. This will be timeout errors for whatever reason.<\/p>\n<p>It works with and without the http protocol as the URLs go through a cleaning process, so for example, the following; &#8216;http(s):\/\/bronco.test&#8217;, &#8216;http(s):\/\/www.bronco.co.uk&#8217; and &#8216;http(s):\/\/www.bronco.co.uk\/who-we-are.html&#8217; will all work!<\/p>\n<h2>Modules needed<\/h2>\n<p><a href=\"https:\/\/pypi.python.org\/pypi\/validate_email\" target=\"_blank\" rel=\"noopener\">Validate Email<\/a>. This module it needed to do the actual validation checks.<\/p>\n<p><a href=\"https:\/\/pypi.python.org\/pypi\/interruptingcow\/\" target=\"_blank\" rel=\"noopener\">Interrupting Cow<\/a>. When testing this script I was checking 25 email address which was taking about 9 minutes! This was due to the length of time it was taking for errors to timeout, I used this module to write in a timeout limit of 5 seconds, this cut the time down to about a minute and four seconds so it was well worth my sunday evening figuring it out!<\/p>\n<p>I hope you find this useful and if you have suggestions you can email me or add some comments below \ud83d\ude42<\/p>\n<p>Anyway enough chit-chat, <strong><a href=\"\/our-ideas\/wp-content\/uploads\/email_val1.txt\" title=\"Click here to grab the code!\" target=\"_blank\" rel=\"noopener\">here&#8217;s the code*<\/a>!<\/strong> <\/p>\n<p>*I&#8217;m not a developer and I&#8217;m sure there is a more efficient way of writing this code, Python is just something I have been exploring in my spare time. We have a software developer in-house who writes all our tools, custom to our needs, whether it&#8217;s in Python or any other language and I&#8217;m sure if he wrote this it would look very different. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whether it\u2019s for outreach purposes or link removals, you won\u2019t get far without searching for email addresses, this of course can be time consuming but well worth it and a manual approach is always recommended, this way you\u2019re going to be able to contact the right person first time and have a more tailored approach [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[4],"class_list":["post-11970","post","type-post","status-publish","format-standard","hentry","category-search-engine-optimisation"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/posts\/11970","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/comments?post=11970"}],"version-history":[{"count":0,"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/posts\/11970\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/media?parent=11970"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bronco.co.uk\/our-ideas\/wp-json\/wp\/v2\/categories?post=11970"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}