X-Scripts

READY-MADE SOLUTIONS FOR YOUR BUSINESS

OUR CONTACTS:
Skype: igor_sev2
Email : order@x-scripts.com

Script Human Emulator the collector of email addresses

The script collection email database address. This script will help you collect your sobstvennye database of email addresses.

The script works as follows: take search phrases from the file and enter them into the google search box. Then parse Google search results links. Then go through the links, go to every website that go to the "Contact" page or "About us" and using regular expression of php, collect all email addresses on the page. Then write what they have collected in the file removing duplicates. Here and ready our base with target email addresses.

On input, the script takes a file with search phrases in the format:
furniture Ukraine contacts
furniture Kiev contacts
furniture Donetsk contacts
furniture Kharkov contacts

have the output file with the results in the format:
decor2004@inbox.ru
info@liganova.kiev.ua
evgenzap@ukr.net
vlabi@optima.com.ua
tasi@io.zp.ua

script:

// the data file for the script
$keys = file("data/keys.txt");
// file with results
$file_res="res/email.txt";

// depth pass in the search results
$cnt_pages = 10;

For the script to work better to disable all unnecessary except java script. For writing this script used the script blank Google Parser. Using a similar script is blank the Parser Yandex you can easily write the same script to collect mail for Yandex.

The script itself looks like this:

<?php

$xhe_host ="127.0.0.1:7010";

// The following code is required to properly run XWeb Human Emulator
require("../../Templates/xweb_human_emulator.php");

// //////////////////////// the script settings /////////////////////////
// the data file for the script
$keys = file("data/keys.txt");
// file with results
$file_res="res/email.txt";

// depth pass in the search results
$cnt_pages = 10;
// current page
$crnt_page =1; 

// the script works in debug mode
$dbg = true;

// //////////////////////// additional modules ///////////////
// function 
require_once("functions.php");

// /////////////////////// script ///////////////////////////////////////////

debug_mess(date("\[ d.m.y H:i:s\] ")." script started");

// number
for($ii=0;$ii<count($keys);$ii++)
{
// get request
$key = trim($keys[$ii]);

// go to Google 
$browser->navigate("google.com");

// set the word to search
$input->set_value_by_name("q",$key);
$input->click_by_name("q");
// press the space bar to disable the pop-up hints
$keyboard->send_key(32,true);

// press enter
$keyboard->send_key(13,true);

// wait for
sleep(3);

// reset before next pass
$crnt_page=1;

while(true)
{
// get all the links to the websites of the prisoners in a tag <cite>
$sites=$webpage->get_body_inter_prefix_all("<cite>","</cite>");
$sites=explode("<br>",$sites);
// go through all the links
for($i=0;$i<count($sites);$i++)
{ 
// go to the site
$site=str_replace("<b>","",trim($sites[$i]));
$site=str_replace("</b>","",$site);
if($site=="")
continue;
// display the debug toolbar
//debug_mess("link : ".$site); 

// open and set active a new browser
$browser->set_count(2);
$browser->set_active_browser(1,true);

// go to the site
$browser->navigate($site);
sleep(1);
// move the contacts
$anchor->click_by_inner_text("contacts");
$anchor->click_by_inner_text("Contacts");
$anchor->click_by_inner_text("About");
$anchor->click_by_inner_text("about");
sleep(2);
// search for all email on the page
preg_match_all('#[\w\d.-_]+@([\w\d.-_]+\.)+[a-zA-Z]{2,6}#i', $webpage->get_source(), $matches);

// iterate over the results
foreach ($matches[0] as $key=>$value)
{
//debug_mess("soap: ".$value);
// remove extra
$str_mail=str_replace(">","",$value);
$str_mail=str_replace("<","",$str_mail); 
$str_mail=str_replace("mailto:","",$str_mail); 
$str_mail=str_replace("/","",$str_mail); 
$str_mail=str_replace("mail:","",$str_mail); 

// write to the file
$textfile->add_string_to_file($file_res,trim($str_mail)."\n",60) ;
}

// close and go back
$browser->set_active_browser(0,true);
$browser->close_all_tabs();

// remove duplicates from file
dedupe($file_res);
}

// not passed to next page 
if(!next_page($crnt_page)) 
break;
}

}

debug_mess(date("\[ d.m.y H:i:s\] ")." script finished<br>");

// Quit
$app->quit();
?>



The script is written 2.10.2012 in XHE Human Emulator 4.4.19 Advanced. At the time of publication of the article 3.10.2012 script was working.

download the script
Number of downloads: 4882

<< Other scripts