X-Scripts

READY-MADE SOLUTIONS FOR YOUR BUSINESS

OUR CONTACTS:
Skype: igor_sev2
Email : order@x-scripts.com

Human Emulator Script parser in the Yandex. Market

One of the topical tasks today is the parsing of the goods to Yandex Store. We decided not to bypass this task by the party and write a script which collects the specified products and all the information and stores it in a MySQL database.

Before writing the script we from the official site have downloaded and installed the latest build Mysql with all the personal belongings and raccomandabili the mysql library in php.ini for that you could use functions for working with mysql databases in php.

Since the data on the website are stored in utf-8 format we used for the development of the script Unicode version of Human Emulаtor. This version is next to exe-shnik a normal version called XWeb Human Emulator MT UE.exe. The input data for the script is also used in unicode format. On input, the script takes a file with key words, he is looking for the items you need in the following format:

laptop
monitor
keyboard
mouse

In the result, the script creates a database with a table products with markets columns Keyword, product Name, Average price, Range of prices, Html code, Html code brief charteristic, Html code of all characteristics. Can all the html to get results in text form. To do this, parsing the page of the product instead of the inner html you should use the functions that get inner text. In the function get_market_info($market_key) replace $element->get_inner_html_by_attribute on $element->get_inner_text_by_attribute.
Result of script in MySQL Workbench :

goods Yandex Store in MySQL database













script:

// the file to parse products - here you specify the file keyword to search products
// data must be in unicode
$a_markets = file("data/markets.txt");

// depth pass in the search results 
// how many pages to collect the goods before moving on to the next
// if you want to collect all the goods you just need to set this parameter = -1
$cnt_pages = 15;

The script itself looks like this:

<?php

$xhe_host ="127.0.0.1:7011";

// The following code is required to properly run XWeb Human Emulator
require("../../Templates/xweb_human_emulator.php");
// ////////// the script settings//////////////////

// a file with the goods to parse
$a_markets = file("data/markets.txt");

// depth pass in the search results
$cnt_pages = 15;
// current page
$crnt_page =1; 

// debug mode
$dbg=true;

// /////////////////// additional modules /////////////////////

// function 
require_once("functions.php");
// ///////////////////// script /////////////////////////////////////////////////////////

debug_mess(date("\[ d.m.y H:i:s\] ")." script started");
// connect to the database
$mysql_bd = @mysql_connect('localhost:3306', 'root', 'password'); 
// if not connected to issue an error message
if (!$mysql_bd) 
{
die('connection error:' . mysql_error());
// end script
$app->quit();
}
// create a database for products
$sql = 'CREATE DATABASE IF NOT EXISTS yandex_markets';
if (!mysql_query($sql, $mysql_bd)) 
debug_mess("failed to create database yandex_markets:" . mysql_error());

// wait until processed
sleep(1);
// if it has already been created we choose database
mysql_select_db("yandex_markets", $mysql_bd);
// wait until processed
sleep(1);
// create the table if it still has no
$sql = 'CREATE TABLE IF NOT EXISTS markets (Market TEXT, Market_Name TEXT, TEXT Avg_Price, Range_Price TEXT, Image, TEXT, MEDIUMTEXT Properties, All_Properties MEDIUMTEXT) ENGINE=MyISAM DEFAULT CHARSET=utf8';
// run the query
//mysql_query($sql,$mysql_bd);
if(!mysql_query($sql,$mysql_bd))
debug_mess("request error :" . mysql_error());

// go through all the products
for($j=0;$j<count($a_markets);$j++)
{
// zero the page before searching
$crnt_page =1;
// jump in yndex market
$browser->navigate("http://market.yandex.ru");

$input->set_value_by_name("text",$a_markets[$j]);
$button->click_by_number(0);

$button->click_by_inner_text("Pick",true);
// go until it runs out of links by the numbers
while(true)
{
$models= $webpage->get_body_inter_prefix_all("class=b-offers__name","\">");
$a_models=explode("<br>",$models); 
// go through all the links
for($k=0;$k<count($a_models);$k++)
{
// go to the product page
$str_href=get_string($a_models[$k], "modelid","&");
$anchor->click_by_href("modelid".$str_href,false); 

// get product information on
get_market_info(trim($a_markets[$j]));
// go back
$browser->go_back(); 
}

// not pertly on the next page 
if(!next_page($crnt_page)) 
break;
}
}
// close database
mysql_close($mysql_bd);

debug_mess(date("\[ d.m.y H:i:s\] ")." script finished<br>");

// Quit
$app->quit();
?>



The script is written 11.09.2012 in Human Emulator 4.4.19 Advanced. At the time of publication of the article 11.09.2012 script was working.

download the script
Number of downloads: 4673

<< Other scripts