How To Scrape Data From Website Using PHP

As a PHP programmer, we often need to get some data from another website for some purpose. Getting data from another websites is known as web scraping. In this tutorial you will learn how to scrape data from website using PHP. The tutorial is explained in easy steps with live demo and download demo source code.


So let’s start the coding. We will have following file structure for data scraping tutorial


  • index.php

  • scrape.js




Steps1: Create Form To Enter Website URL
As we will handle this tutorial with demo, so first we will create From in index.php with submit button to enter website URL to scrape data.

<form method="post" name="scrap_form" id="scrap_form" action="scrape.php">
<label>Enter Website URL To Scrape Data</label>
<input type="input" name="website_url" id="website_url">
<input type="submit" name="submit" value="Submit" >
</form>



Steps2: Create PHP Function Get Website Data
Now we will create a PHP function scrapeWebsiteData in scrape.php to get website data using PHP cURL library that allows you to connect and communicate to many different types of servers with many different types of protocols.

function scrapeWebsiteData($website_url){
if (!function_exists('curl_init')) {
die('cURL is not installed. Please install and try again.');
}
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $website_url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($curl);
curl_close($curl);
return $output;
}



In above function, we are checking whether PHP cURL is installed or not. Here we have used three cURL functions curl_init() initializes the session, curl_exec() executes, and curl_close() to close connection. The variable CURLOPT_URL is used to set the website URL that we scrapping. The second CURLOPT_RETURNTRANSFER is used to tell to store scraped page in a variable rather than its default, which is to simply display the entire page as it is.



Steps3: Scrape Particular Data from Website
Now finally we will handle functionality to scrape particular section of page. As mostly we don't want all data from page, just need section of page or data. So here in this example, we will look for latest posts at PHPZAG.COM. For this we will pass that particular section from which we start getting data and end point. Here we have have used CURLOPT_RETURNTRANSFER variable to that particular scraped section of page.

if(isset($_POST['submit'])){
$html = scrapeWebsiteData($_POST['website_url']);
$start_point = strpos($html, '<h3 class="widgettitle">Latest Posts</h3>');
$end_point = strpos($html, '</div>', $start_point);
$length = $end_point-$start_point;
$html = substr($html, $start_point, $length);
echo $html;
}



Now have a list of latest posts from PHPZAG.COM. This is really a simple example to get that particular section of page. You can go further to get useful data from websites according to your requirement. For example, you can scrape data from eCommerce websites to get product details, price etc. The point is, once the website data in your hands, you can do whatever you want.

You can view the live demo from the Demo link and can download the script from the Download link below.
Demo [sociallocker]Download[/sociallocker]

Komentar

Postingan Populer