Scraping ASP.NET page using Java WebClient (HtmlUnit)
For some reasons, scraping ASP .NET sites is impossible because of ASP.NET view state. This is a continuation of my last post Scraping ASP.NET page in PHP Curl. I discovered this technique from a...
View ArticlePHP: Scraping paginated list – recursion with anonymous function
This function below is very helpful in scraping contents from paginated site, using recursion and PHP’s anonymous function. I haven’t tested this function but you can use the same idea. $done_pages =...
View ArticlePHP: Check if the linked file is a valid file
This is one way to check if the linked file on a page really exist, and at the same time to check if it is really a valid pdf or word file. Some will just return a html page, so you need to check the...
View ArticlePHP: Removing special characters from scraped content
Special characters can cause errors or string will be truncated when inserted into the database. Here is my code to remove those special characters:...
View ArticlePHP: htmlspecialchars_decode, htmlentities and strip_tags – my best friends...
As per title, I’ve been using htmlspecialchars_decode, htmlentities and strip_tags to prevent code injection attacks in PHP. You can combine the three in one line of code: $text =...
View ArticlePHP – CURL: Scraping page with cookies with CURLOPT_COOKIEJAR option
Sometimes you want to scrape a page that requires a cookie to be sent with the request for the page. Here is the solution: $ckfile = tempnam (“/tmp”, “cookie”); $url = ‘http://example.com/page';$ch =...
View ArticlePHP: PHP: Sort an associative array by value of a given key
This is how I sort an associative array by value of a given key.Here is the function. Note that I did not do any benchmarking about the performance. function order_by_key_value(&$array, $key) {...
View ArticleZend PHP 5.3 certification
Today, I passed the Zend PHP 5.3 certification exam. I purchased the voucher earlier this year and planned to take the exam before the end of this year. And, I did it! Like, Hiro Nakamura said:...
View ArticleDjango 1.5 + WSGI + Apache VirtualHost
Here is a sample VirtualHost configuration of django 1.5 application. <VirtualHost *:80> ServerAdmin hello@example.com ServerName example.com ServerAlias www.example.com WSGIScriptAlias /...
View ArticleCodeIgniter + CloudFlare: This website is offline
I have a web site using CodeIgniter framework and it is working fine. I tried using CloudFlare to speed up my site but it keeps on displaying ‘This website is offline’. I do some research and found out...
View Article