Find me on Google+

Web data scraping Guide for SEO

What is Web data scraping?

We frequently hear these words “data scraping” “data mining” etc. Yes it’s true that there is wealth of information out there and every marketer can reap some benefit out of this as well, but the question is how? I have been using a number of tools for a while now to scrape data for SEO. I am using data for On-site Audit, Link Analysis, Blog prospects and outreach. Data scraping for SEO does sounds complicated but trust me it’s not. You need to make sure you have the right tools for the job and you are going after right data.

How to use scraped data for SEO?

On-site Audit

You can use a number of crawlers to crawl the site you are auditing and can scrape very useful data on on-page elements to determine if the site is optimised well or not. You can use tools like Screaming Frog (Paid with functional free version available) or Xenu (free). I would recommend Screaming Frog as you can call it “Ferrari” of crawling. There is a great article on seer interactive blog on how to use screaming frog to its maximum potential.

Link Analysis

With bad link penalty from Google is a reality link analysis or link audit is becoming more and more vital day by day. Given pre 2010 SEO tactics utilised by many business owners and agencies it is quite obvious they now have a number of links within their backlink profile they wish to disappear. Mighty Google (and Bing as well) took pity on us and allow us to inform search engines of these bad link via Disavow tool (Google and Bing). However I believe this disavow is not a good solution until you know which links are good and which are bad.

Now scraping data can be the solution to determine external link quality. You might say there are backlinks data available out there via third party tools such as Opensite Explorer, Ahrefs, Majesticseo etc., but no such data is complete and reporting on complete backlink profile. As Google is our primary target engine I would prefer to use their data and it is certain if someone receive link penalty, the offender link(s) will be listed on backlink profile from Google webmaster tools. Now you will face the problem Google does have backlink data for you but not with any additional metrics so it’s hard to tell which are the bad ones! Yes you can go and check one by one on your browser but if you have a link profile with thousands of links, I wish you all the best with that!

Now data scraping at rescue. You can gather link value metrics using a number of scraping tools depending on what metrics you consider to evaluate links. I personally use External linking domains PageRank, Server Status (404, 500 etc.), Page Title (i.e. look for title with foreign language in it, if you have an English language site), External links from linking page (i.e. over 100 external links), match with blacklist (match linking domain from known spam site, low quality directory etc.). I understand you might want to use different metrics; however these metrics I have mentioned above can be scraped using tools below:

PageRank – SEO Tools for Excel (Free), or if you need to scrape thousands of domains buy ScrapeBox

Server Status, Page Title and number of external links – Screaming Frog using list mode (see under configuration)

Blacklist: I have a list of couple of million blacklisted domains you can Download for Free.

Also find example (analysis done by me) of link analysis for bad links using following method here (this have couple of extra metrics such as TrustFlow and CitationFlow). I will publish a blog post with detailed process to create this analysis here soon.

Blog prospects and find outreach data

It is becoming more and more difficult to run a link building campaign with in a set budget mainly due to the amount of human hours needed to prospect and finds outreach information (email, social media etc.). Most of the agencies charge from $150 – $300 per hour for conducting SEO work. It is quite hard to justify investing 20-30 hours to conduct link prospect and collecting outreach emails. There are services out there to help you with blog outreach however it seemed they are too protective about their data (no raw export etc.) and again quite costly for the service they are providing.  On the other hand you cannot go and manually compile a massive list of blogs and potential sites so you can outreach.

You can use crawler such as screaming frog to crawl sites with lots of qualified blog listed (verified by actual human and categorised based on contents) using complete site crawl and export external links. You also can crawl a specific category to export sites under that category only. Once you have the list of  high quality blogs then you can use ScrapeBox to generate data to evaluate blog popularity such as PageRank, Alexa Rank, number of page indexed etc. So you will have a great list of blogs from the niche you are working on with blog value metrics so you know who to outreach first. You also can use browser based scraper such as multi-links for Firefox or Scraper for Chrome to scrape already created list by known publisher or from search results. Again I would love to write a detailed blog about the process I have mentioned above.

Once you have a nice list of blogs and sites to outreach next thing you need would be contact details for identified opportunities. If you are after email only you can purchase web email extractor from newprosoft, this works really well for large list. If you need social profiles as well you can buy Buzzstream for link building and you will get a scraper capable of gathering all social profiles along with email scraper (however this email scraper search certain urls only and returns far less contacts compared to web mail extractor). Also you can create a custom scraper on 80legs to scrape social profiles from urls.

Happy scraping. Let me know via comment if you guys love scraping too and what tools you use  to get your job done. Also let me know if want to see more post on data scraping like this.

Bonus: list of around 5000 Australian Blogs!

>
 

{ 0 comments }

I have seen a lot of buzz and boom going around Internet Marketing and SEO is one of the primary aspect of Internet Marketing. So more and more people are choosing SEO as a potential career. I will be compiling a list of SEO Blogs deemed most beneficial to me as SEO Beginner. When I was building my self as an online marketer after finishing my Bachelor Degree of Information System I took a lot of help from quite a few sites. And over last couple of years I found few more site being very instrumental in my SEO/Internet marketing learnings. This is quite a tough task to be honest to pick top ten blogs for SEO Beginner as I still read about 50 blogs almost every day.

SEOmoz SEO Blog:

You can’t go past ‘em! SEOmoz and YouMoz blogs are must read for every SEO professional. There is a wealth of information for SEOer from every level. On page, off page, technical, link building, SEO tool, SEO News you name it and they have it! SEOmoz is the best of it’s kind. So SEOmoz is my number one choice because when you learn, you better learn it well and you you get a chance you learn from the best. You will find all the top bloke’s here at SEOmoz.

Search Engine Land

Search Engine Land is a must read hub for news and information about search engine marketing, optimization and how search engines such as Google, Yahoo, Microsoft Live.com and Ask.com work for searchers. Packed with latest information from search world and also have a great collection of articles on search industry.

 Google Webmaster Central Blog

This site offer official news on crawling and indexing sites for Google Index. The also publish a number of great resources to help you understanding your SEO work process better. These information add great value to your SEO value. The dynamic nature of SEO industry makes it a must to get the updates from the original source and implement it as soon as possible.

Ross Hudgens Link Building

I would not say he is the most frequent poster in SEO industry, however he is that kind of SEO blogger, even if you waited over a month for a post from him and he will make every minute of wait worth it! He is a great link building strategist with super knowledge and very solid unique techniques. May be not the best blog for beginners however I can’t effort not to have him on a list where I am talking about SEO. And yes don’t forget to follow him on twitter because that’s where he is more awesome @rosshudgens

John Doherty

John’s blog is always full of surprises. He will make your link building effort way easier by giving you all the free tools. He made tools more awesome. There is a number of times I found a tool online and thought “oh it will be better if this tool also do these stuff for me as well” a week or two check on John’s site and viola! he done it again (he customised the tool so it’s now doing all the stuff I want ém to do). Again not the best site when you started learning SEO however you can’t bypass this site, NO WAY! John is awesome on Twitter, follow him for your own sake, dohertyjf

Weip.net 

Wiep Knol is the artist of link building. He is a smart creative link builder. He is the best reviewer of link building I know of, he offer comprehensive range of reading list for his reader. His style is unique, he took fresh approach to link building and SEO every day. He is great SEOer to follow on twitter

SEO Gadget Blog

Richard Baxter and his team running this fantastic blog, there is a lot to learn specially this blog will make your SEO life lot easier by providing so many great SEO tools. If you are a excel fan this is your place. Follow SEO gadget on twitter

SEER Interactive Blog

SEER is the place of many brilliant minds. The bloke’s at SEER Interactive always on the hunt to bring you the best from SEO world. They make the art of link building enjoyable by introducing all new methods every single week. This is a great learning platform for both SEO and SEM. I would encourage you to follow SEER Interactive on twitter

Search Engine People

This site is continuously producing great blog post on SEO and other aspects of Internet marketing. Follow them on twitter and stay updated on what’s happening on SEO world.

Kaiserthesage

You got to love those long post from Jason. He is giving his all to make you a better SEOer. He is innovative, he is honest and he will teach you SEO and Internet Marketing and yes he will teach you well for free. This young gentleman caught my eyes more than an year ago with his highly technical but simplistic approach to SEO and Internet Marketing. Don’t forget to follow him on twitter: @jasonacidre he is awesome on Google+too

Pointblankseo

Jon Cooper took the SEO world by storm. This man was little known even a year ago and now one of the most influential blogger in link building and SEO. He is young and energetic, he has revealed all possible opportunities of link building and he is sharing all he got on his blog. He can help you to reduce your research work by half because he is there reading every bit of information an sharing it. Follow his twitter is  a must for for every SEOer: @pointblankseo

It was not really easy to make a list with so many great people blogging SEO. Do you have any blog you want to recommend here? Please put it in the comment section.

{ 8 comments }

Learning SEO? Here is a great starting point

April 4, 2012

I still remember how it was when I started learning SEO. There is lot’s of resources out there for the beginner, however SEO being such a dynamic industry it’s not always easy to pick the best source of information where to begin learning SEO. I have compiled a list here what I believe will aid [...]

Read the full article →

I am a Content Marketer

March 29, 2012

It’s been a while when I last write on my blog. I am not sure how other folks (Jason from Kaiserthesage or Jon from Pointblankseo) in my industry manages to do such quality posts regularly after the amount of work+reading we need to do to survive in this industry. There is lot going on in [...]

Read the full article →

This Keyword Competition Tool = less work+more bucks!

October 15, 2011

I was (almost) blown away, when I took the trial for the keyword competition tool by Pasha Stewart and Darrin Demchuk of SerpIQ. This amazing super fast, easy to use tool was in only in my imagination. The tool i believe will help thousands of professional involved in internet marketing. This tool will make your [...]

Read the full article →

Google Plus and SEO?

September 17, 2011

Guest Post By Alex Petrovic There has been a lot of hype, and some trembling, regarding Google Plus. Web Surfers and geeks are raving that it will revolutionize internet searching and also businesses and SEO consulting firms fear that it will end search engine ranking procedures that have been done for years. Google Plus is [...]

Read the full article →

Bingo! Bing Knows Now That My Site is From Australia

July 9, 2011

Today I want to discuss about geographic targeting in Bing and Yahoo. I have quite a few clients they are located in Australia with a .com.au domain and still not ranking on google.com.au pages from Australia. So I would assume that the search engines are not having the geographic signals from the websites. I do [...]

Read the full article →

SEO Magic – The Impact of Google Supplemental Index on Rankings

June 4, 2011

So many times I have been asked what’s the significance of Google PageRank in modern SEO. Good questions many people thinks the PageRank is not valued any more as Google don’t update the metrics regularly. As far as I know Google is updating the PageRank but you are getting the visual representation once or twice [...]

Read the full article →

Google Panda Update: Its only the beginning!

May 28, 2011

I have been working in SEO industry for a while now and the most asked question from other professional of my industry is “What’s the future of SEO”. Not a very easy question if you are not a fortune teller. The question makes me thinks a lot about the way SEO is going and its [...]

Read the full article →