plugin to scrape a third party website’s data?

[ad_1]

Hi, I run a blog specialising in Mixed martial arts content.

In MMA, there exists several websites that document fighter records, these primarily being sherdog and tapology.

I’m wondering if I can somehow use the data on sherdog or tapology to have live fighter records on my own website that update as sherdog/tapology updates. I’d like these records to be on my tags, so when I tag Conor McGregor, for example, his tag page has his record and posts he’s been tagged in.

Is this possible with plugins alone? I’ve noticed apps etc having live fighter data but I wouldn’t know where to start.

I’m grateful for any help!

Thank you

Edit: I’m not looking to steal data, I will first get consent from the website owner. I’d just like to know the means of how to create the feature on my site

[ad_2]
7 Comments
  1. What you’re asking seems a bit much to ask of a plugin.

    I would attempt to use beautiful soup in a python script to do the scraping, and then appends results to a csv file in a folder which syncs to your site via FTP (filezilla).

  2. What you’re effectively asking is to ‘steal’ their content – best to ask those sites for API or RSS access and go from there.

  3. There is a paid plugin that’s called Octolooks Scrapes. I’ve used it to scrape old and repetitive content and use it in the new version. It kinda works OK.

  4. > Edit: I’m not looking to steal data, I will first get consent from the website owner. I’d just like to know the means of how to create the feature on my site

    If you have their permission, just ask them to send you the data.

  5. Sounds like a copyright issue waiting to happen. Even if it’s just Google’s bot detecting duplicate data.
    But if you can find some kind of RSS or JSON feed with the data you want then it shouldn’t be too hard to set up with a plugin that supports such feeds.

  6. There definitely won’t be a plugin that can do this. I’ve built scrapers in PHP, it was a lot more common before API’s covered nearly everything… if you research API’s for UFC you might find something, but in a quick search I just did I couldn’t see that the UFC has any official API for it’s statistics. ESPN does, for many sports, their one of the leaders in sports API’s naturally. Whenever possible you want an API rather than a scrape… but if you do need to scrape, the first thing you need is a fairly skilled programmer because there is zero hope of doing that without custom programming. This is also completely outside of WordPress, the only connection is that you want the final data in WP, but a scraper is a web app or script that runs from a server… it hits the target server, parses the content… assembles the raw content and then parses through it to get the data you need. It’s a pretty challenging programming project, because for instance you have to scrape data in batches otherwise your scraper will get blocked or time out… you have to store raw data first then parse through it later… I’d suggest maybe step back from the problem and think hey if this data is useful, and nobody has built an API yet… maybe there is a business opportunity curating the content and then building an API to serve it to sites like your own… a scratch your own itch project… specialized API’s can be lucrative, for example on project I worked on my client was paying 700USD a month to access esports statistics.

 

This site will teach you how to build a WordPress website for beginners. We will cover everything from installing WordPress to adding pages, posts, and images to your site. You will learn how to customize your site with themes and plugins, as well as how to market your site online.

Buy WordPress Transfer