The Price Is Right! Airbnb edition. Part I: Scraping Airbnb

August 7, 2016

Getting the most bang for your vacation dollar

Ever wondered whether you were getting a bargain for that awesome Airbnb you chanced upon? Just how do Airbnb hosts decide on their pricing anyway? In this series of posts, I will try to tackle those questions by analysing data from 900 listings on the Airbnb website. This project is actually part of a challenge given to candidates applying to the Brooklyn Data Science talent network for data scientists. If you are looking to apply there as well, feel free to check out my code for ideas.

I have just scraped the data I needed from Airbnb’s website. BeautifulSoup made it really easy for me to locate the relevant fields and extract them. The BeautifulSoup documentation concisely covers pretty much everything you need to know to perform a simple scraping task. There are other sophisticated tools out there, like Scrapy, which utilizes the powerful XPath selector, but I will save those for a more complex scraping task in the future.

Another nugget of knowledge I picked up from the experience was how to work with rate limits imposed by the website. As their name suggests, rate limits prevents you from making too many requests to the server within a fixed period of time. These act as the first line of defense against DDoS attacks, where the server is deluged with a huge load of requests and becomes unresponsive to other users or crashes. Staying below the rate limit was simple, at least in my case. Within the time package is a sleep function that introduces a real-time delay in your code when executed. After playing around with several lengths of time delays, I found that 10 seconds per listing was the smallest delay I could use without being denied my request for going over the limit.

The variables I’ve chosen to extract from each listing include the geospatial location, ratings by previous guests, and the presence or absence of various amenities. These are the variables that would conceivably affect the listing price. The idea here is to learn from our analysis which of these variables are the most predictive of the listing price. There are good applications for that insight: hosts can better price their listing by learning what the norms are, and guests can assess whether a given listing is fairly priced. That analysis will be performed in the next step, so keep your eyes peeled for more!

Get the scraper from my Airbnb-Price repository!

About me

I am a Fellow at Insight Data Science. I was trained in computational biophysics and used to simulate and watch proteins wiggling on my screen before shifting my focus towards machine learning and data analytics. My professional interests also include learning to write good code and how to marry data science solutions to real world needs.