How epidemiological models are a lot like Zillow

Years ago, when my husband and I were looking at selling our house, I became absolutely obsessed with the mechanics of the real estate site Zillow.

I had listened to several real estate agents complain about the false expectations Zillow gave their clients about the valuation of houses and how it was really starting to make their jobs a living Hell. What fascinated me the most was that their complaints were not directional – i.e. some clients believed houses in a certain neighborhood would be worth a lot more than they were, and some a lot lower. I started to wonder how Zillow arrived at its “zestimates” – their proprietary valuations of individual properties – and how they could deviate so much from what a professional (human) valuation would provide.

I ended up reading an article about how to manipulate Zillow’s zestimate algorithm, which turned out to be a lot easier than I would have guessed. Zillow uses what I like to call an “HGTV algorithm,” meaning that it gives considerable weight to certain buzzwords that HGTV’s audience would care about: stately, luxurious, trendy, remodeled, open floor plan, professional kitchen, garden, library, granite, stainless steel, Electrolux, Sub-Zero, Wolf, fireplace, hardwood, built-ins, natural light, masonry, cabinetry, security system, etc.

What the algorithm does not do – or at least does not do well enough – is get into the things that actually affect the real valuation of a house. The school district. Overlapping taxing districts. Commute time to major employers. Nearby parks and shopping. Whether other houses in the neighborhood have been maintained.

This means that you can easily min-max (to borrow a term from Dungeons and Dragons enthusiasts) the Zillow algorithm if you so chose. Reading through the listings that real estate agents write for houses – which are barely even grammatically correct, let alone min-maxing the site all their clients and potential buyers would see – it was surprising more of them did not study how to manipulate the HGTV algorithm. It’s buzzword bingo, y’all, not rocket science.

I tried this before our own listing, and within a month the zestimate on our house had increased a whopping 20%. (Note: All of the details I used in the description of our house were factually true of the house. I was not doing anything illegal.) I ended up fighting with our realtor when she tried to change what I had written in the description. After I explained to her what I had done and why I had done it, she asked me for a list of buzzwords that worked. To everyone in that neighborhood who sold a house in the last three years: you’re welcome.

But one of the things that made this experiment interesting is what else Zillow does with their algorithm: they edit the history of their projected valuations to eliminate any data that users of the site might see that would give them a clue their valuations are mostly bogus. After the zestimate on our house was inflated, all of the previous valuations for the house were brought into line with the new one. They revised their past opinions on the property. This means no one could see what the cause for the increase was.

It’s remarkable how much the simplistic algorithm Zillow uses for real estate is like the simplistic algorithms that academics use for modeling the spread of disease.

For one thing, they are not focused on the real forces that impact the spread of disease – things like network effects within a specific population (who has megachurches, public transportation, etc.) and network effects between populations (how many people in New York City have second homes in Florida, for example). In fact, as far as I can tell, many of these models do not take simple things like population density into consideration at all. You have the same chance of being infected if you live in Lincoln, Nebraska, as if you live in downtown Miami. It’s kind of like prioritizing the word granite over a Bellaire zip code. And it has the same effect of inflating values.

But beyond that, you have a lot of academics that have been just flat-out wrong about the initial conditions of the spread of disease. And what are they doing about it? Deleting tweets. Thinking that new assumptions improve the accuracy of their models.

It always works on low-information crowds, however, much like Zillow. I imagine most home buyers think Zillow is superior to real estate agents precisely because it is an “impartial” software model and not a salesperson. But to someone who writes algorithms and develops financial software, that assumption is hilarious. Algorithms are not impartial or intelligent in themselves. They are only as good as the person who created them. And that can be a tremendous source of risk and bad decision-making.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s