May 27, 2026 · 9 min read
Six things to know before building a travel-data API
A retrospective. If I were starting again tomorrow, these are the lessons I'd want pinned to the wall on day one. Most of them I learned the expensive way.
1. Foursquare Open Source is a good starting point — but it's not a travel feed
Foursquare's Open Source Places dataset is the most generous gift the POI world has ever received: roughly a hundred million points, Apache 2.0, with attribution. Start there. You'd be foolish not to.
But it's a places dataset, not a travel dataset. It's optimized for "what's near me right now" — coffee shops, ATMs, dry cleaners. A traveler asking for "things to do in Hampi" needs a different cut of the world. Conflating the two is the single most common mistake I see teams make. Plan to build the travel layer on top, not assume it's already there.
2. You'll spend most of your time on data quality, not features
The product roadmap looks like a list of endpoints. The actual work is reconciling the same monument appearing under three spellings, in two languages, with conflicting coordinates from different sources.
Budget for it. The 80/20 split between "shipping new endpoints" and "fixing data weirdness" is more like 20/80 in year one. Hire someone who likes spreadsheets. Don't pretend ML alone solves this — humans in the loop, every week, forever.
3. License clarity matters more than coverage
Coverage is what marketing pages compare. License is what buyers actually ask about on the second sales call. A B2B customer with a legal team won't ship your data into their product if your terms are fuzzy. A customer without a legal team will ship it and then panic in 18 months when they raise a Series A.
Pick clean source licenses, surface license metadata per row, and write a one-page license guide that says exactly what each row can and can't be used for. This pays for itself the first time a Fortune 500 procurement team reads it.
4. Ratings rot fast — don't pretend you have live ones
The temptation to ship a "rating" field is enormous. The temptation to scrape one is also enormous. Resist both unless you're prepared to be a real-time data company.
A rating from 2024 is more dangerous than no rating at all, because customers will trust it. We chose to ship structured "tier" classifications based on heritage status, admin importance, and inclusion in tourism circuits — facts that don't rot in a quarter — and to leave live ratings to the partners who specialize in them. Honest is faster than impressive.
5. Admin hierarchy varies wildly by country
The American assumption — country, state, county, city — breaks everywhere else. The UK has counties and unitary authorities and boroughs that overlap. India has states, then districts, then sub- districts, then cities and villages, with union territories grandfathered in. Indonesia has provinces and regencies. France has régions and départements and communes.
Don't pick a global schema and force every country into it. Pick a flexible parent-child shape, and let each country use as many or as few levels as it actually has. Travelers care about the levels that matter locally, not your tidy global tree.
6. Chatbots are likely your biggest customer segment
I started this project assuming the customer was a travel-tech CTO rebuilding a recommendation system. The actual customer is a small team building a travel chatbot on top of an LLM and discovering — in week three — that the LLM hallucinates Indian destinations.
They don't want a 200-page spec. They want a free tier, a Postman collection, and an endpoint that returns clean JSON in <200ms. If you build for the chatbot use case first, every other use case becomes a subset. If you build for the legacy travel-tech buyer first, you'll spend six months in procurement and never ship.
So what do you do with this list
None of it is novel. All of it is the thing I wish someone had told me. Pin it to the wall. Read it again in month four. The shape of the work matters more than the shape of the schema.