June 13, 2026 · 7 min read · Data

Cambridge is at least four cities. Ranking which one a user means.

A user types "things to do in Cambridge." Which one? The English university town, the Massachusetts university town, the Ontario industrial town, the New Zealand pastoral town. All exist, all are real cities, all show up in the same search index. If your travel app doesn't disambiguate, you've shipped a 4-way coin flip.

Why this happens everywhere

Place-name collisions are not an edge case. Migration, colonialism, nostalgia, and unimaginative founders have created hundreds of thousands of name duplicates worldwide. A representative sample:

Latin American countries collide internally because every colonial-era municipality named itself after a saint. Spain has six Santiagos before you leave the peninsula.

iso_alpha2 is the load-bearing field

The /v1/cities endpoint returns iso_alpha2 — the two-letter ISO 3166-1 country code — on every row, no exceptions. Without it, you can't disambiguate at all:

GET /v1/cities/search?q=cambridge

[
  { "city_id": "cambridge-uk",  "name": "Cambridge", "iso_alpha2": "GB", "tier": 2, "population": 145000 },
  { "city_id": "cambridge-ma",  "name": "Cambridge", "iso_alpha2": "US", "tier": 3, "population": 118000 },
  { "city_id": "cambridge-on",  "name": "Cambridge", "iso_alpha2": "CA", "tier": 4, "population": 138000 },
  { "city_id": "cambridge-nz",  "name": "Cambridge", "iso_alpha2": "NZ", "tier": 4, "population": 20000 }
]

Now the disambiguation is mechanical: pass an iso_alpha2 filter, or rank by tier and population, or use surrounding context (the user's IP, their previous query, the language of their request).

Disambiguation strategies, ranked by how well they work

  1. Explicit user input. The cleanest path: show the four candidates and let the user pick. Travel apps that do this look more thoughtful, not less. "Did you mean Cambridge, UK or Cambridge, MA?" is the kind of polish users notice.
  2. Context from the conversation. If the user just searched "Heathrow flights to Cambridge," it's the UK one. If they said "MIT campus tour," it's the MA one. Use the LLM's context window for this — don't try to do it deterministically.
  3. Geographic priors from the user's locale. A user with US locale searching "Athens" probably means Greece for travel queries (the touristic one) but Georgia for college-football queries. Ambiguous, so don't lean too hard on this signal.
  4. Population and tourism tier. When all else fails, default to the largest or most-touristed variant. "Lima" probably means Peru, not Ohio. "London" probably means UK, not Ontario. Most of the time this is right.
  5. Never: a single canonical row. Don't pick one and call it the "real" Cambridge. The Ontario Cambridge has 138,000 residents who disagree.

The hidden collision: subnational duplicates

The above collisions cross country borders. A more insidious pattern is intra-country duplicates:

iso_alpha2 doesn't help here. You need a state/province code plus the city name. The /v1/cities response carries admin1_code (subnational, ISO 3166-2-style) for exactly this reason.

Why this matters for AI

LLMs are particularly bad at place disambiguation because they pattern-match. A model asked "best restaurants in Cambridge" will lean toward whichever Cambridge appeared more in its training corpus — usually the UK one or MA one — and be confidently wrong for the other 30%. Grounding the response in a structured city row with explicit iso_alpha2 is the cheapest way to make the answer reliable.

Cost of doing this wrong: a user who asked about Cambridge, NZ gets a writeup of MIT and Harvard. They don't bother to write back. They just close the tab.

Sign up — free See India coverage →