What we are testing, and why
We are building routing that promises more than the shortest path: a calm walk, a safe walk home. Before anyone outside the company gets to try it, we test those promises ourselves — on foot, in Zurich. This page explains what we need to learn, why we go about it the way we do, and how the routing actually works today — including the parts where it does not yet keep its word.
How our routing actually works today
Under the map sits a street network for Zurich — every walkable segment the city knows about. Our data pipeline attaches five signals to each segment: accidents, crime, street lighting, tree coverage, and presence (how many people tend to be around). The time-dependent ones come in slices — weekday or weekend, morning, afternoon, night — because a street that is lively at noon can be deserted at midnight, and the routing should know the difference.
When you ask for a route, the engine computes three options for the same start and destination. It does this by making street segments more or less “expensive” to walk depending on their signals. Fast ignores the signals entirely; Calm and Safe trade a bit of extra distance for streets whose signals look better. The route you see is simply the cheapest path under that mode’s idea of expensive. There is no magic beyond that — which is exactly why testing works: when a route feels wrong, either a signal is wrong or our idea of expensive is wrong, and we can find out which.
The three promises
| Mode | What it promises | What the engine actually does today |
|---|---|---|
| Fast | The shortest sensible walk. | Ignores all signals. This is the baseline the other two must beat — if Calm or Safe is worth choosing, it has to earn its extra minutes. |
| Calm | Quiet streets — less traffic, less noise, fewer crowds. | Honestly: the engine cannot see traffic, noise, or crowds yet. It approximates calm by preferring tree-lined streets and nudging away from accident-prone segments. Part of what this cycle measures is how far that approximation carries — and where it visibly breaks. |
| Safe | A reassuring walk — fewer accidents and less crime for this time of day, better lit, with people around. | Uses the accident, crime, presence, and lighting signals, sliced by time. The route shape responds to the data; whether it feels right is exactly what walking it tells us. |
The four things we need to learn
| Question | Who acts on the answer |
|---|---|
| Can you set up a route and understand the three options without confusion? | The app team. |
| Does the route survive the real world — paths that exist, entrances that are entrances, crossings you can actually make? | The data pipeline. |
| Does each mode keep its promise? | Routing — either the weights we tune or the signals we still need to build. |
| Are three options worth having — or do they look like the same line drawn three times? | Product. |
Notice that these answers go to four different teams. That is the quiet design goal of the whole setup: every piece of feedback should land with the people who can fix it, without anyone having to re-sort a pile of notes afterwards.
Why we test it this way
Everyone walks the same frozen routes. Routes are computed once, saved, and assigned — they do not change between testers or days. That means a complaint sticks to an exact spot on an exact route, and stays re-checkable forever. When we change the engine next month, we replay the same routes and see whether the complaints would still happen. Feedback that cannot be replayed is an anecdote.
You tap an issue, not write an essay. The issue chips in the field app are not arbitrary — each one routes the problem to the team that can fix it. A “couldn’t walk it” goes to the data pipeline; a “confusing instruction” goes to guidance; “traffic and noise on my calm route” is counted as evidence for the signal we know we still owe.
We record what the engine believed. “This alley feels unsafe” has two very different fixes: if the engine thought the alley was well-lit, our data is wrong; if it knew and routed you through anyway, our priorities are wrong. Same complaint — opposite fixes. The review tools show the tester’s note and the engine’s belief side by side, so we never have to guess which one it was.
The short version we keep coming back to: ratings tell us how bad it is; the annotated spots tell us what to fix; the bench tells us whether we fixed it.
It would be quicker to walk around for a week and share impressions in Slack. We are deliberately not doing that. If calm and safe routes are going to be the product — and they are the product — they cannot rest on vibes. The point of the machinery is to turn feelings into evidence: every impression lands attached to a place, a route, and a number, where it can change something.
What this round will not tell us
- Whether Safe works after dark. June in Zurich: our data calls 18:00 “night”, but the sun is up until half past nine. The regular walks happen in daylight; a small volunteer evening probe covers actual darkness separately, and weekends wait for the next round.
- Whether Calm fully keeps its promise. We know it does not, and we wrote that down before the first walk. This round measures how often and where it falls short, so the data work gets pointed at the right streets instead of at a hunch.
- Statistical certainty. Sixty walks find repeated problems and build the baseline; they do not prove a weight is correct to two decimal places. That comes later, from replaying the growing library of tested routes every time something changes.
How a cycle runs — who freezes routes, who walks when, and what happens to the feedback — is on the field testing page. The tap-by-tap tester instructions are in the field guide.