Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Has experimentation become an anchor? Testing for learning, not validation | Paul Davidson, Product Manager

14 Plays1 month ago

Most teams blame the platform when experimentation stalls. The uncomfortable truth is that it's usually people and process.

Paul Davidson was a Senior Product Manager at Expedia Group and Senior Product Owner at NI (National Instruments). He spent years as a product manager on an internal experimentation platform, working on the roadmap and alongside the analysts and engineers using it every day. He has watched thousands of tests, and he knows the patterns behind the ones that go nowhere.

This conversation breaks down how a program drifts into an anchor, and the specific moves that cut one team's test cycles from six weeks to two. You walk away knowing how to simplify your metrics, validate earlier, and hold the trust of stakeholders when real money is on the line.

Trim your secondary metrics and move your primary metric closer to the change you're testing, so readings come back in days instead of weeks.
Validate early and accept imperfect feature work, because a test that fails fast on instrumentation saves the weeks you would otherwise lose validating at the end.
Protect financial goals with circuit breakers and guardrail metrics, then lean on proven proxy metrics so not every test has to move revenue on its own.

Recommended

Building fast beats building perfect: How AI accelerates experimentation velocity | Akash Doshi image

Building fast beats building perfect: How AI accelerates experimentation velocity | Akash Doshi

00:25:02·12 hours ago

Decision quality over quantity: why your experimentation program needs fewer, not more tests | Ilya and Makram image

Decision quality over quantity: why your experimentation program needs fewer, not more tests | Ilya and Makram

00:33:47·15 days ago

AI does the boring stuff: Why creativity still wins in CRO | Katie Kelly, Fractional Business Partner, Subject Consulting image

AI does the boring stuff: Why creativity still wins in CRO | Katie Kelly, Fractional Business Partner, Subject Consulting

00:38:25·1 month ago

Why past behavior is your best A/B test signal: Lessons from behavioral economics | Kristen Berman, Behavioral Scientist, Irrational Labs image

Why past behavior is your best A/B test signal: Lessons from behavioral economics | Kristen Berman, Behavioral Scientist, Irrational Labs

00:36:31·1 month ago

Harder to stop a moving train: Inside FOSSIL's high velocity testing program | Marcela Gutierrez, Director of Digital Analytics and Experimentation, FOSSIL image

Harder to stop a moving train: Inside FOSSIL's high velocity testing program | Marcela Gutierrez, Director of Digital Analytics and Experimentation, FOSSIL

00:34:17·2 months ago

The wrongest person wins: How intellectual humility accelerates experimentation velocity | Rhys Mohun, Founder at Formentor Labs image

The wrongest person wins: How intellectual humility accelerates experimentation velocity | Rhys Mohun, Founder at Formentor Labs

00:39:08·2 months ago

Episode 0 - Welcome to UNITE Voices image

Episode 0 - Welcome to UNITE Voices

00:01:23·3 months ago

Transcript

Introduction to User Journeys and Experimentation

00:00:00

Speaker

what if the purpose of this feature is not always to progress people? Because we tend to think about the user experience in this very clean linear path. They start at the upper upper funnel, land on the homepage, they do a search, they land on a PDP or a product page, and then yeah hopefully they book and or go check out and book. And the reality is it's very messy and people do loop. you So it's like, what if a feature doesn't always progress a user? What if it actually makes them happier if you show them they're in the wrong place and get them regressed faster?

Welcome and Guest Introduction

00:00:30

Speaker

Welcome to Unite Voices, hosted by Katie Green. Real stories from the people behind today's most innovative experimentation programs. No fluff, just wins, failures, and the lessons in between.

00:00:45

Speaker

Paul, welcome to Unite Voices, hosted by me,

The Pitfalls of Experimentation as an Anchor

00:00:49

Speaker

Katie Green. I always feel so silly saying that, but it is part of the song and dance. I'm really excited to have you on the show. You have incredible experience and experimentation. I'm really excited for the world to hear what you and I discussed previously and get to know you a little bit better. So why don't you get us started and tell the people who you are?

00:01:09

Speaker

Yeah, thanks. So I'm Paul Davidson. I was a product manager on an internal experimentation platform for a few years. And part of that work had me both working not only on the platform features and the strategy and the roadmap and all that, but on the practitioner side as well. So working with our other product managers, our analysts and engineers that were using the platform and trying to help them experiment better as well.

00:01:31

Speaker

When you and I last spoke, you've mentioned that experimentation can become an anchor, something that teams have to do but, like, aren't learning from necessarily. And I've personally seen that a ton where people have these really good experimentation engines but they don't have the infrastructure to be learning at the same rate that they're experimenting. I'm curious to get some of your learnings on that because i know people in the community are probably going to listen to that and say, oh my gosh, that that sounds like my experimentation program.

Complex Experiments and Sunk Cost Fallacy

00:02:01

Speaker

Have you ever experienced that? And what what is your insight there? Yeah, I think the feeling of it becoming an anchor for us, it's something that happened over time and it was a drift that kind of happened And it's kind of this series of factors that we'll be talking about where they're kind of designing experiments to complex, they're waiting way too late to validate it, and not really just setting the experiment up for a good outcome and success of its own. And so like what's happening is they were spending months sometimes developing these features, really complex features, putting all this work into it.

00:02:34

Speaker

They get to the end of end of it, go to the readout, and it would be really conflicting or inconclusive. And then they're left really frustrated And also with like this really big sunk cost fallacy of all the effort they'd already put into. So that was where the anchor started to be felt for us.

00:02:49

Speaker

Something that I've seen in the narrative generally with with teams that are operating at a high scale is they come to the conclusion that maybe the platform is the problem.

Is It the Platform or the People?

00:03:00

Speaker

And at Chameleon, obviously we're a platform. We talk about this a lot. We're more AI focused, I think, than others. But, um you know, how does the tool...

00:03:11

Speaker

rationalize some of that thinking, like what what does the platform, what role does the platform play when you're running into that kind of scenario? Yeah. So for us at least, and, you know, I would imagine for a chameleon, you're working with the pretty mature product and what's going to happen is that that product is always going to keep maturing. There's always going to be edges round out. There's going to new features and most importantly, new data as well. And that's something that kind of came up in our pilot as well was exploring some new data. But the point is that if you're working with a company that's growing, everything's the business requirements are keep evolving. And so that's part of what we're feeling there is that we were never going to get to perfect, but people wanted that. And were using that as a justifications and even saying, well, Hey, maybe we don't need to experiment. We shouldn't experiment until this thing is perfect. And i my counter to that was actually, Hey, look for about 70 to 80% of our use cases, the platform was pretty much a complete and ready to

Cultural Impact on Experimentation Effectiveness

00:04:08

Speaker

go. And really, instead of it being a platform challenge, my hypothesis was that it was really a people and process challenge in how we do it. And the thing is, that's really uncomfortable for the organization to hear, because when it's a platform feature, we can file that in the backlog and come back to it when we're ready, um or hopefully soon. and But what's a people problem? One, it kind of feels like an indictment, like somebody's done something wrong, but it's also not something that can just be solved as straightforward as you know developing some code and pushing it out.

00:04:36

Speaker

I feel like we could have an entirely separate episode about that exact statement. You know, you know like it's it's it's really tough. It's really easy for teams to change their platform or, you know, whatever it is. And it's a lot harder for people to change their teams. But I think something that we do a lot as experimenters is – You know, it's almost funny. i feel like we don't let perfect be the enemy of good when when you're doing really well. But at the same time, are we striving for for perfect, like through experimentation?

00:05:11

Speaker

I think some people think we are. and And that's really challenging. So I'm like really loving this exact answer from you. And I want to just like underline and bold it because when when you are on the search for perfect, I feel like that creates a culture that you're not able to – make the shifts on the people side that you need to because that's not saying you need new

Common Failures in Experiments

00:05:32

Speaker

people. It's just saying you need new culture. And culture of experimentation is kind of like my big talking point. I always say the biggest threat to experimentation is not doing it at all. And that feels like something that happens with teams when they're observing like, oh, the platform's right. The platform's good. We can't figure it out. We don't know why you know it's It's tough. But i'm I'm curious. You've spent a ton of time observing things.

00:05:58

Speaker

Would you say hundreds or thousands of tests? I'm not i'm not sure. Thousands. thousands yeah i was like Mountains. and Mountains of tests. Is there any particular pattern or trait of failure that you saw when it comes to individual tests and testing programs at

Decentralized Teams and Experimentation Consistency

00:06:14

Speaker

large? I'm curious to get your takes like on the micro and macro level, any patterns you might have seen or traits that you regularly see in those patterns, in those instances.

00:06:24

Speaker

Yeah. So like at the more micro level, but what kept repeating itself on the test was kind of three core areas. And one of those was the scope of the change. Like I said earlier, we would see these tests in this development work that had gone on for months. And not only had they spent a long time on that, there's a lot of complexity in that change. They changed a lot or they built out a lot. of things. And so when you're trying to understand what moved, what we'll find out in a minute, is are these metrics that are really far away as well? And you you change 5, 10, or 15 things all at once. you know Some of those things might be pushing your goal up, some might be pushing it down. But it's really hard to tell what's going on, and it's really hard to untangle that and and understand what really happened.

00:07:04

Speaker

The second part of it was, like i've alluded to as well, is this came in way too late. So the development was done, all the costs had been sunk, And then it's just really painful to step back.

00:07:15

Speaker

And so that also leads to like a behavior that we call it trying to save the test. So then you're just in there digging, doing kind of this post hoc analysis, trying to find anything that'll justify a rollout maybe because you don't want to go back through that process again, or maybe the the thing should have never been developed in the first place.

00:07:34

Speaker

And then the final thing there was the metrics, and there was two parts to that problem. ah One, the metric list, the metric count had really grown over time. that Even though we only supported one primary metric, we allowed for the secondary metrics and also informational. And that secondary metric list had grown, in a lot of cases, like to 10 to 15 metrics.

00:07:53

Speaker

And they're all kind of checkered and overlapping and and stuff like that. And the other challenge, uh, with mostly with the primary metric was that oftentimes they pick a really insensitive one. And the reason for that is that then since then one usually tied to their org goals. And so this is like the big thing, the big that they're trying to move. And so they felt like every single experiment that ran had to prove that it moved the big thing.

00:08:15

Speaker

ah but that wasn't really going to be achievable in a lot of cases, especially for teams that were way up funnel from where these financial goals were happening. I think i want to it's attach onto the team's way up funnel from where the financial goals are happening because that sounds like you know, a large decentralized program. you You and I have talked about this previously, that culture can drift when your program is too decentralized. Can you talk about that? Is it inevitable? Is it

Experimentation for Validation vs. Learning

00:08:41

Speaker

preventable? What kind of things are you looking out for when you're seeing a team that is like really, really large and trying to tackle a problem maybe they're really far away from? Yeah, so the challenges like the decentralized org is that decentralization and what it kind of lapsed over time was that kind of central connective tissue that would have pulled them all back together. And so what was happening is that as the org and through leadership changes and these financial goals, and without that central thing to bring everybody back together, like you said, it it was just drifting along and kind of everybody had forgot why we were there.

00:09:19

Speaker

And so that was part of the drift that happened. What we were working on for that, we tried different iterations of ideas like champions and heroes and informal groups. And that's such really challenging because a lot of times, uh, those are kind of like informal roles. Those are things that may not show up on your performance review. So it's really tough to get people interested, even if they really enjoy experimentation and want to learn more. Uh, they also they're annual review says they need to go to all these goals. And so where are they going to choose like, you know, joining the experimentation group or driving their product goals? Well, they're

Strategic Changes in Experimentation

00:09:49

Speaker

always going to pick the product goals. And there's one other thing you've mentioned before that I want to make sure we don't lose sight of. I think that's a really, i love that, that point. That's like, they're always going to choose the product goals, which is something that I've talked a lot about on this show, actually, where people are like, okay, well, why would we test if our KPI, our OKR is to ship more features? Like if I'm testing out of features,

00:10:12

Speaker

why would I do that? Right? So like that's that's a really interesting piece. And I feel like we'll go back to metrics. There's a whole section here that I know you and I have discussed where we want to talk about metrics. But before we get into that, something that you've mentioned is that ah experimentation for learning has been abandoned, right? Are are you seeing experimentation more on the validation exercise side for decisions that are kind of already made? can you tell us a little bit more about that? Yeah. So that's kind of the trap that we'd fallen into is that There was these financial goals that had been set, the teams were measuring themselves against both as an individual, a team, an org, and they're all you know trying to row towards that goal goal, rather. And so there's a lot of pressure. And what they had what how that had kind of um drifted was that every experiment had to be towards that and not for learning. and so

00:11:06

Speaker

that led to a bunch of these really bad behaviors. And so they would run these tests, they would try to test the financial goals. And at the end, they weren't getting a clear answer on that. Or what was worse and kind of the whole anchor part of it was that to even try to get a your metric be powered enough to even try to get that reading, which could still be inclusive at the end of it. You're having to run this test a really long time.

00:11:28

Speaker

And so what was happening is we weren't learning much and people, like you said, They were using it for a validation check. It's the final thing. It's the thing we have to do before we go live and that's all. But there's also all these assumptions being made about how these metrics are correlate and work with each other and how maybe even what the core purpose of a feature or like a part of the experience is for. And that's what we really started challenging in this pilot as well is what if the purpose of this feature is not always to progress people? Because we tend to think about user experience in this very clean linear path. They start at the upper upper funnel, land on the homepage, they do a search, they land on a PDP or a product page, and then, you know, hopefully they book and, or, you check out and book. And the reality is it's very messy and people do loop and that's part of their, you know, so it's like,

00:12:15

Speaker

what if a feature doesn't always progress a user? What if it's actually makes them happier if you show them they're in the wrong place and get them regressed faster, things like that.

Building Trust in Metrics

00:12:24

Speaker

um So we were talking about the kind of tactical experience you had when you worked with a team through the really uncomfortable task of like redesigning their approach from the ground up. They've blown everything up. They're going from 15 key metrics to more like three. what What are some of the things that you experienced in that tactical change?

00:12:42

Speaker

Right. And so, yeah, the first thing we did in this pilot, I needed to find a team that was willing to work with me. And I approached this team, I threw out my sales pitch and I got like halfway through it and they were bought in and ready to go. And I was like, wow, this is great. And so then I really had to get to work from there. And what we did, I asked the product manager, she shared like her overall strategy and her research and the recent experiments had been running that led us to kind of breaking down, uh, three work streams, which I'll kind of cover to today. and high level detail. But then what we did from there is I set out working with this product manager and her analyst. So we're just a real tight team of three working on this effort. And so I took all this stuff that she had shared with me and I generated myself the the experimentation strategy for this first experiment experience stream that they're going to be running. And that was a really hard process. And I thought,

AI's Role in Sustainable Experimentation

00:13:33

Speaker

honestly, it was going to be a little bit easier. And it took me about an hour to really get through this thing and really think through it. And the the problem was, even though I saw the problem where I thought I saw the problem from a distance, getting there, getting my hands st dirty myself, you learn that you've been influenced by the org. And it gets really tough to like pull apart. Like you said, there's these 10 or 15 metrics. How going get down to three? It's really uncomfortable to pull those out. We always see these. They're always there. And so that was kind of the first step of it.

00:14:02

Speaker

And so from there, we got together with the team, the analyst product manager, we went through a refinement process. They had a ton of questions, which are great. They also like didn't fully understand all the features on the platform, which is another great learning experience, kind of learning where there were some gaps there, things like that. And so we whittled it down and then were able to launch our first test out of that and had some good learnings from it.

00:14:23

Speaker

Something that practitioners are going to find really valuable out of this episode is that your test cycles, they drop from like six weeks to two weeks, right? Like it's the time savings that I think people are going to be really inspired by tell Can you tell us a little bit more about what you think was the primary driver for that? you know just For somebody who's listening and saying, I want to drop my cycles from six weeks to two weeks, right? What are some of the tangible outcomes there?

00:14:53

Speaker

Yeah, so it was all the things we talked about before. we reduced the first week, got the change in the experiment and way sooner. And we were willing to accept that the feature work wasn't perfect yet, but there was an opportunity just to just go ahead and take something and try it out. We also accepted that the metrics weren't fully instrumented the way that we wanted. we In doing all that discovery work, we discovered new metrics that didn't exist. And actually what happened, because we were moving fast, we went ahead launched it. The test failed because's within a couple of days because the the metrics, not the experience, not a bug, um they failed. And that was actually really helpful because if we had just waited to the end and tried to do this stuff anyways, then it was going to fail anyways and we would just have more time lost.

00:15:36

Speaker

But so by reducing the metrics, it came back really crystal clear what was happening. And we moved them, the key metric way closer to the actual change. We didn't have all these conflicting metrics. Some of those we kind of kept around by the way, we moved some of those down in our informational and I kind of challenged them like, look, we can keep them in the readout. We just don't want to make a decision on them. And what we started learning as well, not only did we have the six week to two week reduction,

00:16:04

Speaker

We also saw that some of these things that they were really convinced were going to be highly correlated to the change showed none at all, like super P value close to one type situation. and so there were actually some good side learnings in there as well. And so the the key takeaways, there were less metrics to try to interpret. So we didn't have this whole checkerboard of results come back and they were super sensitive as well. So we didn't have to wait weeks to get enough power to get that reading. We had a really strong reading actually.

00:16:32

Speaker

within

Simplicity and Learning in Experimentation Culture

00:16:33

Speaker

and several days. And that actually opened up for us as well, the opportunity that, hey, maybe we could adjust our playbook for some of these and and not even need a minimum two week runtime.

00:16:43

Speaker

That piece of the metric being closer to the test change being made, When i was a baby doing crow, when I was just a crow baby, ah you know, I had a really hard time talking to decision makers who are like, well, we're making decisions based on actual dollars here. i don't really care if the navigation change increased navigation clicks, like as long as it's increasing dollars. And I'm like, but there's so many steps between that and this, right? Like i just –

00:17:15

Speaker

think that that's really important and it really perfectly tees up my next question here, which is there for the teams who are measuring those downstream financial goals, right? Because that that they take weeks to move. That's obviously the thing that's going to make the higher up decision makers most passionate about a decision. How do you convince a team to trust – ah you know, a metric closer to the actual change that you made? what What does that trust building look like? For me, it was something that took many, many years to master. And I'm curious because you've done so much work breaking it down and building it back up. How do you build that trust when money moves?

00:17:54

Speaker

Right. Yeah. So there's like three or four things to unpack in there. One of them is part of the work and why we say we're future complete. We had implemented circuit breaker and these metrics that would happen, the canary metrics when the test would launch. So we're protecting the core experience there to make sure there's not a large degradation. We'd also infer implemented guardrail metrics for the end of the experiment. And a lot of those were based around these key financial goals. So we have the protections there in place to protect them. So one, there is automatic protections built in and automatically being calculated and and all that stuff as well. And um the second part of it then was that not everything needs to move the financial goal today. And you can work through that in a couple ways. One way you can work through it is you can work with an analyst to have them develop a you know are really in-depth

00:18:48

Speaker

ah proxy metric. And so, sorry, proxy metrics word, have them develop that. And then they can prove that there's a correlation there. So they can say, Hey, look, when this thing that's up funnel moves, we, we've proven that there's a correlation to this thing at the end of funnel that moves. So that's one way to do it.

00:19:05

Speaker

The other way to do it as well. And was advocating is look, just collect these learnings that we're talking about. And so run a bunch of these short tests. And if you really need that reading on that financial goal, collect all your learnings, make your tweaks, blah, blah, blah. And then run one big experiment at the end. and prove that you know you are going to move the financial goal, if that's really where you have to be. But I don't think that's really necessary in a lot of cases. I think like a true proxy metric can just get the job done.

00:19:30

Speaker

And this question, We hadn't talked about previously, but it's something that I really want to make sure is covered in this conversation, which is AI, right? Like everyone is looking for the AI in this conversation, the AI in everything. what What is the role of AI in decentralizing culture and like maintaining health towards – you did a lot of work breaking this down and building this back up. I don't think AI could do that. But like, is there a role for AI to help create that sustainability long term?

00:20:04

Speaker

Yeah, I mean, I think AI mixed with traditional logic and metrics together can do that. I think you can instrument dashboards. I think you can add insights. You could take the learnings that came out of this and my hypothesis and train an AI to look for those types of things. I think You could create a health dashboard. And that was kind of one of the things we were working on to help bring this thing back in. So you can create a health dashboard. You can start adding insights to that. You can even go further than that as well.

00:20:36

Speaker

And some of these metrics that may not seem, and not just like very these very granular metrics, some of these big metrics that kind of describe how the users move through the big points in the funnels, stuff like that.

00:20:48

Speaker

You can start to show how these things correlate and it'll help. people not only understand how their individual tests and experiments are moving things, but they can understand how this larger body of work that some of it may be coming from their team or org is doing. But also a lot of times there's work within the sister teams and stuff all around that. They're also moving it. It's also very unclear. So I think you can use the AI to start and and analytics and stuff they can do to help start recognizing those connections as well.

00:21:18

Speaker

Got to ask the AI question. you know it's Everyone will be looking for that. I mean, we're we're coming up on the end of our time here, and I want to make sure i ask my favorite question, which is on – I call it the Monday morning advice, right? But somebody could be listening to this on a Wednesday. i understand that. But if somebody is listening on a Sunday and they're saying, my program feels gated, like I'm trying to figure out a way to make it not feel like an anchor, you've successfully done this. What's the first thing they're doing tomorrow after listening to this episode?

00:21:53

Speaker

Yeah. I mean, the first step is the classic find a beachhead. Don't try to boil the ocean. So find somebody that's willing to, that is bought in, that's feeling that pain, um you know, acutely sit down with them, learn what they're working on, and then really, you know, bring this alternate plan to them and focus on things like don't test too much at lunch, reduce the number of metrics, increase the metric sensitivity.

00:22:19

Speaker

Uh, it just worked through that and work through it together. And then from there, if you can get through it, like we did and show the change and show the outcomes, then we kind of went on our promotional tour. We had, uh, our monthly newsletter. We had basically a whole piece written by that product manager from her point of view on how everything went. We opened up a Slack channel, we opened up office hours. I made a full-blown training out of it. And then, you know, was starting to roll that out to another, i ran a version of that and then with one team. And then from there, we were looking at institutionalizing that within our like full-blown learning and development so that I wasn't in the loop anymore on it. So that's kind of the the gist of it. Simplify things, find a beachhead, win it, and then use that to build momentum to roll it out further.

00:23:06

Speaker

I'm calling this episode, Don't Boil the Ocean. ah That's perfect advice, though. I think a lot of people see, and myself included, right? They see the the potential and it's like, oh my God, there's literally 50 ways I could start, but – I could start with simplifying my metrics. Like you do a very good job of saying like, okay, simplify your metrics. Okay, next.

00:23:31

Speaker

How are you democratizing insights next? Like, I feel like you do a very good job of breaking that out into bite-sized pieces. So I hope, you know, we can talk about this another time too. i think we could part two in a year of what ah I want the update on all things, all things Paul, but um okay. I think with that, thank you so much for being on the show. We really appreciate you.

00:23:54

Speaker

Cool. Thank you very much. And we'll see you next time.