Jumping in right where we left off from part one:
The failure rate and the field tech parameters are related in a triple constraint type of way. As a result, there are 2 schools of thought potentially to increase velocity:
Option 1
If we can drive down the failure rate and keep the field techs as it is, we can drive up the MPD
Option 2
If we have more field techs we can handle more failures, thus allowing the MPD to increase
Obviously, the first option is more advantageous since more field techs ultimately carry more project cost. There is a larger increase in MPD if the failure rates are decreased vs employing more field techs.
Getting back to 2%, let’s go through a thought exercise and see what this would look like. Let’s say we are in charge of a 30k seat org. From a technology standpoint as mentioned, the number of deployments per day is, in theory, infinite, and we should be able to do all your migrations all at once. For our 30k estate, in theory, we can do them all in 1 day, our MPD is 100%. This is our starting point.
Let’s dive deeper using some more realistic figures.
From customers present and historically, we are hearing for migrations about a 3 – 7% failure rates for deployments as a whole. This number will be higher during the early days of the migration as we showed them from deployment profile, but in the mid 90% success is good.
I’m going to use 5% failure rate as a round number, 95% success rate. If we use 5%, therefore in a 30k seat org could have 1500 machines that will fail the automatic migration, meaning they will need to be serviced by a field tech via heavy touch. This is a lot of machines at first glance. The goal is to find ways we can drive this failure rate % as low as possible. And as the migration progresses and we get better and better, this failure % is going to go down and the MPD will go up naturally.
Another data point is the amount of time it takes to fix a machine. Per a related Gartner report, a manual rebuild of a machine can take anywhere up to 5 hours depending on the complexity of the machine and the environment. This may seem like a lot but think about the process as a ticket comes in, the tech gets assigned and goes to the office, once at the office he/she has to find the desk of the user, assesses the situation, tries to salvage user data, copies that to the thumb drive, bare metals the machine, install the OS, joins the domain, encrypts the data, re-applies the user profile, re-installs the apps, sets up outlook, etc. It’s a lot.
Therefore let’s say a field tech can service 3 machines a day via heavy touch, which means we have to schedule migrations such that the field tech can keep up with the migration velocity. 3 machines serviced is 5% of 60 machines per day, immediately the number of machines per day goes from 30k in one day to 60 machines per day. Which puts us somewhere around .3% MPD.
What is a way we can increase the MPD? As lowly non-tech project managers, I don’t understand TSs and I don’t understand computers; but I do understand scheduling meetings and I understand resources allocation, so one thing PMs can do is throw more resources at this. How many field techs are included as part of an organization? For a 30k seat, national organization let’s say there are 10 field techs to cover the W10 project. 10 field techs still doing 4 hours to fix a machine doing 3 per day per tech, can cover collectively 30 broken machines per day, which is 5% of 600 machines. Now the MPD is 2% of the estate.
In our though exercise we used numbers to make it work, but managers can modify these figures based on the org to see what might work best. For example, 15 field techs can cover 45 broken machines per day, which is 5% of 900 machines, which is now the limit per day of 3% of the estate . As mentioned what is better, driving down the failure rate can make a much bigger impact on MPD.
How can we decrease failures from a process standpoint?
From our experience, there are 4 main inputs to answer the questions of migration velocity: Visibility, Targeting, Agility, and Strategy. These 4 areas will answer the question on how fast you should go.
Perhaps the most critical factor is visibility into the environment. How small or large is the problem, how small or large are the potential risks. The greater the awareness of the enterprise environment, the less 'hope' needs to be relied upon for the migration. The goal for visibility is to ascertain the various components of the environment that may impact a successful migration, as well as a factor in the speed of a successful migration. Reporting is paramount in order to gather this information. Tools such as Tachyon can help uncover and quantify various aspects of migration visibility.
What is the "application footprint"
What is the hardware readiness? – the goal is to deploy the most secure configuration possible, in order to take advantage of the advanced security features in Windows 10. Not only is it important to know the current state of the machine, but know "what the end-state configuration" of the machine will be at the end of the Windows 10 deployment.
How much variation is in the environment – Visibility
How standardized are the machines? – Visibility
What is the site breakdown? – Visibility
Application rules and readiness – Visibility
What and where is Data – Visibility
Once we have visibility into the estate, this will act as a filter to know what machines to target and when. The goals are to always be pointing and targeting to what machines can go right now, and therefore where the exceptions will present themselves. There is no use in targeting machines for migration that might have applications that aren't yet supported on W10. There are also benefits to targeting the easier machines first to gain momentum and overall migration acumen while leaving outlier machines to the end.
Are there machine hardware models that aren't going to support W10 and should be naturally migrated via attrition and standard machine replace process over time.
Start Small and Ramp Up – Targeting
Advance the Content – Targeting
Good visibility will lead to improved targeting, but there will be things that come up. The ability to be Agile with deployments, due to visibility, will keep the migration velocity high. If there is an issue or a strategic change in direction, it's important to have a contingency. If a section of machines is not able to be migrated as planned, are there other groups of machines that can migrate instead.
How much capacity can you handle when you get errors (they will happen) – Agility
Holidays, Weather Impacts – Agility
Automation can bring out flaws in processes and team procedures – Agility
The overarching strategy of the business will factor into the migration velocity, both accelerating and potentially limiting. Identifying the goals or concerns of the business stakeholders will uncover any ground rules to be used in velocity considerations. It's important to understand the technology solution expectations.
What output is expected or solutions are expected – Strategy
How much end-user communication is required – Strategy
How savvy are users – Strategy
Who 'owns' the machine, the user or the business? – Strategy
The best approach is to start with Visibility, understanding all these things to consider and divide the estate into reasonable migration tranches. Some of these tranches will be able to do very quickly (e.g. corporate offices, call centers, well-connected sites, pilot program users, field techs) while others will need to be done over time (e.g. very remote sites, developer machines, manufacturing machines). It's important to tell the business a target range, but also inform them there will be peaks and values.