Moving from Type 2 to Type 3 Data Organizations

Introduction

In the previous post on the Kardashev scale for data organizations we discussed the three general types and their characteristics, but didn’t dive deeply into how to make the move from Type 2 to Type 3 (the move from Type 1 to Type 2 doesn’t really require much explanation).

The move from Type 2 to Type 3 requires more focus on data and how it integrates into the product (including developing totally new products) and a departure from focusing on the data organization as a service unit. It also implies that the infrastructure and tools are in place for people to look into data on operational metrics themselves and that they do not require the expertise of a dedicated BI group.

All of these changes require the appetite within the organization in addition to people who are willing to be distributed within the organization in an effort to help educate and enable non-technical people to use data and tooling effectively. I mostly discuss the organizational and people side of this problem as the technical side is a much easier problem to solve and has much clearer criteria for success.

Think distributed

As organizations grow, assuming they want to maintain fast growth and flexibility, the desire to completely control anything centrally should decrease. Instead, organizations should decentralize where possible, leading to semi-autonomous groups all working towards the same goal. This approach also helps people to maintain a sense of independence and responsibility in addition to the growth and focus benefits. Since the previous centralized approach is no longer viable for sufficiently large organizations progressing along the Type 2 to Type 3 spectrum, the approach to how data is used within the organization must also become decentralized.

The extreme end of this of course would be that every employee is also a data scientist, but since this is obviously impractical the focus becomes on building basic data literacy within the organization and commoditizing basic data needs while freeing up specialists to work on more valuable topics. The goal of the enablement effort is that every employee has the possibility to do basic data collection and analysis in addition to having the critical thinking skills required to interpret and think about data problems effectively.

In order to do this, it is critical that disparate teams have the specialist support they need. This means that the centralized data function should be distributed to the extent possible, and begin to engage in an education and awareness campaign with the broader organization. The goal of this campaign must be to change the perception on data from a Central Services approach to something for which everyone is responsible and capable of handling.

It was not so long ago that organizations went through the progression of single typist, to centralized group of typists, to distributed typists, to everyone in the organization being responsible for their own typing. Imagine the reaction if any modern organization still maintained a centralized typing staff. Should a sufficiently advanced organization with a strictly centralized Data team be viewed any differently?

Winning hearts and minds

In theory, the organization will have evolved to a point where there is a large appetite for using data more effectively. However, this is often coupled with a desire for people to minimize the work that they are doing alone. In other words, they want all the benefits of increased data usage and literacy but without having to do any additional work themselves. This is a difficult challenge because progress along the Type 2 to Type 3 spectrum requires that individuals in the organization are able to access and analyze data independently. This does not actually add to their workload as they might think, but rather decreases the time they spend waiting for results and therefore improves their ability to make decisions. So even though people may think they will have more work to do, they will likely end up more productive than they were previously due to few delays and context switches.

Helping people become more data literate requires a huge amount of persistence from those actually doing the teaching and enabling. They are at the front line of the effort, and therefore must approach the problem with compassion, patience, and a non-adversarial attitude. In the end it’s about building relationships with teams and helping them become more independent users of data in their daily work.

In order to build these relationships it is best to have data specialists embedded within the teams of the organization in order to provide support throughout the process, to reduce the latency between issues arising and being resolved, and to eliminate any remaining perception within the company that there is a centralized data function to which they can delegate any collection or analysis they do not want to do themselves.

Identify requirements

The first step in the winning of hearts and minds is to simply have productive discussions with the team in which the data specialist is embedded. It is always important to first understand the problem teams want to solve and what kinds of decisions they need to make before discussing any particulars of data which could be collected or analyzed. Throughout this process of discovery it is also critical to provide different perspectives on data usage wherever possible. Things like helping people understand how to use the right kind of data for the right kind of decision begin in this phase and continue throughout all interactions.

A favorite example of mine that should be addressed during this phase is the Data-Decision Frequency Gap. People are often of the opinion that more data will allow them to make better decisions, but due to things like noise in the data, and cognitive biases in humans (which are very extensively documented) it is known that providing more data than is needed to make a decision may lead to worse outcomes than if the data was not available at all. Simply helping people understand that there is no requirement to receive data at a frequency higher than they will make decisions, and all of the associated reasoning for that, is one example of a topic which can be very helpful in improving data literacy and usage.

Automate where practical

Once the discussions have produced basic requirements in terms of what kinds of decisions need to be made and what kind of data is needed to support those decisions, it is helpful to automate the collection and processing of the data. Much of this is too complicated and unnecessary for non-technical people, but is an absolutely required step in moving data specialists away from the service-based role common in early and mid-stage Type 2 organizations.

In addition to automating the required collection and processing of the data, it is highly likely that there were other tasks for which data specialists were previously responsible that have not been discussed in the requirements analysis. These tasks can be stopped, thus freeing up additional time for data specialists to work on more relevant and beneficial things.

The result of this step should be that anything which was previously required by the team before the data specialist was embedded and was still deemed required after the initial discussions has been automated, and work on any other data collection analysis activities now understood to be unnecessary have been stopped. This should result in a dramatic reduction in recurring workload, thus allowing the data specialist to provide support and education to the team. The goal should ultimately be that basic ad hoc and exploratory data questions can be answered by people individually, but this requires additional tools and help from the data specialist.

Tooling and support

Now that the team understands what they need, and they have automated processes for things done in the past, the data specialist can help educate the team on tooling and provide ongoing support. The ongoing support responsibilities should lessen over time as the team becomes more data literate, but may never disappear entirely.

This effort is largely focused on helping people in the team become proficient in accessing the data in which they are interested and in helping train them how to do common reporting tasks which take up most of their time. The majority of all work in non-technical teams (and arguably much in technical data teams as well) is some twist on taking a set of data, grouping it in some way, and doing some kind of aggregations, typically over some time interval. Assuming the data is not excessively large or complicated, this is very simple to do in a standard relational database. The knowledge of SQL required to accomplish these kinds of tasks is very limited and can be learned by non-technical people in a relatively short time. Having this knowledge allows them to dramatically increase their self-sufficiency for common reporting tasks, and also provides them with the power to engage in basic exploratory analysis directly on the data. By not having to rely on others to answer questions for them, people can reap enormous benefits with just a little time invested.

With the above in mind, education at this stage should be on basic SQL needed to address the problems the team wants to solve in addition to any education needed for dashboard creation tools which may be used. Depending on the company culture, there may be resistance from some about learning SQL, typically combined with the claim that they are not technical people and therefore should not need to learn technical things. Leaving aside to the moment the question of whether or not the organization would want to continue working with someone opposed to learning new things, that position just doesn’t make sense. In a similar way to the typists example mentioned previously, in order for organizations to advance in the way data is used there must be basic capability in most of the people to work with data. Helping people understand that basic SQL is not so difficult and that they will have huge benefits in independence and productivity by knowing it may help to encourage them, but some will always resist. Do not focus effort on helping these people. You can’t push a string. The best hope is that seeing their peers who have learned be more productive and capable will encourage them in to improve their basic data skills.

Conclusion

After going through the process above, the result should be teams that understand basic principles for how to collect and analyze the data they need to make decisions in the course of their work. There should no longer be massive reliance on a centralized data team, although more advanced data problems will naturally require specialists. The overall data literacy and capabilities of the organization should have advanced significantly as a result of the process, with associated reduction in ad hoc requests to data specialists and time data specialists spend on recurring reporting tasks.

All of these things support the transition towards a Type 3 data organization, and help build the momentum for the transition by freeing up time of data specialists. These specialists are then able to spend more time working on advanced data topics, including building prototypes and products for external customers in addition to further improving the depth and breadth of their own technical knowledge.

With enough persistence, the data literacy within an organization can be improved considerably. When combined with improved tooling and infrastructure, the ability of people to understand and make decisions about the business can rapidly become a competitive advantage. This process, more than anything currently being discussed in the industry (Big Data, Cloud Computing, etc.) is what has the promise to rapidly accelerate the pace of innovation in modern organizations. It’s not about the technology, it’s about the people.