This is an anonymized analysis, based on one I did for a small production company. The Company offers a wide range of production services from script writing and character design to game asset creation and animation.

The owners approached me about analyzing some of their internal workflow data. In particular, they were interested in trying to better predict the time/effort investment required for their projects. In other words: Are there properties of projects that could be identified in advance that would help them estimate the total time and cost needed to complete the work?

To explore this question, they shared with me a dataset describing their work on 115 projects* (between December 2016 and March 2019). For each project, there is a list of the tasks involved, how long each task took (in hours), and who performed the task.

Here’s what I found.

The Basics

Across all projects in the dataset, the average time spent on a project was 28.7 hours.

The longest project, in terms of hours invested, was “op_nuvectra_explainer,” which involved 303.3 hours of work.

The shortest project was “weatherChain,” which involved 0.66 hours of work. However, this project only involved meetings and e-mails, so the fact that it appears to have been completed in less than an hour is not especially interesting.

More interesting is the break down of time investment by task type. This is shown in the categorical scatterplot below (Figure 1), which depicts the distribution of hours spent (y-axis) by the task category (x-axis). Each glyph in the figure corresponds to 1 task. The shape of the glyph corresponds to who performed the task (as detailed in the legend). Color-coding also corresponds to the task category, so this is redundant information, but it does (I think) make the picture look prettier.


Figure 1. Distribution of Hours By Task Category

 
Open Pixel Project Data
 

What does this show? The first thing to note about this figure is how most of the distributions are tightly clustered around 5 hours (or less) and the vast majority of tasks performed take less than 10 hours. This suggests a largely streamlined process, since the time per task is generally pretty consistent within tasks categories. So if the Company knows how many tasks are required for a project, they could estimate about 5 hours per task as a reasonable starting point.

However, there are some exceptions to this pattern. The distributions for asset creation, project management, storyboards, and concept/design all have some outliers, with an instance or two where the task actually took at least twice as long as the average. But, by far, the most unusual distribution is animation, where some tasks took more than 30 hours. Given the data, it is hard to say if this should be expected or if each of these outliers can be explained or would have been predictable in advance. But if the Company is looking to identify types of activities that take longer or are less predictable, then animation would seem to be the area to investigate further.

The figure also allows you to explore how this labor distribution differs by employee. By clicking on the employee’s name in the legend, it is possible to show/hide their tasks.

The Complex

Now that we have a basic sense of the data distribution, let’s drill a little bit deeper and look at how the tasks break down by project. Figure 2 (below) depicts this relationship as a matrix, which I like to call a “project landscape”—that is, a systematic representation of every task and every project that the Company has ever done (or at least everything that is in the dataset).

In this figure, projects are arranged along the y-axis, ordered by time—with the oldest project at the top and the newest project at the bottom. As in Figure 1, each glyph still corresponds to a single task, performed for a single project, and the glyph shape corresponds to the employee who performed the task (as indicated in the legend). Color-coding still reflects the task category, however, the size of the glyph now corresponds to how many hours it took to complete the task.

Since multiple instances of the same task can be involved in a project (e.g., some projects involved multiple animation tasks), glyphs can overlap. Each individual glyph is also semi-transparent, so the greater the color saturation for each task/project combination (and the more visible “rings”), the more instances of that task activity were performed on the project.

You can also now mouse-over glyphs to see more details about all the tasks at the “location” in the project landscape.


Figure 2. Project Landscape of Task Category By Project

 
Open Pixel Project Data
 
 

What does this show? Each row in this figure is “telling the story” of a project: What tasks did the project involve? Who did the work? And how much effort did each task require? Some projects are made up of only a few small “dots” on the landscape, indicating relatively simple projects that only involved one person, a few tasks, and not a ton of labor. Other projects are composed of a complex mixture of sizes and shapes, indicating that everyone at the company was involved and spent a lot of time on the work.

Now, of course, on some level, the Company owners surely know these facts about their history. But this figure makes that knowledge explicit—showing the whole history of the company in one figure!

Scrolling through the project landscape reveals the kinds of tasks that make up the bulk of the Company’s time and effort. Almost every project involves asset creation, storyboards, and animation. Relatively few projects involve script writing, sound, or editing. The business development and marketing activities are largely confined to the “op” project—which, I take it, is a “catch all” category for work that is related to promoting the company itself.

Since size of glyphs in this figure corresponds to hours/effort, the project landscape can show, at a glance, which individual projects and tasks stand out as considerably larger than others.

Next Steps

To my mind, this is really just the start of the analysis. To truly answer the driving question about how to predict (or optimize?) the effort required for a project, some more information and data are needed. In particular, here’s what I think would be valuable next steps to gain deeper insight into the Company’s workflow:

  1. Developing a Project Taxonomy. Rather than organizing the project landscape by time, I suspect it would be more useful to categorize and cluster the projects by “similarity”. Since I only have the project codes, I can’t really do this on my own. But if there is a logical way to cluster these projects together, then it would be possible to see if projects that intuitively should “look the same” (in terms of activities and hours) actually do look the same. This could reveal if there are some kinds of projects that are more or less predictable than others, and then the Company can use this insight to better predict how much time and effort their next project of that type is likely to take (and plan accordingly).

  2. Developing a Task Hierarchy. I suspect it would also be useful to have a more structured taxonomy of tasks. So, for example, I would start with some broad task categories like, “Creative,” “Administrative,” and “Business Development”. Then within each category, use a more fine-grained taxonomy (e.g., “animation,” “script,” and “asset creation" could all go in the “Creative” category). And then I might have another layer of task detail that specifies the precise piece of software used (or something like that). This would mean a little more work in terms of task coding and data entry, but it would allow the Company to track and explore their workflow with greater precision and detail.

  3. Cost and Revenue. I just assumed that more time spent on a task is a bad thing. But, of course, that is not necessarily the case if all of that time is billable to the client and the client is happy to pay. So in addition to clustering projects by type or content, it may be valuable to know how these projects cluster by revenue for the Company. If, for example, the extra time spent on the longer, animation-driven projects does not translate into more money for the company, then this could be useful in thinking about how to pivot toward attracting more of those projects that give you a better return. Conversely, if those “big” projects are the “successful” ones, then clustering them together to identify particular clients or marketing strategies that led to those projects could also generate actionable insights.