<<

Best practices from physical data products

It is easy to stumble into the trap of thinking that since data exists somewhere abstract, on a spreadsheet or in the cloud, that data products are just abstract algorithms. So, we would like to conclude by showing you how objective- based data products are already a part of the tangible world.

What is most important about these examples is that the engineers who designed these data products didn't start by building a neato robot and then looking for something to do with it. They started with an objective like, "I want my car to drive me places," and then designed a covert data product to accomplish that task. Engineers are often quietly on the leading edge of algorithmic applications be-cause they have long been thinking about their own modeling challenges in an objective-based way. Industrial engineers were among the first to begin using neural networks, applying them to problems like the optimal design of assembly lines and quality control. Brian Ripley's seminal book on pattern recognition gives credit for many ideas and techniques to largely forgotten engineering papers from the 1970s.

When designing a product or manufacturing process, a drivetrain-like process followed by model integration, simulation and optimization is a familiar part of the toolkit of systems engineers.

In engineering, it is often necessary to link many component models together so that they can be simulated and optimized in tandem. These firms have plenty of experience building models of each of the components and systems in their final product, whether they're building a server farm or a fighter jet. There may be one detailed model for mechanical systems, a separate model for thermal systems, and yet another for electrical systems, etc. All of these systems have critical interactions.
For example, re-sistance in the electrical system produces heat, which needs to be included as an input for the thermal diffusion and cooling model. That excess heat could cause mechanical components to warp, producing stresses that should be inputs to the mechanical models.

The screenshot below is taken from a model integration tool designed by Phoenix Integration. Although it's from a completely different engineering discipline, this diagram is very similar to the Drivetrain Approach we've recommended for data products. The objective is clearly defined: build an airplane wing. The wing box includes the design levers like span, taper ratio and sweep. The data is in the wing materials' physical properties; costs are listed in another tab of the application. There is a Modeler for aerodynamics and mechanical structure that can then be fed to a Simulator to produce the Key Wing Outputs of cost, weight, lift coefficient and induced drag. These outcomes can be fed to an Optimizer to build a functioning and cost-effective airplane wing.

Screenshot from a model integration tool designed by Phoenix Integration.

As predictive modeling and optimization become more vital to a wide variety of activities, look out for the engineers to disrupt industries that wouldn't immediately appear to be in the data business. The inspiration for the phrase "Drivetrain Approach," for example, is already on the streets of Mountain View. Instead of being data driven, we can now let the data drive us.

Suppose we wanted to get from San Francisco to the Strata 2012 Conference in Santa Clara. We could just build a simple model of distance / speed-limit to predict arrival time with little more than a ruler and a road map. If we want a more sophisticated system, we can build another model for traffic congestion and yet another model to forecast weather conditions and their effect on the safest maximum speed. There are plenty of cool challenges in building these models, but by themselves, they do not take us to our destination.

These days, it is trivial to use some type of heuristic search algorithm to predict the drive times along various routes (a Simulator) and then pick the shortest one (an Optimizer) subject to constraints like avoiding bridge tolls or maximizing gas mileage. But why not think bigger? Instead of the femme-bot voice of the GPS unit telling us which route to take and where to turn, what would it take to build a car that would make those decisions by itself? Why not bundle simu-lation and optimization engines with a physical engine, all inside the black box of a car?

Let's consider how this is an application of the Drivetrain Approach. We have already defined our objective: building a car that drives itself. The levers are the vehicle controls we are all familiar with: steering wheel, accelerator, brakes, etc. Next, we consider what data the car needs to collect; it needs sensors that gather data about the road as well as cameras that can detect road signs, red or green lights, and unexpected obstacles (including pedestrians). We need to define the models we will need, such as physics models to predict the effects of steering, braking and acceleration, and pattern recognition algorithms to interpret data from the road signs.

As one engineer on the Google self-driving car project put it in a recent Wired article, "We're analyzing and predicting the world 20 times a second." What gets lost in the quote is what happens as a result of that prediction. The vehicle needs to use a simulator to examine the results of the possible actions it could take. If it turns left now, will it hit that pedestrian? If it makes a right turn at 55 mph in these weather conditions, will it skid off the road? Merely predicting what will happen isn't good enough. The self-driving car needs to take the next step: after simulating all the possibilities, it must optimize the results of the simulation to pick the best combination of acceleration and braking, steering and signaling, to get us safely to Santa Clara.

Prediction only tells us that there is going to be an accident. An optimizer tells us how to avoid accidents.

Improving the data collection and predictive models is very important, but we want to emphasize the importance of beginning by defining a clear objective with levers that produce actionable outcomes. Data science is beginning to pervade even the most bricks-and-mortar elements of our lives. As scientists and engineers become more adept at applying prediction and optimization to everyday problems, they are expanding the art of the possible, optimizing ev-erything from our personal health to the houses and cities we live in. Models developed to simulate fluid dynamics and turbulence have been applied to improving traffic and pedestrian flows by using the placement of exits and crowd control barriers as levers. This has improved emergency evacuation procedures for subway stations and reduced the danger of crowd stampedes and trampling during sporting events. Nest is designing smart thermostats that learn the home-owner's temperature preferences and then optimizes their energy consumption. For motor vehicle traffic, IBM performed a project with the city of Stockholm to optimize traffic flows that reduced congestion by nearly a quarter, and increased the air quality in the inner city by 25%. What is particularly interesting is that there was no need to build an elaborate new data collection system. Any city with metered stoplights already has all the necessary information; they just haven't found a way to suck the meaning out of it.

In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters. Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses. These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations. In an emergency, a data product that just produces more data is of little use. Data scientists now have the predictive tools to build products that increase the common good, but they need to be aware that building the models is not enough if they do not also produce optimized, implementable outcomes.

<< |
Источник: Jeremy Howard, Margit Zwemer, and Mike Loukides. Designing Great Data Products. 2012

Еще по теме Best practices from physical data products:

  1. Designing Great Data Products
  2. Objective-based data products
  3. Jeremy Howard, Margit Zwemer, and Mike Loukides. Designing Great Data Products, 2012
  4. How to Promote Other Peoples Products
  5. Affiliate products
  6. How to Promote Affiliate Products
  7. Table of Contents
  8. Drivetrain Approach to recommender systems
  9. Optimizing lifetime customer value
  10. Contents
  11. 2. The Subject and Scope of the Program
  12. Internet Marketing for Building Income