Using Minimal Data For Maximum Impact

Composite image of man sitting on laptop in data server room
AdobeStock
The right questions coupled with a problem-solving approach can turbocharge your AI ROI. Here’s how.

Our relationship with the world has changed an immense, almost unfathomable amount throughout the 21st century. Emerging technologies have revolutionized our connections and the products we cherish. Every day, enterprises generate millions of terabytes of data, yet despite this abundance, many struggle to pinpoint value. The greatest opportunities for growth and competition are too often lost in a sea of data. This gap in accuracy between the questions asked and the data needed to answer them leaves customers disconnected from the products they love and use, blocking companies from realizing meaningful growth. The challenge is multifaceted: an overemphasis on technology for its own sake, a maze of available data with no clear path and a misalignment between technical potential and strategic business goals. These factors complicate the ability to tap into the full potential of data and truly drive enterprise growth.

Typically, organizations begin with the “universe of data”—focusing on what data is collected, needs cleaning, organizing and governance. However, a more effective approach starts with the “universe of questions.” This involves evaluating hypotheses, identifying how different use cases connect and reverse-engineering to determine which data is essential for validating these hypotheses and answering critical questions.

The path to harnessing the power of big data is marked not by the quantity of data gathered but by the quality of questions asked and the precision with which data is used. By focusing on targeted data collection and analysis, organizations can unlock significant value, driving strategic growth and operational efficiency in an increasingly data-saturated world.

You Do Not Need a Lot of Data to Get Started, You Need the MVD

The concept of MVD is inspired by the lean startup methodology’s minimum viable product (MVP). In the realms of data science and AI, MVD denotes the smallest dataset required to effectively train an ML model to a specified performance standard. This idea of “smart sizing” data has been pivotal in the shift toward data-centric AI, where the focus is on the quality, relevance and cleanliness of data rather than its quantity. This approach advocates for a more efficient and focused method of data collection and preparation.

Andrew Ng, a pioneer in the development of deep learning and artificial intelligence, champions this “data-centric AI” approach, prioritizing data quality over complex AI algorithms, which are now more accessible than ever. Ng argues that the key to effectively leveraging AI lies in the meticulous selection, preparation and governance of data. By employing strategies that optimize data use—requiring less but more precise data—businesses can develop efficient AI systems that compete with those of tech giants, even with smaller datasets. Ng’s emphasis on data consistency and strategic curation of training sets signals a significant shift toward more intelligent, accessible AI development. In this model, continuous data refinement and system retraining are crucial for success in AI.

Embracing MVD enables organizations to use resources more efficiently, shorten data processing times and accelerate the iterative cycles of model training and refinement. MVD is not just about reducing data size; it’s about intelligently identifying which data elements are most crucial for model performance and decision-making. This streamlined approach facilitates quicker deployment and agility in adjusting models based on initial insights and performance feedback. For example, in a customer sentiment analysis tool, the MVD might be recent reviews and ratings, focusing on keywords and sentiments rather than extensive historical transactional data or detailed customer profiles. In predictive maintenance for manufacturing, MVD would likely include recent operational data and error logs, emphasizing timeliness and specificity over comprehensive historical records.

MVD promotes a focus on data quality— accuracy, consistency and relevance—to the problem at hand. Starting with high-quality, targeted data can significantly enhance model performance, even with a smaller dataset. This initial model serves as a foundation, similar to an MVP, which can be expanded and refined with additional data based on user feedback and emerging needs. By identifying the MVD, organizations make strategic decisions about data acquisition and curation, guiding them on what types of data are most valuable and what additional data might be necessary to tackle specific challenges or enhance model accuracy. Through these practices, MVD emerges as an approach that balances efficiency with effectiveness, enabling smarter data practices focused on achieving substantial, meaningful results.

Asking the Right Questions

Asking the right questions is critical in defining an optimal MVD set, ensuring that the focus is tightly aligned with the intended outcomes of a project. This process begins with a thorough understanding of the problem statement and the specific goals of the model. Such clarity helps in discerning which data elements are essential and which are superfluous. Teams should consider the key predictors of the outcome, the availability and quality of the data, the necessary granularity and timeliness of the data, and the cost implications of data acquisition and maintenance. Prioritizing data that provides the greatest predictive power and relevance to the task at hand guides the selection process, emphasizing the most impactful variables and eliminating unnecessary information. This method streamlines data collection and processing efforts.

This targeted approach to defining MVD not only speeds up the development and deployment of ML models but also ensures these processes are constructed on a data foundation most indicative of the desired outcomes. Regularly revisiting these questions facilitates the iterative refinement of the dataset, enabling it to adapt to changes in project goals, emergent insights or shifts in the external environment. As a result, the MVD evolves over time, staying aligned with the project’s objectives and the dynamic nature of data. By strategically focusing on the right data from the beginning, organizations can improve the efficiency, effectiveness and economic viability of their data-driven initiatives, leading to more precise decision-making and enhanced outcomes.

A few years ago, about six months after launching Theory+Practice, I had the opportunity to collaborate with one of the largest logistics companies in the world. The marketing team, focusing on e-commerce and retail marketing and customer engagement, was tasked with identifying AI and ML use cases and gathering the necessary data. During a meeting in Memphis, I learned they had pinpointed 120 different use cases, ranging from customer segmentation to advanced recommendation systems and algorithms for suggesting the next best action. The team had also identified 25 different datasets.

Sitting there, I was puzzled by the lack of clear prioritization criteria, value metrics or defined return-on-investment goals that should have guided the ranking of these use cases. It struck me as inefficient: Without knowing which use cases to prioritize, how could we possibly determine the most appropriate datasets? Additionally, there were concerns about identity resolution and the ability to create a comprehensive 360° view of customers across different datasets—one might capture online interactions, another could reveal insights into promotional and price sensitivities, and a third might inform us about customer preferences regarding communication methods and timing.

Despite the team’s diligent effort in identifying these use cases, there was no clear roadmap or guiding principles to ensure value creation, increase efficiency and reduce duplication. The necessary connections were not being made to ensure that investments in the data foundation would maximize the addressable use cases, rather than perpetually starting from scratch with new initiatives.

This lack of strategic foresight in the initial stages of project planning presented a significant barrier to leveraging the full potential of AI and ML technologies. The absence of a structured roadmap not only hindered the team’s ability to effectively use the identified datasets but also complicated the integration of new technologies into existing workflows. The realization of these challenges led to a fundamental shift in our approach.

We initiated a comprehensive review of the use cases and datasets to establish a hierarchy based on potential impact and feasibility. This involved setting clear, measurable objectives for each use case, aligning them with overarching business goals and identifying key performance indicators to track progress and outcomes. We also emphasized the importance of data integration, ensuring that each dataset could be harmoniously linked to provide a unified and complete view of customer interactions.

Through these adjustments, we aimed to create a more efficient and targeted strategy that not only reduced redundancy and waste but also enhanced the overall effectiveness of the team’s efforts. By prioritizing use cases with the highest potential for ROI and ensuring a cohesive data strategy, the organization could better align its technological investments with its strategic objectives, paving the way for more informed decision-making and robust customer engagement strategies.

The right questions, rooted in a deep understanding of the underlying problems and challenges, illuminate potential avenues for both value creation and capture. These inquiries facilitate the identification of connections among various use cases, fostering the development of a strategic road map. This road map is designed to enhance efficiency and minimize redundancies by pinpointing the most relevant datasets that maximize the predictive and analytical capabilities of statistical and ML models. It’s crucial to recognize that our ultimate goal is to influence specific behaviors or to make more informed decisions. This goal is closely linked to the insights gleaned from data and the quality of data employed in the models and algorithms. A rigorous approach to data selection and use directly strengthens the impact and efficacy of our data-driven initiatives, ensuring that they are not only aligned with but also advance the organization’s strategic objectives.

This is an edited extract from Behavioural AI: Unleash Decision Making with Data, by Rogayeh Tabrizi (published by Wiley, February 2025).


MORE LIKE THIS

Get the CEO Briefing

Sign up today to get weekly access to the latest issues affecting CEOs in every industry

upcoming events

Roundtable

Strategic Planning Workshop

1:00 - 5:00 pm

Over 70% of Executives Surveyed Agree: Many Strategic Planning Efforts Lack Systematic Approach Tips for Enhancing Your Strategic Planning Process

Executives expressed frustration with their current strategic planning process. Issues include:

  1. Lack of systematic approach (70%)
  2. Laundry lists without prioritization (68%)
  3. Decisions based on personalities rather than facts and information (65%)

 

Steve Rutan and Denise Harrison have put together an afternoon workshop that will provide the tools you need to address these concerns.  They have worked with hundreds of executives to develop a systematic approach that will enable your team to make better decisions during strategic planning.  Steve and Denise will walk you through exercises for prioritizing your lists and steps that will reset and reinvigorate your process.  This will be a hands-on workshop that will enable you to think about your business as you use the tools that are being presented.  If you are ready for a Strategic Planning tune-up, select this workshop in your registration form.  The additional fee of $695 will be added to your total.

To sign up, select this option in your registration form. Additional fee of $695 will be added to your total.

New York, NY: ​​​Chief Executive's Corporate Citizenship Awards 2017

Women in Leadership Seminar and Peer Discussion

2:00 - 5:00 pm

Female leaders face the same issues all leaders do, but they often face additional challenges too. In this peer session, we will facilitate a discussion of best practices and how to overcome common barriers to help women leaders be more effective within and outside their organizations. 

Limited space available.

To sign up, select this option in your registration form. Additional fee of $495 will be added to your total.

Golf Outing

10:30 - 5:00 pm
General’s Retreat at Hermitage Golf Course
Sponsored by UBS

General’s Retreat, built in 1986 with architect Gary Roger Baird, has been voted the “Best Golf Course in Nashville” and is a “must play” when visiting the Nashville, Tennessee area. With the beautiful setting along the Cumberland River, golfers of all capabilities will thoroughly enjoy the golf, scenery and hospitality.

The golf outing fee includes transportation to and from the hotel, greens/cart fees, use of practice facilities, and boxed lunch. The bus will leave the hotel at 10:30 am for a noon shotgun start and return to the hotel after the cocktail reception following the completion of the round.

To sign up, select this option in your registration form. Additional fee of $295 will be added to your total.