Machine learning has become an indispensable tool for businesses seeking to gain insights, automate decisions, and stay competitive. But the rapid pace of AI advancement and research can be dizzying for companies trying to adopt ML. Should you invest in sophisticated deep neural networks or proven gradient boosted decision trees? Does generative AI live up to the hype? A rigorous new study suggests going back to ML basics and fundamentals may be the best approach.
Researchers conducted the largest ever analysis of machine learning algorithms, comparing 19 techniques across 176 datasets. This monumental study had two goals: settle the debate over neural nets versus decision trees for tabular data, and determine what properties make certain algorithms succeed or fail.
Tabular data, consisting of rows, columns, and features, is the most common type used by businesses for tasks like predicting customer churn, forecasting sales, or personalizing recommendations.
Neural networks have revolutionized fields like computer vision, but their value on this core business data has been unclear.
The study yielded surprising results with important implications. Three key findings should guide business ML strategy:
- In many cases, basic algorithms performed just as well as cutting-edge neural nets. Proper tuning and configuration of established methods like random forests was more impactful than model selection.
- On average, CatBoost (a gradient boosted decision tree technique) and TabPFN (an exotic neural net) achieved top results. But different algorithms succeeded on different datasets — no single approach dominated.
- Decision trees handled messy real-world data better. Neural nets had an edge on smaller, cleaner datasets.
These data-driven insights contradict the AI hype cycle.
Businesses need to take a disciplined, back-to-basics approach focused on fundamentals like data quality, feature engineering, validation, and classic techniques before considering glittering advances like generative AI. Matching algorithms to business data is key for ML success.
Analytics and AI Strategy Guide eBook
Unfortunately, even in 2023, we're still seeing analytics leaders struggle to deliver measurable returns on analytics investments. To deliver value on your analytics and AI efforts, D&A leaders must start with a robust strategy, and we can help you get started. Our free Analytics and AI Strategy eBook is an easy-to-follow guide that takes you through the entire process of creating an effective strategy - from identifying key areas to address to crafting a comprehensive plan and disseminating and monitoring your strategy. With practical tips and real-world examples, you'll be well on your way to developing a more comprehensive analytics and AI strategy.
Business Leaders: Simpler ML Should Not Be Overlooked
First, don’t get sucked into the hype of chasing the latest, greatest AI algorithms. Simpler ML approaches like random forests and logistic regression should not be overlooked — they frequently perform just as well as cutting-edge neural nets.
Second, focus your efforts on properly tuning and configuring established algorithms like CatBoost rather than endlessly testing new models. Optimization is more valuable than being on the bleeding edge.
Finally, match the algorithm to the data. For handling messy real-world data, gradient boosted decision trees are remarkably robust. But for smaller or cleaner datasets, neural networks may excel.
The Bigger Picture: Data Quality and Feature Engineering
This study highlights the fact that picking the right algorithm is just one piece of the ML puzzle for businesses. Ensuring quality data, engineering the right features, and proper validation matter just as much. No algorithm can overcome dirty or biased data.
As such, companies should take a holistic view — investing in data infrastructure, monitoring, and lifecycle management. Hiring expert ML engineers and architects to oversee this process is equally if not more important than choosing CatBoost or TabNet.
The paper also revealed gaps where current algorithms struggle, summarized in a challenging benchmark suite. Progress on these hard problems will better position businesses to take advantage of AI on complex real-world data.
Creating a Data Strategy eBook
The growth in the demand for better decision-making with data means that a comprehensive data strategy is no longer a nice to have. You need a data strategy. It needs to address questions about how to improve the availability, timeliness and quality of data, in that order of priority.
Generative AI’s Role
Another recent paper, “ML-Bench,” evaluated large language models on practical programming tasks using machine learning libraries. The models struggled to effectively leverage libraries and documentation like human developers. This highlights the limitations of today’s generative AI.
While generative models show promise for content creation, they lack the reasoning, comprehension, and tool usage skills needed for many business use cases. Their role should be carefully evaluated rather than overstated.
In closing, this research dispels some AI hype, guides smarter ML solutions, and outlines a measured approach to leveraging generative models. Matching algorithms to data and focusing on fundamentals over trends is the recipe for ML success.
Final Thoughts
The takeaway is clear: companies should adopt a pragmatic, back-to-basics approach as they integrate ML into their operations. Rather than chase the latest hype, invest in quality data, pipeline infrastructure, and fundamental techniques like random forests and gradient boosting that deliver consistent value.
Once these foundations are in place, generative AI can play a supporting role in increasing human productivity through responsible and gradual implementation. For example, large language models could assist developers in leveraging libraries or provide customer service agents with suggested responses. But these systems require meticulous design to avoid harmful mistakes.
For most organizations yet to implement machine learning, the biggest wins lie in finally deploying core analytical capabilities at scale. Robust prediction, personalization, forecasting and optimization systems built on classic methods will propel competitiveness. Rather than stretch for speculative advances, implement the long-established techniques that continue generating immense value.
By taking this measured approach, companies can extract the maximum benefit from AI while managing risks wisely. Focus resources on what delivers results today, and selectively incorporate emerging technologies once existing systems are smooth and productive. With the right strategy, businesses can thrive in the age of artificial intelligence.
Originally published in In Plain English.