Pentaho Unveils AI-Ready Version 11 Platform Update

Pentaho Unveils AI-Ready Version 11 Platform Update

Today we’re joined by Chloe Maraina, a Business Intelligence expert with a passion for transforming big data into compelling visual stories. With the industry racing to build a trusted data foundation for AI, we’ll explore how Pentaho’s Version 11 platform update is tackling this challenge head-on. Our conversation will delve into the practical impact of new browser-based tools on daily workflows, the crucial role of semantic modeling in creating data consistency, and how containerization simplifies deployments in complex multi-cloud environments. We’ll also examine how these advancements directly support the rigorous data quality demands of AI development and look ahead to the exciting future of “AI for data” concepts.

With the new browser-based Pipeline Designer, how does the daily workflow change for data teams, and what new capabilities does this unlock for less-technical business users? Please share a step-by-step example of a data integration task that is now significantly simpler to complete.

It’s a complete game-changer for daily operations. The biggest shift is the removal of friction. Before, a significant hurdle was just getting started; you had to deal with local installations and specific system configurations, which often created a bottleneck and required dedicated engineering support. Now, it’s all handled within the browser. This shift truly democratizes the process. For example, think about a marketing analyst who needs to pull campaign performance data from Google, blend it with sales figures from an internal database, and load it into a dashboard. Previously, this was a ticket in an engineer’s queue. Today, that same analyst can open a browser tab, visually drag and drop the sources, apply transformations through an intuitive interface, and create that job themselves. This dramatically accelerates pipeline development and empowers business users to be more self-sufficient.

The new Semantic Model Editor aims to create consistent data definitions across an organization. What were the most common challenges with the older tools it replaces? Can you share an anecdote illustrating how standardizing metadata with this new editor prevents costly errors in analytics projects?

The older tools it replaces, like Schema Workbench and Data Source Wizard, were functional but often led to a ‘wild west’ of metadata. Different departments would create their own definitions for the same business terms, leading to massive inconsistencies. I remember a project where the finance team’s definition of “quarterly revenue” differed from the sales team’s because of how they handled returns and credits. When we built a company-wide performance dashboard, the numbers clashed, creating weeks of painful reconciliation and a huge loss of trust in the data from leadership. The new Semantic Model Editor centralizes this process in a clean, web-based workflow. By creating one standardized, governed definition for “quarterly revenue,” we ensure that every report, every dashboard, every analysis across the entire organization is speaking the same language. This prevents those costly errors and builds a foundation of trust that is absolutely essential for making data-driven decisions.

The Project Profile feature containerizes data integration jobs for easier deployment. How does this practically help a DevOps engineer manage workflows in a complex hybrid or multi-cloud environment? Could you describe a common deployment challenge this feature directly solves and the potential time savings?

For a DevOps engineer, managing deployments across hybrid and multi-cloud environments is a constant battle with dependencies and configuration drift. A common nightmare is developing a data workflow that works perfectly in a local development environment but breaks when deployed to a production cloud server because of a slight difference in a configuration file or a missing dependency. Project Profile directly solves this by packaging the related jobs, transformations, and all their configuration files into a single, modular container. This means the engineer can move that entire self-contained project from on-premises to AWS, then to Azure, with confidence that it will run identically everywhere. It eliminates the tedious, error-prone process of manually reconfiguring everything for each environment, easily saving hours, if not days, on complex deployment cycles and significantly reducing governance risk.

Enterprises are building trusted foundations for AI, which requires high-quality data. How do the new governance controls and semantic modeling capabilities specifically address the data consistency challenges unique to training AI models? Please provide an example of how this helps prevent a common AI development failure.

AI models are incredibly sensitive to data inconsistencies; garbage in, garbage out is an absolute truth. A common failure in AI development occurs when a model is trained on data with ambiguous or conflicting definitions. For example, imagine training a predictive customer churn model. If the training data has five different definitions of an “inactive customer” pulled from various legacy systems, the model will learn a muddled, unreliable pattern. The new semantic modeling capabilities prevent this by enforcing a single, standardized definition for every piece of metadata. The improved governance controls, like integration with identity providers such as Azure and Okta, then ensure only authorized users can alter these definitions. This creates a trusted, consistent, and secure data foundation, ensuring the AI model is trained on high-quality, unambiguous data, which is the only way to get accurate and reliable predictions.

Your roadmap includes “AI for data” concepts like semantic search and agentic workflows. How do you see these capabilities evolving to automate data discovery and preparation? What might an “agentic workflow” look like in practice for a data analyst in the near future?

These “AI for data” concepts are about to fundamentally change how analysts interact with data. Instead of manually searching through catalogs and databases, an analyst will use semantic search to simply ask a question in natural language, like “Find all datasets related to customer satisfaction in the last quarter.” The AI will understand the intent and return the relevant tables and files. An “agentic workflow” takes this a step further. An analyst could give a prompt like, “Prepare a dataset to analyze the impact of our last marketing campaign on sales in the Northeast region.” An AI agent would then automatically discover the relevant marketing and sales data, identify and join the correct tables, clean the data by handling missing values, and present a prepared dataset to the analyst. This automates the most time-consuming parts of data preparation, allowing the analyst to focus almost entirely on generating insights.

What is your forecast for data integration?

My forecast is that the future of data integration is inextricably linked to AI, evolving along two parallel tracks: “data for AI” and “AI for data.” The first track, “data for AI,” will focus on building robust, governed, and high-quality data foundations, because as we’ve discussed, AI initiatives are completely dependent on them. The second track, “AI for data,” is where we’ll see the most radical transformation. We’re moving beyond manual processes toward a future of intelligent automation. I expect to see AI-enabled discovery, semantic search, and agentic workflows become standard. These capabilities will automate the mundane and complex tasks of data preparation, allowing data professionals to deliver insights faster and enabling a much broader range of users to harness the power of their organization’s data.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later