RapidMiner Studio

In a few clicks, build validated machine learning models, for customer churn prediction, predictive maintenance, credit risk modelling, sentiment analysis etc. Combine R or Python code seamlessly.

Compare RapidMiner Editions

Automated Machine Learning

RapidMiner Auto Model builds validated models in 5 clicks, with automated machine learning.

Pre-built templates for common use cases including customer churn, predictive maintenance, fraud detection, and many more.

Easy to use data science visual workflow studio

Visual Workflow Designer

Increase productivity across the entire data science team, from analysts to experts.

Accelerate and automate the creation of predictive models in a drag and drop visual interface.

Rich library of 1500+ algorithms and functions ensures the best model for any use case.

“Wisdom of Crowds” provides proactive recommendations at every step to help beginners.

Connect to any data source

Work with all of your data, no matter where it lives.

Create point and click connections to databases, enterprise data warehouses, data lakes, cloud storages, business applications and social media.

Easily re-use connections any time and easily share them with anyone who needs access.

Connect to new sources, including social media, with extensions from the RapidMiner Marketplace

Automated In-database Processing

Run data prep and ETL inside databases to keep your data optimized for advanced analytics.

Query and retrieve data without writing complex SQL.

Harness the power of highly scalable database clusters.

Supports MySQL, PostgreSQL, and Google BigQuery.

Data Visualisation

Understand patterns, trends, and distributions with scatter plots, histograms, line charts, parallel coordinates, box plots, and more.

Explore data using robust statistical overviews and over 30 interactive visualizations.

RapidMiner Studio data profiling and exploration.

Data profiling and exploration with the Statistics view

Summary stats of categorical, numeric and time-date data, displayed.

Quickly identify and fix common data quality problems including missing values and outliers.

Data Prep and Blending

Eliminate the hassle of preparing data for predictive modeling.

RapidMiner Turbo Prep offers a fully interactive point and click data prep experience.

Extract, join, filter, and group data across any number of sources.

Create repeatable data prep and ETL processes that can be scheduled and shared.

Machine Learning

Quickly create impactful machine learning models, without writing code. Choose from hundreds of supervised and unsupervised machine learning algorithms

Implement basic and advanced ML techniques including regression, clustering, time-series, text analytics, and deep learning.

Build the model to be sensitive to constraints like costs to optimize the predictions for your desired business impact. Use both automated and manual feature engineering to optimize model accuracy

RapidMiner Studio Model Validation operators – just select the machine learning model.

Understand the true performance of a model before deploying to production.

Eliminate overfitting through a unique approach that prevents model training pre-processing data from leaking into the application of the model.

Add proven techniques, like cross validation, to a model with just a single mouse click.

Explainable Models Not Black Boxes

Create visual data science workflows that are easy to explain and easy to understand. Each step in the data prep, modeling, and validation process is documented for complete transparency.

Visuals are easy to explain to others in the organization to gain buy-in for deployment.

Supports the Local Interpretable Model-Agnostic Explanations (LIME) framework.

Get more from R & Python Code

Scalable code deployment and collaboration between coders and non-coders. Deploy code-based models and code-containing models into a scalable platform.

Eliminate duplicate work and ensure others re-use your work by uploading your code snippets into the RapidMiner repository for use in a simplified visual workflow designer

Make use of the vibrant Python ecosystem by building on top of specialized and cutting-edge libraries and making it available for the rest of your team using RapidMiner.

Leverage data prep and ETL pipelines others have produced.

Flexible Scoring

Turn predictive insights into business impact. Quickly deploy scored data to spreadsheets and data visualization tools or turn models into production web services with RapidMiner AI Hub.

Integrate with enterprise scheduling tools.

Add RapidMiner Real Time Scoring for demanding high transaction/low latency use cases.

Model Operations

RapidMiner Model Ops delivers an easy way for less advanced users to put models into production and properly manage them

RapidMiner for Teams

RapidMiner AI Hub – Team collaboration, process automation, accelerate model creation, scale and deploy

R apidMiner Go – Automated and guided machine learning web interface. Point/click data science for domain experts, business users and analysts.

Request pricing or a 30 minute demo

Let’s create a plan for your specific needs!

Feature List

Access, load and analyze any type of data – both traditional structured data and unstructured data like text, images, and media. Also transform unstructured data into structured.

Access to more than 40 file types including SAS, ARFF, Stata, and via URL
Wizards for Microsoft Excel & Access, CSV, and database connections
Access to NoSQL databases MongoDB and Cassandra
Write to Qlik QVX or Tableau TDE files
Access to Cloud storage like Dropbox and Amazon S3
Access to text documents and web pages, PDF, HTML, and XML
Support for all JDBC database connections including Oracle, IBM DB2, Microsoft SQL Server, MySQL, Postgres, Teradata, Ingres, VectorWise, and more
Access to full-text index & search platform SOLR
Access to Twitter & Salesforce.com
Repository-based data management on local systems or central servers via RapidMiner Server
Connect to Zapier and trigger Zapier tasks
Access to time series data, audio files, images, and many more
Enhanced data and metadata editor for repository entries

Univariate statistics: mean, median, minimum, maximum, standard deviation, and number of missing values and plots
Nominal / categorical attributes: number of categories, counts, mode, number of missing values
Date attributes: minimum, maximum, number of missing values
Distribution plots
Bivariate statistics and plots: Covariance matrix
Correlation matrix
Anova matrix
Grouped Anova
Transition matrix
Transition graph
Mutual information matrix
Rainflow matrix
Scaled and non-scaled mean-deviation plots
Plots of attribute weights based on multiple types of connection with targets
Simple rescaling of axis
Plots can be easily copied and pasted into other applications or exported as in PNG, SVG, JPEG, EPS or PDF formats
Choose from a variety of different colour schemes

Select attributes operator
Aggregations for multiple groups and functions like sum, average, median, standard deviation, variance, count, least, mode, minimum, maximum, product, or log product
Set operators like join, merge, append, union, or intersect
Operators for handling meta data like rename or attribute role definition
Filtering rows / examples according to range, missing values, wrong or correct predictions, or specific attribute value
Filtering outliers according to distances, densities, local outlier factors, class outlier factors, local correlation integrals, or clustering based outlier detections
Identification and removal of duplicates

Normalization and standardization
Z-transformation, range transformation, proportion transformation, or interquartile ranges
Preprocessing models for applying the same transformations on test / scoring data
De-normalization making use of preprocessing models
Scaling by weights
All kinds of type conversions between numerical attributes, nominal / categorical attributes, and date attributes
Operator for guessing correct meta data from existing data sets
Adjustment of calendar dates and times
Sorting and Pareto sort
Shuffling
Rotations of data sets: Pivoting, De-Pivoting, and transposing data sets
Expression builder for arbitrary transformations on attributes: Statistical functions: round, floor, ceiling, average, minimum, maximum
Basic functions: addition, subtraction, multiplication, division, less than, greater than, less or equal, greater or equal, equal, not equal, Boolean not, Boolean and, Boolean or
Log and exponential functions: natural logarithm, logarithm base 10, logarithm dualis, exponential, power
Trigonometric functions: sine, cosine, tangent, arc sine, arc cosine, arc tangent, hyperbolic sine, hyperbolic cosine, hyperbolic tangent, inverse hyperbolic sine, inverse hyperbolic cosine, inverse hyperbolic tangent
Text functions: to string, to number, cut, concatenation, replace and replace all, lower, upper, index, length, character at, compare, contains, equals, starts with, ends with, matches, suffix, prefix, trim, escape HTML
Date functions: parse, parse with locale, arse custom, before, after, to string, to string with locale, to string with custom pattern, create current, difference, add, set, and get
Miscellaneous functions: if then-else, square root, signum, random, modulus, sum, binomial, missing binomial, missing

Attribute weighting
– 30+ weighting schemes measuring the influence of attributes & forming base or weight-based selections (filter approach)
Attribute selection
– Selection of attributes by user specification
– Removal of “useless” attributes
– Removal of attributes unrelated to target based on a chi-square or correlation-based selection criterion
– Removal of attributes unrelated to target based on arbitrary weighting schemes like information gain, Gini index, and others
– Removal attributes with missing values
– Selection of random attribute subsets
Automatic optimization of selections
– Evolutionary
– Forward selection
– Backward elimination
– Weight-guided
– Brute-force
Attribute space transformations
– Principal Component Analysis (PCA)
– Singular Value Decomposition
Support for Fast Map
Plots for principal components coefficients, Eigenvalues, and cumulative variance of Eigenvalues
Calculates Eigenvalues and Eigenvectors from correlation and covariance matrices
Choose the number of components to be retained
Independent component analysis (ICA)
Generalized Hebbian Algorithm (GHA)
Dimensionality reduction with Self- Organizing Maps (SOM)
Correspondence Analysis

Calculation of similarities between data points
Cross Distances operator computes similarities between data points of two data sets
Numerical distance measures
- Euclidean
- Camberra
- Chebychev
- Correlation
- Cosine
- Dice
- Dynamic Time Warping
- Inner product
- Jaccard
- Kernel-Euclidean
- Manhattan
- Max-Product
- Overlap

Nominal / categorical distance measures
- Nominal
- Dice
- Jaccard
- Kulczynski
- Rogers-Tanimoto
- Russel-Rao
- Simple Matching

Mixed Euclidean distance for cases with numerical & nominal attributes
- Bregman divergences
- Itakura-Saito
- Kullback-Leibler
- Logarithmic loss
- Logistic loss
- Mahalonobis
- Squared Euclidean
- Squared Loss

User defined clustering or automatically chooses the best clusters
Support Vector Clustering
Several strategies for encoding class into the clustering
k-Means (for all available distance and similarity measures)
k-Medoids (for all available distance and similarity measures)
Kernel k-Means
X-Means
Cobweb
Clope
DBScan
Expectation Maximization Clustering
Self-organizing maps
Agglomerative Clustering
Top Down Clustering
Operators for flattening hierarchical cluster models
Extraction of prototypes for centroid-based cluster models

Associations and sequence discovery
Measuring quality of rules by support, confidence, La Place, gain, ps-value, lift or conviction
Interactive filter for frequent item sets
Interactive visualization of association rules as a network graph
Rules description table
User defined rule filtering depending on minimum value for the above criteria or matching criteria for specific items
Faster than ever FP-Growth operator (similar to Apriori but far more efficient)
Generalized sequential patterns
Modular operators for the creation of frequent item sets or association rules only
Post-processing to unify of item sets
Application of association rules to deploy as a recommendation engine

Easy-to-understand models
Supported methods: classification and regression trees (CART), CHAID, decision stumps, ID3, C4.5, Random Forest, bagging and boosting
Support for multi-way trees
Gradient Boosted Trees (GBT)
Pre-pruning and pruning
Split criteria include information gain, gain ratio, accuracy, and Gini index
Error-based and confidence-based pruning
Distribution shown at tree leaves
Height of distribution bars correlate to number of examples in each leaf
Majority class shown at tree leaves
Class counts shown as tool tip at tree leaves
The darkness of connections correlates with the number of examples on this path
Graphical and textual representation of trees
Interactive visualization of trees including selecting and moving of nodes

Powerful and robust modeling techniques for large numbers of dimensions
Offers overfitting control by regularization
Especially suitable for modeling unstructured information like text data
More than 10 different methods for support vector classification, regression, and clustering
Support Vector Machine
Relevance vector machine
Linear, Evolutionary, PSO, Fast Large Margin, Hyper Hyper
Kernel functions include dot, radial basis function, polynomial, neural, Anova, Epachnenikov, Gaussian combination, or multiquadric
Simple support vector machines for boosting support
Linear-time support vector machine for fast training also for large numbers of dimensions and examples

RapidMiner Studio follows a stringent modular approach which prevents information used in pre-processing steps leaking from model training into the application of the model. This unique approach is the only guarantee that no overfitting is introduced and no overestimation of prediction performances can occur
Performance criteria for numerical and nominal / categorical targets, including:
- Accuracy
- Classification error
- Kappa
- Area under curve (AUC)
- Precision
- Recall
- Lift
- Fallout
- F-measure
- False positives
- False negatives
- True positives
- True negatives
- Sensitivity
- Specificity
- Youden index
- Positive predictive value
- Negative predictive value
- PSEP
- Correlation
- Spearman rho
- Kendall tau
- Squared correlation
- Absolute error
- Relative error
- Normalized absolute error
- Root mean squared error (RMSE)
- Root relative squared error (RRSE)
- Squared error
- Cross entropy
- Margin
- Soft margin loss
- Logistic loss
Calculating significance tests to determine if and which models performed better
- - T-test
  - Anova
Find threshold operator to determine optimal cutoff point for binominal classes
Performance estimation for cluster models based on distance calculations, density calculations, or item distributions
Embed pre-processing steps into the validation
Display multiple results in history to help better evaluate model performance
Various techniques for the estimation of model performance: Cross validation (with parallel execution of the folds)
Split validation
Bootstrapping
Batch cross validation
Wrapper cross validation
Wrapper split validation
Visual evaluation techniques
Lift chart
ROC curves
Confusion matrix

Operator for applying models to datasets (Scoring)
Support of predictive models, cluster models, preprocessing models, transformation models, and models for missing value imputations
Storing of models in central repositories for reuse in other processes and projects
Applying a model creates optimal scores by ignoring unused attributes and handling previously unseen values
Import and export of RapidMiner models, R models, and Weka models from repository or files
Support of PMML 3.2 and 4.0

Utility-like process control operations that lets you build processes that behave like a program to repeat and loop over tasks, branch flows and call on system resources
Execute multiple processes in parallel
Long-running processes can be run in the background, while continuing to work on other process in the foreground for faster and more effective development iterations.
Processes running in the background can be monitored. Results and logs can be reviewed once they are available
The maximum number of allowed processes running simultaneously can be configured to adapt to the hardware resources and the demand of the processes being executed. The default is the number of cores minus one

Write scripts for easy-to-complex data preparation and transformation tasks where existing operators might not be sufficient
Incorporate procedures from other processes or projects
Develop custom models
Augment scoring logic by custom post-processing or model application procedures
Easy-to-use program development interface: Predefined imports for common data structures
Syntactic sugar for simplified data access and alteration
Interactive code editor and syntax high-lighting
Execute command line programs and integrate results and result codes in processes
Execution of SQL statements directly in database
Seamless integration of the various programming languages into the RapidMiner Studio user interface: Execution of Groovy scripts within RapidMiner Studio processes
Execution of OS scripts within RapidMiner Studio processes
Execution of R scripts within RapidMiner Studio processes
Execution of Python scripts within RapidMiner Studio processes
Predefined scripted models & transformtions available as operators
Custom scripts can be stored and executed as own operators within a process

Organize segments in sub-processes and reuse them in different projects
Repeat execution over a segment of a process
Support for loops
- Loop (basic loop, with parallel execution of the iterations)
- Attributes (parallel execution of the iterations)
- Labels
- Subsets
- Values (parallel execution of the iterations)
- Examples
- Clusters
- Batches
- Data Sets
- Data Fractions
- Parameters
- Files (parallel execution of the iterations)
- Repository entries

Branches (if-then-else) based on:
- Data values
- Attribute existence
- Numbers of examples
- Performance values
- Existence of files and process inputs
- Definition of macros
- Arbitrary expressions

Creation of collections of the same type
Collection handling: selection, flattening, or looping
Remembering and recalling (intermediate) process results for complex process designs
Handling expected and unexpected errors and exceptions

Centralized definition of macros / variables containing arbitrary textual or numerical content
Usage of macros everywhere in the process design, especially as value for parameters
Macros can be defined during the process or in the process context
Definition of macros in the context allows for parameterization of complete processes, e.g. for transforming processes into customizable web services
Extraction of macro values from data values, meta data or statistics supported
Expression engine for calculating arbitrary macro values from existing macros

Logging can be introduced at arbitrary places within a process
Logging can collect parameter values, performance values, or specific values for each operator, e.g. the current generation for evolutionary algorithms
Data values can be logged
Macro values can be logged
Logged values can be transformed into several formats including: data sets and weights which can be stored, transformed, analyzed, or visualized like any other data set.
In cases where logging alone is not sufficient, a complete process-based reporting engine allows for the collection of arbitrary results in static reports
Different formats like PDF, Excel, HTML, or RTF supported
Different reporting styles including a sequential report or portals
Support of sections with up to 5 levels
Arbitrary process results as well as intermediate results can be transformed into different types of visualizations like tables, charts etc.
Support for page breaks and other style information
Combination with loops or other process control structures allows for highly-detailed result overviews even for complex process designs

System Requirements

Processor	3GHz or faster, Quad core (2 GHz, Dual core, minimum)
RAM	16GB RAM (4GB RAM, minimum)
Hard disk space	>100GB free disk space (>1GB free disk space, minimum)
Display	(1280×1024 pixel resolution, minimum)
Operating System	Windows 10 (64-bit highly recommended) Windows 8.1 Windows 8 Windows 7 Linux (64-bit only) MacOS X 10.10 – 10.15
Java Platform	64-bit, OpenJDK Java 8

Cart