|
|
Miner Feature List
What's New in Miner
Association Rules
Association Rules help uncover relationships between variables in large
data sets, most commonly to analyze customer behavior such as purchasing
patterns (Market Basket Analysis), but also in many other areas, such as
web site usage.
Spotfire
Integration
In addition to new nodes to read and write Spotfire Text Data files,
Miner now provides examples of using Spotfire Professional to visualize,
explore & share model results. The Spotfire platform makes it easier to
communicate results with business professionals in your organization and
to provide them the tools they need to make better decisions based on
the insights from your models.
New Deployment
and Integration Options
Custom Java & C++ nodes: Spotfire Miner now supports custom nodes
written using Java and C++.
Remote Script Execution:
Leveraging the new functionality in Spotfire S+, Miner workflows can
execute S+ scripts remotely on Statistics Services to offload and
distribute intensive jobs. The Spotfire Miner workflow interface
provides a convenient way to organize and track the progress of these
jobs.
Global Worksheet Parameters: Spotfire Miner users can now set
global worksheet parameters as a property of a workflow. These
parameters can be accessed by interactive and batch applications, and
open up new flexibility and reusability for workflows.
Data
Import/Export and Data Preparation Enhancements
New Data File Types: Spotfire Miner provides nodes to access new
data formats, including Spotfire Text Data, Microsoft Excel 2007,
Microsoft Access 2007, and Matlab 7 data files.
JDBC: Spotfire Miner an also import and export to JDBC using the
sjdbc library, opening up many new data sources for analysis.
Recode Values Node: Handling and preparing data from multiple
sources is now easier, allowing you to change the values in a column to
a new value, including renaming the levels of a categorical variable.
Improved graphics tools such as a new trellis hexbin plot
and hexbin matrix, and the ability to create charts without
the need for sampling.
Extended file format support including support for 64-bit
SAS® and compressed SAS and new report and graphics output
formats
The S+ Script Node and over 20 charting nodes are now included,
no separate license of S+ required
Miner Feature List
Visual Workflow Environment
- Create self-documenting visual programs
- Intuitive drag-and-drop interface
- Link nodes together to describe analytic process
- On-screen annotations
- Node-level change-tracking for multi-user collaboration
- Visual confirmation of validity and caching
- Save and share worksheets as templates for best practices
- Export worksheet image to a file
Data Access (Input and Outpu)
- Delimited ASCII files
- Fixed format ASCII
- Data dictionary support
- SAS®, SPSS®, Excel® & many other flat file
formats
- ODBC access to compliant databases (Windows®)
- Native access Oracle®, DB2, Microsoft® SQL Server,
Sybase
Data Manipulation
- Powerful sampling, including stratified methods
- Row: Aggregate, Append, Filter, Partition, Sample, Shuffle,
Sort, Stack, and Unstack
- Column: Bin, Create, Filter, Join, Modify, Reorder, Transpose
and Normalize
- Automatically bin continuous variables
- Continuous, date, categorical and string data types
- Create or modify columns and filter rows using powerful expression
language
Data Cleaning
- Detect and repair missing values with variance-preserving
methods
- Detect duplicates
- Missing value handling: drop, replace, impute and last observation
carried forward
- Detect multi-dimensional outliers with leading-edge robust
methods
Exploratory Data Analysis and Visualization
- Trellis graphics quickly show structure of high-dimension
data
- Univariate descriptive statistics, plus Correlation and Covariance
calculations
- Table views and Visual Crosstabs rapidly slice and dice data
- Compare datasets for validation purposes
- 1-D Charts: Pie, Bar, Column, Dot, Histogram, Boxplot
- 2-D Charts: Scatterplot, Boxplot, Strip plot, Quantile-Quantile,
Density
- Hexagonal Binning chart to view relationships between variables
of very large data sets
- 3-D Charts: Contour, Level plot, Surface plot, Cloud plot
- Multivariate charts: Multiple 2-D plot, Scatterplot matrix,
Hexbin Matrix, Parallel plot
- Time series charts: Line plot, High-Low plot, Stacked Bar
plot
Model Types, Algorithms and Visualizers
- Prediction and classification outcome models with basic and
advanced model options
- Highly scalable algorithms: train models on very large data
sets without the need for sampling or aggregation
- Decision trees for classification and regression with single-tree
or ensemble techniques using Block Model Averaging; K-Fold
cross-validation, plus Gini and Entropy splitting rules
- Linear and logistic regression implemented as QR decomposition
with Householder transformations
- Neural Networks with Multi-layer perceptrons
- Neural Network training methods: Resilient Propagation, Quick
Propagation, Delta-Bar-Delta, Conjugate Gradient, and Online
methods.
- Neural Networks: up to three hidden layers with user-specified
number of nodes per layer
- Interactive Neural Network visualizer allows real-time control
over learning process
- Naïve Bayes Classifier
- Principal components analysis
- Cox Proportional Hazard models for censored data with time-varying
covariates
- Customer segmentation models with K-Means Clustering
- Collapsible tree viewer with interactive dendrogram
- Assess models with gain charts, lift charts, ROC charts and
agreement matrices
- Variable importance tool for selection of the most significant
variables
- Automatic calculation of dummy variable and interaction columns
Scalability
- All components operate out-of memory and in-memory
- Unique "Pipeline Architecture" moves data in blocks
through processing components
- Classical incremental techniques
- Block Model Averaging techniques
- Tailor size of blocks to optimize use of computing resources
- Automatic and manual control of caching to balance quick response
with massive scalability
Extensibility
- Compound nodes: create an entire process within a single node
- Create new nodes using S programming language
- Complete access to all S-PLUS 8 Enterprise Developer functions
and libraries through S programming language
- Create custom predictive models, charts and reports
- Create and share user libraries of custom nodes
- Manage multiple custom libraries
Deployment and Scoring
- Web-ready graphical reports
- HTML. PDF, PostScript and RTF model summary exports
- Non-interactive batch execution of all components*
- Model ports support automatically-updating scoring components
- Score custom predictive models created using S-PLUS on very
large databases
- Predictive Model Markup Language (PMML) model import and export
- Generate C code for run-time model scoring*
Note: * Requires Statistics Services
|
RELATED LINKS
|