pmml.rpart {pmml} | R Documentation |
Generate the PMML (Predictive Model Markup Language) representation of an rpart object (classification tree). The rpart object (currently expected to be a classification tree) is converted into a PMML representation. The PMML can then be imported into other systems that accept PMML.
## S3 method for class 'rpart': pmml(model, model.name="RPart_Model", app.name="Rattle/PMML", description="RPart Decision Tree Model", copyright=NULL, transforms=NULL, ...)
model |
an rpart object. |
model.name |
a name to give to the model in the PMML. |
app.name |
the name of the application that generated the PMML. |
description |
a descriptive text for the header of the PMML. |
copyright |
the copyright notice for the model. |
transforms |
a coded list of transforms performed. |
... |
further arguments passed to or from other methods. |
The generated PMML can be imported into any PMML consuming application, such as Teradata Warehouse Miner and DB2. Generally, these applications convert the PMML into SQL for execution across a database.
Teradata, for example, generates a single SELECT statement to implement a decision tree. In the Examples section below, we use the rpart example to build a model stored in the variable fit. A segment of the PMML for this model is:
<Node score="absent" recordCount="81"> <True/> <Node score="absent" recordCount="62"> <SimplePredicate field="Start" operator="greaterOrEqual" value="8.5"/> <Node score="absent" recordCount="29"> <SimplePredicate field="Start" operator="greaterOrEqual" value="14.5"/> </Node> <Node score="absent" recordCount="33"> <SimplePredicate field="Start" operator="lessThan" value="14.5"/> <Node score="absent" recordCount="12"> <SimplePredicate field="Age" operator="lessThan" value="55"/> </Node> <Node score="absent" recordCount="21"> <SimplePredicate field="Age" operator="greaterOrEqual" value="55"/> <Node score="absent" recordCount="14"> <SimplePredicate field="Age" operator="greaterOrEqual" value="111"/> </Node> <Node score="present" recordCount="7"> <SimplePredicate field="Age" operator="lessThan" value="111"/> </Node> </Node> </Node> </Node> <Node score="present" recordCount="19"> <SimplePredicate field="Start" operator="lessThan" value="8.5"/> </Node> </Node>
The resulting SQL from Teradata includes:
CREATE TABLE "MyScores" AS ( SELECT "UserID", (CASE WHEN _node = 0 THEN 'absent' WHEN _node = 1 THEN 'absent' WHEN _node = 2 THEN 'absent' WHEN _node = 3 THEN 'present' WHEN _node = 4 THEN 'present' ELSE NULL END) (VARCHAR(8)) AS "Kyphosis" FROM (SELECT "UserID", (CASE WHEN ("Start" >= 8.5) AND ("Start" >= 14.5) THEN 0 WHEN ("Start" >= 8.5) AND ("Start" < 14.5) AND ("Age" < 55) THEN 1 WHEN ("Start" >= 8.5) AND ("Start" < 14.5) AND ("Age" >= 55) AND ("Age" >= 111) THEN 2 WHEN ("Start" >= 8.5) AND ("Start" < 14.5) AND ("Age" >= 55) AND ("Age" < 111) THEN 3 WHEN ("Start" < 8.5) THEN 4 ELSE -1 END) AS _node FROM "MyData" WHERE _node IS NOT NULL) A WHERE "Kyphosis" IS NOT NULL) WITH DATA UNIQUE PRIMARY INDEX ("UserID");
Package home page: http://rattle.togaware.com
PMML home page: http://www.dmg.org
Zementis' useful PMML convert: http://www.zementis.com/pmml_converters.htm
pmml
.
library(rpart) (iris.rpart <- rpart(Species ~ ., data=iris)) pmml(iris.rpart)