create.fused {StatMatch}R Documentation

Creates a matched (synthetic) dataset

Description

Creates a synthetic data frame after the statistical matching of two data sources at micro level.

Usage

create.fused(data.rec, data.don, mtc.ids, 
                z.vars, dup.x=FALSE, match.vars=NULL)  

Arguments

data.rec A matrix or data frame that has been used as recipient in the statistical matching application.
data.don A matrix or data frame that has been used as the donor in the statistical matching application.
mtc.ids A matrix with two columns. Each row must contain the name or the index of the recipient record (row) in data.don and the name or the index of the corresponding donor record (row) in data.don. Note that this type of matrix is returned by the functions NND.hotdeck and RANDwNND.hotdeck.
z.vars A character vector with the name of the variables available only in data.don that should be “donated” to data.rec.
dup.x Logical. When TRUE the values of the matching variables in data.don are also “donated” to data.rec. The name of the matching variables has to be specified with the argument match.vars. To avoid confusion, the matching variables added to data.rec are renamed by adding the suffix “don”. By default dup.x=FALSE.
match.vars A character vector with the names of the matching variables. It has to be specified only when dup.x=TRUE.

Details

This function allows to create the synthetic (or fused) data set after the application of a statistical matching in a micro framework. For details D'Orazio et al. (2006).

Value

The data.rec data frame with the z.vars filled in and, when dup.x=TRUE, with the values of the matching variables for the donor records.

Author(s)

Marcello D'Orazio madorazi@istat.it

References

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

See Also

NNN.hotdeck RANDwNNN.hotdeck

Examples


lab <- c(1:15, 51:65, 101:115)
iris.rec <- iris[lab, c(1:3,5)]  # recipient data.frame
iris.don <- iris[-lab, c(1:2,4:5)] # donor data.frame

# Now iris.rec and iris.don have the variables
# "Sepal.Length", "Sepal.Width"  and "Species"
# in common.
#  "Petal.Length" is available only in iris.rec
#  "Petal.Width"  is available only in iris.don

# find the closest donors using NND hot deck;
# distances are computed on "Sepal.Length" and "Sepal.Width"

out.NND <- NND.hotdeck(data.rec=iris.rec, data.don=iris.don,
            match.vars=c("Sepal.Length", "Sepal.Width"), don.class="Species")

# create synthetic data.set, without the duplication of the matching variables

fused.0 <- create.fused(data.rec=iris.rec, data.don=iris.don, 
            mtc.ids=out.NND$mtc.ids, z.vars="Petal.Width")

# create synthetic data.set, with the "duplication" of the matching variables

fused.1 <- create.fused(data.rec=iris.rec, data.don=iris.don,
            mtc.ids=out.NND$mtc.ids, z.vars="Petal.Width", dup.x=TRUE,
            match.vars=c("Sepal.Length", "Sepal.Width"))

[Package StatMatch version 0.6 Index]