Automatic identification of variables in epidemiological datasets using logic regression

Matthias W. Lorenz,Negin Ashtiani Abdi,Frank Scheckenbach,Anja Pflug,Alpaslan Bülbül,Alberico L. Catapano,Stefan Agewall,Marat Ezhov,Michiel L. Bots,Stefan Kiechl,Andreas Orth

Automatic identification of variables in epidemiological datasets using logic regression

2017

Matthias W. Lorenz
Negin Ashtiani Abdi
Frank Scheckenbach
Anja Pflug
Alpaslan Bülbül
Alberico L. Catapano
Stefan Agewall
Marat Ezhov
Michiel L. Bots
Stefan Kiechl
Andreas Orth

Background For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable.

Keywords:

Data mining
Software
Logistic regression
Workload
Individual participant data
Data management
Health informatics
Computer science
Data quality
Meta-analysis
Statistics
Backup
Predictive value of tests

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations