Datasets

These are some of the datasets used in the Symbolic Regression book

Friedman

Sample of 100 rows from the Friedman equation. Target variable is y, input variables are x1 to x10. f ( x ) = 10 sin ( π x 1 x 2 ) + 20 ( x 3 - 1 2 ) 2 + 10 x 4 + 5 x 5

Boston Housing

Preprocessed Boston housing dataset (taken from: StatLib) Target variable is log(CMEDV) or NOX.

Yacht

Yacht dataset from Gerritsma, Onnink, and Versluis, "Geometry, Resistance and Stability of The Delft Systematic Yacht Hull Series, International Shipbuilding Progress, Vol. 28, No. 328, Dec. 1981.
(taken from UCI Machine Learning Repository)

Friction coefficient (static and dynamic)

Predict mu_dyn_avg (dynamic friction coefficient from Exp (categorical), pressure, velocity, and initial temperature (T_0)).
Predict mu_stat (static friction coefficient from Exp (categorical), pressure, and initial temperature (T_0)).

(data provided by Miba frictec GmbH)

Water Level Prediction

Preprocessed water level dataset including features for harmonics (original source: NOAA, Station S452634 Elfin Cove, AK).

Battery: State of charge (dataset 1)

Preprocessed dataset containing data from four cells (original source: NASA Ames Research Center)
Citation: B. Saha and K. Goebel (2007). “Battery Data Set”, NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA.
Use data from B0005, B0006, B0007 for training and B0018 for testing. Predict the remaining discharge time from the inital capacity, number of discharge cycles and current voltage.

Battery: State of charge (dataset 2)

Preprocessed dataset containing data from multiple cells (original source: NASA Ames Research Center)
Citation: B. Saha and K. Goebel (2007). “Battery Data Set”, NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA.
Data for multiple cells with different discharge currents. Find a model that predicts the remaining discharge time from the inital capacity, number of discharge cycles, current voltage, and discharge current.

Battery: State of charge (dataset 3)

Preprocessed dataset containing data from multiple cells (original source: NASA Ames Research Center)
Citation: B. Saha and K. Goebel (2007). “Battery Data Set”, NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA.
Uses only a single cell (to reduce amount of data). Find a regression model structure that can be fit to the first part of the voltage curve and predict the remaining curve until end of discharge.