Indonesian Journal of Geospatial
Vol 3, No 2 (2014)

Reconstructing Disrupted Water Level Records in A Tide Dominated Region Using Data Mining Technique

Hidayat, Hidayat ( Research Center for Limnology, Indonesian Institute of Sciences, Cibinong Science Centre)
Setiawan, Fajar ( Research Center for Limnology, Indonesian Institute of Sciences, Cibinong Science Centre)
Handoko, Unggul ( Research Center for Limnology, Indonesian Institute of Sciences, Cibinong Science Centre)



Article Info

Publish Date
06 Dec 2014

Abstract

Abstract. A continuous time-series of certain hydrographical data, such as water levels, is required for various purposes such as time series analysis to study system behaviour and to perform predictions. However, due to some technical failure or natural obstacles, disruptions of measurements may occur. Data gap filling technique is then required to obtain a reliable reconstructed continuous time-series. Linear regression is an example of the simplest technique in data gap filling for parameters that can be linearized. Most of hydrographical data, however, are highly non-linear. Therefore a more advanced techniques are required to complete the missing data. This paper discusses the application of data mining technique in obtaining a continuous water level data using the M5 model tree. The main idea of the M5 model tree machine-learning technique is that the algorithm splits the parameter space into subspaces and then builds a linear regression model for each subspaces. Therefore, the resulting model can be regarded as a modular model. This technique was applied to reconstruct a disrupted water level record of the Mahakam Delta, East Kalimantan, Indonesia. A datasets obtained during a measurement campaign in 2008-2009 were split into the training and validation sets. The model was trained using the three-hourly water level data from the Delta Apex and Tenggarong measurement stations. Water level records show the semi-diurnal character of tides in the region, and that the tides are still dominant in the upstream area at the Tenggarong station located about 40 km from the Delta Apex. Four previous time-step data from the Tenggarong station were included as input to the model to cover the time lag of tide propagation between the two stations. Nash–Sutcliffe coefficient of Efficiency were used to evaluate the model. Nine model rules (using smoothed linear models) were obtained from the training of the M5 model tree, which are executed sequentially until suitable conditions are matched. Validation shows that M5 model tree can satisfactorily be applied as an alternative tool for water level data gap filling in the tide dominated region. Keyword: data mining, hydrographical data, water levels, time-series

Copyrights © 2014