Name: Earthquake prediction
Challenge: Predicting when an earthquake will occur from real-time seismic data
Solution: Develop a machine learning model that could predict the occurrence of an earthquake based on the latest acoustic sensory data
Technologies Used: Python, Scikit-learn, Linear Regression Models
Earthquakes may not be preventable, however, their casualties and destruction may be reduced if they are forecasted in time. In this case study we attempted to forecast the timing of laboratory synthesized earthquakes. As part of this project, we were provided a training dataset containing a continuous stream of experimental seismic sensory data. Our dataset included time series of seismic signal strength which was timestamped by the time remaining to the next synthesized earthquake. Our goal was to predict the time remaining to the next earthquake in test set.
Due to the time series nature of this problem we encountered two fundamental challenges in this project. First, the training set we were provided was a record of a continuous stream of seismic signals over a very long span of time. Whereas, the test set consisted of much smaller segments of seismic signals that occurred during a much shorter span of time. Adjusting for this difference proved to be an important breakthrough in this study. Second, the small number of features posed a challenge in correlating the time of an earthquake to the seismic signal strength. Each data point in our dataset consisted of only two features, a time where the signal strength was taken, and a value which indicates the strength of seismic signal. These two data points alone proved to be not correlated to the timing of an earthquake.
A major breakthrough resulted from segmenting the training set into smaller intervals to match the time interval of test data. This approach normalized the training and test datasets and provided the grounds for engineering a new set of features for each segment. Our results continued to improve by engineering new features. The engineered features consisted of mean, standard deviation, min, max, quantile, trend and the rolling metrics that captured the time series effect of these signals.
We evaluated a number of models to develop our predictive engine. We explored the results generated through Random Forest, SVM, LGBM, XGBoost and LSTM models. After performing gradient search and tuning the models for performance we were able to achieve our best results by stacking a number of them simultaneously into our predictive engine.
To reduce the destruction and the casualties of earthquakes scientists need to predict the timing, location and magnitude of a potential earthquake. In this experiment we tried to predict the timing of an earthquake using laboratory simulated earthquakes. We were able to achieve an acceptable level of accuracy for near immediate earthquakes due to the consistency of signal patterns just before an earthquake. However, our model failed to accurately predict earthquakes well in advance. We believe additional input features and larger datasets may be required to develop models capable of making accurate predictions long before a strike.
We hope in the near future similar studies in conjunction with studies focused on prediction of the location and magnitude of potential earthquakes will draw a holistic framework for forecasting real world earthquakes and save lives.