Skip to content
Go back

Geomagnetic Signal Analysis Based Classification of Earthquake Magnitudes

Updated:
Edit page

Table of contents

Open Table of contents

Introduction

In this paper, we explore the effect of earthquakes which can be occurred in any place of the world on geomagnetic waves during a day. To achieve this goal, all magnetic data which is recorded in the same station should be organized based on the maximum earthquake in a day.

According to the studies, there is not a day in a world that the earthquake does not happen. Indeed, hundreds of earthquakes in any place of the earth are rising continuously. Through this fact and using various feature extraction methods like Mel Frequency Cepstral Coefficient (MFCC) and Continues Wavelet Transform (CWT) relevant discriminant features are extracted from the event signal.

By using these features and give them to dimension reduction algorithms like Linear Discrimination Analysis (LDA) and Principle Component Analysis (PCA) separately, the classification system was built based on the K- Nearest Neighbor (KNN) and Simple Vector Machine (SVM) methods. Classification results on real geomagnetic data indicate that the unique structure of the geomagnetic waves with the effect of the big earthquakes can be detected and classified in high accuracy.

Methodology

The method is comprised of three stages, namely, i. feature extraction methods, ii. dimension reduction algorithm iii. apply classification approaches. Each stage is explained in more detail below.

Feature Extraction Methods

1. Mel Frequency Ceptral Coefficient

Mel Frequency Cepstral Coefficient (MFCC) is a popular method for extracting features from audio signals, especially in speech and sound analysis. Here’s a short summary of the process:

This method effectively converts a time-domain audio signal into a compact set of frequency-domain features.

2.Continues Wavelet Transform

The Wavelet Transform (WT) is a powerful tool for analyzing non-stationary signals, as it captures both time and frequency characteristics. Unlike traditional Fourier analysis, which provides only frequency information, WT allows us to see how the frequency content of a signal changes over time. It does this by convolving the signal with a set of wavelet functions—short, oscillating filters—at various scales and time positions. These wavelets represent different frequency bands, and by adjusting the scale, the transform can focus on high-frequency (small scale) or low-frequency (large scale) features.

The most commonly used wavelets for spectral analysis are the Morlet or Gabor wavelets, which have a Gaussian envelope. The result of this process is a set of wavelet coefficients, which indicate how much of the wavelet is present in the signal at a given time and scale. This produces a time-scale representation that helps in detecting localized features in the signal. The Continuous Wavelet Transform (CWT), defined as an integral of the product of the signal and scaled wavelet functions, is especially useful for applications where frequency components vary over time, such as in seismic, audio, or geomagnetic signal analysis.
The integral of the function is given by ( \int x(t), dt ).

Dimension Reduction Methods

Here two kinds of dimension reduction approaches which are namely, i. Principle component analysis and ii. Linear Discrimination Algorithm are explained in more details.

1. Principal Component Analysis

PCA is a technique used to reduce the number of variables in a dataset while keeping the most important information. It works by transforming the original variables into a new set of uncorrelated variables called principal components, which are sorted by how much variance (information) they capture.

PCA uses eigenvalues and eigenvectors from the covariance matrix of the data to find the most important directions components in the data.

After PCA, you can reduce thousands of features into just a few and use them for classification or other analysis.

Classification of Magnetic Features

1. Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm using linear or non-linear kernels for classification or regression challenges which has effective performance in high dimensions.

The goal of this algorithm is to find optimal separating hyperplane between classes by focusing on the training cases which maximize the margin (distance) between classes and minimize the error

2. K Nearest Neighbor

It is a kind of simple classification algorithm and has a low error rate. At the beginning of the algorithm, a train data set with accurate classification labels should be known. Then for a qi , whose label has not defined the distances between it and every point in the train data set should be calculated. After sorting the results of distances, the decision of the class label of the test point qi can be made according to the label of the k nearest points in the train data set [33]. The distance between two points can be defined in many ways by Using Euclidean which is defined in equation.

def knn_classify(test_point, training_data, k, distance_fn):
    """
    test_point: the unknown point to classify
    training_data: list of tuples (x_i, y_i)
    k: number of neighbors
    distance_fn: function to compute distance between two points
    """
    distances = []

    # Step 1: Compute distance from test_point to all training points
    for x_i, y_i in training_data:
        d = distance_fn(test_point, x_i)
        distances.append((d, y_i))

    # Step 2: Sort distances
    distances.sort(key=lambda tup: tup[0])

    # Step 3: Select K nearest neighbors
    k_nearest = distances[:k]

    # Step 4: Count class votes
    from collections import Counter
    labels = [label for _, label in k_nearest]
    majority_vote = Counter(labels).most_common(1)[0][0]

    return majority_vote

The choice of parameter K (neighbor) is very important and could affect the classification results. For K parameter value it should not be the multiple of the number of the classes and we should choose an odd value for k in binary class problems. If the dimension of the data set increased, the time complexity of this algorithm will be increased.

Results and Duscussions

In this part, we need to prepare our data which is recognized as magnetic data to use in the implementation process. Here you can find requirement steps in more detail to prepare data.

Catalog Preparation

At first step, we need to prepare the Earthquake catalog which has earthquakes information. To achieve this goal, earthquakes as large as 2.5 and more between 2007 and 2018 years which are recorded all over the world are downloaded from this website: https://earthquake.usgs.gov/earthquakes/search/. In this excel file, all information about selected earthquakes including location, longitude, latitude, time, magnetic, depth, and some other extra information exist.
According to the data reviews, we noticed that in each hour of a day hundreds of earthquakes are happened in all over the world. So, data should be organized in a comprehensible structure. For this reason, each day maximum earthquake information is kept and the final catalog built. Now, the created catalog consist of each day maximum earthquake information between 2007 and 2018 years.

DateTimeLatitudeLongitudeDepth (km)MagnitudePlace
2007-01-0100:01:2110.981-85.32540.44.2Costa Rica

In addition, we count the number of earthquakes based on their maximum magnitude as you can find in Table 4.2. It means that the number of 5 Richter earthquakes which is a maximum earthquake between 2007 and 2017 years is 111. The total number of earthquakes is 4383.

MagnitudeCountMagnitudeCountMagnitudeCountMagnitudeCount
4.714.8184.8714.956
5.01115.0615.12005.2289
5.2915.33365.3515.361
5.43375.53765.63575.7315
5.7115.83095.8815.9287
6.02606.12136.21426.3153
6.41066.5916.6716.761
6.8516.9547.0337.130
7.2217.3217.4127.514
7.697.7107.8127.96
8.028.138.238.32
8.418.618.819.11

Total Earthquakes: 4383

Geomagnetic Data Preparation

Iznik-Turkey station is selected to downloaded geomagnetic data based on the catalog. In this station, data recorded in each minute in three different dimensions North (X), East (Y), and Z. It means that each dimension has 1440 data points in a day 60(minutes) × 24(hours) as you can find in Figure below.

Free Classic wooden desk with writing materials, vintage clock, and a leather bag. Stock Photo
Geomagnetic Signal of One Day in Three Dimension
The reason for the selection years between 2007 and 2017 is that Iznik station has started its working from 2007. In the last step of this stage, data is divided into two classes.
ClassThresholdTotal
1< 5.41016
2> 6.1911

Performance Analysis of Algorithms

Here, four different scenarios are defined in order to evaluate the SVM and KNN classification algorithm performance. Feature extraction and dimension reduction in different situations apply to geomagnetic data through MFCC, CWT, and PCA, LDA respectively. The result of each scenario will be described in more detail in each part separately.

1. First Scenario Results

In this strategy, we concatenate three axes of one-day data together before applying dimension reduction methods. Finally, the result of SVM and KNN classification methods will be analyzed in different cases which are indicated in Table 4.6 and Table 4.7 based on Figure 4.4:

Free Classic wooden desk with writing materials, vintage clock, and a leather bag. Stock Photo
First Scenario Flowchart
Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
SVM Results after PCA

SVM Results After LDA
Solver=’SVD’

SVM Results After LDA
Solver=’Eigen’

SVM Results After PCA and LDA
Solver=’SVD’

SVM Results After PCA and LDA
Solver=’Eigen’
Polynomial 47.28 47.28 98.39 52.68 47.28
RBF 52.72 98.46 64.94 46.43 51.34
Sigmoid 54.82 98.89 52.72 49.02 50.50
Table: SVM Mean of Accuracy in a First Scenario
Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
KNN Results

KNN Results After LDA
Solver=’SVD’

KNN Results After LDA
Solver=’Eigen’

KNN Results After PCA
Solver=’SVD’

KNN Results After PCA and LDA
Solver=’Eigen’
Auto 25.64 98.60 61.14 52.42 49.20
Ball Tree 25.64 98.60 61.14 53.92 49.20
Kd Tree 26.27 98.70 60.98 53.45 49.30
Brute 25.64 98.60 61.14 53.92 49.20
Table: KNN Accuracy in a First Scenario

As you can find in Table the result of the SVM algorithm after applying LDA Algorithm with SVD Solver, provides the best outputs in all kernels especially in Sigmoid kernel.
According to KNN classification results, the best accuracy is related to a situation when LDA with SVD solver applied on raw data.

2. Second Scenario Results

In this scenario, dimension reduction methods are applied on each axis separately and then concatenating together in order to use as an input for SVM and KNN classification algorithms.

Free Classic wooden desk with writing materials, vintage clock, and a leather bag. Stock Photo
Second Scenario Flowchart
Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
SVM Results

SVM Results After LDA
Solver=’SVD’

SVM Results After LDA
Solver=’Eigen’

SVM Results After PCA
Solver=’SVD’

SVM Results After PCA and LDA
Solver=’Eigen’
Polynomial 47.28 62.70 100.00 38.92 43.61
RBF 52.72 99.99 100.00 94.13 83.45
Sigmoid 96.25 99.99 52.72 95.83 90.37
Table: SVM Mean of Accuracy in a Second Scenario
Case 1 Case 2 Case 3 Case 4 Case 5
Kernal
KNN Results

KNN Results After LDA
Solver=’SVD’

KNN Results After LDA
Solver=’Eigen’

KNN Results After PCA and LDA
Solver=’SVD’

KNN Results After PCA and LDA
Solver=’Eigen’
Auto 94.04 100.00 100.00 94.40 91.78
Ball Tree 94.04 100.00 100.00 94.40 91.78
Kd Tree 94.04 100.00 100.00 94.45 90.80
Brute 94.04 100.00 100.00 94.45 91.78
Table: KNN Accuracy in a Second Scenario

• In SVM Classification methods After LDA with solver Eigen and Polynomial and RBF kernels have high accuracy. • Generally, KNN indicates high performance

3. Third Scenario Results

In this scenario, extracted features through the MFCC approach are analyzed in various cases. Selected features could be controlled by n_mffc parameter.

Free Classic wooden desk with writing materials, vintage clock, and a leather bag. Stock Photo
Third Scenario Flowchart
Case 0 Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
SVM Results

SVM Results After PCA

SVM Results After LDA
Solver=’SVD’

SVM Results After LDA
Solver=’Eigen’

SVM Results After PCA and LDA
Solver=’Eigen’

SVM Results After PCA and LDA
Solver=’Eigen’
Polynomial 92.69 65.38 40.21 93.43 41.36 52.72
RBF 79.48 65.44 93.05 93.55 93.88 92.15
Sigmoid 52.72 67.00 95.69 52.72 95.64 92.26
Table 3.1: SVM Mean of Accuracy in a Third Scenario
Case 0 Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
KNN Results

KNN Results After PCA

KNN Results After LDA
Solver=’SVD’

KNN Results After LDA
Solver=’Eigen’

KNN Results After PCA and LDA
Solver=’SVD’

KNN Results After PCA and LDA
Solver=’Eigen’
Auto 86.44 89.77 92.75 94.72 94.72 93.05
Ball Tree 86.44 89.77 92.75 94.72 94.72 93.05
Kd Tree 85.35 89.67 92.64 94.41 94.40 93.00
Brute 86.44 89.77 92.75 94.72 94.72 93.05
Table: KNN Accuracy in a Second Scenario

• According to Table 4.13 SVM efficiency after LDA with SVD solver and Sigmoid kernel is in its high accuracy. We notice that after PCA and LDA SVM accuracy by setting RBF and Sigmoid kernels increased.

4. Fourth Scenario Results

In this scenario, after applying the CWT feature extraction algorithm on geomagnetic data, it is utilized as an input for classification methods after passing through dimension reduction methods.

Free Classic wooden desk with writing materials, vintage clock, and a leather bag. Stock Photo
Fourth Scenario Flowchart
Case 0 Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
SVM Results

SVM Results After PCA

SVM Results After LDA
Solver=’SVD’

SVM Results After LDA
Solver=’Eigen’

SVM Results After PCA and LDA
Solver=’Eigen’

SVM Results After PCA and LDA
Solver=’Eigen’
Polynomial 92.78 47.28 96.83 47.28 38.61 57.48
RBF 84.85 52.72 99.61 93.99 93.17 79.64
Sigmoid 52.72 96.77 100.00 52.72 96.07 85.56
Table 3.1: SVM Mean of Accuracy in a Third Scenario
Case 0 Case 1 Case 2 Case 3 Case 4 Case 5
Kernel
KNN Results

KNN Results After PCA

KNN Results After LDA
Solver=’SVD’

KNN Results After LDA
Solver=’Eigen’

KNN Results After PCA and LDA
Solver=’SVD’

KNN Results After PCA and LDA
Solver=’Eigen’
Auto 94.61 94.14 100.00 94.72 93.05 84.22
Ball Tree 94.61 94.14 100.00 94.72 93.05 84.22
Kd Tree 94.71 94.30 100.00 94.41 92.53 83.19
Brute 94.61 94.14 100.00 94.72 93.05 84.22
Table: KNN Accuracy in a Second Scenario

. SVM algorithm after LDA with SVD solver in all kernels produces high outputs as you can see in Table 4.16. After applying PCA, in sigmoid kernels, accuracy is increased. • KNN classification in all kernels has high accuracy especially after LDA with SVD Solver.

Comparison Results of SVM and KNN Classification Methods


Edit page
Share this post on:

Previous Post
BigQuery Service Module for VOIP Data Management and Reporting
Next Post
FastAPI-based web application