Method

This document gives a comprehensive overview of Grond’s methodical background. It describes how the objective function and the data weighting are defined, how the optimisation algorithm works, and how model uncertainties are estimated.

The very core of any optimisation is the evaluation of an objective function or misfit value between observed and predicted data. This is most often based on the difference

\[{\bf d}_{\mathrm{obs}} - {\bf d}_{\mathrm{synth}},\]

but can also be any other comparison, like a correlation measure for instance.

Observed data \({\bf d}_{\mathrm{obs}}\) are post-processed data (or features) derived from the raw measurements. For example, in the context of seismic source inversion, seismic waveforms are tapered to seismic phases of interest, restituted to displacement and filtered. Predicted data \({\bf d}_{\mathrm{synth}}\) are then modelled seismograms which are tapered and filtered in the same way as the observed waveforms.

Forward modelling with pre-calculated Green’s functions

The forward modelling of raw synthetic data \({\bf d}_{\mathrm{raw, synth}}\) for earthquake source models requires the calculation of the Green’s function (GF) between all source points and receiver positions involved, based on a medium (velocity) model. In the general earthquake source problem, the positions of the sources change during the optimisation because the misfit is calculated for many different source-receiver configurations. The calculation of the GFs for each specific source-receiver pair is computationally costly. Therefore, Grond uses pre-calculated GFs, stored in a database called Pyrocko GF store. Such GF stores can be created with the Fomosto GF management tool of Pyrocko.

Different applications need different types of GF stores. For the purpose of forward modelling in Grond, we have to distinguish between

  1. GFs for dynamic seismic waveforms and corresponding ray attributes, and

  2. GFs for static near-field displacement.

Ready-to-use GF stores can be found in our online repository. Global GFs for several standard earth models, like e.g. the global 1D PREM model are available, as well as regional distance GFs for many profiles of the CRUST 2.0 earth model database. Custom GF stores can be created using the Fomosto tool and an appropriate choice of the numerical method to calculate the GFs (Fomosto back end).

For more details on GF stores, see the Pyrocko documentation.

GFs for seismic waveforms

For regional data analyses the QSEIS method for layered media by Wang et al. (1999) is appropriate. For global forward models the QSSP method also by Wang et al. (2017) is more suited.

GFs for static near-field displacements (measured by using GNSS or InSAR)

For the calculation of purely static coseismic surface displacements the use of the PSGRN/PSCMP method by Wang et al. (2006) is suggested for fast forward modelling.

Objective function design

The objective function (or misfit function) gives a scalar misfit value which describes how well the source model fits the observed data. A smaller misfit value is better than a large one. The global minimum of the objective function` represents the best-fitting source model. The objective function defines how different observed data sets are combined, which difference measure is applied, and how data uncertainties are considered.

alternate text

Figure 1: Concept of Grond’s objective function design. The objective function combines different data sets (e.g., waveform, satellite and GNSS) with different weighting schemes to balance their influence. It also integrates bootstrap weights that are needed to assess uncertainties. Details on how each target and weight vector is formed are described in the sections below.

Misfit calculation and objective function

The measure of disagreement between observed and synthetic data is generally based on a data-point-wise difference:

\[|{\bf d}_{\mathrm{obs}} - {\bf d}_{\mathrm{synth}}|.\]

Grond supports different seismological observations and a combination of those, thus \({\bf d}_{\mathrm{obs}}\) and \({\bf d}_{\mathrm{synth}}\) can be:

  • Seismic waveforms
    • in time domain

    • in spectral domain

    • in logarithmic spectral domain

    • trace’s spectral ratios

    • first-motion polarities

    • arrival time picks

    • cross-correlation of time domain waveforms (In this case, the misfit calculation is not a typical data-point-wise measure.)

  • Static surface displacements
    • from unwrapped InSAR images

    • from pixel offsets

    • measured by using GNSS sensors

This framework is open for the implementation of additional methods.

The misfit M is defined as a weighted norm of sample differences normalised by the observations.

(1)\[ M = \ \frac{\lVert e \rVert_p}{ \lVert e_{\mathrm{0}} \rVert_p}\]

where

(2)\[\begin{align*} \lVert e \rVert_p &= \left(\sum{ ({w_{i}} \cdot |{{d}}_{\mathrm{obs},i} - \ {{ d}}_{\mathrm{synth},i}|)^{p}}\right)^{\frac{1}{p}} \end{align*}\]

and

(3)\[\begin{align*} \lVert e_{\mathrm{0}} \rVert_p &= \left(\sum{ ({w_{i}} \cdot \ |{{d}}_{\mathrm{obs},i} |)^{p}}\right)^{\frac{1}{p}} \end{align*}\]

with adjustable weighting factors \(w_{i}\) and where p=2 corresponds to the commonly used \(L^2\)-norm.

Waveform misfit

Waveform data is preprocessed before misfit calculation in time or frequency domain: Observed and synthetic data are tapered within a time window and bandpass filtered.

The waveform misfit in Grond can further be based on the maximum waveform cross-correlation. In this case, the misfit function is based on the maximum cross-correlation \(\mathrm{max}(CC)\) between observed and synthetic data defined as:

(4)\[\begin{align*} M = 1 - \mathrm{max}(CC). \end{align*}\]

Satellite misfit

The surface deformation data is preprocessed with Kite (See example project: Rectangular source plane from InSAR observations) to obtain a subsampled quadtree. The misfit is then calculated for each quadtree tile \(d_{i}\).

GNSS misfit

Each GNSS component (north, east and up) is forward modelled and compared with the observed data.

Target Weighting

Grond can employ several different kinds of weights:

  • \(w_{\mathrm{tba},i}\) - target balancing (for waveforms and GNSS campaign only)

  • \(w_{\mathrm{noise},i}\) - noise-based data weights (for waveforms only)

  • \(w_{\mathrm{man},i}\) - user-defined, manual weights of target groups

These weights are applied as factors to the misfits, optionally as a product of weight combinations. E.g. for a waveform all data weights combined means:

(5)\[ w_{i} = w_{\mathrm{tba},i} \cdot w_{\mathrm{noise},i} \ \cdot w_{\mathrm{man},i}\]

Target balancing weights

With these weights, waveform targets are balanced with respect to the expected earthquake signal amplitude.

alternate text

Figure 2: Qualitative sketch of how target balancing weight increases with source-receiver distance to balance amplitude decay by geometrical spreading. Large triangles indicate larger weights and vice versa.

Signal amplitudes depend on the (1) source-receiver distance, (2) the phase type, and (3) the signal processing. Target balancing weights avoid that large signal amplitudes dominate the misfit, without providing more information about the source mechanism. The weight of each target is the inverse of the expected signal amplitude at each station \(k\). This amplitude is estimated by forward modelling a sufficiently large number of random samples.

(6)\[{w}_{\mathrm{tba},k}=\frac{1}{{a}_k},\qquad \textrm{with} \qquad a_k= \frac{1}{N} \sum^N_j{|{\bf d}_{\mathrm{synth}}|_{jk}}.\]

For details see Heimann (2011; page 23, adaptive station weighting).

Data weights based on data error statistics

Grond optionally allows weighting based on data quality or error statistics.

There are direct data weight vectors \(\bf{w}\) or weight matrices \(\bf{W}\) based on empirical data error variance estimates. Partly, e.g. for InSAR and GNSS data, these weights are derived from data error correlations expressed in the data error variance-covariance matrix \(\bf{\Sigma}\):

(7)\[{\bf w} = \frac{1}{{\bf \sigma}}, \quad \bf{W} = \sqrt{{\bf \Sigma}^{-1}}.\]

For a SatelliteTargetGroup the data error statistics have to be pre-calculated by Kite beforehand and loaded with the scenes. Kite allows to interactively estimate the noise from areas not affected by coseismic deformation as described in e.g. Sudhaus and Jonsson (2009).

When using a GNSSCampaignTargetGroup the data error statistics are provided as part of the input data set. For details visit the corresponding chapter in the Pyrocko tutorial.

For a WaveformTargetGroup the data error statistics can be estimated by Grond from pre-event data noise (NoiseAnalyser).

Manual data weighting

User-defined manual data weights enable an arbitrary weighting of data sets. No rules apply other than from the user’s rationale. In Grond, they are called “manual_weight “and are given in the configuration file of the targets config.

Normalisation of different data types

Different input data types have different scaling behaviours with different source parameters. This can lead to problems of over- or underweighting of particular input targets, especially when balancing weights are involved. Therefore, the normalisation in equation (1) is applied to each data group separately. A group is formed by assigning them to a normalisation family. We advise putting data of the same kind and the same misfit type into the same normalisation family (see Fig. 1). This could be time domain P and S waveforms or two InSAR data sets.

Example: Fitting waveforms of P and S waves

Let’s say we combine the waveform fit of P and S waves in the frequency domain and cross-correlation-based waveform fits. To account for a different scaling behaviour of the cross-correlation and the frequency domain misfits with magnitude, we form one normalisation_family for each domain. The waveforms of P and S waves in frequency domain are of a similar kind and can be normalised together. The same holds for the normalisation of the cross-correlated P and S waves.

Using an \(L^2\,\)-norm, the global misfit for the two normalisation families is then:

(8)\[ M =\ \sqrt{ \frac{ M_{\mathrm{cc}}^2 + \ M_{\mathrm{frequency}}^2}{ \ 2 }}\]

The bootstrap method

Bootstrapping is a method to estimate model uncertainties (see also Bootstrapping [Wikipedia]).

It is based on the idea that we get slightly different results if we randomise the input data. The variance of these results can be used to assess uncertainties.

In Grond, the bootstrapping is applied in a number of parallel bootstrapping chains where individual bootstrap weights and bootstrap noise are applied to the model misfits. Technically each bootstrap chain carries out its optimisation. Find more detail below, at Bayesian Bootstrap Optimisation (BABO). (What is an optimiser?)

In Grond two different bootstrapping methods are implemented:

  1. Bayesian and classic bootstrapping through misfit weighting and

  2. Residual bootstrapping by adding synthetic noise to the residuals (Fig. 1).

Currently, we use (1) for waveform targets and (2) for InSAR targets.

Bayesian and classic bootstrapping

These bootstrap types are based on residual weighting. Conceptually, the inversion is repeated with a different set of random weights in each bootstrap chain.

Classic weights

For a classic bootstrap realisation we draw \(N_{\mathrm{targets}}\) random integer numbers \({\bf r} \, \in \, [0, N_{\mathrm{targets}}]\) from a uniform distribution (Fig. 2, left). We then sort these in \(N_{\mathrm{targets}}\) bins (Fig. 2, right). The frequency in each bin forms the bootstrap target weights.

alternate text

Figure 3: Formation of classical bootstrap weights. Uniformly drawn random samples (left) and the corresponding histogram (right) with the occurrence frequencies being used as bootstrap weights.

Bayesian weights

For a Bayesian bootstrap realisation we draw \(N_{\mathrm{targets}}-1\) random real numbers \({\bf r} \, \in \, [0, 1]\) from a uniform distribution (Fig. 4, left). We then sort the obtained random values in an ascending order (Fig. 4, middle) forming \(N_{\mathrm{targets}}\) intervals. The widths of the intervals are used as the bootstrap weights (Fig. 4, right):

\[w_{\mathrm{bootstr},\,i}=r_{i+1}-r_i\]
alternate text

Figure 4: Formation of Bayesian bootstrap weights. Uniformly random samples (left) are sorted (middle) and the differences of neighbouring points (right) are used as bootstrap weights.

Residual bootstrap

Residual bootstrapping is a variant of the Randomize-then-optimize approach: with empirical estimates of the data error statistics individual realisations of synthetic correlated random noise are systematically added to the data to obtain perturbed optimisations results (Fig. 5).

Earthquake source parameter distributions retrieved with the Randomize-then-optimize method based on the data error variance-covariance matrices have been shown to match the model parameter distributions obtained through Markov Chain Monte Carlo sampling of the model space (Jonsson et al., 2014). In our residual bootstrapping method we add realisations of synthetic correlated random noise to each bootstrapping chain (Fig. 5C and 1).

To generate random noise for InSAR data, we use functions of the Kite module. From the noise estimation region defined in the Kite scenes (Fig. 5A), the noise power spectrum is used directly with a randomised phase spectrum to create new random noise with the same spectral characteristics (Fig. 5B). The noise is then subsampled through the same quadtree as defined for the observed data (Fig. 5C).

alternate text

Figure 5: Residual bootstrapping of InSAR surface displacement data in Grond. (A) From displacement maps we extract noise and (B) synthesise random correlated data noise. (C) This synthetic noise is then subsampled exactly as the observed data. These random realisations are added to the residuals of each bootstrap chain.

Optimisation

Grond’s modular framework is open for different optimisation schemes, the native optimisation scheme is the so-called Bayesian Bootstrap Optimisation (BABO).

Bayesian Bootstrap Optimisation (BABO)

Bayesian bootstrap optimisation BABO allows for earthquake source optimisation whilst providing the complete information for a full Bayesian analysis. BABO is based on Direct Search, where random model parameters are drawn from a defined model space. Synthetics are then calculated for these models and compared to the target’s observed data. This needs no assumptions on the topology of the misfit space and is appropriate for highly non-linear problems.

BABO can be configured for a simple Monte-Carlo random direct search. It can also resemble a simulated annealing optimisation approach. BABO enables fully probabilistic bootstrapping of the optimisation. This is realised by optimising an ensemble of bootstrap chains in parallel.

Note

Bootstrap weights are explained above. The specific weighting is configured with the targets config used and also with the problem. The model space in which the optimisation takes place is configured with the problem.

Sampling scheme and sampling phases

Like in any direct search, optimisation models are drawn from a model space. From all visited and evaluated models we form and keep a highscore list. The sampling is set up to progressively converge to the low-misfit regions. The behaviour of the BABO optimiser can be tuned to be more explorative or more targeted. Like this, the algorithm’s ability to resolve multiple minima can be balanced against its efficiency.

Highscore list

The BABO sampler uses a highscore list to keep track of the best models found so far. This list contains a fixed number of models. It is updated after every tested candidate model. The highscore list length \(L_{\mathrm{hs}}\) (i.e. number of member models) is problem dependent: \(L_{\mathrm{hs}} = f_{\mathrm{len}} (N_{\mathrm{par}} -1)\), with \(N_{\mathrm{par}}\) being the number of model parameters. \(f_{\mathrm{len}}\) is configurable (chain_length_factor, default is 8).

How new models are proposed is defined by the sampler. Three different samplers are currently available and can be applied successively in an inversion run:

  • UniformSamplerPhase - models are drawn uniformly from the model space ignoring the highscore list

  • InjectionSamplerPhase - allows to inject specific models

  • DirectedSamplerPhase - existing low-misfit models direct the sampling

Details about the sampler configuration: optimiser config

alternate text

Figure 7: Strategic sketch of different optimiser sampling phases.

Bootstrap chains

BABO optimises multiple bootstrap chains in parallel. Each bootstrap chain is created by a specific set of data weights. Accordingly, each chain represents a perturbed objective function providing a different misfit for the same candidate source model. With one forward model \(N_{\mathrm{bootstrap}}\) different bootstrap misfits can be calculated (Fig. 7B).

The highscore list member models in each bootstrap chain (Fig. 7B) will differ to some extent, and therefore, different bootstrap chains may converge to different places within the model space (Fig. 7C, Fig. 8). These differences mark the uncertainty of the models with respect to data errors.

alternate text

Figure 7: Illustration of BABO’s highscore optimiser after twelve iterations. (A) Bootstrap weights, larger triangles indicate higher weight. (B) Bootstrap chain highscore lists and (C) their influence on the convergence in the model parameter space due to the individual weighted objective function of each bootstrap chain.

BABO maintains a separate highscore list for each chain. Consequently, each chain will converge to a slightly different minimum in model space. At the beginning of an inversion, the regions covered by the highscore lists of different chains will overlap (Fig. 7). Later, these regions will separate (Fig. 8) and no further overall convergence can be achieved. This remaining overall region is the solution space of the inversion with BABO. The final models from all bootstrap chains’ highscore lists are interpreted as a non-parametric error distribution.

alternate text

Figure 8: Drawing new model candidates from the described sampling strategies - the proposal is based on the existing solution space.

Movies: BABO at work

To illustrate the optimisation procedure of BABO, we apply it to a simple toy problem. The aim is to find a best-fitting source location in 3D, given noisy 1D distance measures from 10 observation points (triangles). In the following, we show the evolution of the optimiser for different set-ups. The true source location is marked with a star and located at x=0, y=0, z=5. Projections to the x-z-plane are shown.

Contour lines indicate regions of low misfit and are computed from a grid search in x-y-z, which is shown only for comparison. The currently tested candidate model is marked with an empty circle. Large dots represent the highscore models; small dots represent so far tested candidate models. The green background colour depicts the sampler’s current proposal distribution during the directed sampler phase (label: transition).

Single chain

Only the upper half-space (z>0) is searched; the problem is unimodal. The inversion converges to a singular point close to the true solution. Based on the single bootstrap chain, we cannot obtain a meaningful uncertainty measure.

Global + 3 bootstrap chains

Only the upper half-space (z>0) is searched; the problem is unimodal. The black colour marks the so-called global chain in which all observations get the same bootstrap weights. The colours represent three bootstrap chains, each with its own fixed set of random weights. The chains result in slightly different highscore distributions. While in the first example, the models converged to a singular point, here, the distributions converge towards multiple points in the search space, providing a meaningful uncertainty.

Ill-posed problem, no eccentricity correction

Only the upper half-space (z>0) is searched; the problem is unimodal. In this example, all observation points are located at x>0, making the location problem ill-posed, which results in elongated, curved misfit distributions.

In this case, we observe a convergence to locations close to the true location. However, compared with the grid search, low misfit regions tend to be undersampled in the tails (especially the pink distribution). This behaviour is undesired in a setting in which the true location is unknown and a reliable uncertainty estimate is important.

Ill-posed problem, eccentricity correction applied

In the same example as above, when eccentricity correction is applied, the models are drawn from the full regions of low misfit. This allows for a reliable uncertainty estimate.

The eccentricity correction tries to compensate for the fact that the proposal distributions obtained from different bootstrap chains strongly overlap. Without the correction a too large proportion of candidate models was drawn from the region of overlap.

Bimodal, standard deviations from highscore models

To investigate the behaviour of BABO in the case of bimodal objective functions, we search the full space in this toy model. Due to the symmetry, two mirrored minima exist. The proposal distribution (green background color) is elongated parallel to the y-axis because the standard deviation of the y-component remains high due to the separation of the two minima. This standard deviation is used to scale the proposal distribution. The approach works fine to solve the problem, but the search is not very efficient because the proposal distribution does not mimic the low misfit regions.

Bimodal, standard deviations from median density

To improve the efficiency deficit observed in the previous bimodal example, an effective standard deviation is obtained by investigating the median spacing of the model parameters.