locate

locate.py

Detects pop-ins (sudden displacement jumps) in nano-indentation curves using multiple methods:

• IsolationForest anomaly detection on stiffness and curvature features • CNN-based autoencoder reconstruction error • Finite difference method using Fourier spectral analysis • Savitzky-Golay derivative method

To ensure relevance, all detection methods operate only on the indentation curve up to the maximum load point. This is because pop-in events occur during the loading phase of indentation. After reaching peak load, material unloading or post-penetration artifacts may dominate, which are irrelevant for pop-in analysis.

Provides: - compute_stiffness - compute_features - detect_popins_iforest - detect_popins_cnn - detect_popins_fd_fourier - detect_popins_savgol - default_locate (combines all methods)

`build_cnn_autoencoder(window_size, n_features)`

Build a 1D Convolutional Autoencoder for time-series anomaly detection.

This model learns to reconstruct input sequences composed of features like stiffness difference and curvature. During inference, reconstruction error is used to detect anomalies—samples with high error are likely pop-ins.

Architecture overview

Encoder: Conv1D -> MaxPooling -> Conv1D -> MaxPooling -> Conv1D
Decoder: UpSampling -> Conv1D -> UpSampling -> Conv1D

The model operates on fixed-size input windows and uses symmetric encoding and decoding layers. The final layer has linear activation to match the original feature values.

Parameters:

Name	Type	Description	Default
`window_size`	`int`	Number of time steps per sequence.	required
`n_features`	`int`	Number of input features per time step.	required

Returns:

Type	Description
	keras.Model: Keras autoencoder model (uncompiled).

Source code in src/merrypopins/locate.py

def build_cnn_autoencoder(window_size, n_features):
    """
    Build a 1D Convolutional Autoencoder for time-series anomaly detection.

    This model learns to reconstruct input sequences composed of features like
    stiffness difference and curvature. During inference, reconstruction error
    is used to detect anomalies—samples with high error are likely pop-ins.

    Architecture overview:
      - Encoder:
          Conv1D -> MaxPooling -> Conv1D -> MaxPooling -> Conv1D
      - Decoder:
          UpSampling -> Conv1D -> UpSampling -> Conv1D

    The model operates on fixed-size input windows and uses symmetric encoding
    and decoding layers. The final layer has linear activation to match the
    original feature values.

    Args:
        window_size (int): Number of time steps per sequence.
        n_features (int): Number of input features per time step.

    Returns:
        keras.Model: Keras autoencoder model (uncompiled).
    """
    inp = Input(shape=(window_size, n_features))
    x = Conv1D(32, 3, activation="relu", padding="same")(inp)
    x = MaxPooling1D(2, padding="same")(x)
    x = Conv1D(16, 3, activation="relu", padding="same")(x)
    x = MaxPooling1D(2, padding="same")(x)
    x = Conv1D(16, 3, activation="relu", padding="same")(x)
    x = UpSampling1D(2)(x)
    x = Conv1D(32, 3, activation="relu", padding="same")(x)
    x = UpSampling1D(2)(x)
    x = Conv1D(n_features, 3, activation="linear", padding="same")(x)
    return Model(inp, x)

`compute_features(df, depth_col='Depth (nm)', load_col='Load (µN)', window=5, return_derivatives=True)`

Compute derived indentation features for anomaly detection.

This function calculates three features

Stiffness: local slope of load vs. depth (ΔLoad/ΔDepth)
Stiffness difference: the rate of change in stiffness (first derivative)
Curvature: the rate of change in stiffness difference (second derivative)

These features help detect sudden shifts in indentation behavior, often indicative of pop-in events.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data.	required
`depth_col`	`str`	Column name for depth.	`'Depth (nm)'`
`load_col`	`str`	Column name for load.	`'Load (µN)'`
`window`	`int`	Sliding window size for stiffness calculation.	`5`
`return_derivatives`	`bool`	If True (default), return DataFrame with added features. If False, return original DataFrame without added columns.	`True`

Returns:

Name	Type	Description
`DataFrame`		Enhanced DataFrame with 'stiffness', 'stiff_diff', and 'curvature' columns.

Source code in src/merrypopins/locate.py

def compute_features(
    df, depth_col="Depth (nm)", load_col="Load (µN)", window=5, return_derivatives=True
):
    """
    Compute derived indentation features for anomaly detection.

    This function calculates three features:
      1. Stiffness: local slope of load vs. depth (ΔLoad/ΔDepth)
      2. Stiffness difference: the rate of change in stiffness (first derivative)
      3. Curvature: the rate of change in stiffness difference (second derivative)

    These features help detect sudden shifts in indentation behavior, often indicative
    of pop-in events.

    Args:
        df (DataFrame): Input indentation data.
        depth_col (str): Column name for depth.
        load_col (str): Column name for load.
        window (int): Sliding window size for stiffness calculation.
        return_derivatives (bool): If True (default), return DataFrame with added features.
                                   If False, return original DataFrame without added columns.

    Returns:
        DataFrame: Enhanced DataFrame with 'stiffness', 'stiff_diff', and 'curvature' columns.
    """
    df2 = df.copy()
    df2["stiffness"] = compute_stiffness(df, depth_col, load_col, window)
    df2["stiff_diff"] = df2["stiffness"].diff()
    df2["curvature"] = df2["stiff_diff"].diff()
    return df2 if return_derivatives else df

`compute_stiffness(df, depth_col='Depth (nm)', load_col='Load (µN)', window=5)`

Compute local stiffness (dLoad/dDepth) using sliding-window linear regression.

In nano-indentation, 'stiffness' is the local slope of the load–depth curve and reflects how resistant the material is to deformation. It is computed as:

stiffness = change in Load / change in Depth
          = ΔLoad / ΔDepth

This is estimated using linear regression over a moving window centered on each point.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data.	required
`depth_col`	`str`	Column name for depth.	`'Depth (nm)'`
`load_col`	`str`	Column name for load.	`'Load (µN)'`
`window`	`int`	Sliding window size.	`5`

Returns:

Name	Type	Description
`Series`		Stiffness at each data point.

Source code in src/merrypopins/locate.py

def compute_stiffness(df, depth_col="Depth (nm)", load_col="Load (µN)", window=5):
    """
    Compute local stiffness (dLoad/dDepth) using sliding-window linear regression.

    In nano-indentation, 'stiffness' is the local slope of the load–depth curve and
    reflects how resistant the material is to deformation. It is computed as:

        stiffness = change in Load / change in Depth
                  = ΔLoad / ΔDepth

    This is estimated using linear regression over a moving window centered on each point.

    Args:
        df (DataFrame): Input indentation data.
        depth_col (str): Column name for depth.
        load_col (str): Column name for load.
        window (int): Sliding window size.

    Returns:
        Series: Stiffness at each data point.
    """
    if depth_col not in df.columns or load_col not in df.columns:
        raise ValueError(
            f"Required columns '{depth_col}' and/or '{load_col}' not found in DataFrame."
        )
    if len(df) < window:
        raise ValueError(
            f"Not enough data points ({len(df)}) for window size ({window})."
        )

    x, y = df[depth_col].values, df[load_col].values
    stiffness = np.full(len(x), np.nan)
    half_win = window // 2

    for i in range(half_win, len(x) - half_win):
        dx = x[i - half_win : i + half_win + 1]
        dy = y[i - half_win : i + half_win + 1]
        A = np.vstack([dx, np.ones_like(dx)]).T
        stiffness[i], _ = np.linalg.lstsq(A, dy, rcond=None)[0]

    return pd.Series(stiffness, index=df.index)

default_locate(df, iforest_contamination=0.001, iforest_random_state=None, cnn_window_size=64, cnn_epochs=10, cnn_threshold_multiplier=5.0, cnn_batch_size=32, cnn_validation_split=0.0, fd_threshold=3.0, fd_spacing=1.0, savgol_window_length=11, savgol_polyorder=2, savgol_threshold=3.0, sg_deriv_order=1, stiffness_window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True, use_iforest=True, use_cnn=True, use_fd=True, use_savgol=True, depth_col='Depth (nm)', load_col='Load (µN)')

Apply all (default) or selected detection methods to identify pop-ins.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data.	required
`iforest_contamination`	`float`	Expected contamination level for IsolationForest.	`0.001`
`iforest_random_state`	`int or None`	Seed for reproducibility.	`None`
`cnn_window_size`	`int`	Window size for CNN autoencoder.	`64`
`cnn_epochs`	`int`	Training epochs for CNN.	`10`
`cnn_threshold_multiplier`	`float`	Threshold multiplier for CNN anomaly detection.	`5.0`
`cnn_batch_size`	`int`	Batch size for CNN autoencoder.	`32`
`cnn_validation_split`	`float`	Validation split for CNN autoencoder.	`0.0`
`fd_threshold`	`float`	Standard deviation threshold for finite difference method.	`3.0`
`fd_spacing`	`float`	Spacing between samples for FFT derivative.	`1.0`
`savgol_window_length`	`int`	Window size for Savitzky-Golay filter.	`11`
`savgol_polyorder`	`int`	Polynomial order for Savitzky-Golay filter.	`2`
`savgol_threshold`	`float`	Std deviation threshold for Savitzky-Golay.	`3.0`
`sg_deriv_order`	`int`	Derivative order for Savitzky-Golay.	`1`
`stiffness_window`	`int`	Sliding window size for stiffness computation.	`5`
`trim_edges_enabled`	`bool`	If True, trims the first `window` elements	`True`
`trim_margin`	`int or None`	Number of elements to trim from the start.	`None`
`max_load_trim_enabled`	`bool`	If True, masks out any anomalies after the maximum load point. Default is True.	`True`
`use_iforest`	`bool`	Whether to use IsolationForest method.	`True`
`use_cnn`	`bool`	Whether to use CNN method.	`True`
`use_fd`	`bool`	Whether to use finite difference method.	`True`
`use_savgol`	`bool`	Whether to use Savitzky-Golay method.	`True`
`depth_col`	`str`	Column name for depth data.	`'Depth (nm)'`
`load_col`	`str`	Column name for load data.	`'Load (µN)'`

Returns:

Name	Type	Description
`DataFrame`		Data with individual method flags, combined flag, and metadata columns.

Source code in src/merrypopins/locate.py

def default_locate(
    df,
    iforest_contamination=0.001,
    iforest_random_state=None,
    cnn_window_size=64,
    cnn_epochs=10,
    cnn_threshold_multiplier=5.0,
    cnn_batch_size=32,
    cnn_validation_split=0.0,
    fd_threshold=3.0,
    fd_spacing=1.0,
    savgol_window_length=11,
    savgol_polyorder=2,
    savgol_threshold=3.0,
    sg_deriv_order=1,
    stiffness_window=5,
    trim_edges_enabled=True,
    trim_margin=None,
    max_load_trim_enabled=True,
    use_iforest=True,
    use_cnn=True,
    use_fd=True,
    use_savgol=True,
    depth_col="Depth (nm)",
    load_col="Load (µN)",
):
    """
    Apply all (default) or selected detection methods to identify pop-ins.

    Args:
        df (DataFrame): Input indentation data.
        iforest_contamination (float): Expected contamination level for IsolationForest.
        iforest_random_state (int or None): Seed for reproducibility.
        cnn_window_size (int): Window size for CNN autoencoder.
        cnn_epochs (int): Training epochs for CNN.
        cnn_threshold_multiplier (float): Threshold multiplier for CNN anomaly detection.
        cnn_batch_size (int): Batch size for CNN autoencoder.
        cnn_validation_split (float): Validation split for CNN autoencoder.
        fd_threshold (float): Standard deviation threshold for finite difference method.
        fd_spacing (float): Spacing between samples for FFT derivative.
        savgol_window_length (int): Window size for Savitzky-Golay filter.
        savgol_polyorder (int): Polynomial order for Savitzky-Golay filter.
        savgol_threshold (float): Std deviation threshold for Savitzky-Golay.
        sg_deriv_order (int): Derivative order for Savitzky-Golay.
        stiffness_window (int): Sliding window size for stiffness computation.
        trim_edges_enabled (bool): If True, trims the first `window` elements
        trim_margin (int or None): Number of elements to trim from the start.
        max_load_trim_enabled (bool): If True, masks out any anomalies after the maximum load point. Default is True.
        use_iforest (bool): Whether to use IsolationForest method.
        use_cnn (bool): Whether to use CNN method.
        use_fd (bool): Whether to use finite difference method.
        use_savgol (bool): Whether to use Savitzky-Golay method.
        depth_col (str): Column name for depth data.
        load_col (str): Column name for load data.

    Returns:
        DataFrame: Data with individual method flags, combined flag, and metadata columns.
    """
    df_combined = df.copy()
    method_flags = []

    if use_iforest:
        df_iforest = detect_popins_iforest(
            df,
            contamination=iforest_contamination,
            random_state=iforest_random_state,
            depth_col=depth_col,
            load_col=load_col,
            window=stiffness_window,
            trim_edges_enabled=trim_edges_enabled,
            trim_margin=trim_margin,
            max_load_trim_enabled=max_load_trim_enabled,
        )
        df_combined["popin_iforest"] = df_iforest["popin_iforest"]
        method_flags.append("popin_iforest")

    if use_cnn:
        df_cnn = detect_popins_cnn(
            df,
            window_size=cnn_window_size,
            epochs=cnn_epochs,
            threshold_multiplier=cnn_threshold_multiplier,
            batch_size=cnn_batch_size,
            validation_split=cnn_validation_split,
            depth_col=depth_col,
            load_col=load_col,
            window=stiffness_window,
            trim_edges_enabled=trim_edges_enabled,
            trim_margin=trim_margin,
            max_load_trim_enabled=max_load_trim_enabled,
        )
        df_combined["popin_cnn"] = df_cnn["popin_cnn"]
        method_flags.append("popin_cnn")

    if use_fd:
        df_fd = detect_popins_fd_fourier(
            df,
            threshold=fd_threshold,
            spacing=fd_spacing,
            trim_edges_enabled=trim_edges_enabled,
            trim_margin=trim_margin,
            load_col=load_col,
            max_load_trim_enabled=max_load_trim_enabled,
        )
        df_combined["popin_fd"] = df_fd["popin_fd"]
        method_flags.append("popin_fd")

    if use_savgol:
        df_savgol = detect_popins_savgol(
            df,
            window_length=savgol_window_length,
            polyorder=savgol_polyorder,
            threshold=savgol_threshold,
            deriv=sg_deriv_order,
            load_col=load_col,
            trim_edges_enabled=trim_edges_enabled,
            trim_margin=trim_margin,
            max_load_trim_enabled=max_load_trim_enabled,
        )
        df_combined["popin_savgol"] = df_savgol["popin_savgol"]
        method_flags.append("popin_savgol")

    df_combined["popin"] = df_combined[method_flags].any(axis=1)
    df_combined["popin_methods"] = df_combined[method_flags].apply(
        lambda row: ",".join(
            [col.replace("popin_", "") for col in method_flags if row[col]]
        ),
        axis=1,
    )
    df_combined["popin_score"] = df_combined[method_flags].sum(axis=1)
    df_combined["popin_confident"] = df_combined["popin_score"] >= 2

    total_popins = df_combined["popin"].sum()
    logger.info(f"Total pop-ins detected by selected methods: {total_popins}")

    return df_combined

`detect_popins_cnn(df, window_size=64, epochs=10, threshold_multiplier=5.0, batch_size=32, validation_split=0.0, depth_col='Depth (nm)', load_col='Load (µN)', window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)`

Detect pop-ins using a Convolutional Autoencoder trained on stiffness features.

This method uses an unsupervised CNN-based autoencoder to learn a compressed representation of local indentation behavior. It reconstructs short time windows of two features: - Stiffness difference: rate of change of the slope (d²Load/dDepth²) - Curvature: second derivative of load (d³Load/dDepth³)

The reconstruction error (mean squared error) is computed between input and output. High reconstruction errors indicate patterns that the model considers unusual— these are flagged as potential pop-in events.

The method uses a sliding window to extract overlapping sequences from the full curve, trains the model on all windows, and flags windows whose error exceeds a dynamic threshold.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data containing load and depth columns.	required
`window_size`	`int`	Number of time steps per CNN input window.	`64`
`epochs`	`int`	Number of training epochs for the autoencoder.	`10`
`threshold_multiplier`	`float`	Multiplier for anomaly detection threshold based on std dev.	`5.0`
`batch_size`	`int`	Mini-batch size during training.	`32`
`validation_split`	`float`	Proportion of data used for validation (0.0 disables validation).	`0.0`
`depth_col`	`str`	Column name for depth measurements.	`'Depth (nm)'`
`load_col`	`str`	Column name for load measurements.	`'Load (µN)'`
`window`	`int`	Size of the moving window used for stiffness calculation.	`5`
`trim_edges_enabled`	`bool`	If True, trims the first `window` elements	`True`
`trim_margin`	`int or None`	Number of elements to trim from the start.	`None`
`max_load_trim_enabled`	`bool`	If True, masks out any anomalies after the maximum load point. Default is True.	`True`

Returns:

Name	Type	Description
`DataFrame`		Original DataFrame with a new boolean column: - "popin_cnn": True for detected anomalies, False otherwise. - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.

Source code in src/merrypopins/locate.py

def detect_popins_cnn(
    df,
    window_size=64,
    epochs=10,
    threshold_multiplier=5.0,
    batch_size=32,
    validation_split=0.0,
    depth_col="Depth (nm)",
    load_col="Load (µN)",
    window=5,
    trim_edges_enabled=True,
    trim_margin=None,
    max_load_trim_enabled=True,
):
    """
    Detect pop-ins using a Convolutional Autoencoder trained on stiffness features.

    This method uses an unsupervised CNN-based autoencoder to learn a compressed
    representation of local indentation behavior. It reconstructs short time windows
    of two features:
      - Stiffness difference: rate of change of the slope (d²Load/dDepth²)
      - Curvature: second derivative of load (d³Load/dDepth³)

    The reconstruction error (mean squared error) is computed between input and output.
    High reconstruction errors indicate patterns that the model considers unusual—
    these are flagged as potential pop-in events.

    The method uses a sliding window to extract overlapping sequences from the full curve,
    trains the model on all windows, and flags windows whose error exceeds a dynamic threshold.

    Args:
        df (DataFrame): Input indentation data containing load and depth columns.
        window_size (int): Number of time steps per CNN input window.
        epochs (int): Number of training epochs for the autoencoder.
        threshold_multiplier (float): Multiplier for anomaly detection threshold based on std dev.
        batch_size (int): Mini-batch size during training.
        validation_split (float): Proportion of data used for validation (0.0 disables validation).
        depth_col (str): Column name for depth measurements.
        load_col (str): Column name for load measurements.
        window (int): Size of the moving window used for stiffness calculation.
        trim_edges_enabled (bool): If True, trims the first `window` elements
        trim_margin (int or None): Number of elements to trim from the start.
        max_load_trim_enabled (bool): If True, masks out any anomalies after the maximum load point. Default is True.

    Returns:
        DataFrame: Original DataFrame with a new boolean column:
            - "popin_cnn": True for detected anomalies, False otherwise.
                - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.
    """
    df2 = compute_features(df, depth_col, load_col, window)
    X = df2[["stiff_diff", "curvature"]].fillna(0).values
    W = np.array([X[i : i + window_size] for i in range(len(X) - window_size)])

    ae = build_cnn_autoencoder(window_size, 2)
    ae.compile(optimizer="adam", loss="mse")
    ae.fit(
        W,
        W,
        epochs=epochs,
        batch_size=batch_size,
        validation_split=validation_split,
        verbose=0,
        callbacks=(
            [EarlyStopping(patience=3, restore_best_weights=True)]
            if validation_split > 0
            else None
        ),
    )

    W_pred = ae.predict(W, verbose=0)
    errors = np.mean((W - W_pred) ** 2, axis=(1, 2))
    threshold = errors.mean() + threshold_multiplier * errors.std()

    flags = np.zeros(len(X), dtype=bool)
    flags[window_size // 2 : -window_size // 2] = errors > threshold
    if trim_edges_enabled:
        margin = trim_margin if trim_margin is not None else max(10, window)
        flags = trim_edges(flags, margin=margin)

    if max_load_trim_enabled:
        # Mask out anything *after* max load
        max_idx = find_max_load_index(df, load_col)
        flags[max_idx + 1 :] = False

    df2["popin_cnn"] = flags
    logger.info(f"CNN flagged {df2['popin_cnn'].sum()} anomalies")
    return df2

`detect_popins_fd_fourier(df, threshold=3.0, spacing=1.0, trim_edges_enabled=True, trim_margin=None, load_col='Load (µN)', max_load_trim_enabled=True)`

Detect pop-ins by estimating the derivative of Load using a Fourier spectral method.

This method computes the first derivative in the frequency domain using the Fourier Transform. The basic idea is that differentiation in the time domain corresponds to multiplying by a frequency component in the Fourier domain:

dLoad/dDepth ≈ IFFT( i * 2π * frequency * FFT(Load) )

The inverse FFT (IFFT) is then used to convert the differentiated signal back into the spatial domain. IFFT takes frequency-domain data and reconstructs the original time-domain (or spatial) signal.

Anomalies are flagged where the resulting derivative deviates from the mean by more than a given number of standard deviations.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data.	required
`threshold`	`float`	Std deviation multiplier to flag anomalies in derivative.	`3.0`
`spacing`	`float`	Spacing between data points (in nm or similar units).	`1.0`
`trim_edges_enabled`	`bool`	If True, trims the first `window` elements	`True`
`trim_margin`	`int or None`	Number of elements to trim from the start.	`None`
`load_col`	`str`	Column name for load data.	`'Load (µN)'`
`max_load_trim_enabled`	`bool`	If True, masks out any anomalies after the maximum load point. Default is True.	`True`

Returns:

Name	Type	Description
`DataFrame`		Original DataFrame with a new boolean column: - "popin_fd": True for detected anomalies, False otherwise. - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.

Source code in src/merrypopins/locate.py

def detect_popins_fd_fourier(
    df,
    threshold=3.0,
    spacing=1.0,
    trim_edges_enabled=True,
    trim_margin=None,
    load_col="Load (µN)",
    max_load_trim_enabled=True,
):
    """
    Detect pop-ins by estimating the derivative of Load using a Fourier spectral method.

    This method computes the first derivative in the frequency domain using the Fourier Transform.
    The basic idea is that differentiation in the time domain corresponds to multiplying by a frequency
    component in the Fourier domain:

        dLoad/dDepth ≈ IFFT( i * 2π * frequency * FFT(Load) )

    The inverse FFT (IFFT) is then used to convert the differentiated signal back into the spatial domain.
    IFFT takes frequency-domain data and reconstructs the original time-domain (or spatial) signal.

    Anomalies are flagged where the resulting derivative deviates from the mean by more than a
    given number of standard deviations.

    Args:
        df (DataFrame): Input indentation data.
        threshold (float): Std deviation multiplier to flag anomalies in derivative.
        spacing (float): Spacing between data points (in nm or similar units).
        trim_edges_enabled (bool): If True, trims the first `window` elements
        trim_margin (int or None): Number of elements to trim from the start.
        load_col (str): Column name for load data.
        max_load_trim_enabled (bool): If True, masks out any anomalies after the maximum load point. Default is True.

    Returns:
        DataFrame: Original DataFrame with a new boolean column:
            - "popin_fd": True for detected anomalies, False otherwise.
                - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.
    """
    load = df[load_col].values
    fft_load = np.fft.fft(load)
    freqs = np.fft.fftfreq(len(load), d=spacing)
    derivative = np.real(np.fft.ifft(1j * 2 * np.pi * freqs * fft_load))
    anomalies = np.abs(derivative - np.mean(derivative)) > threshold * np.std(
        derivative
    )
    if trim_edges_enabled:
        margin = trim_margin if trim_margin is not None else 10
        anomalies = trim_edges(anomalies, margin=margin)

    if max_load_trim_enabled:
        # Mask out anything *after* max load
        max_idx = find_max_load_index(df, load_col)
        anomalies[max_idx + 1 :] = False

    df2 = df.copy()
    df2["popin_fd"] = anomalies
    logger.info(f"Fourier spectral method flagged {anomalies.sum()} anomalies")
    return df2

`detect_popins_iforest(df, contamination=0.001, random_state=None, depth_col='Depth (nm)', load_col='Load (µN)', window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)`

Detect pop-ins using Isolation Forest based on local stiffness and curvature.

This method computes two time-series features

Stiffness difference: the rate of change in the slope of the load–depth curve
Curvature: the second derivative of the load curve (change in stiffness difference)

It then applies the Isolation Forest algorithm from scikit-learn, which isolates anomalies by recursively partitioning the feature space. Points that require fewer partitions to isolate are more likely to be outliers.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Indentation dataset containing load and depth columns.	required
`contamination`	`float`	Proportion of expected anomalies in the dataset.	`0.001`
`random_state`	`int or None`	Random seed for reproducibility.	`None`
`depth_col`	`str`	Name of the depth column.	`'Depth (nm)'`
`load_col`	`str`	Name of the load column.	`'Load (µN)'`
`window`	`int`	Size of the sliding window used to compute stiffness.	`5`
`trim_edges_enabled`	`bool`	If True, trims the first `window` elements	`True`
`trim_margin`	`int or None`	Number of elements to trim from the start.	`None`
`max_load_trim_enabled`	`bool`	If True, masks out any anomalies after the maximum load point. Default is True.	`True`

Returns:

Name	Type	Description
`DataFrame`		A copy of the original DataFrame with a new boolean column: - "popin_iforest": True for detected pop-ins (anomalies), False otherwise. - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.

Source code in src/merrypopins/locate.py

def detect_popins_iforest(
    df,
    contamination=0.001,
    random_state=None,
    depth_col="Depth (nm)",
    load_col="Load (µN)",
    window=5,
    trim_edges_enabled=True,
    trim_margin=None,
    max_load_trim_enabled=True,
):
    """
    Detect pop-ins using Isolation Forest based on local stiffness and curvature.

    This method computes two time-series features:
      - Stiffness difference: the rate of change in the slope of the load–depth curve
      - Curvature: the second derivative of the load curve (change in stiffness difference)

    It then applies the Isolation Forest algorithm from scikit-learn, which isolates
    anomalies by recursively partitioning the feature space. Points that require fewer
    partitions to isolate are more likely to be outliers.

    Args:
        df (DataFrame): Indentation dataset containing load and depth columns.
        contamination (float): Proportion of expected anomalies in the dataset.
        random_state (int or None): Random seed for reproducibility.
        depth_col (str): Name of the depth column.
        load_col (str): Name of the load column.
        window (int): Size of the sliding window used to compute stiffness.
        trim_edges_enabled (bool): If True, trims the first `window` elements
        trim_margin (int or None): Number of elements to trim from the start.
        max_load_trim_enabled (bool): If True, masks out any anomalies after the maximum load point. Default is True.

    Returns:
        DataFrame: A copy of the original DataFrame with a new boolean column:
            - "popin_iforest": True for detected pop-ins (anomalies), False otherwise.
                - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.
    """
    df2 = compute_features(df, depth_col, load_col, window)
    iso = IsolationForest(contamination=contamination, random_state=random_state)
    features = df2[["stiff_diff", "curvature"]].fillna(0)
    preds = iso.fit_predict(features) == -1
    if trim_edges_enabled:
        margin = trim_margin if trim_margin is not None else max(10, window)
        preds = trim_edges(preds, margin=margin)

    if max_load_trim_enabled:
        # Mask out anything *after* max load
        max_idx = find_max_load_index(df, load_col)
        preds[max_idx + 1 :] = False

    df2["popin_iforest"] = preds
    logger.info(f"IsolationForest flagged {df2['popin_iforest'].sum()} anomalies")
    return df2

`detect_popins_savgol(df, window_length=11, polyorder=2, threshold=3.0, deriv=1, load_col='Load (µN)', trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)`

Detect pop-ins using Savitzky-Golay filtered derivatives.

This method smooths the load data using a polynomial filter and computes its derivative. Anomalies are flagged where the derivative differs significantly from its mean value.

The steps are

Apply Savitzky-Golay filter to compute the derivative (e.g., velocity or acceleration)
Flag points where |derivative - mean| > threshold * std deviation

The Savitzky-Golay filter works by fitting successive subsets of adjacent data points with a low-degree polynomial using linear least squares.

Parameters:

Name	Type	Description	Default
`window_length`	`int`	Length of the filter window (must be odd).	`11`
`polyorder`	`int`	Order of polynomial for smoothing.	`2`
`threshold`	`float`	Threshold in standard deviations for detecting anomalies.	`3.0`
`deriv`	`int`	Order of derivative to compute (default is 1 for first derivative).	`1`
`load_col`	`str`	Column name for load data.	`'Load (µN)'`
`trim_edges_enabled`	`bool`	If True, trims the first `window` elements	`True`
`trim_margin`	`int or None`	Number of elements to trim from the start.	`None`
`max_load_trim_enabled`	`bool`	If True, masks out any anomalies after the maximum load point. Default is True.	`True`

Returns:

Type	Description
	DataFrame: Original dataframe with a new boolean column: "popin_savgol": True for detected anomalies, False otherwise. Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.

Source code in src/merrypopins/locate.py

def detect_popins_savgol(
    df,
    window_length=11,
    polyorder=2,
    threshold=3.0,
    deriv=1,
    load_col="Load (µN)",
    trim_edges_enabled=True,
    trim_margin=None,
    max_load_trim_enabled=True,
):
    """
    Detect pop-ins using Savitzky-Golay filtered derivatives.

    This method smooths the load data using a polynomial filter and computes its derivative.
    Anomalies are flagged where the derivative differs significantly from its mean value.

    The steps are:
      1. Apply Savitzky-Golay filter to compute the derivative (e.g., velocity or acceleration)
      2. Flag points where |derivative - mean| > threshold * std deviation

    The Savitzky-Golay filter works by fitting successive subsets of adjacent data points
    with a low-degree polynomial using linear least squares.

    Args:
        window_length (int): Length of the filter window (must be odd).
        polyorder (int): Order of polynomial for smoothing.
        threshold (float): Threshold in standard deviations for detecting anomalies.
        deriv (int): Order of derivative to compute (default is 1 for first derivative).
        load_col (str): Column name for load data.
        trim_edges_enabled (bool): If True, trims the first `window` elements
        trim_margin (int or None): Number of elements to trim from the start.
        max_load_trim_enabled (bool): If True, masks out any anomalies after the maximum load point. Default is True.

    Returns:
        - DataFrame: Original dataframe with a new boolean column:
            - "popin_savgol": True for detected anomalies, False otherwise.
                - Only pre-max-load anomalies are returned to focus on loading-phase events. If `max_load_trim_enabled` is True which is the default.
    """
    derivative = savgol_filter(df[load_col], window_length, polyorder, deriv=deriv)
    anomalies = np.abs(derivative - np.mean(derivative)) > threshold * np.std(
        derivative
    )
    if trim_edges_enabled:
        margin = trim_margin if trim_margin is not None else max(10, window_length)
        anomalies = trim_edges(anomalies, margin=margin)

    if max_load_trim_enabled:
        # Mask out anything *after* max load
        max_idx = find_max_load_index(df, load_col)
        anomalies[max_idx + 1 :] = False

    df2 = df.copy()
    df2["popin_savgol"] = anomalies
    logger.info(f"Savitzky-Golay flagged {anomalies.sum()} anomalies")
    return df2

`find_max_load_index(df, load_col='Load (µN)')`

Find the index of the maximum load point in the indentation curve.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input indentation data.	required
`load_col`	`str`	Column name for the load data.	`'Load (µN)'`

Returns:

Name	Type	Description
`int`		Index of the maximum load value.

Source code in src/merrypopins/locate.py

def find_max_load_index(df, load_col="Load (µN)"):
    """
    Find the index of the maximum load point in the indentation curve.

    Args:
        df (DataFrame): Input indentation data.
        load_col (str): Column name for the load data.

    Returns:
        int: Index of the maximum load value.
    """
    return df[load_col].idxmax()

`trim_edges(series, margin)`

Trim the first margin elements of a pandas Series. This is useful for removing edge effects in time-series data where the first few points may not be reliable. Args: series (pd.Series): Input time-series data. margin (int): Number of elements to trim from the start. Returns: pd.Series: A copy of the input series with the first margin elements set to False.

Source code in src/merrypopins/locate.py

def trim_edges(series, margin):
    """
    Trim the first `margin` elements of a pandas Series.
    This is useful for removing edge effects in time-series data where
    the first few points may not be reliable.
    Args:
        series (pd.Series): Input time-series data.
        margin (int): Number of elements to trim from the start.
    Returns:
        pd.Series: A copy of the input series with the first `margin` elements set to False.
    """
    trimmed = series.copy()
    trimmed[:margin] = False
    return trimmed