locate
locate.py
Detects pop-ins (sudden displacement jumps) in nano-indentation curves using multiple methods:
• IsolationForest anomaly detection on stiffness and curvature features • CNN-based autoencoder reconstruction error • Finite difference method using Fourier spectral analysis • Savitzky-Golay derivative method
To ensure relevance, all detection methods operate only on the indentation curve up to the maximum load point. This is because pop-in events occur during the loading phase of indentation. After reaching peak load, material unloading or post-penetration artifacts may dominate, which are irrelevant for pop-in analysis.
Provides: - compute_stiffness - compute_features - detect_popins_iforest - detect_popins_cnn - detect_popins_fd_fourier - detect_popins_savgol - default_locate (combines all methods)
build_cnn_autoencoder(window_size, n_features)
Build a 1D Convolutional Autoencoder for time-series anomaly detection.
This model learns to reconstruct input sequences composed of features like stiffness difference and curvature. During inference, reconstruction error is used to detect anomalies—samples with high error are likely pop-ins.
Architecture overview
- Encoder: Conv1D -> MaxPooling -> Conv1D -> MaxPooling -> Conv1D
- Decoder: UpSampling -> Conv1D -> UpSampling -> Conv1D
The model operates on fixed-size input windows and uses symmetric encoding and decoding layers. The final layer has linear activation to match the original feature values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_size
|
int
|
Number of time steps per sequence. |
required |
n_features
|
int
|
Number of input features per time step. |
required |
Returns:
Type | Description |
---|---|
keras.Model: Keras autoencoder model (uncompiled). |
Source code in src/merrypopins/locate.py
compute_features(df, depth_col='Depth (nm)', load_col='Load (µN)', window=5, return_derivatives=True)
Compute derived indentation features for anomaly detection.
This function calculates three features
- Stiffness: local slope of load vs. depth (ΔLoad/ΔDepth)
- Stiffness difference: the rate of change in stiffness (first derivative)
- Curvature: the rate of change in stiffness difference (second derivative)
These features help detect sudden shifts in indentation behavior, often indicative of pop-in events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data. |
required |
depth_col
|
str
|
Column name for depth. |
'Depth (nm)'
|
load_col
|
str
|
Column name for load. |
'Load (µN)'
|
window
|
int
|
Sliding window size for stiffness calculation. |
5
|
return_derivatives
|
bool
|
If True (default), return DataFrame with added features. If False, return original DataFrame without added columns. |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Enhanced DataFrame with 'stiffness', 'stiff_diff', and 'curvature' columns. |
Source code in src/merrypopins/locate.py
compute_stiffness(df, depth_col='Depth (nm)', load_col='Load (µN)', window=5)
Compute local stiffness (dLoad/dDepth) using sliding-window linear regression.
In nano-indentation, 'stiffness' is the local slope of the load–depth curve and reflects how resistant the material is to deformation. It is computed as:
stiffness = change in Load / change in Depth
= ΔLoad / ΔDepth
This is estimated using linear regression over a moving window centered on each point.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data. |
required |
depth_col
|
str
|
Column name for depth. |
'Depth (nm)'
|
load_col
|
str
|
Column name for load. |
'Load (µN)'
|
window
|
int
|
Sliding window size. |
5
|
Returns:
Name | Type | Description |
---|---|---|
Series |
Stiffness at each data point. |
Source code in src/merrypopins/locate.py
default_locate(df, iforest_contamination=0.001, iforest_random_state=None, cnn_window_size=64, cnn_epochs=10, cnn_threshold_multiplier=5.0, cnn_batch_size=32, cnn_validation_split=0.0, fd_threshold=3.0, fd_spacing=1.0, savgol_window_length=11, savgol_polyorder=2, savgol_threshold=3.0, sg_deriv_order=1, stiffness_window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True, use_iforest=True, use_cnn=True, use_fd=True, use_savgol=True, depth_col='Depth (nm)', load_col='Load (µN)')
Apply all (default) or selected detection methods to identify pop-ins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data. |
required |
iforest_contamination
|
float
|
Expected contamination level for IsolationForest. |
0.001
|
iforest_random_state
|
int or None
|
Seed for reproducibility. |
None
|
cnn_window_size
|
int
|
Window size for CNN autoencoder. |
64
|
cnn_epochs
|
int
|
Training epochs for CNN. |
10
|
cnn_threshold_multiplier
|
float
|
Threshold multiplier for CNN anomaly detection. |
5.0
|
cnn_batch_size
|
int
|
Batch size for CNN autoencoder. |
32
|
cnn_validation_split
|
float
|
Validation split for CNN autoencoder. |
0.0
|
fd_threshold
|
float
|
Standard deviation threshold for finite difference method. |
3.0
|
fd_spacing
|
float
|
Spacing between samples for FFT derivative. |
1.0
|
savgol_window_length
|
int
|
Window size for Savitzky-Golay filter. |
11
|
savgol_polyorder
|
int
|
Polynomial order for Savitzky-Golay filter. |
2
|
savgol_threshold
|
float
|
Std deviation threshold for Savitzky-Golay. |
3.0
|
sg_deriv_order
|
int
|
Derivative order for Savitzky-Golay. |
1
|
stiffness_window
|
int
|
Sliding window size for stiffness computation. |
5
|
trim_edges_enabled
|
bool
|
If True, trims the first |
True
|
trim_margin
|
int or None
|
Number of elements to trim from the start. |
None
|
max_load_trim_enabled
|
bool
|
If True, masks out any anomalies after the maximum load point. Default is True. |
True
|
use_iforest
|
bool
|
Whether to use IsolationForest method. |
True
|
use_cnn
|
bool
|
Whether to use CNN method. |
True
|
use_fd
|
bool
|
Whether to use finite difference method. |
True
|
use_savgol
|
bool
|
Whether to use Savitzky-Golay method. |
True
|
depth_col
|
str
|
Column name for depth data. |
'Depth (nm)'
|
load_col
|
str
|
Column name for load data. |
'Load (µN)'
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Data with individual method flags, combined flag, and metadata columns. |
Source code in src/merrypopins/locate.py
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 |
|
detect_popins_cnn(df, window_size=64, epochs=10, threshold_multiplier=5.0, batch_size=32, validation_split=0.0, depth_col='Depth (nm)', load_col='Load (µN)', window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)
Detect pop-ins using a Convolutional Autoencoder trained on stiffness features.
This method uses an unsupervised CNN-based autoencoder to learn a compressed representation of local indentation behavior. It reconstructs short time windows of two features: - Stiffness difference: rate of change of the slope (d²Load/dDepth²) - Curvature: second derivative of load (d³Load/dDepth³)
The reconstruction error (mean squared error) is computed between input and output. High reconstruction errors indicate patterns that the model considers unusual— these are flagged as potential pop-in events.
The method uses a sliding window to extract overlapping sequences from the full curve, trains the model on all windows, and flags windows whose error exceeds a dynamic threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data containing load and depth columns. |
required |
window_size
|
int
|
Number of time steps per CNN input window. |
64
|
epochs
|
int
|
Number of training epochs for the autoencoder. |
10
|
threshold_multiplier
|
float
|
Multiplier for anomaly detection threshold based on std dev. |
5.0
|
batch_size
|
int
|
Mini-batch size during training. |
32
|
validation_split
|
float
|
Proportion of data used for validation (0.0 disables validation). |
0.0
|
depth_col
|
str
|
Column name for depth measurements. |
'Depth (nm)'
|
load_col
|
str
|
Column name for load measurements. |
'Load (µN)'
|
window
|
int
|
Size of the moving window used for stiffness calculation. |
5
|
trim_edges_enabled
|
bool
|
If True, trims the first |
True
|
trim_margin
|
int or None
|
Number of elements to trim from the start. |
None
|
max_load_trim_enabled
|
bool
|
If True, masks out any anomalies after the maximum load point. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Original DataFrame with a new boolean column:
- "popin_cnn": True for detected anomalies, False otherwise.
- Only pre-max-load anomalies are returned to focus on loading-phase events. If |
Source code in src/merrypopins/locate.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 |
|
detect_popins_fd_fourier(df, threshold=3.0, spacing=1.0, trim_edges_enabled=True, trim_margin=None, load_col='Load (µN)', max_load_trim_enabled=True)
Detect pop-ins by estimating the derivative of Load using a Fourier spectral method.
This method computes the first derivative in the frequency domain using the Fourier Transform. The basic idea is that differentiation in the time domain corresponds to multiplying by a frequency component in the Fourier domain:
dLoad/dDepth ≈ IFFT( i * 2π * frequency * FFT(Load) )
The inverse FFT (IFFT) is then used to convert the differentiated signal back into the spatial domain. IFFT takes frequency-domain data and reconstructs the original time-domain (or spatial) signal.
Anomalies are flagged where the resulting derivative deviates from the mean by more than a given number of standard deviations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data. |
required |
threshold
|
float
|
Std deviation multiplier to flag anomalies in derivative. |
3.0
|
spacing
|
float
|
Spacing between data points (in nm or similar units). |
1.0
|
trim_edges_enabled
|
bool
|
If True, trims the first |
True
|
trim_margin
|
int or None
|
Number of elements to trim from the start. |
None
|
load_col
|
str
|
Column name for load data. |
'Load (µN)'
|
max_load_trim_enabled
|
bool
|
If True, masks out any anomalies after the maximum load point. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Original DataFrame with a new boolean column:
- "popin_fd": True for detected anomalies, False otherwise.
- Only pre-max-load anomalies are returned to focus on loading-phase events. If |
Source code in src/merrypopins/locate.py
detect_popins_iforest(df, contamination=0.001, random_state=None, depth_col='Depth (nm)', load_col='Load (µN)', window=5, trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)
Detect pop-ins using Isolation Forest based on local stiffness and curvature.
This method computes two time-series features
- Stiffness difference: the rate of change in the slope of the load–depth curve
- Curvature: the second derivative of the load curve (change in stiffness difference)
It then applies the Isolation Forest algorithm from scikit-learn, which isolates anomalies by recursively partitioning the feature space. Points that require fewer partitions to isolate are more likely to be outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Indentation dataset containing load and depth columns. |
required |
contamination
|
float
|
Proportion of expected anomalies in the dataset. |
0.001
|
random_state
|
int or None
|
Random seed for reproducibility. |
None
|
depth_col
|
str
|
Name of the depth column. |
'Depth (nm)'
|
load_col
|
str
|
Name of the load column. |
'Load (µN)'
|
window
|
int
|
Size of the sliding window used to compute stiffness. |
5
|
trim_edges_enabled
|
bool
|
If True, trims the first |
True
|
trim_margin
|
int or None
|
Number of elements to trim from the start. |
None
|
max_load_trim_enabled
|
bool
|
If True, masks out any anomalies after the maximum load point. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
A copy of the original DataFrame with a new boolean column:
- "popin_iforest": True for detected pop-ins (anomalies), False otherwise.
- Only pre-max-load anomalies are returned to focus on loading-phase events. If |
Source code in src/merrypopins/locate.py
detect_popins_savgol(df, window_length=11, polyorder=2, threshold=3.0, deriv=1, load_col='Load (µN)', trim_edges_enabled=True, trim_margin=None, max_load_trim_enabled=True)
Detect pop-ins using Savitzky-Golay filtered derivatives.
This method smooths the load data using a polynomial filter and computes its derivative. Anomalies are flagged where the derivative differs significantly from its mean value.
The steps are
- Apply Savitzky-Golay filter to compute the derivative (e.g., velocity or acceleration)
- Flag points where |derivative - mean| > threshold * std deviation
The Savitzky-Golay filter works by fitting successive subsets of adjacent data points with a low-degree polynomial using linear least squares.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_length
|
int
|
Length of the filter window (must be odd). |
11
|
polyorder
|
int
|
Order of polynomial for smoothing. |
2
|
threshold
|
float
|
Threshold in standard deviations for detecting anomalies. |
3.0
|
deriv
|
int
|
Order of derivative to compute (default is 1 for first derivative). |
1
|
load_col
|
str
|
Column name for load data. |
'Load (µN)'
|
trim_edges_enabled
|
bool
|
If True, trims the first |
True
|
trim_margin
|
int or None
|
Number of elements to trim from the start. |
None
|
max_load_trim_enabled
|
bool
|
If True, masks out any anomalies after the maximum load point. Default is True. |
True
|
Returns:
Type | Description |
---|---|
|
Source code in src/merrypopins/locate.py
find_max_load_index(df, load_col='Load (µN)')
Find the index of the maximum load point in the indentation curve.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data. |
required |
load_col
|
str
|
Column name for the load data. |
'Load (µN)'
|
Returns:
Name | Type | Description |
---|---|---|
int |
Index of the maximum load value. |
Source code in src/merrypopins/locate.py
trim_edges(series, margin)
Trim the first margin
elements of a pandas Series.
This is useful for removing edge effects in time-series data where
the first few points may not be reliable.
Args:
series (pd.Series): Input time-series data.
margin (int): Number of elements to trim from the start.
Returns:
pd.Series: A copy of the input series with the first margin
elements set to False.