statistics
statistics.py
Extracts statistics from nanoindentation data: - Postprocess located popins - Extract pop-in intervals - Stress–strain transformation Kalidindi & Pathak (2008). - Calculate pop-in statistics (load-depth and stress-strain) - Calculate curve-level summary statistics (load-depth)
calculate_curve_summary(df, start_col='start_idx', end_col='end_idx', time_col='Time (s)')
Compute curve-level summary statistics about pop-in activity.
This function calculates the number of pop-ins, total pop-in duration, first and last pop-in times, and the average time between consecutive pop-ins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame that includes pop-in intervals. |
required |
start_col,
|
end_col (str
|
Column names for start and end indices of pop-ins. |
required |
time_col
|
str
|
Column name for time. |
'Time (s)'
|
Returns:
Type | Description |
---|---|
pd.Series: Summary metrics: count, total duration, first/last timing, average interval. |
Source code in src/merrypopins/statistics.py
calculate_popin_statistics(df, precursor_stats=True, temporal_stats=True, popin_shape_stats=True, time_col='Time (s)', load_col='Load (µN)', depth_col='Depth (nm)', start_col='start_idx', end_col='end_idx', before_window=0.5, after_window=0.5)
Compute descriptive statistics for each detected pop-in.
This function calculates time-based, precursor-based, and shape-based features for each interval where a pop-in occurred (based on start and end index).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame with indentation data and interval metadata. |
required |
precursor_stats
|
bool
|
Whether to calculate average dLoad and slope before the pop-in. |
True
|
temporal_stats
|
bool
|
Whether to calculate duration and inter-event timing features. |
True
|
popin_shape_stats
|
bool
|
Whether to compute shape-based features like velocity and curvature. |
True
|
time_col,
|
load_col, depth_col (str
|
Column names for time, load, and depth. |
required |
start_col,
|
end_col (str
|
Column names for the start and end index of pop-in intervals. |
required |
before_window,
|
after_window (float
|
Time window in seconds to use for context before/after the pop-in. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame: Original DataFrame with per-pop-in statistics added (NaNs elsewhere). |
Source code in src/merrypopins/statistics.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
|
calculate_stress_strain(df, depth_col='Depth (nm)', load_col='Load (µN)', Reff_um=5.323, min_load_uN=2000, smooth_stress=True, smooth_window=11, smooth_polyorder=2, copy_popin_cols=True)
Convert load–depth data to stress–strain using Kalidindi & Pathak (2008) formulas.
This function converts indentation data from load and depth measurements to stress and strain values using the Kalidindi & Pathak (2008). approach. It optionally copies pop-in markers from the input DataFrame and filters data based on load. Additionally, stress can be smoothed using the Savitzky-Golay filter. With the current setup, stress-strain data is accurate up to the yield point, after which it becomes increasingly inaccurate. To be expanded upon in a future version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing the indentation data. |
required |
depth_col
|
str
|
Column name for the depth data. |
'Depth (nm)'
|
load_col
|
str
|
Column name for the load data. |
'Load (µN)'
|
Reff_um
|
float
|
Effective tip radius in microns. |
5.323
|
min_load_uN
|
float
|
Minimum load threshold to filter out low-load points (in µN). |
2000
|
smooth_stress
|
bool
|
Whether to apply smoothing to the stress signal. |
True
|
smooth_window
|
int
|
Window size for the Savitzky-Golay filter. |
11
|
smooth_polyorder
|
int
|
Polynomial order for the Savitzky-Golay filter. |
2
|
copy_popin_cols
|
bool
|
Whether to copy pop-in markers from the input DataFrame. |
True
|
Returns:
Type | Description |
---|---|
pd.DataFrame: DataFrame with additional columns for stress, strain, and optionally pop-in markers. |
Source code in src/merrypopins/statistics.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 |
|
calculate_stress_strain_statistics(df, start_col='start_idx', end_col='end_idx', time_col='Time (s)', stress_col='stress', strain_col='strain', before_window=0.5, precursor_stats=True, temporal_stats=True, shape_stats=True)
Compute statistics for each pop-in in stress–strain space.
This function computes various statistics related to stress and strain for each detected pop-in event. It calculates features such as the jump in stress and strain, slope of the stress-strain curve, and temporal statistics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Data with stress/strain and pop-in intervals. |
required |
start_col,
|
end_col (str
|
Columns marking start and end indices of pop-ins. |
required |
time_col
|
str
|
Time column. |
'Time (s)'
|
stress_col,
|
strain_col (str
|
Stress and strain columns. |
required |
before_window
|
float
|
Time window to use for precursor features. |
0.5
|
precursor_stats
|
bool
|
Whether to compute precursor statistics (e.g., slope). |
True
|
temporal_stats
|
bool
|
Whether to compute temporal statistics (e.g., pop-in duration). |
True
|
shape_stats
|
bool
|
Whether to compute shape-based statistics (e.g., velocity, curvature). |
True
|
Returns:
Type | Description |
---|---|
pd.DataFrame: DataFrame with per-pop-in stress/strain statistics added. |
Source code in src/merrypopins/statistics.py
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 |
|
default_statistics(df_locate, popin_flag_column='popin', before_window=0.5, after_window=0.5)
Pipeline to compute pop-in statistics from raw located popins.
This function extracts relevant columns, selects valid pop-in candidates based on local maxima, extracts intervals for each pop-in event, and calculates descriptive statistics for each interval.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_locate
|
DataFrame
|
Input data containing pop-in candidate flags and indentation curve. |
required |
popin_flag_column
|
str
|
Column name indicating Boolean pop-in candidate (True/False). |
'popin'
|
before_window
|
float
|
Time window (in seconds) to use for features before the pop-in event. |
0.5
|
after_window
|
float
|
Time window (in seconds) to use for features after the pop-in event. |
0.5
|
Returns:
Type | Description |
---|---|
pd.DataFrame: DataFrame with annotated pop-in intervals and computed statistics (e.g., time, shape, precursor). |
Source code in src/merrypopins/statistics.py
default_statistics_stress_strain(df_locate, popin_flag_column='popin', before_window=0.5, after_window=0.5, Reff_um=5.323, min_load_uN=2000, smooth_stress=True, stress_col='stress', strain_col='strain', time_col='Time (s)')
Full pipeline: from raw data to stress–strain statistics.
This includes: - Load–depth pop-in detection - Interval extraction - Stress–strain transformation - Stress–strain statistics
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_locate
|
DataFrame
|
Raw indentation data with pop-in flag column. |
required |
popin_flag_column
|
str
|
Column with Boolean flags for pop-in candidates. |
'popin'
|
before_window
|
float
|
Time window (in seconds) for computing precursor features. |
0.5
|
after_window
|
float
|
Time window (in seconds) for computing shape-based features. |
0.5
|
Reff_um
|
float
|
Effective tip radius in microns. |
5.323
|
min_load_uN
|
float
|
Minimum load threshold for stress–strain conversion. |
2000
|
smooth_stress
|
bool
|
Whether to smooth the stress signal. |
True
|
stress_col
|
str
|
Column name for stress data. |
'stress'
|
strain_col
|
str
|
Column name for strain data. |
'strain'
|
time_col
|
str
|
Column name for time data. |
'Time (s)'
|
Returns:
Type | Description |
---|---|
pd.DataFrame: DataFrame with stress-strain statistics and pop-in intervals. |
Source code in src/merrypopins/statistics.py
extract_popin_intervals(df, popin_col='popin_selected', load_col='Load (µN)')
Extract start and end indices for each pop-in event.
For each detected pop-in, this function identifies the start and end points based on the load curve. The start of a pop-in is where the load first increases, and the end is when the load returns to baseline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame with pop-in flags. |
required |
popin_col
|
str
|
The column with Boolean values indicating pop-in events. |
'popin_selected'
|
load_col
|
str
|
The load column used to identify the recovery point. |
'Load (µN)'
|
Returns:
Type | Description |
---|---|
pd.DataFrame: DataFrame with added start and end index columns for each pop-in interval. |
Source code in src/merrypopins/statistics.py
postprocess_popins_local_max(df, popin_flag_column='popin', window=1)
Select pop-ins that have a local load maxima.
This function filters out pop-in events that do not represent local maxima in the load curve. A local maximum is defined as a point where the load is higher than the adjacent points within a sliding window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input indentation data with a pop-in flag column. |
required |
popin_flag_column
|
str
|
The column that marks pop-in candidates (True/False). |
'popin'
|
window
|
int
|
The local window size to assess if the current load is a maximum. |
1
|
Returns:
Type | Description |
---|---|
pd.DataFrame: The original DataFrame with a new column indicating selected pop-ins. |