preprocess
preprocess.py
Provides pre-processing functions for indentation datasets.
Functions:
Name | Description |
---|---|
- remove_pre_min_load |
Remove all points up to the minimum Load point. |
- rescale_data |
Automatically detect contact point and rescale Depth. |
- finalise_contact_index |
Optionally trim and/or flag the contact point. |
- default_preprocess |
Recommended preprocessing pipeline. |
Usage
from merrypopins.preprocess import ( remove_pre_min_load, rescale_data, finalise_contact_index, default_preprocess )
default_preprocess(df)
Default preprocessing pipeline using recommended settings.
Steps
- Remove early data up to the minimum Load point
- Automatically detect contact and rescale Depth
- Remove Depth < 0 rows and flag the contact point
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Raw indentation data. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Preprocessed DataFrame. |
Source code in src/merrypopins/preprocess.py
finalise_contact_index(df, depth_col='Depth (nm)', remove_pre_contact=True, add_flag_column=True, flag_column='contact_point')
Optionally remove all rows before contact (Depth < 0) and/or flag the first contact point.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Rescaled DataFrame. |
required |
depth_col
|
str
|
Depth column name. |
'Depth (nm)'
|
remove_pre_contact
|
bool
|
If True, remove rows with Depth < 0. Default is True. |
True
|
add_flag_column
|
bool
|
If True, add a column marking the contact index. Default is True. |
True
|
flag_column
|
str
|
Name of the column used to flag the contact point. Default column name is "contact_point". |
'contact_point'
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame after trimming/flagging contact point. |
Source code in src/merrypopins/preprocess.py
remove_pre_min_load(df, load_col='Load (µN)')
Remove all points up to and including the minimum Load point.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame. |
required |
load_col
|
str
|
Load column name. |
'Load (µN)'
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Cleaned DataFrame. |
Source code in src/merrypopins/preprocess.py
rescale_data(df, depth_col='Depth (nm)', load_col='Load (µN)', N_baseline=50, k=5, window_length=11, polyorder=2)
Automatically detect contact point by noise threshold and rescale Depth so contact = 0.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame. |
required |
depth_col
|
str
|
Depth column name. |
'Depth (nm)'
|
load_col
|
str
|
Load column name. |
'Load (µN)'
|
N_baseline
|
int
|
Number of points for baseline noise estimation. |
50
|
k
|
float
|
Noise multiplier for threshold. |
5
|
window_length
|
int
|
Smoothing window (must be odd). |
11
|
polyorder
|
int
|
Polynomial order for smoothing. |
2
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Rescaled DataFrame. |