sciduck.basic_qc module
- sciduck.basic_qc.add_exclude_constraint(adata: AnnData, column: str, exclude_values: list, subset: str | None = None, subset_values: list | None = None)
Adds an exclude constraint to a specific column in the AnnData object.
This function filters out cells where the value in the specified column is in ‘exclude_values’. Subsetting based on another column is supported.
- Parameters:
adata – The AnnData object where constraints will be added.
column – The column in ‘adata.obs’ for which the exclude constraint will be applied.
exclude_values – The list of values that should be excluded from ‘column’.
subset – The column in ‘adata.obs’ to use for subsetting the data.
subset_values – A list of values for subsetting, applied to the ‘subset’ column.
- sciduck.basic_qc.add_group_level_constraint(adata: AnnData, column: str, groupby: str, gt: float | None = None, lt: float | None = None, agg_func: str = 'mean')
Adds a group-level constraint to a specific column in the AnnData object.
Cells are grouped by ‘groupby’, and an aggregation function (e.g., ‘mean’) is applied to each group for the specified column. Cells in groups that meet the ‘gt’ and ‘lt’ conditions are kept.
- Parameters:
adata – The AnnData object where constraints will be added.
column – The column in ‘adata.obs’ for which the group-level constraint will be applied.
groupby – The column in ‘adata.obs’ used to group the cells (e.g., cluster ID).
gt – The lower bound for the aggregated group value. If None, the lower bound is open.
lt – The upper bound for the aggregated group value. If None, the upper bound is open.
agg_func – The aggregation function to apply to the groups. Can be ‘mean’, ‘sum’, ‘std’, or ‘median’. Default is ‘mean’.
- sciduck.basic_qc.add_range_constraint(adata: AnnData, column: str, gt: float | None = None, lt: float | None = None, subset: str | None = None, subset_values: list | None = None)
Adds a range constraint to a specific column in the AnnData object.
This function filters cells where values in the specified column fall within the range [gt, lt]. Subsetting based on another column is supported.
- Parameters:
adata – The AnnData object where constraints will be added.
column – The column in ‘adata.obs’ for which the range constraint will be applied.
gt – The lower bound of the range (inclusive). If None, the lower bound is open.
lt – The upper bound of the range (inclusive). If None, the upper bound is open.
subset – The column in ‘adata.obs’ to use for subsetting the data.
subset_values – A list of values for subsetting, applied to the ‘subset’ column.
- sciduck.basic_qc.apply_constraints(adata: AnnData, inplace: bool = False) AnnData
Applies all quality control (QC) constraints stored in ‘adata.uns[“qc_constraints”]’ to ‘adata.obs’ and filtered indices in ‘adata.uns[“qc_filtered”]’.
This function handles both individual cell-level constraints (range, exclude) and group-level constraints (using a ‘groupby’ column). The resulting ‘keeper_cells’ column in ‘adata.obs’ indicates which cells passed the constraints.
- Parameters:
adata – The AnnData object where constraints are applied.
inplace – If True, only the cells passing the constraints will be retained in ‘adata’. If False (default), the ‘keeper_cells’ column will be added to ‘adata.obs’ to indicate which cells passed the filtering.
- Returns:
- The filtered AnnData object with the ‘keeper_cells’ column added to ‘adata.obs’. If ‘inplace=True’,
the returned object will only contain the filtered cells.
- Return type:
AnnData