chowclassifier package

Submodules

chowclassifier.chow module

Implementation of a Chow Classifier

class chowclassifier.chow.Chow(df, name: str = '', initial_breakpoint: float = None, timecol: str = 'year', ycol: str = 'value', groupcol: str = 'g', alpha=0.01, margin: int = 2)

Bases: object

Class handling the chow analysis Call run() to run the analysis (find breakpoint and classify) Call afterwards params() to get the regressions (y=k * x + b) parameters as [[model1_k, model1_b, model1_Rsquared],[model2_k, model2_b, model2_Rsquared]] (if the breakpoint is not significant, model1 = model2) The significance level of the breakpoint is adjusted with Bonferroni correction, i.e. dividing alpha by the number of tests. The confidence interval of the slope and intercept are at significance level alpha (not corrected).

Parameters:
  • df – data on which to perform the Chow categorisation

  • name – name of the dataset (optional)

  • initial_breakpoint – breakpoint (if None, midpoint of X series will be used)

  • timecol – name of time column (X)

  • ycol – name of the variable column (y)

  • groupcol – name of the group column (g) if not in df will be ignored

  • alpha – significance level (without Bonferroni correction), can be changed with Chow.set_alpha(0.01)

  • margin – range of indexes around breakpoint (or midpoint) where the best breakpoint will be searched. E.g. if X = [0,1,1.5,2,3,4,5], breakpoint = 2 and margin = 1, the set of breakpoints [1.5,2,3] will be tested. (set = 0 if you want to look only at the given breakpoint or the midpoint)

classify(**_)

Classify the dataset based on results of Chow test at significance level alpha.

Possible classification values:

No significant breakpoint:

  1. N: non-significant overall trend

  2. I: significant increasing overall trend

  3. D: significant decreasing overall trend

Significant breakpoint (set1 and set2 indicate points before/after breakpoint):

  1. NN: non-significant trend on set1 and non-significant trend on set2

  2. NI: non-significant trend on set1 and significant increasing

    trend on set2

  3. ND: non-significant trend on set1 and significant decreasing

    trend on set2

  4. IN: significant increasing trend on set1 and non-significant

    trend on set2

  5. ID: significant increasing trend on set1 and significant

    decreasing trend on set2

  6. iI: significant increasing trend on both set1 and set2

    with greater increase in set2

  7. Ii: significant increasing trend on both set1 and set2

    with greater increase in set1

  8. DN: significant decreasing trend on set1 and non-significant

    trend on set2

  9. DI: significant decreasing trend on set1 and significant

    increasing trend on set2

  10. dD: significant decreasing trend on both set1 and set2 with

    greater decrease in set2

  11. Dd: significant decreasing trend on both set1 and set2 with

    greater decrease in set1

Parameters:

kwargs – Additional keyword arguments (currently unused)

Returns:

Dictionary with model parameters

find_best_bkp(breakpoint_indices=None, **_)

Find the best breakpoint and run Chow test to find OLS parameters.

Tests multiple breakpoints and selects the one with the best score. Updates self.initial_breakpoint with the optimal value.

Parameters:
  • breakpoint_indices – Optional list of indices to test. If None, uses get_breakpoint_indices()

  • kwargs – Additional keyword arguments (currently unused)

get_breakpoint_indices()

Calculate and set the breakpoint indices to search.

Returns:

List of breakpoint indices to test

params()

Return parameters of the fitted model(s).

Returns:

Dictionary containing model0, model1, and model2 parameters. If breakpoint is significant, model0 is empty and model1/model2 are populated. Otherwise, model0 is populated and model1/model2 are empty.

params_names()

Get list of parameter names for model output.

Returns:

List of parameter name strings

plot(filename: str = None, ax=None, figsize=(16, 8), ylog: bool = False, fill: bool = True, scatter: bool = True, scatterparams: dict = None, fillparams: dict = None, fitparams: dict = None, linestyles: list = None, show_legend: bool = True, **params)

Plot the regression model(s).

Parameters:
  • filename – Filename (with extension format), if None, will call plt.show() instead

  • ax – Matplotlib axis on which to plot, if None will create new figure

  • figsize – Tuple with figure size

  • ylog – Make the y-axis log-scale

  • fill – Plot the confidence interval

  • scatter – Plot individual points

  • scatterparams – Parameters for the scatter plot

  • fillparams – Parameters for the confidence intervals

  • fitparams – Parameters for the fitted line

  • linestyles – Select the linestyle of each trend (list of size 3), set to None if you want to define it in fitparams

  • show_legend – Legend options passed to ax.legend()

  • params – Additional parameters including color, xlabel, ylabel, title, xlim, ylim

plot_by_group(filename: str = None, ax=None, figsize=(16, 8), cmap: str = 'Set1', show_legend: bool = True, plot_overall: bool = True, plot_individual_fill: bool = True, scatterparams: dict = None, fillparams: dict = None, fitparams: dict = None, groups_order: list = None, **params)

Plot the regression model(s) grouped by the group column.

Parameters:
  • filename – Filename (with extension format), if None, will call plt.show() instead

  • ax – Matplotlib axis on which to plot, if None will create new figure

  • figsize – Tuple with figure size

  • cmap – Colormap name (from matplotlib) or custom color mapping

  • show_legend – Include the legend

  • plot_overall – Plot overall trend and confidence interval

  • plot_individual_fill – Plot confidence interval for each individual group

  • scatterparams – Parameters for the scatter plot

  • fillparams – Parameters for the confidence intervals

  • fitparams – Parameters for the fitted line

  • groups_order – List showing the order in which the groups must be plotted

  • params – Additional parameters including xlabel, ylabel, title, ylog, scatter

run(**kwargs)

Run the complete Chow classification analysis.

Finds best breakpoint, runs Chow test, and classifies the trend.

Parameters:

kwargs – Keyword arguments forwarded to find_best_bkp and classify. Can include ‘normalise’ for run_chow, ‘alpha’ for classify.

Returns:

Classification code (string) or None if analysis fails

run_chow(bkp, normalise=False)

Run Chow test at specified breakpoint.

Fits OLS models to the full dataset and two subsets (before and after breakpoint), then calculates the F-statistic to test for structural break.

Parameters:
  • bkp – Breakpoint value at which to split the data

  • normalise – Whether to normalize the y values before fitting

Returns:

Dictionary containing score, F-statistic, and residual sum of squares

set_alpha(alpha: float)

Change the current statistical significance level.

The Bonferroni correction will be applied based on the number of tests.

Parameters:

alpha – Significance level, positive float in (0,1)

set_color()

Set the color attribute based on the trend type (tt).

Uses predefined color scheme from TREND_COLORS.

summary()

Print the statistical summary of the fitted model(s).

If breakpoint is significant, prints summaries for both model1 and model2. Otherwise, prints summary for model0 (whole dataset).

Module contents

Chow classifier package for time-series trend classification.