freqtrade_origin/docs/freqai-configuration.md
2022-09-11 17:50:50 +02:00

25 KiB

Configuration

Setting up the configuration file

The user interface is isolated to the typical Freqtrade config file. Although there are plenty of additional parameters that a user can choose from, as highlighted in the parameter table, a FreqAI config must at minimum include the following parameters (the example inputs are not requires):

    "freqai": {
        "enabled": true,
        "purge_old_models": true,
        "train_period_days": 30,
        "backtest_period_days": 7,
        "identifier" : "unique-id",
        "feature_parameters" : {
            "include_timeframes": ["5m","15m","4h"],
            "include_corr_pairlist": [
                "ETH/USD",
                "LINK/USD",
                "BNB/USD"
            ],
            "label_period_candles": 24,
            "include_shifted_candles": 2,
            "indicator_periods_candles": [10, 20]
        },
        "data_split_parameters" : {
            "test_size": 0.25
        },
        "model_training_parameters" : {
            "n_estimators": 100
        },
    }

Building a FreqAI strategy

The FreqAI strategy requires the user to include the following lines of code in the standard Freqtrade strategy:

    # user should define the maximum startup candle count (the largest number of candles
    # passed to any single indicator)
    startup_candle_count: int = 20

    def informative_pairs(self):
        whitelist_pairs = self.dp.current_whitelist()
        corr_pairs = self.config["freqai"]["feature_parameters"]["include_corr_pairlist"]
        informative_pairs = []
        for tf in self.config["freqai"]["feature_parameters"]["include_timeframes"]:
            for pair in whitelist_pairs:
                informative_pairs.append((pair, tf))
            for pair in corr_pairs:
                if pair in whitelist_pairs:
                    continue  # avoid duplication
                informative_pairs.append((pair, tf))
        return informative_pairs

    def populate_indicators(self, dataframe: DataFrame, metadata: dict) -> DataFrame:

        # the model will return all labels created by user in `populate_any_indicators`
        # (& appended targets), an indication of whether or not the prediction should be accepted,
        # the target mean/std values for each of the labels created by user in
        # `populate_any_indicators()` for each training period.

        dataframe = self.freqai.start(dataframe, metadata, self)

        return dataframe

    def populate_any_indicators(
        self, pair, df, tf, informative=None, set_generalized_indicators=False
    ):
        """
        Function designed to automatically generate, name and merge features
        from user indicated timeframes in the configuration file. User controls the indicators
        passed to the training/prediction by prepending indicators with `'%-' + coin `
        (see convention below). I.e. user should not prepend any supporting metrics
        (e.g. bb_lowerband below) with % unless they explicitly want to pass that metric to the
        model.
        :param pair: pair to be used as informative
        :param df: strategy dataframe which will receive merges from informatives
        :param tf: timeframe of the dataframe which will modify the feature names
        :param informative: the dataframe associated with the informative pair
        :param coin: the name of the coin which will modify the feature names.
        """

        coin = pair.split('/')[0]

        if informative is None:
            informative = self.dp.get_pair_dataframe(pair, tf)

        # first loop is automatically duplicating indicators for time periods
        for t in self.freqai_info["feature_parameters"]["indicator_periods_candles"]:
            t = int(t)
            informative[f"%-{coin}rsi-period_{t}"] = ta.RSI(informative, timeperiod=t)
            informative[f"%-{coin}mfi-period_{t}"] = ta.MFI(informative, timeperiod=t)
            informative[f"%-{coin}adx-period_{t}"] = ta.ADX(informative, window=t)

        indicators = [col for col in informative if col.startswith("%")]
        # This loop duplicates and shifts all indicators to add a sense of recency to data
        for n in range(self.freqai_info["feature_parameters"]["include_shifted_candles"] + 1):
            if n == 0:
                continue
            informative_shift = informative[indicators].shift(n)
            informative_shift = informative_shift.add_suffix("_shift-" + str(n))
            informative = pd.concat((informative, informative_shift), axis=1)

        df = merge_informative_pair(df, informative, self.config["timeframe"], tf, ffill=True)
        skip_columns = [
            (s + "_" + tf) for s in ["date", "open", "high", "low", "close", "volume"]
        ]
        df = df.drop(columns=skip_columns)

        # Add generalized indicators here (because in live, it will call this
        # function to populate indicators during training). Notice how we ensure not to
        # add them multiple times
        if set_generalized_indicators:

            # user adds targets here by prepending them with &- (see convention below)
            # If user wishes to use multiple targets, a multioutput prediction model
            # needs to be used such as templates/CatboostPredictionMultiModel.py
            df["&-s_close"] = (
                df["close"]
                .shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
                .rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
                .mean()
                / df["close"]
                - 1
            )

        return df


Notice how the populate_any_indicators() is where the user adds their own features (more information) and labels (more information). See a full example at templates/FreqaiExampleStrategy.py.

Another structure to consider is the location of the labels at the bottom of the example function (below if set_generalized_indicators:). This is where the user will add single features and labels to their feature set to avoid duplication of them from various configuration parameters that multiply the feature set, such as include_timeframes.

!!! Note Features must be defined in populate_any_indicators(). Defining FreqAI features in populate_indicators() will cause the algorithm to fail in live/dry mode. If the user wishes to add generalized features that are not associated with a specific pair or timeframe, they should use the following structure inside populate_any_indicators() (as exemplified in freqtrade/templates/FreqaiExampleStrategy.py):

```python
    def populate_any_indicators(self, metadata, pair, df, tf, informative=None, coin="", set_generalized_indicators=False):

        ...

        # Add generalized indicators here (because in live, it will call only this function to populate
        # indicators for retraining). Notice how we ensure not to add them multiple times by associating
        # these generalized indicators to the basepair/timeframe
        if set_generalized_indicators:
            df['%-day_of_week'] = (df["date"].dt.dayofweek + 1) / 7
            df['%-hour_of_day'] = (df['date'].dt.hour + 1) / 25

            # user adds targets here by prepending them with &- (see convention below)
            # If user wishes to use multiple targets, a multioutput prediction model
            # needs to be used such as templates/CatboostPredictionMultiModel.py
            df["&-s_close"] = (
                df["close"]
                .shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
                .rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
                .mean()
                / df["close"]
                - 1
                )
```

(Please see the example script located in `freqtrade/templates/FreqaiExampleStrategy.py` for a full example of `populate_any_indicators()`.)

Important: The self.freqai.start() function cannot be called outside the populate_indicators().

Setting the startup_candle_count

Users need to take care to set the startup_candle_count in their strategy the same way they would for any normal Freqtrade strategy (see details here). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling on the dataprovider to avoid any NaNs at the beginning of the first training. Users can easily set this value by identifying the longest period (in candle units) that they pass to their indicator creation functions (e.g. talib functions). In the present example, the user would pass 20 to as this value (since it is the maximum value in their indicators_periods_candles).

!!! Note Typically it is best for users to be safe and multiply their expected startup_candle_count by 2. There are instances where the talib functions actually require more data than just the passed period. Anecdotally, multiplying the startup_candle_count by 2 always leads to a fully NaN free training dataset. Look out for this log message to confirm that your data is clean:

```
2022-08-31 15:14:04 - freqtrade.freqai.data_kitchen - INFO - dropped 0 training points due to NaNs in populated dataset 4319.
```

Creating a dynamic target threshold

The &*_std/mean return values describe the statistical fit of the user defined label during the most recent training. This value allows the user to know the rarity of a given prediction. For example, templates/FreqaiExampleStrategy.py, creates a target_roi which is based on filtering out predictions that are below a given z-score of 1.25.

dataframe["target_roi"] = dataframe["&-s_close_mean"] + dataframe["&-s_close_std"] * 1.25
dataframe["sell_roi"] = dataframe["&-s_close_mean"] - dataframe["&-s_close_std"] * 1.25

If the user wishes to consider the population of historical predictions for creating the dynamic target instead of the trained labels, (as discussed above) the user can do so by setting fit_live_prediction_candles in the config to the number of historical prediction candles the user wishes to use to generate target statistics.

    "freqai": {
        "fit_live_prediction_candles": 300,
    }

If the user sets this value, FreqAI will initially use the predictions from the training data and subsequently begin introducing real prediction data as it is generated. FreqAI will save this historical data to be reloaded if the user stops and restarts a model with the same identifier.

Parameter table

The table below will list all configuration parameters available for FreqAI, presented in the same order as config_examples/config_freqai.example.json.

Mandatory parameters are marked as Required, which means that they are required to be set in one of the possible ways.

Parameter Description
General configuration parameters
freqai Required.
The parent dictionary containing all the parameters for controlling FreqAI.
Datatype: Dictionary.
purge_old_models Delete obsolete models (otherwise, all historic models will remain on disk).
Datatype: Boolean. Default: False.
train_period_days Required.
Number of days to use for the training data (width of the sliding window).
Datatype: Positive integer.
backtest_period_days Required.
Number of days to inference from the trained model before sliding the window defined above, and retraining the model. This can be fractional days, but beware that the user-provided timerange will be divided by this number to yield the number of trainings necessary to complete the backtest.
Datatype: Float.
save_backtest_models Backtesting operates most efficiently by saving the prediction data and reusing them directly for subsequent runs (when users wish to tune entry/exit parameters). If a user wishes to save models to disk when running backtesting, they should activate save_backtest_models. A user may wish to do this if they plan to use the same model files for starting a dry/live instance with the same identifier.
Datatype: Boolean. Default: False.
identifier Required.
A unique name for the current model. This can be reused to reload pre-trained models/data.
Datatype: String.
live_retrain_hours Frequency of retraining during dry/live runs.
Default set to 0, which means the model will retrain as often as possible.
Datatype: Float > 0.
expiration_hours Avoid making predictions if a model is more than expiration_hours old.
Defaults set to 0, which means models never expire.
Datatype: Positive integer.
fit_live_predictions_candles Number of historical candles to use for computing target (label) statistics from prediction data, instead of from the training data set.
Datatype: Positive integer.
follow_mode If true, this instance of FreqAI will look for models associated with identifier and load those for inferencing. A follower will not train new models.
Datatype: Boolean. Default: False.
continual_learning If true, FreqAI will start training new models from the final state of the most recently trained model.
Datatype: Boolean. Default: False.
Feature parameters
feature_parameters A dictionary containing the parameters used to engineer the feature set. Details and examples are shown here.
Datatype: Dictionary.
include_timeframes A list of timeframes that all indicators in populate_any_indicators will be created for. The list is added as features to the base asset feature set.
Datatype: List of timeframes (strings).
include_corr_pairlist A list of correlated coins that FreqAI will add as additional features to all pair_whitelist coins. All indicators set in populate_any_indicators during feature engineering (see details here) will be created for each coin in this list, and that set of features is added to the base asset feature set.
Datatype: List of assets (strings).
label_period_candles Number of candles into the future that the labels are created for. This is used in populate_any_indicators (see templates/FreqaiExampleStrategy.py for detailed usage). The user can create custom labels, making use of this parameter or not.
Datatype: Positive integer.
include_shifted_candles Add features from previous candles to subsequent candles to add historical information. FreqAI takes all features from the include_shifted_candles previous candles, duplicates and shifts them so that the information is available for the subsequent candle.
Datatype: Positive integer.
weight_factor Used to set weights for training data points according to their recency. See details about how it works here.
Datatype: Positive float (typically < 1).
indicator_max_period_candles No longer used. User must use the strategy set startup_candle_count which defines the maximum period used in populate_any_indicators() for indicator creation (timeframe independent). FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN
Datatype: positive integer.
indicator_periods_candles Calculate indicators for indicator_periods_candles time periods and add them to the feature set.
Datatype: List of positive integers.
stratify_training_data This value is used to indicate the grouping of the data. For example, 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. See details about how it works here
Datatype: Positive integer.
principal_component_analysis Automatically reduce the dimensionality of the data set using Principal Component Analysis. See details about how it works here
Datatype: Boolean.
DI_threshold Activates the Dissimilarity Index for outlier detection when > 0. See details about how it works here.
Datatype: Positive float (typically < 1).
use_SVM_to_remove_outliers Train a support vector machine to detect and remove outliers from the training data set, as well as from incoming data points. See details about how it works here.
Datatype: Boolean.
svm_params All parameters available in Sklearn's SGDOneClassSVM(). See details about some select parameters here.
Datatype: Dictionary.
use_DBSCAN_to_remove_outliers Cluster data using DBSCAN to identify and remove outliers from training and prediction data. See details about how it works here.
Datatype: Boolean.
inlier_metric_window If set, FreqAI will add the inlier_metric to the training feature set and set the lookback to be the inlier_metric_window. Details of how the inlier_metric is computed can be found here
Datatype: int. Default: 0
noise_standard_deviation If > 0, FreqAI adds noise to the training features. FreqAI generates random deviates from a gaussian distribution with a standard deviation of noise_standard_deviation and adds them to all data points. Value should be kept relative to the normalized space between -1 and 1). In other words, since data is always normalized between -1 and 1 in FreqAI, the user can expect a noise_standard_deviation: 0.05 to see 32% of data randomly increased/decreased by more than 2.5% (i.e. the percent of data falling within the first standard deviation). Good for preventing overfitting.
Datatype: int. Default: 0
outlier_protection_percentage If more than outlier_protection_percentage % of points are detected as outliers by the SVM or DBSCAN, FreqAI will log a warning message and ignore outlier detection while keeping the original dataset intact. If the outlier protection is triggered, no predictions will be made based on the training data.
Datatype: Float. Default: 30
reverse_train_test_order If true, FreqAI will train on the latest data split and test on historical split of the data. This allows the model to be trained up to the most recent data point, while avoiding overfitting. However, users should be careful to understand unorthodox nature of this parameter before employing it.
Datatype: Boolean. Default: False
Data split parameters
data_split_parameters Include any additional parameters available from Scikit-learn test_train_split(), which are shown here (external website).
Datatype: Dictionary.
test_size Fraction of data that should be used for testing instead of training.
Datatype: Positive float < 1.
shuffle Shuffle the training data points during training. Typically, for time-series forecasting, this is set to False.
Datatype: Boolean.
Model training parameters
model_training_parameters A flexible dictionary that includes all parameters available by the user selected model library. For example, if the user uses LightGBMRegressor, this dictionary can contain any parameter available by the LightGBMRegressor here (external website). If the user selects a different model, this dictionary can contain any parameter from that model.
Datatype: Dictionary.
n_estimators The number of boosted trees to fit in regression.
Datatype: Integer.
learning_rate Boosting learning rate during regression.
Datatype: Float.
n_jobs, thread_count, task_type Set the number of threads for parallel processing and the task_type (gpu or cpu). Different model libraries use different parameter names.
Datatype: Float.
Extraneous parameters
keras If your model makes use of Keras (typical for Tensorflow-based prediction models), activate this flag so that the model save/loading follows Keras standards.
Datatype: Boolean. Default: False.
conv_width The width of a convolutional neural network input tensor. This replaces the need for shifting candles (include_shifted_candles) by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction.
Datatype: Integer. Default: 2.

Important dataframe key patterns

Below are the values the user can expect to include/use inside a typical strategy dataframe (df[]):

DataFrame Key Description
df['&*'] Any dataframe column prepended with & in populate_any_indicators() is treated as a training target (label) inside FreqAI (typically following the naming convention &-s*). The names of these dataframe columns are fed back to the user as the predictions. For example, if the user wishes to predict the price change in the next 40 candles (similar to templates/FreqaiExampleStrategy.py), they set df['&-s_close']. FreqAI makes the predictions and gives them back under the same key (df['&-s_close']) to be used in populate_entry/exit_trend().
Datatype: Depends on the output of the model.
df['&*_std/mean'] Standard deviation and mean values of the user-defined labels during training (or live tracking with fit_live_predictions_candles). Commonly used to understand the rarity of a prediction (use the z-score as shown in templates/FreqaiExampleStrategy.py to evaluate how often a particular prediction was observed during training or historically with fit_live_predictions_candles).
Datatype: Float.
df['do_predict'] Indication of an outlier data point. The return value is integer between -1 and 2, which lets the user know if the prediction is trustworthy or not. do_predict==1 means the prediction is trustworthy. If the Dissimilarity Index (DI, see details here) of the input data point is above the user-defined threshold, FreqAI will subtract 1 from do_predict, resulting in do_predict==0. If use_SVM_to_remove_outliers() is active, the Support Vector Machine (SVM) may also detect outliers in training and prediction data. In this case, the SVM will also subtract 1 from do_predict. If the input data point was considered an outlier by the SVM but not by the DI, the result will be do_predict==0. If both the DI and the SVM considers the input data point to be an outlier, the result will be do_predict==-1. A particular case is when do_predict == 2, which means that the model has expired due to exceeding expired_hours.
Datatype: Integer between -1 and 2.
df['DI_values'] Dissimilarity Index values are proxies to the level of confidence FreqAI has in the prediction. A lower DI means the prediction is close to the training data, i.e., higher prediction confidence.
Datatype: Float.
df['%*'] Any dataframe column prepended with % in populate_any_indicators() is treated as a training feature. For example, the user can include the RSI in the training feature set (similar to in templates/FreqaiExampleStrategy.py) by setting df['%-rsi']. See more details on how this is done here.
Note: Since the number of features prepended with % can multiply very quickly (10s of thousands of features is easily engineered using the multiplictative functionality described in the feature_parameters table shown above), these features are removed from the dataframe upon return from FreqAI. If the user wishes to keep a particular type of feature for plotting purposes, they can prepend it with %%.
Datatype: Depends on the output of the model.

Building a custom prediction model

FreqAI has multiple example prediction model libraries, such as Catboost regression (freqai/prediction_models/CatboostRegressor.py), LightGBM, XGBoost regression. However, the user can customize and create their own prediction models using the IFreqaiModel class. The user is encouraged to inherit fit(), train() and predict() to let them customize various aspects of their training procedures.

Setting classifier targets

FreqAI includes a variety of classifiers, such as the CatboostClassifier via the flag --freqaimodel CatboostClassifier. If the user elects to use a classifier, they must ensure the classes are set using strings. For example:

df['&s-up_or_down'] = np.where( df["close"].shift(-100) > df["close"], 'up', 'down')

Additionally, the example classifier models do not accommodate multiple labels, but they do allow multi-class classification within a single label column.