Merge pull request #8692 from freqtrade/feat/outsource-data-pipeline

Outsource data pipeline handling to improve flexibility
This commit is contained in:
Matthias 2023-06-18 13:39:36 +02:00 committed by GitHub
commit 02071df8fa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
21 changed files with 587 additions and 995 deletions

View File

@ -212,41 +212,7 @@ Another example, where the user wants to use live metrics from the trade databas
You need to set the standard dictionary in the config so that FreqAI can return proper dataframe shapes. These values will likely be overridden by the prediction model, but in the case where the model has yet to set them, or needs a default initial value, the pre-set values are what will be returned. You need to set the standard dictionary in the config so that FreqAI can return proper dataframe shapes. These values will likely be overridden by the prediction model, but in the case where the model has yet to set them, or needs a default initial value, the pre-set values are what will be returned.
## Feature normalization ### Weighting features for temporal importance
FreqAI is strict when it comes to data normalization. The train features, $X^{train}$, are always normalized to [-1, 1] using a shifted min-max normalization:
$$X^{train}_{norm} = 2 * \frac{X^{train} - X^{train}.min()}{X^{train}.max() - X^{train}.min()} - 1$$
All other data (test data and unseen prediction data in dry/live/backtest) is always automatically normalized to the training feature space according to industry standards. FreqAI stores all the metadata required to ensure that test and prediction features will be properly normalized and that predictions are properly denormalized. For this reason, it is not recommended to eschew industry standards and modify FreqAI internals - however - advanced users can do so by inheriting `train()` in their custom `IFreqaiModel` and using their own normalization functions.
## Data dimensionality reduction with Principal Component Analysis
You can reduce the dimensionality of your features by activating the `principal_component_analysis` in the config:
```json
"freqai": {
"feature_parameters" : {
"principal_component_analysis": true
}
}
```
This will perform PCA on the features and reduce their dimensionality so that the explained variance of the data set is >= 0.999. Reducing data dimensionality makes training the model faster and hence allows for more up-to-date models.
## Inlier metric
The `inlier_metric` is a metric aimed at quantifying how similar the features of a data point are to the most recent historical data points.
You define the lookback window by setting `inlier_metric_window` and FreqAI computes the distance between the present time point and each of the previous `inlier_metric_window` lookback points. A Weibull function is fit to each of the lookback distributions and its cumulative distribution function (CDF) is used to produce a quantile for each lookback point. The `inlier_metric` is then computed for each time point as the average of the corresponding lookback quantiles. The figure below explains the concept for an `inlier_metric_window` of 5.
![inlier-metric](assets/freqai_inlier-metric.jpg)
FreqAI adds the `inlier_metric` to the training features and hence gives the model access to a novel type of temporal information.
This function does **not** remove outliers from the data set.
## Weighting features for temporal importance
FreqAI allows you to set a `weight_factor` to weight recent data more strongly than past data via an exponential function: FreqAI allows you to set a `weight_factor` to weight recent data more strongly than past data via an exponential function:
@ -256,13 +222,103 @@ where $W_i$ is the weight of data point $i$ in a total set of $n$ data points. B
![weight-factor](assets/freqai_weight-factor.jpg) ![weight-factor](assets/freqai_weight-factor.jpg)
## Building the data pipeline
By default, FreqAI builds a dynamic pipeline based on user congfiguration settings. The default settings are robust and designed to work with a variety of methods. These two steps are a `MinMaxScaler(-1,1)` and a `VarianceThreshold` which removes any column that has 0 variance. Users can activate other steps with more configuration parameters. For example if users add `use_SVM_to_remove_outliers: true` to the `freqai` config, then FreqAI will automatically add the [`SVMOutlierExtractor`](#identifying-outliers-using-a-support-vector-machine-svm) to the pipeline. Likewise, users can add `principal_component_analysis: true` to the `freqai` config to activate PCA. The [DissimilarityIndex](#identifying-outliers-with-the-dissimilarity-index-di) is activated with `DI_threshold: 1`. Finally, noise can also be added to the data with `noise_standard_deviation: 0.1`. Finally, users can add [DBSCAN](#identifying-outliers-with-dbscan) outlier removal with `use_DBSCAN_to_remove_outliers: true`.
!!! note "More information available"
Please review the [parameter table](freqai-parameter-table.md) for more information on these parameters.
### Customizing the pipeline
Users are encouraged to customize the data pipeline to their needs by building their own data pipeline. This can be done by simply setting `dk.feature_pipeline` to their desired `Pipeline` object inside their `IFreqaiModel` `train()` function, or if they prefer not to touch the `train()` function, they can override `define_data_pipeline`/`define_label_pipeline` functions in their `IFreqaiModel`:
!!! note "More information available"
FreqAI uses the the [`DataSieve`](https://github.com/emergentmethods/datasieve) pipeline, which follows the SKlearn pipeline API, but adds, among other features, coherence between the X, y, and sample_weight vector point removals, feature removal, feature name following.
```python
from datasieve.transforms import SKLearnWrapper, DissimilarityIndex
from datasieve.pipeline import Pipeline
from sklearn.preprocessing import QuantileTransformer, StandardScaler
from freqai.base_models import BaseRegressionModel
class MyFreqaiModel(BaseRegressionModel):
"""
Some cool custom model
"""
def fit(self, data_dictionary: Dict, dk: FreqaiDataKitchen, **kwargs) -> Any:
"""
My custom fit function
"""
model = cool_model.fit()
return model
def define_data_pipeline(self) -> Pipeline:
"""
User defines their custom feature pipeline here (if they wish)
"""
feature_pipeline = Pipeline([
('qt', SKLearnWrapper(QuantileTransformer(output_distribution='normal'))),
('di', ds.DissimilarityIndex(di_threshold=1)
])
return feature_pipeline
def define_label_pipeline(self) -> Pipeline:
"""
User defines their custom label pipeline here (if they wish)
"""
label_pipeline = Pipeline([
('qt', SKLearnWrapper(StandardScaler())),
])
return label_pipeline
```
Here, you are defining the exact pipeline that will be used for your feature set during training and prediction. You can use *most* SKLearn transformation steps by wrapping them in the `SKLearnWrapper` class as shown above. In addition, you can use any of the transformations available in the [`DataSieve` library](https://github.com/emergentmethods/datasieve).
You can easily add your own transformation by creating a class that inherits from the datasieve `BaseTransform` and implementing your `fit()`, `transform()` and `inverse_transform()` methods:
```python
from datasieve.transforms.base_transform import BaseTransform
# import whatever else you need
class MyCoolTransform(BaseTransform):
def __init__(self, **kwargs):
self.param1 = kwargs.get('param1', 1)
def fit(self, X, y=None, sample_weight=None, feature_list=None, **kwargs):
# do something with X, y, sample_weight, or/and feature_list
return X, y, sample_weight, feature_list
def transform(self, X, y=None, sample_weight=None,
feature_list=None, outlier_check=False, **kwargs):
# do something with X, y, sample_weight, or/and feature_list
return X, y, sample_weight, feature_list
def inverse_transform(self, X, y=None, sample_weight=None, feature_list=None, **kwargs):
# do/dont do something with X, y, sample_weight, or/and feature_list
return X, y, sample_weight, feature_list
```
!!! note "Hint"
You can define this custom class in the same file as your `IFreqaiModel`.
### Migrating a custom `IFreqaiModel` to the new Pipeline
If you have created your own custom `IFreqaiModel` with a custom `train()`/`predict()` function, *and* you still rely on `data_cleaning_train/predict()`, then you will need to migrate to the new pipeline. If your model does *not* rely on `data_cleaning_train/predict()`, then you do not need to worry about this migration.
More details about the migration can be found [here](strategy_migration.md#freqai---new-data-pipeline).
## Outlier detection ## Outlier detection
Equity and crypto markets suffer from a high level of non-patterned noise in the form of outlier data points. FreqAI implements a variety of methods to identify such outliers and hence mitigate risk. Equity and crypto markets suffer from a high level of non-patterned noise in the form of outlier data points. FreqAI implements a variety of methods to identify such outliers and hence mitigate risk.
### Identifying outliers with the Dissimilarity Index (DI) ### Identifying outliers with the Dissimilarity Index (DI)
The Dissimilarity Index (DI) aims to quantify the uncertainty associated with each prediction made by the model. The Dissimilarity Index (DI) aims to quantify the uncertainty associated with each prediction made by the model.
You can tell FreqAI to remove outlier data points from the training/test data sets using the DI by including the following statement in the config: You can tell FreqAI to remove outlier data points from the training/test data sets using the DI by including the following statement in the config:
@ -274,7 +330,7 @@ You can tell FreqAI to remove outlier data points from the training/test data se
} }
``` ```
The DI allows predictions which are outliers (not existent in the model feature space) to be thrown out due to low levels of certainty. To do so, FreqAI measures the distance between each training data point (feature vector), $X_{a}$, and all other training data points: Which will add `DissimilarityIndex` step to your `feature_pipeline` and set the threshold to 1. The DI allows predictions which are outliers (not existent in the model feature space) to be thrown out due to low levels of certainty. To do so, FreqAI measures the distance between each training data point (feature vector), $X_{a}$, and all other training data points:
$$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$ $$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$
@ -308,9 +364,9 @@ You can tell FreqAI to remove outlier data points from the training/test data se
} }
``` ```
The SVM will be trained on the training data and any data point that the SVM deems to be beyond the feature space will be removed. Which will add `SVMOutlierExtractor` step to your `feature_pipeline`. The SVM will be trained on the training data and any data point that the SVM deems to be beyond the feature space will be removed.
FreqAI uses `sklearn.linear_model.SGDOneClassSVM` (details are available on scikit-learn's webpage [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDOneClassSVM.html) (external website)) and you can elect to provide additional parameters for the SVM, such as `shuffle`, and `nu`. You can elect to provide additional parameters for the SVM, such as `shuffle`, and `nu` via the `feature_parameters.svm_params` dictionary in the config.
The parameter `shuffle` is by default set to `False` to ensure consistent results. If it is set to `True`, running the SVM multiple times on the same data set might result in different outcomes due to `max_iter` being to low for the algorithm to reach the demanded `tol`. Increasing `max_iter` solves this issue but causes the procedure to take longer time. The parameter `shuffle` is by default set to `False` to ensure consistent results. If it is set to `True`, running the SVM multiple times on the same data set might result in different outcomes due to `max_iter` being to low for the algorithm to reach the demanded `tol`. Increasing `max_iter` solves this issue but causes the procedure to take longer time.
@ -328,7 +384,7 @@ You can configure FreqAI to use DBSCAN to cluster and remove outliers from the t
} }
``` ```
DBSCAN is an unsupervised machine learning algorithm that clusters data without needing to know how many clusters there should be. Which will add the `DataSieveDBSCAN` step to your `feature_pipeline`. This is an unsupervised machine learning algorithm that clusters data without needing to know how many clusters there should be.
Given a number of data points $N$, and a distance $\varepsilon$, DBSCAN clusters the data set by setting all data points that have $N-1$ other data points within a distance of $\varepsilon$ as *core points*. A data point that is within a distance of $\varepsilon$ from a *core point* but that does not have $N-1$ other data points within a distance of $\varepsilon$ from itself is considered an *edge point*. A cluster is then the collection of *core points* and *edge points*. Data points that have no other data points at a distance $<\varepsilon$ are considered outliers. The figure below shows a cluster with $N = 3$. Given a number of data points $N$, and a distance $\varepsilon$, DBSCAN clusters the data set by setting all data points that have $N-1$ other data points within a distance of $\varepsilon$ as *core points*. A data point that is within a distance of $\varepsilon$ from a *core point* but that does not have $N-1$ other data points within a distance of $\varepsilon$ from itself is considered an *edge point*. A cluster is then the collection of *core points* and *edge points*. Data points that have no other data points at a distance $<\varepsilon$ are considered outliers. The figure below shows a cluster with $N = 3$.

View File

@ -728,3 +728,86 @@ Targets now get their own, dedicated method.
return dataframe return dataframe
``` ```
### FreqAI - New data Pipeline
If you have created your own custom `IFreqaiModel` with a custom `train()`/`predict()` function, *and* you still rely on `data_cleaning_train/predict()`, then you will need to migrate to the new pipeline. If your model does *not* rely on `data_cleaning_train/predict()`, then you do not need to worry about this migration. That means that this migration guide is relevant for a very small percentage of power-users. If you stumbled upon this guide by mistake, feel free to inquire in depth about your problem in the Freqtrade discord server.
The conversion involves first removing `data_cleaning_train/predict()` and replacing them with a `define_data_pipeline()` and `define_label_pipeline()` function to your `IFreqaiModel` class:
```python linenums="1" hl_lines="11-14 47-49 55-57"
class MyCoolFreqaiModel(BaseRegressionModel):
"""
Some cool custom IFreqaiModel you made before Freqtrade version 2023.6
"""
def train(
self, unfiltered_df: DataFrame, pair: str, dk: FreqaiDataKitchen, **kwargs
) -> Any:
# ... your custom stuff
# Remove these lines
# data_dictionary = dk.make_train_test_datasets(features_filtered, labels_filtered)
# self.data_cleaning_train(dk)
# data_dictionary = dk.normalize_data(data_dictionary)
# (1)
# Add these lines. Now we control the pipeline fit/transform ourselves
dd = dk.make_train_test_datasets(features_filtered, labels_filtered)
dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
dk.label_pipeline = self.define_label_pipeline(threads=dk.thread_count)
(dd["train_features"],
dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
dd["train_labels"], _, _ = dk.label_pipeline.fit_transform(dd["train_labels"])
dd["test_labels"], _, _ = dk.label_pipeline.transform(dd["test_labels"])
# ... your custom code
return model
def predict(
self, unfiltered_df: DataFrame, dk: FreqaiDataKitchen, **kwargs
) -> Tuple[DataFrame, npt.NDArray[np.int_]]:
# ... your custom stuff
# Remove these lines:
# self.data_cleaning_predict(dk)
# (2)
# Add these lines:
dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"], outlier_check=True)
# Remove this line
# pred_df = dk.denormalize_labels_from_metadata(pred_df)
# (3)
# Replace with these lines
pred_df, _, _ = dk.label_pipeline.inverse_transform(pred_df)
if self.freqai_info.get("DI_threshold", 0) > 0:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
# ... your custom code
return (pred_df, dk.do_predict)
```
1. Data normalization and cleaning is now homogenized with the new pipeline definition. This is created in the new `define_data_pipeline()` and `define_label_pipeline()` functions. The `data_cleaning_train()` and `data_cleaning_predict()` functions are no longer used. You can override `define_data_pipeline()` to create your own custom pipeline if you wish.
2. Data normalization and cleaning is now homogenized with the new pipeline definition. This is created in the new `define_data_pipeline()` and `define_label_pipeline()` functions. The `data_cleaning_train()` and `data_cleaning_predict()` functions are no longer used. You can override `define_data_pipeline()` to create your own custom pipeline if you wish.
3. Data denormalization is done with the new pipeline. Replace this with the lines below.

View File

@ -8,6 +8,7 @@ from typing import Any, Dict, List, Literal, Tuple
from freqtrade.enums import CandleType, PriceType, RPCMessageType from freqtrade.enums import CandleType, PriceType, RPCMessageType
DOCS_LINK = "https://www.freqtrade.io/en/stable"
DEFAULT_CONFIG = 'config.json' DEFAULT_CONFIG = 'config.json'
DEFAULT_EXCHANGE = 'bittrex' DEFAULT_EXCHANGE = 'bittrex'
PROCESS_THROTTLE_SECS = 5 # sec PROCESS_THROTTLE_SECS = 5 # sec

View File

@ -83,6 +83,9 @@ class BaseReinforcementLearningModel(IFreqaiModel):
if self.ft_params.get('use_DBSCAN_to_remove_outliers', False): if self.ft_params.get('use_DBSCAN_to_remove_outliers', False):
self.ft_params.update({'use_DBSCAN_to_remove_outliers': False}) self.ft_params.update({'use_DBSCAN_to_remove_outliers': False})
logger.warning('User tried to use DBSCAN with RL. Deactivating DBSCAN.') logger.warning('User tried to use DBSCAN with RL. Deactivating DBSCAN.')
if self.ft_params.get('DI_threshold', False):
self.ft_params.update({'DI_threshold': False})
logger.warning('User tried to use DI_threshold with RL. Deactivating DI_threshold.')
if self.freqai_info['data_split_parameters'].get('shuffle', False): if self.freqai_info['data_split_parameters'].get('shuffle', False):
self.freqai_info['data_split_parameters'].update({'shuffle': False}) self.freqai_info['data_split_parameters'].update({'shuffle': False})
logger.warning('User tried to shuffle training data. Setting shuffle to False') logger.warning('User tried to shuffle training data. Setting shuffle to False')
@ -108,27 +111,37 @@ class BaseReinforcementLearningModel(IFreqaiModel):
training_filter=True, training_filter=True,
) )
data_dictionary: Dict[str, Any] = dk.make_train_test_datasets( dd: Dict[str, Any] = dk.make_train_test_datasets(
features_filtered, labels_filtered) features_filtered, labels_filtered)
self.df_raw = copy.deepcopy(data_dictionary["train_features"]) self.df_raw = copy.deepcopy(dd["train_features"])
dk.fit_labels() # FIXME useless for now, but just satiating append methods dk.fit_labels() # FIXME useless for now, but just satiating append methods
# normalize all data based on train_dataset only # normalize all data based on train_dataset only
prices_train, prices_test = self.build_ohlc_price_dataframes(dk.data_dictionary, pair, dk) prices_train, prices_test = self.build_ohlc_price_dataframes(dk.data_dictionary, pair, dk)
data_dictionary = dk.normalize_data(data_dictionary) dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
# data cleaning/analysis (dd["train_features"],
self.data_cleaning_train(dk) dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
logger.info( logger.info(
f'Training model on {len(dk.data_dictionary["train_features"].columns)}' f'Training model on {len(dk.data_dictionary["train_features"].columns)}'
f' features and {len(data_dictionary["train_features"])} data points' f' features and {len(dd["train_features"])} data points'
) )
self.set_train_and_eval_environments(data_dictionary, prices_train, prices_test, dk) self.set_train_and_eval_environments(dd, prices_train, prices_test, dk)
model = self.fit(data_dictionary, dk) model = self.fit(dd, dk)
logger.info(f"--------------------done training {pair}--------------------") logger.info(f"--------------------done training {pair}--------------------")
@ -239,13 +252,10 @@ class BaseReinforcementLearningModel(IFreqaiModel):
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_dataframe = self.drop_ohlc_from_df(filtered_dataframe, dk) dk.data_dictionary["prediction_features"] = self.drop_ohlc_from_df(filtered_dataframe, dk)
filtered_dataframe = dk.normalize_data_from_metadata(filtered_dataframe) dk.data_dictionary["prediction_features"], _, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"] = filtered_dataframe dk.data_dictionary["prediction_features"], outlier_check=True)
# optional additional data cleaning/analysis
self.data_cleaning_predict(dk)
pred_df = self.rl_model_predict( pred_df = self.rl_model_predict(
dk.data_dictionary["prediction_features"], dk, self.model) dk.data_dictionary["prediction_features"], dk, self.model)

View File

@ -17,8 +17,8 @@ logger = logging.getLogger(__name__)
class BaseClassifierModel(IFreqaiModel): class BaseClassifierModel(IFreqaiModel):
""" """
Base class for regression type models (e.g. Catboost, LightGBM, XGboost etc.). Base class for regression type models (e.g. Catboost, LightGBM, XGboost etc.).
User *must* inherit from this class and set fit() and predict(). See example scripts User *must* inherit from this class and set fit(). See example scripts
such as prediction_models/CatboostPredictionModel.py for guidance. such as prediction_models/CatboostClassifier.py for guidance.
""" """
def train( def train(
@ -50,21 +50,30 @@ class BaseClassifierModel(IFreqaiModel):
logger.info(f"-------------------- Training on data from {start_date} to " logger.info(f"-------------------- Training on data from {start_date} to "
f"{end_date} --------------------") f"{end_date} --------------------")
# split data into train/test data. # split data into train/test data.
data_dictionary = dk.make_train_test_datasets(features_filtered, labels_filtered) dd = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live: if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live:
dk.fit_labels() dk.fit_labels()
# normalize all data based on train_dataset only dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
data_dictionary = dk.normalize_data(data_dictionary)
# optional additional data cleaning/analysis (dd["train_features"],
self.data_cleaning_train(dk) dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
logger.info( logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features" f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
) )
logger.info(f"Training model on {len(data_dictionary['train_features'])} data points") logger.info(f"Training model on {len(dd['train_features'])} data points")
model = self.fit(data_dictionary, dk) model = self.fit(dd, dk)
end_time = time() end_time = time()
@ -89,10 +98,11 @@ class BaseClassifierModel(IFreqaiModel):
filtered_df, _ = dk.filter_features( filtered_df, _ = dk.filter_features(
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_df = dk.normalize_data_from_metadata(filtered_df)
dk.data_dictionary["prediction_features"] = filtered_df dk.data_dictionary["prediction_features"] = filtered_df
self.data_cleaning_predict(dk) dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"], outlier_check=True)
predictions = self.model.predict(dk.data_dictionary["prediction_features"]) predictions = self.model.predict(dk.data_dictionary["prediction_features"])
if self.CONV_WIDTH == 1: if self.CONV_WIDTH == 1:
@ -107,4 +117,10 @@ class BaseClassifierModel(IFreqaiModel):
pred_df = pd.concat([pred_df, pred_df_prob], axis=1) pred_df = pd.concat([pred_df, pred_df_prob], axis=1)
if dk.feature_pipeline["di"]:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
return (pred_df, dk.do_predict) return (pred_df, dk.do_predict)

View File

@ -1,5 +1,6 @@
import logging import logging
from typing import Dict, List, Tuple from time import time
from typing import Any, Dict, List, Tuple
import numpy as np import numpy as np
import numpy.typing as npt import numpy.typing as npt
@ -35,6 +36,7 @@ class BasePyTorchClassifier(BasePyTorchModel):
return dataframe return dataframe
""" """
def __init__(self, **kwargs): def __init__(self, **kwargs):
super().__init__(**kwargs) super().__init__(**kwargs)
self.class_name_to_index = None self.class_name_to_index = None
@ -68,9 +70,12 @@ class BasePyTorchClassifier(BasePyTorchModel):
filtered_df, _ = dk.filter_features( filtered_df, _ = dk.filter_features(
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_df = dk.normalize_data_from_metadata(filtered_df)
dk.data_dictionary["prediction_features"] = filtered_df dk.data_dictionary["prediction_features"] = filtered_df
self.data_cleaning_predict(dk)
dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"], outlier_check=True)
x = self.data_convertor.convert_x( x = self.data_convertor.convert_x(
dk.data_dictionary["prediction_features"], dk.data_dictionary["prediction_features"],
device=self.device device=self.device
@ -85,6 +90,13 @@ class BasePyTorchClassifier(BasePyTorchModel):
pred_df_prob = DataFrame(probs.detach().tolist(), columns=class_names) pred_df_prob = DataFrame(probs.detach().tolist(), columns=class_names)
pred_df = DataFrame(predicted_classes_str, columns=[dk.label_list[0]]) pred_df = DataFrame(predicted_classes_str, columns=[dk.label_list[0]])
pred_df = pd.concat([pred_df, pred_df_prob], axis=1) pred_df = pd.concat([pred_df, pred_df_prob], axis=1)
if dk.feature_pipeline["di"]:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
return (pred_df, dk.do_predict) return (pred_df, dk.do_predict)
def encode_class_names( def encode_class_names(
@ -149,3 +161,58 @@ class BasePyTorchClassifier(BasePyTorchModel):
) )
return self.class_names return self.class_names
def train(
self, unfiltered_df: DataFrame, pair: str, dk: FreqaiDataKitchen, **kwargs
) -> Any:
"""
Filter the training data and train a model to it. Train makes heavy use of the datakitchen
for storing, saving, loading, and analyzing the data.
:param unfiltered_df: Full dataframe for the current training period
:return:
:model: Trained model which can be used to inference (self.predict)
"""
logger.info(f"-------------------- Starting training {pair} --------------------")
start_time = time()
features_filtered, labels_filtered = dk.filter_features(
unfiltered_df,
dk.training_features_list,
dk.label_list,
training_filter=True,
)
# split data into train/test data.
dd = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live:
dk.fit_labels()
dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
(dd["train_features"],
dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
)
logger.info(f"Training model on {len(dd['train_features'])} data points")
model = self.fit(dd, dk)
end_time = time()
logger.info(f"-------------------- Done training {pair} "
f"({end_time - start_time:.2f} secs) --------------------")
return model

View File

@ -1,12 +1,8 @@
import logging import logging
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from time import time
from typing import Any
import torch import torch
from pandas import DataFrame
from freqtrade.freqai.data_kitchen import FreqaiDataKitchen
from freqtrade.freqai.freqai_interface import IFreqaiModel from freqtrade.freqai.freqai_interface import IFreqaiModel
from freqtrade.freqai.torch.PyTorchDataConvertor import PyTorchDataConvertor from freqtrade.freqai.torch.PyTorchDataConvertor import PyTorchDataConvertor
@ -29,51 +25,6 @@ class BasePyTorchModel(IFreqaiModel, ABC):
self.splits = ["train", "test"] if test_size != 0 else ["train"] self.splits = ["train", "test"] if test_size != 0 else ["train"]
self.window_size = self.freqai_info.get("conv_width", 1) self.window_size = self.freqai_info.get("conv_width", 1)
def train(
self, unfiltered_df: DataFrame, pair: str, dk: FreqaiDataKitchen, **kwargs
) -> Any:
"""
Filter the training data and train a model to it. Train makes heavy use of the datakitchen
for storing, saving, loading, and analyzing the data.
:param unfiltered_df: Full dataframe for the current training period
:return:
:model: Trained model which can be used to inference (self.predict)
"""
logger.info(f"-------------------- Starting training {pair} --------------------")
start_time = time()
features_filtered, labels_filtered = dk.filter_features(
unfiltered_df,
dk.training_features_list,
dk.label_list,
training_filter=True,
)
# split data into train/test data.
data_dictionary = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions", 0) or not self.live:
dk.fit_labels()
# normalize all data based on train_dataset only
data_dictionary = dk.normalize_data(data_dictionary)
# optional additional data cleaning/analysis
self.data_cleaning_train(dk)
logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
)
logger.info(f"Training model on {len(data_dictionary['train_features'])} data points")
model = self.fit(data_dictionary, dk)
end_time = time()
logger.info(f"-------------------- Done training {pair} "
f"({end_time - start_time:.2f} secs) --------------------")
return model
@property @property
@abstractmethod @abstractmethod
def data_convertor(self) -> PyTorchDataConvertor: def data_convertor(self) -> PyTorchDataConvertor:

View File

@ -1,5 +1,6 @@
import logging import logging
from typing import Tuple from time import time
from typing import Any, Tuple
import numpy as np import numpy as np
import numpy.typing as npt import numpy.typing as npt
@ -17,6 +18,7 @@ class BasePyTorchRegressor(BasePyTorchModel):
A PyTorch implementation of a regressor. A PyTorch implementation of a regressor.
User must implement fit method User must implement fit method
""" """
def __init__(self, **kwargs): def __init__(self, **kwargs):
super().__init__(**kwargs) super().__init__(**kwargs)
@ -36,10 +38,11 @@ class BasePyTorchRegressor(BasePyTorchModel):
filtered_df, _ = dk.filter_features( filtered_df, _ = dk.filter_features(
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_df = dk.normalize_data_from_metadata(filtered_df)
dk.data_dictionary["prediction_features"] = filtered_df dk.data_dictionary["prediction_features"] = filtered_df
self.data_cleaning_predict(dk) dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"], outlier_check=True)
x = self.data_convertor.convert_x( x = self.data_convertor.convert_x(
dk.data_dictionary["prediction_features"], dk.data_dictionary["prediction_features"],
device=self.device device=self.device
@ -47,5 +50,71 @@ class BasePyTorchRegressor(BasePyTorchModel):
self.model.model.eval() self.model.model.eval()
y = self.model.model(x) y = self.model.model(x)
pred_df = DataFrame(y.detach().tolist(), columns=[dk.label_list[0]]) pred_df = DataFrame(y.detach().tolist(), columns=[dk.label_list[0]])
pred_df = dk.denormalize_labels_from_metadata(pred_df) pred_df, _, _ = dk.label_pipeline.inverse_transform(pred_df)
if dk.feature_pipeline["di"]:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
return (pred_df, dk.do_predict) return (pred_df, dk.do_predict)
def train(
self, unfiltered_df: DataFrame, pair: str, dk: FreqaiDataKitchen, **kwargs
) -> Any:
"""
Filter the training data and train a model to it. Train makes heavy use of the datakitchen
for storing, saving, loading, and analyzing the data.
:param unfiltered_df: Full dataframe for the current training period
:return:
:model: Trained model which can be used to inference (self.predict)
"""
logger.info(f"-------------------- Starting training {pair} --------------------")
start_time = time()
features_filtered, labels_filtered = dk.filter_features(
unfiltered_df,
dk.training_features_list,
dk.label_list,
training_filter=True,
)
# split data into train/test data.
dd = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live:
dk.fit_labels()
dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
dk.label_pipeline = self.define_label_pipeline(threads=dk.thread_count)
dd["train_labels"], _, _ = dk.label_pipeline.fit_transform(dd["train_labels"])
dd["test_labels"], _, _ = dk.label_pipeline.transform(dd["test_labels"])
(dd["train_features"],
dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
dd["train_labels"], _, _ = dk.label_pipeline.fit_transform(dd["train_labels"])
if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
dd["test_labels"], _, _ = dk.label_pipeline.transform(dd["test_labels"])
logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
)
logger.info(f"Training model on {len(dd['train_features'])} data points")
model = self.fit(dd, dk)
end_time = time()
logger.info(f"-------------------- Done training {pair} "
f"({end_time - start_time:.2f} secs) --------------------")
return model

View File

@ -16,8 +16,8 @@ logger = logging.getLogger(__name__)
class BaseRegressionModel(IFreqaiModel): class BaseRegressionModel(IFreqaiModel):
""" """
Base class for regression type models (e.g. Catboost, LightGBM, XGboost etc.). Base class for regression type models (e.g. Catboost, LightGBM, XGboost etc.).
User *must* inherit from this class and set fit() and predict(). See example scripts User *must* inherit from this class and set fit(). See example scripts
such as prediction_models/CatboostPredictionModel.py for guidance. such as prediction_models/CatboostRegressor.py for guidance.
""" """
def train( def train(
@ -49,21 +49,33 @@ class BaseRegressionModel(IFreqaiModel):
logger.info(f"-------------------- Training on data from {start_date} to " logger.info(f"-------------------- Training on data from {start_date} to "
f"{end_date} --------------------") f"{end_date} --------------------")
# split data into train/test data. # split data into train/test data.
data_dictionary = dk.make_train_test_datasets(features_filtered, labels_filtered) dd = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live: if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live:
dk.fit_labels() dk.fit_labels()
# normalize all data based on train_dataset only dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
data_dictionary = dk.normalize_data(data_dictionary) dk.label_pipeline = self.define_label_pipeline(threads=dk.thread_count)
# optional additional data cleaning/analysis (dd["train_features"],
self.data_cleaning_train(dk) dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
dd["train_labels"], _, _ = dk.label_pipeline.fit_transform(dd["train_labels"])
if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
dd["test_labels"], _, _ = dk.label_pipeline.transform(dd["test_labels"])
logger.info( logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features" f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
) )
logger.info(f"Training model on {len(data_dictionary['train_features'])} data points") logger.info(f"Training model on {len(dd['train_features'])} data points")
model = self.fit(data_dictionary, dk) model = self.fit(dd, dk)
end_time = time() end_time = time()
@ -85,14 +97,12 @@ class BaseRegressionModel(IFreqaiModel):
""" """
dk.find_features(unfiltered_df) dk.find_features(unfiltered_df)
filtered_df, _ = dk.filter_features( dk.data_dictionary["prediction_features"], _ = dk.filter_features(
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_df = dk.normalize_data_from_metadata(filtered_df)
dk.data_dictionary["prediction_features"] = filtered_df
# optional additional data cleaning/analysis dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
self.data_cleaning_predict(dk) dk.data_dictionary["prediction_features"], outlier_check=True)
predictions = self.model.predict(dk.data_dictionary["prediction_features"]) predictions = self.model.predict(dk.data_dictionary["prediction_features"])
if self.CONV_WIDTH == 1: if self.CONV_WIDTH == 1:
@ -100,6 +110,11 @@ class BaseRegressionModel(IFreqaiModel):
pred_df = DataFrame(predictions, columns=dk.label_list) pred_df = DataFrame(predictions, columns=dk.label_list)
pred_df = dk.denormalize_labels_from_metadata(pred_df) pred_df, _, _ = dk.label_pipeline.inverse_transform(pred_df)
if dk.feature_pipeline["di"]:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
return (pred_df, dk.do_predict) return (pred_df, dk.do_predict)

View File

@ -1,70 +0,0 @@
import logging
from time import time
from typing import Any
from pandas import DataFrame
from freqtrade.freqai.data_kitchen import FreqaiDataKitchen
from freqtrade.freqai.freqai_interface import IFreqaiModel
logger = logging.getLogger(__name__)
class BaseTensorFlowModel(IFreqaiModel):
"""
Base class for TensorFlow type models.
User *must* inherit from this class and set fit() and predict().
"""
def train(
self, unfiltered_df: DataFrame, pair: str, dk: FreqaiDataKitchen, **kwargs
) -> Any:
"""
Filter the training data and train a model to it. Train makes heavy use of the datakitchen
for storing, saving, loading, and analyzing the data.
:param unfiltered_df: Full dataframe for the current training period
:param metadata: pair metadata from strategy.
:return:
:model: Trained model which can be used to inference (self.predict)
"""
logger.info(f"-------------------- Starting training {pair} --------------------")
start_time = time()
# filter the features requested by user in the configuration file and elegantly handle NaNs
features_filtered, labels_filtered = dk.filter_features(
unfiltered_df,
dk.training_features_list,
dk.label_list,
training_filter=True,
)
start_date = unfiltered_df["date"].iloc[0].strftime("%Y-%m-%d")
end_date = unfiltered_df["date"].iloc[-1].strftime("%Y-%m-%d")
logger.info(f"-------------------- Training on data from {start_date} to "
f"{end_date} --------------------")
# split data into train/test data.
data_dictionary = dk.make_train_test_datasets(features_filtered, labels_filtered)
if not self.freqai_info.get("fit_live_predictions_candles", 0) or not self.live:
dk.fit_labels()
# normalize all data based on train_dataset only
data_dictionary = dk.normalize_data(data_dictionary)
# optional additional data cleaning/analysis
self.data_cleaning_train(dk)
logger.info(
f"Training model on {len(dk.data_dictionary['train_features'].columns)} features"
)
logger.info(f"Training model on {len(data_dictionary['train_features'])} data points")
model = self.fit(data_dictionary, dk)
end_time = time()
logger.info(f"-------------------- Done training {pair} "
f"({end_time - start_time:.2f} secs) --------------------")
return model

View File

@ -28,6 +28,11 @@ from freqtrade.strategy.interface import IStrategy
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
FEATURE_PIPELINE = "feature_pipeline"
LABEL_PIPELINE = "label_pipeline"
TRAINDF = "trained_df"
METADATA = "metadata"
class pair_info(TypedDict): class pair_info(TypedDict):
model_filename: str model_filename: str
@ -425,7 +430,7 @@ class FreqaiDataDrawer:
dk.data["training_features_list"] = list(dk.data_dictionary["train_features"].columns) dk.data["training_features_list"] = list(dk.data_dictionary["train_features"].columns)
dk.data["label_list"] = dk.label_list dk.data["label_list"] = dk.label_list
with (save_path / f"{dk.model_filename}_metadata.json").open("w") as fp: with (save_path / f"{dk.model_filename}_{METADATA}.json").open("w") as fp:
rapidjson.dump(dk.data, fp, default=self.np_encoder, number_mode=rapidjson.NM_NATIVE) rapidjson.dump(dk.data, fp, default=self.np_encoder, number_mode=rapidjson.NM_NATIVE)
return return
@ -450,39 +455,39 @@ class FreqaiDataDrawer:
elif self.model_type in ["stable_baselines3", "sb3_contrib", "pytorch"]: elif self.model_type in ["stable_baselines3", "sb3_contrib", "pytorch"]:
model.save(save_path / f"{dk.model_filename}_model.zip") model.save(save_path / f"{dk.model_filename}_model.zip")
if dk.svm_model is not None:
dump(dk.svm_model, save_path / f"{dk.model_filename}_svm_model.joblib")
dk.data["data_path"] = str(dk.data_path) dk.data["data_path"] = str(dk.data_path)
dk.data["model_filename"] = str(dk.model_filename) dk.data["model_filename"] = str(dk.model_filename)
dk.data["training_features_list"] = dk.training_features_list dk.data["training_features_list"] = dk.training_features_list
dk.data["label_list"] = dk.label_list dk.data["label_list"] = dk.label_list
# store the metadata # store the metadata
with (save_path / f"{dk.model_filename}_metadata.json").open("w") as fp: with (save_path / f"{dk.model_filename}_{METADATA}.json").open("w") as fp:
rapidjson.dump(dk.data, fp, default=self.np_encoder, number_mode=rapidjson.NM_NATIVE) rapidjson.dump(dk.data, fp, default=self.np_encoder, number_mode=rapidjson.NM_NATIVE)
# save the train data to file so we can check preds for area of applicability later # save the pipelines to pickle files
with (save_path / f"{dk.model_filename}_{FEATURE_PIPELINE}.pkl").open("wb") as fp:
cloudpickle.dump(dk.feature_pipeline, fp)
with (save_path / f"{dk.model_filename}_{LABEL_PIPELINE}.pkl").open("wb") as fp:
cloudpickle.dump(dk.label_pipeline, fp)
# save the train data to file for post processing if desired
dk.data_dictionary["train_features"].to_pickle( dk.data_dictionary["train_features"].to_pickle(
save_path / f"{dk.model_filename}_trained_df.pkl" save_path / f"{dk.model_filename}_{TRAINDF}.pkl"
) )
dk.data_dictionary["train_dates"].to_pickle( dk.data_dictionary["train_dates"].to_pickle(
save_path / f"{dk.model_filename}_trained_dates_df.pkl" save_path / f"{dk.model_filename}_trained_dates_df.pkl"
) )
if self.freqai_info["feature_parameters"].get("principal_component_analysis"):
cloudpickle.dump(
dk.pca, (dk.data_path / f"{dk.model_filename}_pca_object.pkl").open("wb")
)
self.model_dictionary[coin] = model self.model_dictionary[coin] = model
self.pair_dict[coin]["model_filename"] = dk.model_filename self.pair_dict[coin]["model_filename"] = dk.model_filename
self.pair_dict[coin]["data_path"] = str(dk.data_path) self.pair_dict[coin]["data_path"] = str(dk.data_path)
if coin not in self.meta_data_dictionary: if coin not in self.meta_data_dictionary:
self.meta_data_dictionary[coin] = {} self.meta_data_dictionary[coin] = {}
self.meta_data_dictionary[coin]["train_df"] = dk.data_dictionary["train_features"] self.meta_data_dictionary[coin][METADATA] = dk.data
self.meta_data_dictionary[coin]["meta_data"] = dk.data self.meta_data_dictionary[coin][FEATURE_PIPELINE] = dk.feature_pipeline
self.meta_data_dictionary[coin][LABEL_PIPELINE] = dk.label_pipeline
self.save_drawer_to_disk() self.save_drawer_to_disk()
return return
@ -492,7 +497,7 @@ class FreqaiDataDrawer:
Load only metadata into datakitchen to increase performance during Load only metadata into datakitchen to increase performance during
presaved backtesting (prediction file loading). presaved backtesting (prediction file loading).
""" """
with (dk.data_path / f"{dk.model_filename}_metadata.json").open("r") as fp: with (dk.data_path / f"{dk.model_filename}_{METADATA}.json").open("r") as fp:
dk.data = rapidjson.load(fp, number_mode=rapidjson.NM_NATIVE) dk.data = rapidjson.load(fp, number_mode=rapidjson.NM_NATIVE)
dk.training_features_list = dk.data["training_features_list"] dk.training_features_list = dk.data["training_features_list"]
dk.label_list = dk.data["label_list"] dk.label_list = dk.data["label_list"]
@ -512,15 +517,17 @@ class FreqaiDataDrawer:
dk.data_path = Path(self.pair_dict[coin]["data_path"]) dk.data_path = Path(self.pair_dict[coin]["data_path"])
if coin in self.meta_data_dictionary: if coin in self.meta_data_dictionary:
dk.data = self.meta_data_dictionary[coin]["meta_data"] dk.data = self.meta_data_dictionary[coin][METADATA]
dk.data_dictionary["train_features"] = self.meta_data_dictionary[coin]["train_df"] dk.feature_pipeline = self.meta_data_dictionary[coin][FEATURE_PIPELINE]
dk.label_pipeline = self.meta_data_dictionary[coin][LABEL_PIPELINE]
else: else:
with (dk.data_path / f"{dk.model_filename}_metadata.json").open("r") as fp: with (dk.data_path / f"{dk.model_filename}_{METADATA}.json").open("r") as fp:
dk.data = rapidjson.load(fp, number_mode=rapidjson.NM_NATIVE) dk.data = rapidjson.load(fp, number_mode=rapidjson.NM_NATIVE)
dk.data_dictionary["train_features"] = pd.read_pickle( with (dk.data_path / f"{dk.model_filename}_{FEATURE_PIPELINE}.pkl").open("rb") as fp:
dk.data_path / f"{dk.model_filename}_trained_df.pkl" dk.feature_pipeline = cloudpickle.load(fp)
) with (dk.data_path / f"{dk.model_filename}_{LABEL_PIPELINE}.pkl").open("rb") as fp:
dk.label_pipeline = cloudpickle.load(fp)
dk.training_features_list = dk.data["training_features_list"] dk.training_features_list = dk.data["training_features_list"]
dk.label_list = dk.data["label_list"] dk.label_list = dk.data["label_list"]
@ -530,9 +537,6 @@ class FreqaiDataDrawer:
model = self.model_dictionary[coin] model = self.model_dictionary[coin]
elif self.model_type == 'joblib': elif self.model_type == 'joblib':
model = load(dk.data_path / f"{dk.model_filename}_model.joblib") model = load(dk.data_path / f"{dk.model_filename}_model.joblib")
elif self.model_type == 'keras':
from tensorflow import keras
model = keras.models.load_model(dk.data_path / f"{dk.model_filename}_model.h5")
elif 'stable_baselines' in self.model_type or 'sb3_contrib' == self.model_type: elif 'stable_baselines' in self.model_type or 'sb3_contrib' == self.model_type:
mod = importlib.import_module( mod = importlib.import_module(
self.model_type, self.freqai_info['rl_config']['model_type']) self.model_type, self.freqai_info['rl_config']['model_type'])
@ -544,9 +548,6 @@ class FreqaiDataDrawer:
model = zip["pytrainer"] model = zip["pytrainer"]
model = model.load_from_checkpoint(zip) model = model.load_from_checkpoint(zip)
if Path(dk.data_path / f"{dk.model_filename}_svm_model.joblib").is_file():
dk.svm_model = load(dk.data_path / f"{dk.model_filename}_svm_model.joblib")
if not model: if not model:
raise OperationalException( raise OperationalException(
f"Unable to load model, ensure model exists at " f"{dk.data_path} " f"Unable to load model, ensure model exists at " f"{dk.data_path} "
@ -556,11 +557,6 @@ class FreqaiDataDrawer:
if coin not in self.model_dictionary: if coin not in self.model_dictionary:
self.model_dictionary[coin] = model self.model_dictionary[coin] = model
if self.config["freqai"]["feature_parameters"]["principal_component_analysis"]:
dk.pca = cloudpickle.load(
(dk.data_path / f"{dk.model_filename}_pca_object.pkl").open("rb")
)
return model return model
def update_historic_data(self, strategy: IStrategy, dk: FreqaiDataKitchen) -> None: def update_historic_data(self, strategy: IStrategy, dk: FreqaiDataKitchen) -> None:

View File

@ -4,7 +4,6 @@ import logging
import random import random
import shutil import shutil
from datetime import datetime, timezone from datetime import datetime, timezone
from math import cos, sin
from pathlib import Path from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple from typing import Any, Dict, List, Optional, Tuple
@ -12,16 +11,12 @@ import numpy as np
import numpy.typing as npt import numpy.typing as npt
import pandas as pd import pandas as pd
import psutil import psutil
from datasieve.pipeline import Pipeline
from pandas import DataFrame from pandas import DataFrame
from scipy import stats
from sklearn import linear_model
from sklearn.cluster import DBSCAN
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
from freqtrade.configuration import TimeRange from freqtrade.configuration import TimeRange
from freqtrade.constants import Config from freqtrade.constants import DOCS_LINK, Config
from freqtrade.data.converter import reduce_dataframe_footprint from freqtrade.data.converter import reduce_dataframe_footprint
from freqtrade.exceptions import OperationalException from freqtrade.exceptions import OperationalException
from freqtrade.exchange import timeframe_to_seconds from freqtrade.exchange import timeframe_to_seconds
@ -81,11 +76,12 @@ class FreqaiDataKitchen:
self.backtest_predictions_folder: str = "backtesting_predictions" self.backtest_predictions_folder: str = "backtesting_predictions"
self.live = live self.live = live
self.pair = pair self.pair = pair
self.svm_model: linear_model.SGDOneClassSVM = None
self.keras: bool = self.freqai_config.get("keras", False) self.keras: bool = self.freqai_config.get("keras", False)
self.set_all_pairs() self.set_all_pairs()
self.backtest_live_models = config.get("freqai_backtest_live_models", False) self.backtest_live_models = config.get("freqai_backtest_live_models", False)
self.feature_pipeline = Pipeline()
self.label_pipeline = Pipeline()
self.DI_values: npt.NDArray = np.array([])
if not self.live: if not self.live:
self.full_path = self.get_full_models_path(self.config) self.full_path = self.get_full_models_path(self.config)
@ -227,13 +223,7 @@ class FreqaiDataKitchen:
drop_index = pd.isnull(filtered_df).any(axis=1) # get the rows that have NaNs, drop_index = pd.isnull(filtered_df).any(axis=1) # get the rows that have NaNs,
drop_index = drop_index.replace(True, 1).replace(False, 0) # pep8 requirement. drop_index = drop_index.replace(True, 1).replace(False, 0) # pep8 requirement.
if (training_filter): if (training_filter):
const_cols = list((filtered_df.nunique() == 1).loc[lambda x: x].index)
if const_cols:
filtered_df = filtered_df.filter(filtered_df.columns.difference(const_cols))
self.data['constant_features_list'] = const_cols
logger.warning(f"Removed features {const_cols} with constant values.")
else:
self.data['constant_features_list'] = []
# we don't care about total row number (total no. datapoints) in training, we only care # we don't care about total row number (total no. datapoints) in training, we only care
# about removing any row with NaNs # about removing any row with NaNs
# if labels has multiple columns (user wants to train multiple modelEs), we detect here # if labels has multiple columns (user wants to train multiple modelEs), we detect here
@ -264,8 +254,7 @@ class FreqaiDataKitchen:
self.data["filter_drop_index_training"] = drop_index self.data["filter_drop_index_training"] = drop_index
else: else:
if 'constant_features_list' in self.data and len(self.data['constant_features_list']):
filtered_df = self.check_pred_labels(filtered_df)
# we are backtesting so we need to preserve row number to send back to strategy, # we are backtesting so we need to preserve row number to send back to strategy,
# so now we use do_predict to avoid any prediction based on a NaN # so now we use do_predict to avoid any prediction based on a NaN
drop_index = pd.isnull(filtered_df).any(axis=1) drop_index = pd.isnull(filtered_df).any(axis=1)
@ -307,107 +296,6 @@ class FreqaiDataKitchen:
return self.data_dictionary return self.data_dictionary
def normalize_data(self, data_dictionary: Dict) -> Dict[Any, Any]:
"""
Normalize all data in the data_dictionary according to the training dataset
:param data_dictionary: dictionary containing the cleaned and
split training/test data/labels
:returns:
:data_dictionary: updated dictionary with standardized values.
"""
# standardize the data by training stats
train_max = data_dictionary["train_features"].max()
train_min = data_dictionary["train_features"].min()
data_dictionary["train_features"] = (
2 * (data_dictionary["train_features"] - train_min) / (train_max - train_min) - 1
)
data_dictionary["test_features"] = (
2 * (data_dictionary["test_features"] - train_min) / (train_max - train_min) - 1
)
for item in train_max.keys():
self.data[item + "_max"] = train_max[item]
self.data[item + "_min"] = train_min[item]
for item in data_dictionary["train_labels"].keys():
if data_dictionary["train_labels"][item].dtype == object:
continue
train_labels_max = data_dictionary["train_labels"][item].max()
train_labels_min = data_dictionary["train_labels"][item].min()
data_dictionary["train_labels"][item] = (
2
* (data_dictionary["train_labels"][item] - train_labels_min)
/ (train_labels_max - train_labels_min)
- 1
)
if self.freqai_config.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
data_dictionary["test_labels"][item] = (
2
* (data_dictionary["test_labels"][item] - train_labels_min)
/ (train_labels_max - train_labels_min)
- 1
)
self.data[f"{item}_max"] = train_labels_max
self.data[f"{item}_min"] = train_labels_min
return data_dictionary
def normalize_single_dataframe(self, df: DataFrame) -> DataFrame:
train_max = df.max()
train_min = df.min()
df = (
2 * (df - train_min) / (train_max - train_min) - 1
)
for item in train_max.keys():
self.data[item + "_max"] = train_max[item]
self.data[item + "_min"] = train_min[item]
return df
def normalize_data_from_metadata(self, df: DataFrame) -> DataFrame:
"""
Normalize a set of data using the mean and standard deviation from
the associated training data.
:param df: Dataframe to be standardized
"""
train_max = [None] * len(df.keys())
train_min = [None] * len(df.keys())
for i, item in enumerate(df.keys()):
train_max[i] = self.data[f"{item}_max"]
train_min[i] = self.data[f"{item}_min"]
train_max_series = pd.Series(train_max, index=df.keys())
train_min_series = pd.Series(train_min, index=df.keys())
df = (
2 * (df - train_min_series) / (train_max_series - train_min_series) - 1
)
return df
def denormalize_labels_from_metadata(self, df: DataFrame) -> DataFrame:
"""
Denormalize a set of data using the mean and standard deviation from
the associated training data.
:param df: Dataframe of predictions to be denormalized
"""
for label in df.columns:
if df[label].dtype == object or label in self.unique_class_list:
continue
df[label] = (
(df[label] + 1)
* (self.data[f"{label}_max"] - self.data[f"{label}_min"])
/ 2
) + self.data[f"{label}_min"]
return df
def split_timerange( def split_timerange(
self, tr: str, train_split: int = 28, bt_split: float = 7 self, tr: str, train_split: int = 28, bt_split: float = 7
) -> Tuple[list, list]: ) -> Tuple[list, list]:
@ -452,9 +340,7 @@ class FreqaiDataKitchen:
tr_training_list_timerange.append(copy.deepcopy(timerange_train)) tr_training_list_timerange.append(copy.deepcopy(timerange_train))
# associated backtest period # associated backtest period
timerange_backtest.startts = timerange_train.stopts timerange_backtest.startts = timerange_train.stopts
timerange_backtest.stopts = timerange_backtest.startts + int(bt_period) timerange_backtest.stopts = timerange_backtest.startts + int(bt_period)
if timerange_backtest.stopts > config_timerange.stopts: if timerange_backtest.stopts > config_timerange.stopts:
@ -485,426 +371,6 @@ class FreqaiDataKitchen:
return df return df
def check_pred_labels(self, df_predictions: DataFrame) -> DataFrame:
"""
Check that prediction feature labels match training feature labels.
:param df_predictions: incoming predictions
"""
constant_labels = self.data['constant_features_list']
df_predictions = df_predictions.filter(
df_predictions.columns.difference(constant_labels)
)
logger.warning(
f"Removed {len(constant_labels)} features from prediction features, "
f"these were considered constant values during most recent training."
)
return df_predictions
def principal_component_analysis(self) -> None:
"""
Performs Principal Component Analysis on the data for dimensionality reduction
and outlier detection (see self.remove_outliers())
No parameters or returns, it acts on the data_dictionary held by the DataHandler.
"""
from sklearn.decomposition import PCA # avoid importing if we dont need it
pca = PCA(0.999)
pca = pca.fit(self.data_dictionary["train_features"])
n_keep_components = pca.n_components_
self.data["n_kept_components"] = n_keep_components
n_components = self.data_dictionary["train_features"].shape[1]
logger.info("reduced feature dimension by %s", n_components - n_keep_components)
logger.info("explained variance %f", np.sum(pca.explained_variance_ratio_))
train_components = pca.transform(self.data_dictionary["train_features"])
self.data_dictionary["train_features"] = pd.DataFrame(
data=train_components,
columns=["PC" + str(i) for i in range(0, n_keep_components)],
index=self.data_dictionary["train_features"].index,
)
# normalsing transformed training features
self.data_dictionary["train_features"] = self.normalize_single_dataframe(
self.data_dictionary["train_features"])
# keeping a copy of the non-transformed features so we can check for errors during
# model load from disk
self.data["training_features_list_raw"] = copy.deepcopy(self.training_features_list)
self.training_features_list = self.data_dictionary["train_features"].columns
if self.freqai_config.get('data_split_parameters', {}).get('test_size', 0.1) != 0:
test_components = pca.transform(self.data_dictionary["test_features"])
self.data_dictionary["test_features"] = pd.DataFrame(
data=test_components,
columns=["PC" + str(i) for i in range(0, n_keep_components)],
index=self.data_dictionary["test_features"].index,
)
# normalise transformed test feature to transformed training features
self.data_dictionary["test_features"] = self.normalize_data_from_metadata(
self.data_dictionary["test_features"])
self.data["n_kept_components"] = n_keep_components
self.pca = pca
logger.info(f"PCA reduced total features from {n_components} to {n_keep_components}")
if not self.data_path.is_dir():
self.data_path.mkdir(parents=True, exist_ok=True)
return None
def pca_transform(self, filtered_dataframe: DataFrame) -> None:
"""
Use an existing pca transform to transform data into components
:param filtered_dataframe: DataFrame = the cleaned dataframe
"""
pca_components = self.pca.transform(filtered_dataframe)
self.data_dictionary["prediction_features"] = pd.DataFrame(
data=pca_components,
columns=["PC" + str(i) for i in range(0, self.data["n_kept_components"])],
index=filtered_dataframe.index,
)
# normalise transformed predictions to transformed training features
self.data_dictionary["prediction_features"] = self.normalize_data_from_metadata(
self.data_dictionary["prediction_features"])
def compute_distances(self) -> float:
"""
Compute distances between each training point and every other training
point. This metric defines the neighborhood of trained data and is used
for prediction confidence in the Dissimilarity Index
"""
# logger.info("computing average mean distance for all training points")
pairwise = pairwise_distances(
self.data_dictionary["train_features"], n_jobs=self.thread_count)
# remove the diagonal distances which are itself distances ~0
np.fill_diagonal(pairwise, np.NaN)
pairwise = pairwise.reshape(-1, 1)
avg_mean_dist = pairwise[~np.isnan(pairwise)].mean()
return avg_mean_dist
def get_outlier_percentage(self, dropped_pts: npt.NDArray) -> float:
"""
Check if more than X% of points werer dropped during outlier detection.
"""
outlier_protection_pct = self.freqai_config["feature_parameters"].get(
"outlier_protection_percentage", 30)
outlier_pct = (dropped_pts.sum() / len(dropped_pts)) * 100
if outlier_pct >= outlier_protection_pct:
return outlier_pct
else:
return 0.0
def use_SVM_to_remove_outliers(self, predict: bool) -> None:
"""
Build/inference a Support Vector Machine to detect outliers
in training data and prediction
:param predict: bool = If true, inference an existing SVM model, else construct one
"""
if self.keras:
logger.warning(
"SVM outlier removal not currently supported for Keras based models. "
"Skipping user requested function."
)
if predict:
self.do_predict = np.ones(len(self.data_dictionary["prediction_features"]))
return
if predict:
if not self.svm_model:
logger.warning("No svm model available for outlier removal")
return
y_pred = self.svm_model.predict(self.data_dictionary["prediction_features"])
do_predict = np.where(y_pred == -1, 0, y_pred)
if (len(do_predict) - do_predict.sum()) > 0:
logger.info(f"SVM tossed {len(do_predict) - do_predict.sum()} predictions.")
self.do_predict += do_predict
self.do_predict -= 1
else:
# use SGDOneClassSVM to increase speed?
svm_params = self.freqai_config["feature_parameters"].get(
"svm_params", {"shuffle": False, "nu": 0.1})
self.svm_model = linear_model.SGDOneClassSVM(**svm_params).fit(
self.data_dictionary["train_features"]
)
y_pred = self.svm_model.predict(self.data_dictionary["train_features"])
kept_points = np.where(y_pred == -1, 0, y_pred)
# keep_index = np.where(y_pred == 1)
outlier_pct = self.get_outlier_percentage(1 - kept_points)
if outlier_pct:
logger.warning(
f"SVM detected {outlier_pct:.2f}% of the points as outliers. "
f"Keeping original dataset."
)
self.svm_model = None
return
self.data_dictionary["train_features"] = self.data_dictionary["train_features"][
(y_pred == 1)
]
self.data_dictionary["train_labels"] = self.data_dictionary["train_labels"][
(y_pred == 1)
]
self.data_dictionary["train_weights"] = self.data_dictionary["train_weights"][
(y_pred == 1)
]
logger.info(
f"SVM tossed {len(y_pred) - kept_points.sum()}"
f" train points from {len(y_pred)} total points."
)
# same for test data
# TODO: This (and the part above) could be refactored into a separate function
# to reduce code duplication
if self.freqai_config['data_split_parameters'].get('test_size', 0.1) != 0:
y_pred = self.svm_model.predict(self.data_dictionary["test_features"])
kept_points = np.where(y_pred == -1, 0, y_pred)
self.data_dictionary["test_features"] = self.data_dictionary["test_features"][
(y_pred == 1)
]
self.data_dictionary["test_labels"] = self.data_dictionary["test_labels"][(
y_pred == 1)]
self.data_dictionary["test_weights"] = self.data_dictionary["test_weights"][
(y_pred == 1)
]
logger.info(
f"{self.pair}: SVM tossed {len(y_pred) - kept_points.sum()}"
f" test points from {len(y_pred)} total points."
)
return
def use_DBSCAN_to_remove_outliers(self, predict: bool, eps=None) -> None:
"""
Use DBSCAN to cluster training data and remove "noisy" data (read outliers).
User controls this via the config param `DBSCAN_outlier_pct` which indicates the
pct of training data that they want to be considered outliers.
:param predict: bool = If False (training), iterate to find the best hyper parameters
to match user requested outlier percent target.
If True (prediction), use the parameters determined from
the previous training to estimate if the current prediction point
is an outlier.
"""
if predict:
if not self.data['DBSCAN_eps']:
return
train_ft_df = self.data_dictionary['train_features']
pred_ft_df = self.data_dictionary['prediction_features']
num_preds = len(pred_ft_df)
df = pd.concat([train_ft_df, pred_ft_df], axis=0, ignore_index=True)
clustering = DBSCAN(eps=self.data['DBSCAN_eps'],
min_samples=self.data['DBSCAN_min_samples'],
n_jobs=self.thread_count
).fit(df)
do_predict = np.where(clustering.labels_[-num_preds:] == -1, 0, 1)
if (len(do_predict) - do_predict.sum()) > 0:
logger.info(f"DBSCAN tossed {len(do_predict) - do_predict.sum()} predictions")
self.do_predict += do_predict
self.do_predict -= 1
else:
def normalise_distances(distances):
normalised_distances = (distances - distances.min()) / \
(distances.max() - distances.min())
return normalised_distances
def rotate_point(origin, point, angle):
# rotate a point counterclockwise by a given angle (in radians)
# around a given origin
x = origin[0] + cos(angle) * (point[0] - origin[0]) - \
sin(angle) * (point[1] - origin[1])
y = origin[1] + sin(angle) * (point[0] - origin[0]) + \
cos(angle) * (point[1] - origin[1])
return (x, y)
MinPts = int(len(self.data_dictionary['train_features'].index) * 0.25)
# measure pairwise distances to nearest neighbours
neighbors = NearestNeighbors(
n_neighbors=MinPts, n_jobs=self.thread_count)
neighbors_fit = neighbors.fit(self.data_dictionary['train_features'])
distances, _ = neighbors_fit.kneighbors(self.data_dictionary['train_features'])
distances = np.sort(distances, axis=0).mean(axis=1)
normalised_distances = normalise_distances(distances)
x_range = np.linspace(0, 1, len(distances))
line = np.linspace(normalised_distances[0],
normalised_distances[-1], len(normalised_distances))
deflection = np.abs(normalised_distances - line)
max_deflection_loc = np.where(deflection == deflection.max())[0][0]
origin = x_range[max_deflection_loc], line[max_deflection_loc]
point = x_range[max_deflection_loc], normalised_distances[max_deflection_loc]
rot_angle = np.pi / 4
elbow_loc = rotate_point(origin, point, rot_angle)
epsilon = elbow_loc[1] * (distances[-1] - distances[0]) + distances[0]
clustering = DBSCAN(eps=epsilon, min_samples=MinPts,
n_jobs=int(self.thread_count)).fit(
self.data_dictionary['train_features']
)
logger.info(f'DBSCAN found eps of {epsilon:.2f}.')
self.data['DBSCAN_eps'] = epsilon
self.data['DBSCAN_min_samples'] = MinPts
dropped_points = np.where(clustering.labels_ == -1, 1, 0)
outlier_pct = self.get_outlier_percentage(dropped_points)
if outlier_pct:
logger.warning(
f"DBSCAN detected {outlier_pct:.2f}% of the points as outliers. "
f"Keeping original dataset."
)
self.data['DBSCAN_eps'] = 0
return
self.data_dictionary['train_features'] = self.data_dictionary['train_features'][
(clustering.labels_ != -1)
]
self.data_dictionary["train_labels"] = self.data_dictionary["train_labels"][
(clustering.labels_ != -1)
]
self.data_dictionary["train_weights"] = self.data_dictionary["train_weights"][
(clustering.labels_ != -1)
]
logger.info(
f"DBSCAN tossed {dropped_points.sum()}"
f" train points from {len(clustering.labels_)}"
)
return
def compute_inlier_metric(self, set_='train') -> None:
"""
Compute inlier metric from backwards distance distributions.
This metric defines how well features from a timepoint fit
into previous timepoints.
"""
def normalise(dataframe: DataFrame, key: str) -> DataFrame:
if set_ == 'train':
min_value = dataframe.min()
max_value = dataframe.max()
self.data[f'{key}_min'] = min_value
self.data[f'{key}_max'] = max_value
else:
min_value = self.data[f'{key}_min']
max_value = self.data[f'{key}_max']
return (dataframe - min_value) / (max_value - min_value)
no_prev_pts = self.freqai_config["feature_parameters"]["inlier_metric_window"]
if set_ == 'train':
compute_df = copy.deepcopy(self.data_dictionary['train_features'])
elif set_ == 'test':
compute_df = copy.deepcopy(self.data_dictionary['test_features'])
else:
compute_df = copy.deepcopy(self.data_dictionary['prediction_features'])
compute_df_reindexed = compute_df.reindex(
index=np.flip(compute_df.index)
)
pairwise = pd.DataFrame(
np.triu(
pairwise_distances(compute_df_reindexed, n_jobs=self.thread_count)
),
columns=compute_df_reindexed.index,
index=compute_df_reindexed.index
)
pairwise = pairwise.round(5)
column_labels = [
'{}{}'.format('d', i) for i in range(1, no_prev_pts + 1)
]
distances = pd.DataFrame(
columns=column_labels, index=compute_df.index
)
for index in compute_df.index[no_prev_pts:]:
current_row = pairwise.loc[[index]]
current_row_no_zeros = current_row.loc[
:, (current_row != 0).any(axis=0)
]
distances.loc[[index]] = current_row_no_zeros.iloc[
:, :no_prev_pts
]
distances = distances.replace([np.inf, -np.inf], np.nan)
drop_index = pd.isnull(distances).any(axis=1)
distances = distances[drop_index == 0]
inliers = pd.DataFrame(index=distances.index)
for key in distances.keys():
current_distances = distances[key].dropna()
current_distances = normalise(current_distances, key)
if set_ == 'train':
fit_params = stats.weibull_min.fit(current_distances)
self.data[f'{key}_fit_params'] = fit_params
else:
fit_params = self.data[f'{key}_fit_params']
quantiles = stats.weibull_min.cdf(current_distances, *fit_params)
df_inlier = pd.DataFrame(
{key: quantiles}, index=distances.index
)
inliers = pd.concat(
[inliers, df_inlier], axis=1
)
inlier_metric = pd.DataFrame(
data=inliers.sum(axis=1) / no_prev_pts,
columns=['%-inlier_metric'],
index=compute_df.index
)
inlier_metric = (2 * (inlier_metric - inlier_metric.min()) /
(inlier_metric.max() - inlier_metric.min()) - 1)
if set_ in ('train', 'test'):
inlier_metric = inlier_metric.iloc[no_prev_pts:]
compute_df = compute_df.iloc[no_prev_pts:]
self.remove_beginning_points_from_data_dict(set_, no_prev_pts)
self.data_dictionary[f'{set_}_features'] = pd.concat(
[compute_df, inlier_metric], axis=1)
else:
self.data_dictionary['prediction_features'] = pd.concat(
[compute_df, inlier_metric], axis=1)
self.data_dictionary['prediction_features'].fillna(0, inplace=True)
logger.info('Inlier metric computed and added to features.')
return None
def remove_beginning_points_from_data_dict(self, set_='train', no_prev_pts: int = 10):
features = self.data_dictionary[f'{set_}_features']
weights = self.data_dictionary[f'{set_}_weights']
labels = self.data_dictionary[f'{set_}_labels']
self.data_dictionary[f'{set_}_weights'] = weights[no_prev_pts:]
self.data_dictionary[f'{set_}_features'] = features.iloc[no_prev_pts:]
self.data_dictionary[f'{set_}_labels'] = labels.iloc[no_prev_pts:]
def add_noise_to_training_features(self) -> None:
"""
Add noise to train features to reduce the risk of overfitting.
"""
mu = 0 # no shift
sigma = self.freqai_config["feature_parameters"]["noise_standard_deviation"]
compute_df = self.data_dictionary['train_features']
noise = np.random.normal(mu, sigma, [compute_df.shape[0], compute_df.shape[1]])
self.data_dictionary['train_features'] += noise
return
def find_features(self, dataframe: DataFrame) -> None: def find_features(self, dataframe: DataFrame) -> None:
""" """
Find features in the strategy provided dataframe Find features in the strategy provided dataframe
@ -925,37 +391,6 @@ class FreqaiDataKitchen:
labels = [c for c in column_names if "&" in c] labels = [c for c in column_names if "&" in c]
self.label_list = labels self.label_list = labels
def check_if_pred_in_training_spaces(self) -> None:
"""
Compares the distance from each prediction point to each training data
point. It uses this information to estimate a Dissimilarity Index (DI)
and avoid making predictions on any points that are too far away
from the training data set.
"""
distance = pairwise_distances(
self.data_dictionary["train_features"],
self.data_dictionary["prediction_features"],
n_jobs=self.thread_count,
)
self.DI_values = distance.min(axis=0) / self.data["avg_mean_dist"]
do_predict = np.where(
self.DI_values < self.freqai_config["feature_parameters"]["DI_threshold"],
1,
0,
)
if (len(do_predict) - do_predict.sum()) > 0:
logger.info(
f"{self.pair}: DI tossed {len(do_predict) - do_predict.sum()} predictions for "
"being too far from training data."
)
self.do_predict += do_predict
self.do_predict -= 1
def set_weights_higher_recent(self, num_weights: int) -> npt.ArrayLike: def set_weights_higher_recent(self, num_weights: int) -> npt.ArrayLike:
""" """
Set weights so that recent data is more heavily weighted during Set weights so that recent data is more heavily weighted during
@ -1325,9 +760,9 @@ class FreqaiDataKitchen:
" which was deprecated on March 1, 2023. Please refer " " which was deprecated on March 1, 2023. Please refer "
"to the strategy migration guide to use the new " "to the strategy migration guide to use the new "
"feature_engineering_* methods: \n" "feature_engineering_* methods: \n"
"https://www.freqtrade.io/en/stable/strategy_migration/#freqai-strategy \n" f"{DOCS_LINK}/strategy_migration/#freqai-strategy \n"
"And the feature_engineering_* documentation: \n" "And the feature_engineering_* documentation: \n"
"https://www.freqtrade.io/en/latest/freqai-feature-engineering/" f"{DOCS_LINK}/freqai-feature-engineering/"
) )
tfs: List[str] = self.freqai_config["feature_parameters"].get("include_timeframes") tfs: List[str] = self.freqai_config["feature_parameters"].get("include_timeframes")
@ -1515,3 +950,32 @@ class FreqaiDataKitchen:
timerange.startts += buffer * timeframe_to_seconds(self.config["timeframe"]) timerange.startts += buffer * timeframe_to_seconds(self.config["timeframe"])
return timerange return timerange
# deprecated functions
def normalize_data(self, data_dictionary: Dict) -> Dict[Any, Any]:
"""
Deprecation warning, migration assistance
"""
logger.warning(f"Your custom IFreqaiModel relies on the deprecated"
" data pipeline. Please update your model to use the new data pipeline."
" This can be achieved by following the migration guide at "
f"{DOCS_LINK}/strategy_migration/#freqai-new-data-pipeline "
"We added a basic pipeline for you, but this will be removed "
"in a future version.")
return data_dictionary
def denormalize_labels_from_metadata(self, df: DataFrame) -> DataFrame:
"""
Deprecation warning, migration assistance
"""
logger.warning(f"Your custom IFreqaiModel relies on the deprecated"
" data pipeline. Please update your model to use the new data pipeline."
" This can be achieved by following the migration guide at "
f"{DOCS_LINK}/strategy_migration/#freqai-new-data-pipeline "
"We added a basic pipeline for you, but this will be removed "
"in a future version.")
pred_df, _, _ = self.label_pipeline.inverse_transform(df)
return pred_df

View File

@ -7,14 +7,18 @@ from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
from typing import Any, Dict, List, Literal, Optional, Tuple from typing import Any, Dict, List, Literal, Optional, Tuple
import datasieve.transforms as ds
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import psutil import psutil
from datasieve.pipeline import Pipeline
from datasieve.transforms import SKLearnWrapper
from numpy.typing import NDArray from numpy.typing import NDArray
from pandas import DataFrame from pandas import DataFrame
from sklearn.preprocessing import MinMaxScaler
from freqtrade.configuration import TimeRange from freqtrade.configuration import TimeRange
from freqtrade.constants import Config from freqtrade.constants import DOCS_LINK, Config
from freqtrade.data.dataprovider import DataProvider from freqtrade.data.dataprovider import DataProvider
from freqtrade.enums import RunMode from freqtrade.enums import RunMode
from freqtrade.exceptions import OperationalException from freqtrade.exceptions import OperationalException
@ -503,68 +507,43 @@ class IFreqaiModel(ABC):
"feature_engineering_* functions" "feature_engineering_* functions"
) )
def data_cleaning_train(self, dk: FreqaiDataKitchen) -> None: def define_data_pipeline(self, threads=-1) -> Pipeline:
"""
Base data cleaning method for train.
Functions here improve/modify the input data by identifying outliers,
computing additional metrics, adding noise, reducing dimensionality etc.
"""
ft_params = self.freqai_info["feature_parameters"] ft_params = self.freqai_info["feature_parameters"]
pipe_steps = [
('const', ds.VarianceThreshold(threshold=0)),
('scaler', SKLearnWrapper(MinMaxScaler(feature_range=(-1, 1))))
]
if ft_params.get('inlier_metric_window', 0): if ft_params.get("principal_component_analysis", False):
dk.compute_inlier_metric(set_='train') pipe_steps.append(('pca', ds.PCA()))
if self.freqai_info["data_split_parameters"]["test_size"] > 0: pipe_steps.append(('post-pca-scaler',
dk.compute_inlier_metric(set_='test') SKLearnWrapper(MinMaxScaler(feature_range=(-1, 1)))))
if ft_params.get(
"principal_component_analysis", False
):
dk.principal_component_analysis()
if ft_params.get("use_SVM_to_remove_outliers", False): if ft_params.get("use_SVM_to_remove_outliers", False):
dk.use_SVM_to_remove_outliers(predict=False) svm_params = ft_params.get(
"svm_params", {"shuffle": False, "nu": 0.01})
pipe_steps.append(('svm', ds.SVMOutlierExtractor(**svm_params)))
if ft_params.get("DI_threshold", 0): di = ft_params.get("DI_threshold", 0)
dk.data["avg_mean_dist"] = dk.compute_distances() if di:
pipe_steps.append(('di', ds.DissimilarityIndex(di_threshold=di, n_jobs=threads)))
if ft_params.get("use_DBSCAN_to_remove_outliers", False): if ft_params.get("use_DBSCAN_to_remove_outliers", False):
if dk.pair in self.dd.old_DBSCAN_eps: pipe_steps.append(('dbscan', ds.DBSCAN(n_jobs=threads)))
eps = self.dd.old_DBSCAN_eps[dk.pair]
else:
eps = None
dk.use_DBSCAN_to_remove_outliers(predict=False, eps=eps)
self.dd.old_DBSCAN_eps[dk.pair] = dk.data['DBSCAN_eps']
if self.freqai_info["feature_parameters"].get('noise_standard_deviation', 0): sigma = self.freqai_info["feature_parameters"].get('noise_standard_deviation', 0)
dk.add_noise_to_training_features() if sigma:
pipe_steps.append(('noise', ds.Noise(sigma=sigma)))
def data_cleaning_predict(self, dk: FreqaiDataKitchen) -> None: return Pipeline(pipe_steps)
"""
Base data cleaning method for predict.
Functions here are complementary to the functions of data_cleaning_train.
"""
ft_params = self.freqai_info["feature_parameters"]
# ensure user is feeding the correct indicators to the model def define_label_pipeline(self, threads=-1) -> Pipeline:
self.check_if_feature_list_matches_strategy(dk)
if ft_params.get('inlier_metric_window', 0): label_pipeline = Pipeline([
dk.compute_inlier_metric(set_='predict') ('scaler', SKLearnWrapper(MinMaxScaler(feature_range=(-1, 1))))
])
if ft_params.get( return label_pipeline
"principal_component_analysis", False
):
dk.pca_transform(dk.data_dictionary['prediction_features'])
if ft_params.get("use_SVM_to_remove_outliers", False):
dk.use_SVM_to_remove_outliers(predict=True)
if ft_params.get("DI_threshold", 0):
dk.check_if_pred_in_training_spaces()
if ft_params.get("use_DBSCAN_to_remove_outliers", False):
dk.use_DBSCAN_to_remove_outliers(predict=True)
def model_exists(self, dk: FreqaiDataKitchen) -> bool: def model_exists(self, dk: FreqaiDataKitchen) -> bool:
""" """
@ -576,8 +555,6 @@ class IFreqaiModel(ABC):
""" """
if self.dd.model_type == 'joblib': if self.dd.model_type == 'joblib':
file_type = ".joblib" file_type = ".joblib"
elif self.dd.model_type == 'keras':
file_type = ".h5"
elif self.dd.model_type in ["stable_baselines3", "sb3_contrib", "pytorch"]: elif self.dd.model_type in ["stable_baselines3", "sb3_contrib", "pytorch"]:
file_type = ".zip" file_type = ".zip"
@ -701,7 +678,7 @@ class IFreqaiModel(ABC):
# # for keras type models, the conv_window needs to be prepended so # # for keras type models, the conv_window needs to be prepended so
# # viewing is correct in frequi # # viewing is correct in frequi
if self.freqai_info.get('keras', False) or self.ft_params.get('inlier_metric_window', 0): if self.ft_params.get('inlier_metric_window', 0):
n_lost_points = self.freqai_info.get('conv_width', 2) n_lost_points = self.freqai_info.get('conv_width', 2)
zeros_df = DataFrame(np.zeros((n_lost_points, len(hist_preds_df.columns))), zeros_df = DataFrame(np.zeros((n_lost_points, len(hist_preds_df.columns))),
columns=hist_preds_df.columns) columns=hist_preds_df.columns)
@ -991,3 +968,50 @@ class IFreqaiModel(ABC):
:do_predict: np.array of 1s and 0s to indicate places where freqai needed to remove :do_predict: np.array of 1s and 0s to indicate places where freqai needed to remove
data (NaNs) or felt uncertain about data (i.e. SVM and/or DI index) data (NaNs) or felt uncertain about data (i.e. SVM and/or DI index)
""" """
# deprecated functions
def data_cleaning_train(self, dk: FreqaiDataKitchen, pair: str):
"""
throw deprecation warning if this function is called
"""
logger.warning(f"Your model {self.__class__.__name__} relies on the deprecated"
" data pipeline. Please update your model to use the new data pipeline."
" This can be achieved by following the migration guide at "
f"{DOCS_LINK}/strategy_migration/#freqai-new-data-pipeline")
dk.feature_pipeline = self.define_data_pipeline(threads=dk.thread_count)
dd = dk.data_dictionary
(dd["train_features"],
dd["train_labels"],
dd["train_weights"]) = dk.feature_pipeline.fit_transform(dd["train_features"],
dd["train_labels"],
dd["train_weights"])
(dd["test_features"],
dd["test_labels"],
dd["test_weights"]) = dk.feature_pipeline.transform(dd["test_features"],
dd["test_labels"],
dd["test_weights"])
dk.label_pipeline = self.define_label_pipeline(threads=dk.thread_count)
dd["train_labels"], _, _ = dk.label_pipeline.fit_transform(dd["train_labels"])
dd["test_labels"], _, _ = dk.label_pipeline.transform(dd["test_labels"])
return
def data_cleaning_predict(self, dk: FreqaiDataKitchen, pair: str):
"""
throw deprecation warning if this function is called
"""
logger.warning(f"Your model {self.__class__.__name__} relies on the deprecated"
" data pipeline. Please update your model to use the new data pipeline."
" This can be achieved by following the migration guide at "
f"{DOCS_LINK}/strategy_migration/#freqai-new-data-pipeline")
dd = dk.data_dictionary
dd["predict_features"], outliers, _ = dk.feature_pipeline.transform(
dd["predict_features"], outlier_check=True)
if self.freqai_info.get("DI_threshold", 0) > 0:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
return

View File

@ -103,13 +103,13 @@ class PyTorchTransformerRegressor(BasePyTorchRegressor):
""" """
dk.find_features(unfiltered_df) dk.find_features(unfiltered_df)
filtered_df, _ = dk.filter_features( dk.data_dictionary["prediction_features"], _ = dk.filter_features(
unfiltered_df, dk.training_features_list, training_filter=False unfiltered_df, dk.training_features_list, training_filter=False
) )
filtered_df = dk.normalize_data_from_metadata(filtered_df)
dk.data_dictionary["prediction_features"] = filtered_df
self.data_cleaning_predict(dk) dk.data_dictionary["prediction_features"], outliers, _ = dk.feature_pipeline.transform(
dk.data_dictionary["prediction_features"], outlier_check=True)
x = self.data_convertor.convert_x( x = self.data_convertor.convert_x(
dk.data_dictionary["prediction_features"], dk.data_dictionary["prediction_features"],
device=self.device device=self.device
@ -131,7 +131,13 @@ class PyTorchTransformerRegressor(BasePyTorchRegressor):
yb = yb.cpu().squeeze() yb = yb.cpu().squeeze()
pred_df = pd.DataFrame(yb.detach().numpy(), columns=dk.label_list) pred_df = pd.DataFrame(yb.detach().numpy(), columns=dk.label_list)
pred_df = dk.denormalize_labels_from_metadata(pred_df) pred_df, _, _ = dk.label_pipeline.inverse_transform(pred_df)
if self.freqai_info.get("DI_threshold", 0) > 0:
dk.DI_values = dk.feature_pipeline["di"].di_values
else:
dk.DI_values = np.zeros(len(outliers.index))
dk.do_predict = outliers.to_numpy()
if x.shape[1] > 1: if x.shape[1] > 1:
zeros_df = pd.DataFrame(np.zeros((x.shape[1] - len(pred_df), len(pred_df.columns))), zeros_df = pd.DataFrame(np.zeros((x.shape[1] - len(pred_df), len(pred_df.columns))),

View File

@ -5,6 +5,7 @@ from xgboost import XGBRFRegressor
from freqtrade.freqai.base_models.BaseRegressionModel import BaseRegressionModel from freqtrade.freqai.base_models.BaseRegressionModel import BaseRegressionModel
from freqtrade.freqai.data_kitchen import FreqaiDataKitchen from freqtrade.freqai.data_kitchen import FreqaiDataKitchen
from freqtrade.freqai.tensorboard import TBCallback
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@ -44,7 +45,10 @@ class XGBoostRFRegressor(BaseRegressionModel):
model = XGBRFRegressor(**self.model_training_parameters) model = XGBRFRegressor(**self.model_training_parameters)
model.set_params(callbacks=[TBCallback(dk.data_path)], activate=self.activate_tensorboard)
model.fit(X=X, y=y, sample_weight=sample_weight, eval_set=eval_set, model.fit(X=X, y=y, sample_weight=sample_weight, eval_set=eval_set,
sample_weight_eval_set=eval_weights, xgb_model=xgb_model) sample_weight_eval_set=eval_weights, xgb_model=xgb_model)
# set the callbacks to empty so that we can serialize to disk later
model.set_params(callbacks=[])
return model return model

View File

@ -34,7 +34,7 @@ class FreqaiModelResolver(IResolver):
Load the custom class from config parameter Load the custom class from config parameter
:param config: configuration dictionary :param config: configuration dictionary
""" """
disallowed_models = ["BaseRegressionModel", "BaseTensorFlowModel"] disallowed_models = ["BaseRegressionModel"]
freqaimodel_name = config.get("freqaimodel") freqaimodel_name = config.get("freqaimodel")
if not freqaimodel_name: if not freqaimodel_name:

View File

@ -3,7 +3,7 @@ import logging
from packaging import version from packaging import version
from sqlalchemy import select from sqlalchemy import select
from freqtrade.constants import Config from freqtrade.constants import DOCS_LINK, Config
from freqtrade.enums.tradingmode import TradingMode from freqtrade.enums.tradingmode import TradingMode
from freqtrade.exceptions import OperationalException from freqtrade.exceptions import OperationalException
from freqtrade.persistence.pairlock import PairLock from freqtrade.persistence.pairlock import PairLock
@ -25,7 +25,7 @@ def migrate_binance_futures_names(config: Config):
if version.parse("2.6.26") > version.parse(ccxt.__version__): if version.parse("2.6.26") > version.parse(ccxt.__version__):
raise OperationalException( raise OperationalException(
"Please follow the update instructions in the docs " "Please follow the update instructions in the docs "
"(https://www.freqtrade.io/en/latest/updating/) to install a compatible ccxt version.") f"({DOCS_LINK}/updating/) to install a compatible ccxt version.")
_migrate_binance_futures_db(config) _migrate_binance_futures_db(config)
migrate_binance_futures_data(config) migrate_binance_futures_data(config)

View File

@ -9,3 +9,4 @@ catboost==1.2; 'arm' not in platform_machine
lightgbm==3.3.5 lightgbm==3.3.5
xgboost==1.7.5 xgboost==1.7.5
tensorboard==2.13.0 tensorboard==2.13.0
datasieve==0.1.5

View File

@ -16,7 +16,8 @@ freqai = [
'catboost; platform_machine != "aarch64"', 'catboost; platform_machine != "aarch64"',
'lightgbm', 'lightgbm',
'xgboost', 'xgboost',
'tensorboard' 'tensorboard',
'datasieve>=0.1.5'
] ]
freqai_rl = [ freqai_rl = [

View File

@ -9,9 +9,9 @@ from freqtrade.configuration import TimeRange
from freqtrade.data.dataprovider import DataProvider from freqtrade.data.dataprovider import DataProvider
from freqtrade.exceptions import OperationalException from freqtrade.exceptions import OperationalException
from freqtrade.freqai.data_kitchen import FreqaiDataKitchen from freqtrade.freqai.data_kitchen import FreqaiDataKitchen
from tests.conftest import get_patched_exchange, log_has_re from tests.conftest import get_patched_exchange
from tests.freqai.conftest import (get_patched_data_kitchen, get_patched_freqai_strategy, from tests.freqai.conftest import (get_patched_data_kitchen, get_patched_freqai_strategy,
make_data_dictionary, make_unfiltered_dataframe) make_unfiltered_dataframe)
from tests.freqai.test_freqai_interface import is_mac from tests.freqai.test_freqai_interface import is_mac
@ -72,68 +72,6 @@ def test_check_if_model_expired(mocker, freqai_conf):
shutil.rmtree(Path(dk.full_path)) shutil.rmtree(Path(dk.full_path))
def test_use_DBSCAN_to_remove_outliers(mocker, freqai_conf, caplog):
freqai = make_data_dictionary(mocker, freqai_conf)
# freqai_conf['freqai']['feature_parameters'].update({"outlier_protection_percentage": 1})
freqai.dk.use_DBSCAN_to_remove_outliers(predict=False)
assert log_has_re(r"DBSCAN found eps of 1\.7\d\.", caplog)
def test_compute_distances(mocker, freqai_conf):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai_conf['freqai']['feature_parameters'].update({"DI_threshold": 1})
avg_mean_dist = freqai.dk.compute_distances()
assert round(avg_mean_dist, 2) == 1.98
def test_use_SVM_to_remove_outliers_and_outlier_protection(mocker, freqai_conf, caplog):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai_conf['freqai']['feature_parameters'].update({"outlier_protection_percentage": 0.1})
freqai.dk.use_SVM_to_remove_outliers(predict=False)
assert log_has_re(
"SVM detected 7.83%",
caplog,
)
def test_compute_inlier_metric(mocker, freqai_conf, caplog):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai_conf['freqai']['feature_parameters'].update({"inlier_metric_window": 10})
freqai.dk.compute_inlier_metric(set_='train')
assert log_has_re(
"Inlier metric computed and added to features.",
caplog,
)
def test_add_noise_to_training_features(mocker, freqai_conf):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai_conf['freqai']['feature_parameters'].update({"noise_standard_deviation": 0.1})
freqai.dk.add_noise_to_training_features()
def test_remove_beginning_points_from_data_dict(mocker, freqai_conf):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai.dk.remove_beginning_points_from_data_dict(set_='train')
def test_principal_component_analysis(mocker, freqai_conf, caplog):
freqai = make_data_dictionary(mocker, freqai_conf)
freqai.dk.principal_component_analysis()
assert log_has_re(
"reduced feature dimension by",
caplog,
)
def test_normalize_data(mocker, freqai_conf):
freqai = make_data_dictionary(mocker, freqai_conf)
data_dict = freqai.dk.data_dictionary
freqai.dk.normalize_data(data_dict)
assert any('_max' in entry for entry in freqai.dk.data.keys())
assert any('_min' in entry for entry in freqai.dk.data.keys())
def test_filter_features(mocker, freqai_conf): def test_filter_features(mocker, freqai_conf):
freqai, unfiltered_dataframe = make_unfiltered_dataframe(mocker, freqai_conf) freqai, unfiltered_dataframe = make_unfiltered_dataframe(mocker, freqai_conf)
freqai.dk.find_features(unfiltered_dataframe) freqai.dk.find_features(unfiltered_dataframe)

View File

@ -38,21 +38,22 @@ def can_run_model(model: str) -> None:
pytest.skip("Reinforcement learning / PyTorch module not available on intel based Mac OS.") pytest.skip("Reinforcement learning / PyTorch module not available on intel based Mac OS.")
@pytest.mark.parametrize('model, pca, dbscan, float32, can_short, shuffle, buffer', [ @pytest.mark.parametrize('model, pca, dbscan, float32, can_short, shuffle, buffer, noise', [
('LightGBMRegressor', True, False, True, True, False, 0), ('LightGBMRegressor', True, False, True, True, False, 0, 0),
('XGBoostRegressor', False, True, False, True, False, 10), ('XGBoostRegressor', False, True, False, True, False, 10, 0.05),
('XGBoostRFRegressor', False, False, False, True, False, 0), ('XGBoostRFRegressor', False, False, False, True, False, 0, 0),
('CatboostRegressor', False, False, False, True, True, 0), ('CatboostRegressor', False, False, False, True, True, 0, 0),
('PyTorchMLPRegressor', False, False, False, False, False, 0), ('PyTorchMLPRegressor', False, False, False, False, False, 0, 0),
('PyTorchTransformerRegressor', False, False, False, False, False, 0), ('PyTorchTransformerRegressor', False, False, False, False, False, 0, 0),
('ReinforcementLearner', False, True, False, True, False, 0), ('ReinforcementLearner', False, True, False, True, False, 0, 0),
('ReinforcementLearner_multiproc', False, False, False, True, False, 0), ('ReinforcementLearner_multiproc', False, False, False, True, False, 0, 0),
('ReinforcementLearner_test_3ac', False, False, False, False, False, 0), ('ReinforcementLearner_test_3ac', False, False, False, False, False, 0, 0),
('ReinforcementLearner_test_3ac', False, False, False, True, False, 0), ('ReinforcementLearner_test_3ac', False, False, False, True, False, 0, 0),
('ReinforcementLearner_test_4ac', False, False, False, True, False, 0), ('ReinforcementLearner_test_4ac', False, False, False, True, False, 0, 0),
]) ])
def test_extract_data_and_train_model_Standard(mocker, freqai_conf, model, pca, def test_extract_data_and_train_model_Standard(mocker, freqai_conf, model, pca,
dbscan, float32, can_short, shuffle, buffer): dbscan, float32, can_short, shuffle,
buffer, noise):
can_run_model(model) can_run_model(model)
@ -69,12 +70,14 @@ def test_extract_data_and_train_model_Standard(mocker, freqai_conf, model, pca,
freqai_conf.update({"reduce_df_footprint": float32}) freqai_conf.update({"reduce_df_footprint": float32})
freqai_conf['freqai']['feature_parameters'].update({"shuffle_after_split": shuffle}) freqai_conf['freqai']['feature_parameters'].update({"shuffle_after_split": shuffle})
freqai_conf['freqai']['feature_parameters'].update({"buffer_train_data_candles": buffer}) freqai_conf['freqai']['feature_parameters'].update({"buffer_train_data_candles": buffer})
freqai_conf['freqai']['feature_parameters'].update({"noise_standard_deviation": noise})
if 'ReinforcementLearner' in model: if 'ReinforcementLearner' in model:
model_save_ext = 'zip' model_save_ext = 'zip'
freqai_conf = make_rl_config(freqai_conf) freqai_conf = make_rl_config(freqai_conf)
# test the RL guardrails # test the RL guardrails
freqai_conf['freqai']['feature_parameters'].update({"use_SVM_to_remove_outliers": True}) freqai_conf['freqai']['feature_parameters'].update({"use_SVM_to_remove_outliers": True})
freqai_conf['freqai']['feature_parameters'].update({"DI_threshold": 2})
freqai_conf['freqai']['data_split_parameters'].update({'shuffle': True}) freqai_conf['freqai']['data_split_parameters'].update({'shuffle': True})
if 'test_3ac' in model or 'test_4ac' in model: if 'test_3ac' in model or 'test_4ac' in model:
@ -163,7 +166,6 @@ def test_extract_data_and_train_model_MultiTargets(mocker, freqai_conf, model, s
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_model.joblib").is_file() assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_model.joblib").is_file()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_metadata.json").is_file() assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_metadata.json").is_file()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_trained_df.pkl").is_file() assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_trained_df.pkl").is_file()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_svm_model.joblib").is_file()
assert len(freqai.dk.data['training_features_list']) == 14 assert len(freqai.dk.data['training_features_list']) == 14
shutil.rmtree(Path(freqai.dk.full_path)) shutil.rmtree(Path(freqai.dk.full_path))
@ -219,7 +221,6 @@ def test_extract_data_and_train_model_Classifiers(mocker, freqai_conf, model):
f"{freqai.dk.model_filename}_model{model_file_extension}").exists() f"{freqai.dk.model_filename}_model{model_file_extension}").exists()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_metadata.json").exists() assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_metadata.json").exists()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_trained_df.pkl").exists() assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_trained_df.pkl").exists()
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_svm_model.joblib").exists()
shutil.rmtree(Path(freqai.dk.full_path)) shutil.rmtree(Path(freqai.dk.full_path))
@ -284,9 +285,6 @@ def test_start_backtesting(mocker, freqai_conf, model, num_files, strat, caplog)
_, base_df = freqai.dd.get_base_and_corr_dataframes(sub_timerange, "LTC/BTC", freqai.dk) _, base_df = freqai.dd.get_base_and_corr_dataframes(sub_timerange, "LTC/BTC", freqai.dk)
df = base_df[freqai_conf["timeframe"]] df = base_df[freqai_conf["timeframe"]]
for i in range(5):
df[f'%-constant_{i}'] = i
metadata = {"pair": "LTC/BTC"} metadata = {"pair": "LTC/BTC"}
freqai.dk.set_paths('LTC/BTC', None) freqai.dk.set_paths('LTC/BTC', None)
freqai.start_backtesting(df, metadata, freqai.dk, strategy) freqai.start_backtesting(df, metadata, freqai.dk, strategy)
@ -294,14 +292,6 @@ def test_start_backtesting(mocker, freqai_conf, model, num_files, strat, caplog)
assert len(model_folders) == num_files assert len(model_folders) == num_files
Trade.use_db = True Trade.use_db = True
assert log_has_re(
"Removed features ",
caplog,
)
assert log_has_re(
"Removed 5 features from prediction features, ",
caplog,
)
Backtesting.cleanup() Backtesting.cleanup()
shutil.rmtree(Path(freqai.dk.full_path)) shutil.rmtree(Path(freqai.dk.full_path))
@ -426,36 +416,6 @@ def test_backtesting_fit_live_predictions(mocker, freqai_conf, caplog):
shutil.rmtree(Path(freqai.dk.full_path)) shutil.rmtree(Path(freqai.dk.full_path))
def test_principal_component_analysis(mocker, freqai_conf):
freqai_conf.update({"timerange": "20180110-20180130"})
freqai_conf.get("freqai", {}).get("feature_parameters", {}).update(
{"princpial_component_analysis": "true"})
strategy = get_patched_freqai_strategy(mocker, freqai_conf)
exchange = get_patched_exchange(mocker, freqai_conf)
strategy.dp = DataProvider(freqai_conf, exchange)
strategy.freqai_info = freqai_conf.get("freqai", {})
freqai = strategy.freqai
freqai.live = True
freqai.dk = FreqaiDataKitchen(freqai_conf)
freqai.dk.live = True
timerange = TimeRange.parse_timerange("20180110-20180130")
freqai.dd.load_all_pair_histories(timerange, freqai.dk)
freqai.dd.pair_dict = MagicMock()
data_load_timerange = TimeRange.parse_timerange("20180110-20180130")
new_timerange = TimeRange.parse_timerange("20180120-20180130")
freqai.dk.set_paths('ADA/BTC', None)
freqai.extract_data_and_train_model(
new_timerange, "ADA/BTC", strategy, freqai.dk, data_load_timerange)
assert Path(freqai.dk.data_path / f"{freqai.dk.model_filename}_pca_object.pkl")
shutil.rmtree(Path(freqai.dk.full_path))
def test_plot_feature_importance(mocker, freqai_conf): def test_plot_feature_importance(mocker, freqai_conf):
from freqtrade.freqai.utils import plot_feature_importance from freqtrade.freqai.utils import plot_feature_importance