Merge pull request #7808 from freqtrade/fix-freqai-rl-reward-link

Fix custom reward function link
2024-09-20 09:31:12 +00:00 · 2022-11-28 06:33:24 +01:00 · 2022-11-28 06:33:24 +01:00 · 8e60364f0d
commit 8e60364f0d
parent 51e773fe37 f21dbbd8bb
2 changed files with 18 additions and 17 deletions
--- a/docs/freqai-reinforcement-learning.md
+++ b/docs/freqai-reinforcement-learning.md
@ -1,14 +1,14 @@
 # Reinforcement Learning

 !!! Note "Installation size"
-    Reinforcement learning dependencies include large packages such as `torch`, which should be explicitly requested during `./setup.sh -i` by answering "y" to the question "Do you also want dependencies for freqai-rl (~700mb additional space required) [y/N]?".  
+    Reinforcement learning dependencies include large packages such as `torch`, which should be explicitly requested during `./setup.sh -i` by answering "y" to the question "Do you also want dependencies for freqai-rl (~700mb additional space required) [y/N]?".
    Users who prefer docker should ensure they use the docker image appended with `_freqairl`.

 ## Background and terminology

 ### What is RL and why does FreqAI need it?

-Reinforcement learning involves two important components, the *agent* and the training *environment*. During agent training, the agent moves through historical data candle by candle, always making 1 of a set of actions: Long entry, long exit, short entry, short exit, neutral). During this training process, the environment tracks the performance of these actions and rewards the agent according to a custom user made `calculate_reward()` (here we offer a default reward for users to build on if they wish [details here](#creating-the-reward)). The reward is used to train weights in a neural network.
+Reinforcement learning involves two important components, the *agent* and the training *environment*. During agent training, the agent moves through historical data candle by candle, always making 1 of a set of actions: Long entry, long exit, short entry, short exit, neutral). During this training process, the environment tracks the performance of these actions and rewards the agent according to a custom user made `calculate_reward()` (here we offer a default reward for users to build on if they wish [details here](#creating-a-custom-reward-function)). The reward is used to train weights in a neural network.

 A second important component of the FreqAI RL implementation is the use of *state* information. State information is fed into the network at each step, including current profit, current position, and current trade duration. These are used to train the agent in the training environment, and to reinforce the agent in dry/live (this functionality is not available in backtesting). *FreqAI + Freqtrade is a perfect match for this reinforcing mechanism since this information is readily available in live deployments.*

@ -16,15 +16,15 @@ Reinforcement learning is a natural progression for FreqAI, since it adds a new

 ### The RL interface

-With the current framework, we aim to expose the training environment via the common "prediction model" file, which is a user inherited `BaseReinforcementLearner` object (e.g. `freqai/prediction_models/ReinforcementLearner`). Inside this user class, the RL environment is available and customized via `MyRLEnv` as [shown below](#creating-the-reward).
+With the current framework, we aim to expose the training environment via the common "prediction model" file, which is a user inherited `BaseReinforcementLearner` object (e.g. `freqai/prediction_models/ReinforcementLearner`). Inside this user class, the RL environment is available and customized via `MyRLEnv` as [shown below](#creating-a-custom-reward-function).

-We envision the majority of users focusing their effort on creative design of the `calculate_reward()` function [details here](#creating-the-reward), while leaving the rest of the environment untouched. Other users may not touch the environment at all, and they will only play with the configuration settings and the powerful feature engineering that already exists in FreqAI. Meanwhile, we enable advanced users to create their own model classes entirely.
+We envision the majority of users focusing their effort on creative design of the `calculate_reward()` function [details here](#creating-a-custom-reward-function), while leaving the rest of the environment untouched. Other users may not touch the environment at all, and they will only play with the configuration settings and the powerful feature engineering that already exists in FreqAI. Meanwhile, we enable advanced users to create their own model classes entirely.

 The framework is built on stable_baselines3 (torch) and OpenAI gym for the base environment class. But generally speaking, the model class is well isolated. Thus, the addition of competing libraries can be easily integrated into the existing framework. For the environment, it is inheriting from `gym.env` which means that it is necessary to write an entirely new environment in order to switch to a different library.

 ### Important considerations

-As explained above, the agent is "trained" in an artificial trading "environment". In our case, that environment may seem quite similar to a real Freqtrade backtesting environment, but it is *NOT*. In fact, the RL trading environment is much more simplified. It does not incorporate any of the complicated strategy logic, such as callbacks such as `custom_exit`, `custom_stoploss`, leverage controls, etc. The RL environment is instead a very "raw" representation of the true market, where the agent has free-will to learn the policy (read: stoploss, take profit, ect) which is enforced by the `calculate_reward()`. Thus, it is important to consider that the agent training environment is not identical to the real world.
+As explained above, the agent is "trained" in an artificial trading "environment". In our case, that environment may seem quite similar to a real Freqtrade backtesting environment, but it is *NOT*. In fact, the RL training environment is much more simplified. It does not incorporate any of the complicated strategy logic, such as callbacks like `custom_exit`, `custom_stoploss`, leverage controls, etc. The RL environment is instead a very "raw" representation of the true market, where the agent has free-will to learn the policy (read: stoploss, take profit, etc.) which is enforced by the `calculate_reward()`. Thus, it is important to consider that the agent training environment is not identical to the real world.

 ## Running Reinforcement Learning

@ -95,7 +95,7 @@ Most of the function remains the same as for typical Regressors, however, the fu
        informative[f"%-{pair}raw_low"] = informative["low"]
 ```

-Finally, there is no explicit "label" to make - instead the you need to assign the `&-action` column which will contain the agent's actions when accessed in `populate_entry/exit_trends()`. In the present example, the neutral action to 0. This value should align with the environment used. FreqAI provides two environments, both use 0 as the neutral action.
+Finally, there is no explicit "label" to make - instead it is necessary to assign the `&-action` column which will contain the agent's actions when accessed in `populate_entry/exit_trends()`. In the present example, the neutral action to 0. This value should align with the environment used. FreqAI provides two environments, both use 0 as the neutral action.

 After users realize there are no labels to set, they will soon understand that the agent is making its "own" entry and exit decisions. This makes strategy construction rather simple. The entry and exit signals come from the agent in the form of an integer - which are used directly to decide entries and exits in the strategy:

@ -130,7 +130,7 @@ After users realize there are no labels to set, they will soon understand that t
        return df
 ```

-It is important to consider that `&-action` depends on which environment they choose to use. The example above shows 5 actions, where 0 is neutral, 1 is enter long, 2 is exit long, 3 is enter short and 4 is exit short. 
+It is important to consider that `&-action` depends on which environment they choose to use. The example above shows 5 actions, where 0 is neutral, 1 is enter long, 2 is exit long, 3 is enter short and 4 is exit short.

 ## Configuring the Reinforcement Learner

@ -166,25 +166,26 @@ As you begin to modify the strategy and the prediction model, you will quickly r

 ```python
    from freqtrade.freqai.prediction_models.ReinforcementLearner import ReinforcementLearner
-    from freqtrade.freqai.RL.Base5ActionRLEnv import Base5ActionRLEnv
+    from freqtrade.freqai.RL.Base5ActionRLEnv import Actions, Base5ActionRLEnv, Positions
+

    class MyCoolRLModel(ReinforcementLearner):
        """
-        User created RL prediction model. 
+        User created RL prediction model.

        Save this file to `freqtrade/user_data/freqaimodels`

        then use it with:

        freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
-        
-        Here the users can override any of the functions 
-        available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this 
+
+        Here the users can override any of the functions
+        available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this
        is where the user overrides `MyRLEnv` (see below), to define custom
        `calculate_reward()` function, or to override any other parts of the environment.
-        
+
        This class also allows users to override any other part of the IFreqaiModel tree.
-        For example, the user can override `def fit()` or `def train()` or `def predict()` 
+        For example, the user can override `def fit()` or `def train()` or `def predict()`
        to take fine-tuned control over these processes.

        Another common override may be `def data_cleaning_predict()` where the user can
@ -253,7 +254,7 @@ FreqAI provides two base environments, `Base4ActionEnvironment` and `Base5Action
 * the actions available in the `calculate_reward`
 * the actions consumed by the user strategy

-Both of the FreqAI provided environments inherit from an action/position agnostic environment object called the `BaseEnvironment`, which contains all shared logic. The architecture is designed to be easily customized. The simplest customization is the `calculate_reward()` (see details [here](#creating-the-reward)). However, the customizations can be further extended into any of the functions inside the environment. You can do this by simply overriding those functions inside your `MyRLEnv` in the prediction model file. Or for more advanced customizations, it is encouraged to create an entirely new environment inherited from `BaseEnvironment`.
+Both of the FreqAI provided environments inherit from an action/position agnostic environment object called the `BaseEnvironment`, which contains all shared logic. The architecture is designed to be easily customized. The simplest customization is the `calculate_reward()` (see details [here](#creating-a-custom-reward-function)). However, the customizations can be further extended into any of the functions inside the environment. You can do this by simply overriding those functions inside your `MyRLEnv` in the prediction model file. Or for more advanced customizations, it is encouraged to create an entirely new environment inherited from `BaseEnvironment`.

 !!! Note
    FreqAI does not provide by default, a long-only training environment. However, creating one should be as simple as copy-pasting one of the built in environments and removing the `short` actions (and all associated references to those).
--- a/freqtrade/templates/FreqaiExampleHybridStrategy.py
+++ b/freqtrade/templates/FreqaiExampleHybridStrategy.py
@ -19,7 +19,7 @@ class FreqaiExampleHybridStrategy(IStrategy):

    Launching this strategy would be:

-    freqtrade trade --strategy FreqaiExampleHyridStrategy --strategy-path freqtrade/templates
+    freqtrade trade --strategy FreqaiExampleHybridStrategy --strategy-path freqtrade/templates
    --freqaimodel CatboostClassifier --config config_examples/config_freqai.example.json

    or the user simply adds this to their config:
@ -86,7 +86,7 @@ class FreqaiExampleHybridStrategy(IStrategy):
    process_only_new_candles = True
    stoploss = -0.05
    use_exit_signal = True
-    startup_candle_count: int = 300
+    startup_candle_count: int = 30
    can_short = True

    # Hyperoptable parameters