Fix doc and example indentation

2024-11-10 10:21:59 +00:00 · 2023-05-19 07:02:54 +02:00 · 2023-05-19 07:02:54 +02:00 · 707c6744b9
commit 707c6744b9
parent 292bd62973
1 changed files with 89 additions and 91 deletions
--- a/docs/freqai-reinforcement-learning.md
+++ b/docs/freqai-reinforcement-learning.md
@ -145,94 +145,94 @@ As you begin to modify the strategy and the prediction model, you will quickly r
    The best reward functions are ones that are continuously differentiable, and well scaled. In other words, adding a single large negative penalty to a rare event is not a good idea, and the neural net will not be able to learn that function. Instead, it is better to add a small negative penalty to a common event. This will help the agent learn faster. Not only this, but you can help improve the continuity of your rewards/penalties by having them scale with severity according to some linear/exponential functions. In other words, you'd slowly scale the penalty as the duration of the trade increases. This is better than a single large penalty occuring at a single point in time.

 ```python
-    from freqtrade.freqai.prediction_models.ReinforcementLearner import ReinforcementLearner
-    from freqtrade.freqai.RL.Base5ActionRLEnv import Actions, Base5ActionRLEnv, Positions
+from freqtrade.freqai.prediction_models.ReinforcementLearner import ReinforcementLearner
+from freqtrade.freqai.RL.Base5ActionRLEnv import Actions, Base5ActionRLEnv, Positions


-    class MyCoolRLModel(ReinforcementLearner):
+class MyCoolRLModel(ReinforcementLearner):
+    """
+    User created RL prediction model.
+
+    Save this file to `freqtrade/user_data/freqaimodels`
+
+    then use it with:
+
+    freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
+
+    Here the users can override any of the functions
+    available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this
+    is where the user overrides `MyRLEnv` (see below), to define custom
+    `calculate_reward()` function, or to override any other parts of the environment.
+
+    This class also allows users to override any other part of the IFreqaiModel tree.
+    For example, the user can override `def fit()` or `def train()` or `def predict()`
+    to take fine-tuned control over these processes.
+
+    Another common override may be `def data_cleaning_predict()` where the user can
+    take fine-tuned control over the data handling pipeline.
+    """
+    class MyRLEnv(Base5ActionRLEnv):
        """
-        User created RL prediction model.
+        User made custom environment. This class inherits from BaseEnvironment and gym.env.
+        Users can override any functions from those parent classes. Here is an example
+        of a user customized `calculate_reward()` function.

-        Save this file to `freqtrade/user_data/freqaimodels`
-
-        then use it with:
-
-        freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
-
-        Here the users can override any of the functions
-        available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this
-        is where the user overrides `MyRLEnv` (see below), to define custom
-        `calculate_reward()` function, or to override any other parts of the environment.
-
-        This class also allows users to override any other part of the IFreqaiModel tree.
-        For example, the user can override `def fit()` or `def train()` or `def predict()`
-        to take fine-tuned control over these processes.
-
-        Another common override may be `def data_cleaning_predict()` where the user can
-        take fine-tuned control over the data handling pipeline.
+        Warning!
+        This is function is a showcase of functionality designed to show as many possible
+        environment control features as possible. It is also designed to run quickly
+        on small computers. This is a benchmark, it is *not* for live production.
        """
-        class MyRLEnv(Base5ActionRLEnv):
-            """
-            User made custom environment. This class inherits from BaseEnvironment and gym.env.
-            Users can override any functions from those parent classes. Here is an example
-            of a user customized `calculate_reward()` function.
+        def calculate_reward(self, action: int) -> float:
+            # first, penalize if the action is not valid
+            if not self._is_valid(action):
+                return -2
+            pnl = self.get_unrealized_profit()

-            Warning!
-            This is function is a showcase of functionality designed to show as many possible
-            environment control features as possible. It is also designed to run quickly
-            on small computers. This is a benchmark, it is *not* for live production.
-            """
-            def calculate_reward(self, action: int) -> float:
-                # first, penalize if the action is not valid
-                if not self._is_valid(action):
-                    return -2
-                pnl = self.get_unrealized_profit()
+            factor = 100

-                factor = 100
+            pair = self.pair.replace(':', '')

-                pair = self.pair.replace(':', '')
+            # you can use feature values from dataframe
+            # Assumes the shifted RSI indicator has been generated in the strategy.
+            rsi_now = self.raw_features[f"%-rsi-period_10_shift-1_{pair}_"
+                            f"{self.config['timeframe']}"].iloc[self._current_tick]

-                # you can use feature values from dataframe
-                # Assumes the shifted RSI indicator has been generated in the strategy.
-                rsi_now = self.raw_features[f"%-rsi-period_10_shift-1_{pair}_"
-                                f"{self.config['timeframe']}"].iloc[self._current_tick]
+            # reward agent for entering trades
+            if (action in (Actions.Long_enter.value, Actions.Short_enter.value)
+                    and self._position == Positions.Neutral):
+                if rsi_now < 40:
+                    factor = 40 / rsi_now
+                else:
+                    factor = 1
+                return 25 * factor

-                # reward agent for entering trades
-                if (action in (Actions.Long_enter.value, Actions.Short_enter.value)
-                        and self._position == Positions.Neutral):
-                    if rsi_now < 40:
-                        factor = 40 / rsi_now
-                    else:
-                        factor = 1
-                    return 25 * factor
-
-                # discourage agent from not entering trades
-                if action == Actions.Neutral.value and self._position == Positions.Neutral:
-                    return -1
-                max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
-                trade_duration = self._current_tick - self._last_trade_tick
-                if trade_duration <= max_trade_duration:
-                    factor *= 1.5
-                elif trade_duration > max_trade_duration:
-                    factor *= 0.5
-                # discourage sitting in position
-                if self._position in (Positions.Short, Positions.Long) and \
-                action == Actions.Neutral.value:
-                    return -1 * trade_duration / max_trade_duration
-                # close long
-                if action == Actions.Long_exit.value and self._position == Positions.Long:
-                    if pnl > self.profit_aim * self.rr:
-                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                    return float(pnl * factor)
-                # close short
-                if action == Actions.Short_exit.value and self._position == Positions.Short:
-                    if pnl > self.profit_aim * self.rr:
-                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                    return float(pnl * factor)
-                return 0.
+            # discourage agent from not entering trades
+            if action == Actions.Neutral.value and self._position == Positions.Neutral:
+                return -1
+            max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
+            trade_duration = self._current_tick - self._last_trade_tick
+            if trade_duration <= max_trade_duration:
+                factor *= 1.5
+            elif trade_duration > max_trade_duration:
+                factor *= 0.5
+            # discourage sitting in position
+            if self._position in (Positions.Short, Positions.Long) and \
+            action == Actions.Neutral.value:
+                return -1 * trade_duration / max_trade_duration
+            # close long
+            if action == Actions.Long_exit.value and self._position == Positions.Long:
+                if pnl > self.profit_aim * self.rr:
+                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                return float(pnl * factor)
+            # close short
+            if action == Actions.Short_exit.value and self._position == Positions.Short:
+                if pnl > self.profit_aim * self.rr:
+                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                return float(pnl * factor)
+            return 0.
 ```

-### Using Tensorboard
+## Using Tensorboard

 Reinforcement Learning models benefit from tracking training metrics. FreqAI has integrated Tensorboard to allow users to track training and evaluation performance across all coins and across all retrainings. Tensorboard is activated via the following command:

@ -245,32 +245,30 @@ where `unique-id` is the `identifier` set in the `freqai` configuration file. Th

 ![tensorboard](assets/tensorboard.jpg)

-
-### Custom logging
+## Custom logging

 FreqAI also provides a built in episodic summary logger called `self.tensorboard_log` for adding custom information to the Tensorboard log. By default, this function is already called once per step inside the environment to record the agent actions. All values accumulated for all steps in a single episode are reported at the conclusion of each episode, followed by a full reset of all metrics to 0 in preparation for the subsequent episode.

-
 `self.tensorboard_log` can also be used anywhere inside the environment, for example, it can be added to the `calculate_reward` function to collect more detailed information about how often various parts of the reward were called:

-```py
-        class MyRLEnv(Base5ActionRLEnv):
-            """
-            User made custom environment. This class inherits from BaseEnvironment and gym.env.
-            Users can override any functions from those parent classes. Here is an example
-            of a user customized `calculate_reward()` function.
-            """
-            def calculate_reward(self, action: int) -> float:
-                if not self._is_valid(action):
-                    self.tensorboard_log("invalid")
-                    return -2
+```python
+    class MyRLEnv(Base5ActionRLEnv):
+        """
+        User made custom environment. This class inherits from BaseEnvironment and gym.env.
+        Users can override any functions from those parent classes. Here is an example
+        of a user customized `calculate_reward()` function.
+        """
+        def calculate_reward(self, action: int) -> float:
+            if not self._is_valid(action):
+                self.tensorboard_log("invalid")
+                return -2

 ```

 !!! Note
    The `self.tensorboard_log()` function is designed for tracking incremented objects only i.e. events, actions inside the training environment. If the event of interest is a float, the float can be passed as the second argument e.g. `self.tensorboard_log("float_metric1", 0.23)`. In this case the metric values are not incremented.

-### Choosing a base environment
+## Choosing a base environment

 FreqAI provides three base environments, `Base3ActionRLEnvironment`, `Base4ActionEnvironment` and `Base5ActionEnvironment`. As the names imply, the environments are customized for agents that can select from 3, 4 or 5 actions. The `Base3ActionEnvironment` is the simplest, the agent can select from hold, long, or short. This environment can also be used for long-only bots (it automatically follows the `can_short` flag from the strategy), where long is the enter condition and short is the exit condition. Meanwhile, in the `Base4ActionEnvironment`, the agent can enter long, enter short, hold neutral, or exit position. Finally, in the `Base5ActionEnvironment`, the agent has the same actions as Base4, but instead of a single exit action, it separates exit long and exit short. The main changes stemming from the environment selection include: