Richard Jozsa
|
66c326b789
|
Add proper handling of multiple environments
|
2023-03-20 15:54:58 +01:00 |
|
robcaulk
|
cb80d7c26f
|
close the multi_proc env before creating new ones in an attempt to avoid increasing processes
|
2023-02-24 11:19:54 +01:00 |
|
robcaulk
|
4fc0edb8b7
|
add pair to environment for access inside calculate_reward
|
2023-02-10 14:45:50 +01:00 |
|
robcaulk
|
7b4abd5ef5
|
use a dictionary to make code more readable
|
2022-12-15 12:25:33 +01:00 |
|
Emre
|
2018da0767
|
Add env_info dict to base environment
|
2022-12-14 22:03:05 +03:00 |
|
robcaulk
|
2285ca7d2a
|
add dp to multiproc
|
2022-12-14 18:22:20 +01:00 |
|
robcaulk
|
24766928ba
|
reorganize/generalize tensorboard callback
|
2022-12-04 13:54:30 +01:00 |
|
smarmau
|
d6f45a12ae
|
add multiproc fix flake8
|
2022-12-03 22:30:04 +11:00 |
|
robcaulk
|
8dbfd2cacf
|
improve docstring clarity about how to inherit from ReinforcementLearner, demonstrate inherittance with ReinforcementLearner_multiproc
|
2022-11-26 11:51:08 +01:00 |
|
robcaulk
|
6394ef4558
|
fix docstrings
|
2022-11-13 17:43:52 +01:00 |
|
robcaulk
|
8d7adfabe9
|
clean RL tests to avoid dir pollution and increase speed
|
2022-10-08 12:10:38 +02:00 |
|
robcaulk
|
83343dc2f1
|
control number of threads, update doc
|
2022-09-29 00:10:18 +02:00 |
|
Timothy Pogue
|
099137adac
|
remove hasattr calls
|
2022-09-27 22:35:15 -06:00 |
|
Timothy Pogue
|
9e36b0d2ea
|
fix formatting
|
2022-09-27 22:02:33 -06:00 |
|
Timothy Pogue
|
caa47a2f47
|
close subproc env on shutdown
|
2022-09-28 03:06:05 +00:00 |
|
robcaulk
|
647200e8a7
|
isort
|
2022-09-23 19:30:56 +02:00 |
|
robcaulk
|
77c360b264
|
improve typing, improve docstrings, ensure global tests pass
|
2022-09-23 19:17:27 +02:00 |
|
robcaulk
|
8aac644009
|
add tests. add guardrails.
|
2022-09-15 00:46:35 +02:00 |
|
robcaulk
|
240b529533
|
fix tensorboard path so that users can track all historical models
|
2022-08-31 16:50:39 +02:00 |
|
robcaulk
|
7766350c15
|
refactor environment inheritence tree to accommodate flexible action types/counts. fix bug in train profit handling
|
2022-08-28 19:21:57 +02:00 |
|
robcaulk
|
3199eb453b
|
reduce code for base use-case, ensure multiproc inherits custom env, add ability to limit ram use.
|
2022-08-25 19:05:51 +02:00 |
|
robcaulk
|
05ccebf9a1
|
automate eval freq in multiproc
|
2022-08-25 12:29:48 +02:00 |
|
robcaulk
|
94cfc8e63f
|
fix multiproc callback, add continual learning to multiproc, fix totalprofit bug in env, set eval_freq automatically, improve default reward
|
2022-08-25 11:46:18 +02:00 |
|
robcaulk
|
bd870e2331
|
fix monitor bug, set default values in case user doesnt set params
|
2022-08-24 16:32:14 +02:00 |
|
robcaulk
|
b708134c1a
|
switch multiproc thread count to rl_config definition
|
2022-08-24 13:00:55 +02:00 |
|
robcaulk
|
b26ed7dea4
|
fix generic reward, add time duration to reward
|
2022-08-24 13:00:55 +02:00 |
|
robcaulk
|
29f0e01c4a
|
expose environment reward parameters to the user config
|
2022-08-24 13:00:55 +02:00 |
|
robcaulk
|
3eb897c2f8
|
reuse callback, allow user to acces all stable_baselines3 agents via config
|
2022-08-24 13:00:55 +02:00 |
|