Skip to content

Commit 8ce8412

Browse files
committed
commit on master
1 parent 6bc7a4c commit 8ce8412

1 file changed

Lines changed: 59 additions & 0 deletions

File tree

sh000016/README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# PyTorch-DDPG-Stock-Trading
2+
An implementation of DDPG using PyTorch for algorithmic trading on Chinese SH50 stock market, from [Continuous Control with Deep Reinforcement Learning](https://arxiv.org/pdf/1509.02971.pdf).
3+
4+
5+
## Environment
6+
The reinforcement learning environment is to simulate Chinese SH50 stock market HF-trading at an average of 5s per tick. The environment is based on `gym` and optimised using PyTorch and GPU. Need only to change the target device to `cuda` or `cpu`.
7+
8+
The environment has several parameters to be set, for example: the initial cash is `asset`, minimum volume to be bought or sold is `unit`, the overall transaction rate is `rate` and the additional charge on short position is `short_rate` (which genuinely exists in Chinese stock market).
9+
10+
## Model
11+
The Actor-Critic model is defined in `actor_critic.py` with act and target networks for them both. Complying to the original DDPG algorithm, the target networks are updated using `soft-copy`.
12+
13+
The train-on-data process is same as the original DDPG algorithm using SARSAs from memory buffer.
14+
```
15+
# Calculate y_batch
16+
next_action_batch = self.actor_network.target_action(next_state_batch)
17+
q_batch = self.critic_network.target_q(next_action_batch, next_state_batch)
18+
y_batch = torch.add(reward_batch, q_batch, alpha=GAMMA).view(-1, 1)
19+
20+
# train actor-critic by target loss
21+
self.actor_network.train(
22+
self.critic_network.train(
23+
y_batch, action_batch, state_batch
24+
)
25+
)
26+
27+
# Update target networks by soft update
28+
self.actor_network.update_target()
29+
self.critic_network.update_target()
30+
```
31+
32+
The policy gradience is fetched from the very first layer between actor & critic and directed to the actor's backward propagation.
33+
```
34+
# The policy mean gradience from critic
35+
return torch.mean(self.critic_weights[0].grad[:, :self.action_dim], dim=0)
36+
```
37+
```
38+
# Using policy gradience training the actor
39+
self.actor_weights[-1].backward(-loss_grad)
40+
```
41+
42+
## Agent
43+
`DDPG.py` is the wrapped up agent to collect memory buffer and train-on-data. Only `train_on_batch` and `perceive` are relevant to the algorithm. The random sampling is realised using a more sufficient way on cuda:
44+
```
45+
sample = torch.randint(self.time_dim, self.replay_reward.shape[0], [self.batch_size], device=cuda)
46+
47+
index = torch.stack([sample - i for i in range(self.time_dim, 0, -1)]).t().reshape(-1)
48+
```
49+
```
50+
state_batch = torch.index_select(state_data, 0, index).view(self.batch_size, -1)
51+
next_amount_data = torch.index_select(next_amount_data, 0, sample).view(self.batch_size, -1)
52+
action_batch = torch.index_select(self.replay_action / self.unit, 0, sample)
53+
reward_batch = torch.index_select(self.replay_reward, 0, sample)
54+
```
55+
## OUNoise
56+
The OU-noise is implemented by [Flood Sung](https://github.com/rllab/rllab/blob/master/rllab/exploration_strategies/ou_strategy.py).
57+
58+
## Playground
59+
`DDPG-agent.py` is the playground to interact. This repo provides the data of Chinese SH50 stock market from 17/04/2020 to 13/04/2020 for totally more than 13000 ticks.

0 commit comments

Comments
 (0)