[EPIC] Make $ trading across 500 tokens; 5m; trade rarely per token. Sim, and live

### Background / motivation

Make $ trading on 5m, a timescale good enough for agent evolution. (And more fun in general:)

**How:** 
- trade across 500+ tokens
- trade rarely per token. Eg only 1-3 trades every 5m. Aka "snipe". Few relative trades → low fees. 
- have a framework where I can test super-fast. Chop out the rest. Ideally, fast enough to _evolve_ with trading-sim-in-the-loop
- approach makes $: tune params (eg conf/TP/SL); and bigger changes (eg model "up > 0.2%?" and "down <0.2%?")

### Phases / tasks

(completed phases are farther below)

**Phase: build / improve until "make $ on 0.035% fees (HL starter rate), on BTC, 3mo sim** 
- [x] Try: "improve $ via larger ensembles (on lin) for better conf est". Result: doesn't help [Details](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2618156949). 
- [x] Try: "center model on most recently seen price". [Setup, expt's](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2619382030)
  - [x] Test: 0% fees. Result: APY = 4766% (!)
  - [x] Test: 0.025% fees, basic tuning. Result: loses $ 
- [ ] Fast model test/tune flow. Details below.
  - [x] Build X/y save & load module
  - [x] Grab 6 yrs BTC historical data
  - [x] Do sim that generates 1000 X/y datasets. Test_n=52500 (5y), therefore gen a new model every 52 epochs (just over 4h). For transform=None
  - [ ] ^ but for transform = center-on-recent
  - [ ] In an aimodel unit test, build a testing framework: for each X/y dataset: load it, build a model, report log loss. Calc & report average log loss across all X/y datasets.
  - [ ] ^ with: MBO across all X/y datasets. Obj function = average log loss. Design space = xgboost params (and maybe some ppss params too)
- [ ] (add tasks until goal met)

**Phase: like ^, but now 2y sim** 

**Phase: like ^, but now make $ on 8 of top-10 tokens**
- [ ] Test: run benchmark for each of top-10 separately, record results
- [ ] (add tasks until goal met)

**Phase: a single run trades on 500 tokens at once**
- [ ] Update sim etc to consider top-10 tokens at once
- [ ] Update sim etc to consider top-500 tokens at once
- [ ] (add tasks until goal met)

### Completed phases / tasks

Phase: init experiments
- [x] Revive pdr-backend sim, on 5m, btc, binance. Backend/cli, sim, and new dashboard too.
- [x] Conduct cheap experiment: set params to trade rarely (high confidence_thr), and run for a long time. Fiddle with different philosophies. Results in next steps.
- [x] Q: make $, with no fees? **A: yes.** [See comment 2](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2614268409). 227.2% APY
- [x] Q: make $ with nonzero fees (and init params)? [See comment 4](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2614280613)
  - Q: $ with 0.1% fees? (Binance starter rate)  A: no 
  - Q: $ with 0.025% fees? (HL moderate rate) A: no
  - Q $ with 0.01% fees? (2x lower than lowest Binance/HL)  **A: yes**
- [x] Q: on 0.025% fees (HL moderate rate), make $? (Tune params as needed.) **A: yes**. 20.5% APY. [See comment 5](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2614349530). 
- [x] Try: Improve classifier calibration accuracy (with lin model). Status quo: 5-fold CV on CalibratedClassifierCV_Sigmoid. Try: 10-fold, 100-fold

Phase: reduce complexity & speed runtime
- [x] Create new branch "tdr" [link](https://github.com/oceanprotocol/pdr-backend/tree/tdr)
- [x] Simplify codebase: chop pdr-backend code that I won't use. Basically, everything but what sim/ needs. That is, delete: smart contracts/web3, barge, dashboard/plots/analytics, predictoor/, trader/, DF rewards, trueval, ETL/Duck, more.
- [x] Try to make code super-fast, just for trading. Result: no big wins possible. [Details](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2618156949)

### Appendix: things to add, only if we need, to make $

- **Fast model test/tune flow.** Steps: (a) New sim param that saves X/y right before model-build. (b) Do a sim run with train_every_n_epochs = 10000; test_n = 100,000; transform = None (c) Do another sim run, with transform = center_on_recent (d) add saved X/y to a new dir, in github repo (e) Build a benchmarking framework within aimodel factory tests - leverage existing
- **Manual tune xgboost params.** Draw on new fast model test/tune flow to find more ideal xgboost params. [Docs](https://xgboost.readthedocs.io/en/stable/parameter.html). Guides: [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/), [RITHP](https://medium.com/@rithpansanga/the-main-parameters-in-xgboost-and-their-effects-on-model-performance-4f9833cac7c)
- **Auto-tune xgboost params.** Use the approach that Udit used
- **Model "up > 0.2%?" and "down <0.2%?"***. 
  - This is a classifier-based approach for the model to account for fees
  - Specifically, reframe models to: "If prediction says 'price goes up > 0.2% anytime in the next 5 min', then buy; and as soon as actual price > 0.2% above, then sell. And, vice versa for going down. And, if still in position at 5min mark, then exit position". 
  - This was [#1278](https://github.com/oceanprotocol/pdr-backend/issues/1278). See PRs [reframe2](https://github.com/oceanprotocol/pdr-backend/compare/main...issue1278-reframe2), [reframe3](https://github.com/oceanprotocol/pdr-backend/compare/main...issue1278-reframe3). 
  - Those PRs ran into complexity issues. The codebase was getting too big & crazy, and hard to iterate. So if we want to pursue this, we need to simplify the codebase. How: focus on trading, chop out unneeded
  - Another idea: just as we'll have separate loops for different tokens, we can have separate loops for up-vs-down models. Will keep things simpler. BUT don't do that, because we can take advantage of 2 models on the same output: if they disagree, then skip. i.e. Only proceed if they agree. It's another gate for confidence-building. reframe2/3 above has this.
- **Predict profitability; only trade when profitable.** 
  - This is regressor-based approach for the model to account for fees. 
  - First, build an AI model that predicts continuous-value prices. From the price prediction, have a simple function to compute expected profitability (accounting for order fees, maybe slippage). Include uncertainty. For each pair, sweep across different long/short levels and compute profitability for each. Only trade when expected profitability > 0; or 95% lower bound > 0, or other thr.
  - Concern: regression models so far have poor performance. Because they have to do more work to model more stuff, including stuff we don't need (compared to classifiers). If we do it, we'd probably need nonlinear models, Udit's trick, more.
- **Use "# orders" as an input to model.** [make $ trading](https://docs.google.com/document/d/1xmhdHzjQBLs-kUp6PIfw1LcNXtLddO2GnRUHTF1SAKU/edit?tab=t.0#heading=h.xv57656hlpyi) GDoc
- **Udit's non-stationary trick, via quantiles.** To properly model longer-term historical data. Though we can probably get aware with short-term data & models, given so many tokens. (And it'd be very expensive to do longer-term anyway.)
- **Trade on cowswap only when fee=zero thanks to coincidence-of-wants**. Sub-q: where do we get historical data, and what do we model? Maybe the A is simply: just use Binance?
- Ref: [make $ trading](https://docs.google.com/document/d/1xmhdHzjQBLs-kUp6PIfw1LcNXtLddO2GnRUHTF1SAKU/edit?tab=t.0#heading=h.xv57656hlpyi) GDoc. A bit dated, but many great ideas.
- Ref: [5m trading ideas](https://docs.google.com/document/d/1F5QSiaV3euf0-5q-48BHvaOqKVui0uJs751SmtQ8q7k/edit?tab=t.0#heading=h.qxwpujkl7sw4) in 1e 9 GDoc


### Appendix: Fees

[Binance & HL fee schedules](https://github.com/oceanprotocol/pdr-backend/issues/1712#issuecomment-2614273554)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Make $ trading across 500 tokens; 5m; trade rarely per token. Sim, and live #1712

Background / motivation

Phases / tasks

Completed phases / tasks

Appendix: things to add, only if we need, to make $

Appendix: Fees

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC] Make $ trading across 500 tokens; 5m; trade rarely per token. Sim, and live #1712

Description

Background / motivation

Phases / tasks

Completed phases / tasks

Appendix: things to add, only if we need, to make $

Appendix: Fees

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions