Performance Optimization Roadmap¶
Current Performance Baseline (from bench.py profiling)¶
Mock Mode Results¶
- Store creation: 4,634ms (90MB) - Test fixture import overhead
- Pandas operations: 707ms (24MB) - Benchmark simulation
- Pure Backtrader: 110ms - External library baseline
- CrackTrader overhead: 17ms - Our actual code overhead
- Data processing: ~5ms - Very fast
Sandbox Mode Results¶
- Average latency: 775ms - Includes real network calls
- Network operations: 200-1000ms - External Binance testnet
- Performance score: 56/100 - Acceptable for sandbox with network
Optimization Priority Analysis¶
Priority 1: High Impact, Production Critical¶
Historical Data Pipeline (MAJOR INSIGHT!) - Current: No caching enabled by default, CSV reading = 10-80s for large datasets - Discovery: We already have a sophisticated caching system but it's disabled! - Target: Enable caching by default, add Parquet support - Impact: 90%+ time reduction for backtesting workflows - Immediate Action: Enable HistoricalDataCache by default
Network Layer Optimization - Current: 200-1000ms per API call - Target: Connection pooling, request batching - Impact: Direct trading latency reduction - Implementation: Python asyncio improvements first, then Rust HTTP client
Order Processing Pipeline - Current: 17ms CrackTrader overhead - Target: <5ms order validation and routing - Impact: Critical for HFT aspirations - Rust Candidate: Order validation, risk checks, position calculations
Priority 2: Medium Impact, Scalability¶
Large Dataset Processing (5M+ candles) - Current: Pandas CSV reading = 20-50s, 1GB+ memory usage - Target: Streaming + Parquet = <1s, <100MB memory - Impact: Enable multi-year, multi-symbol backtesting - Rust Candidate: Technical indicators (100x speedup), data I/O pipeline
WebSocket Data Ingestion - Current: Python asyncio WebSocket handling - Target: Higher throughput tick processing - Impact: More symbols, higher frequency data - Rust Candidate: Tick data parsing, OHLCV candle building
Priority 3: Low Impact, Nice-to-Have¶
Test Infrastructure - Current: 4.6s test setup time (FakeExchange import) - Target: Faster test runs - Impact: Developer experience only - Solution: Lazy imports, smaller test fixtures
Native Acceleration Strategy (optional)¶
Phase 1: Proof of Concept (Post-Documentation)¶
Target: Order validation module
# Current Python
def validate_order(symbol, size, price, order_type):
# Validation logic ~1-2ms
# Future Rust FFI
import cracktrader_fast
result = cracktrader_fast.validate_order(symbol, size, price, order_type) # ~0.1ms
Benefits: - Clear performance win (10x faster) - Isolated, testable component - Low integration risk
Phase 2: Data Pipeline (If Needed)¶
Target: Technical indicators for backtesting
# Current (if we implemented)
df['sma_20'] = df['close'].rolling(20).mean() # 700ms for 1000 points
# Rust alternative
sma_values = cracktrader_fast.sma(prices, window=20) # ~10ms for 1000 points
Phase 3: Network Layer (Advanced)¶
Target: HTTP client with connection pooling - Replace Python requests/aiohttp with Rust reqwest - Custom protocol optimizations - Only if network becomes the bottleneck
Integration Approach¶
Option 1: PyO3 Rust Extension¶
[package]
name = "cracktrader-fast"
version = "0.1.0"
edition = "2021"
[dependencies]
pyo3 = "0.20"
serde = "1.0"
tokio = "1.0"
Option 2: Separate Process (microservice)¶
- Rust binary for intensive computations
- IPC via message queues
- Better isolation, easier deployment
Option 3: WASM (Future consideration)¶
- Rust -> WebAssembly for browser compatibility
- Could enable web-based backtesting
Success Metrics¶
Before Optimization (Current)¶
- Order validation: ~1ms Python
- Mock tests: 1.6s total time
- Sandbox tests: 775ms average (network-bound)
- Memory usage: High pandas overhead when used
Target After Rust Optimization¶
- Order validation: <0.1ms (10x improvement)
- Mock tests: <500ms total time
- Technical indicators: 50x faster than pandas
- Memory usage: Reduced allocations
Timeline & Dependencies¶
Prerequisites (current priority)¶
- Complete performance benchmarking framework
- Finish documentation
- Stabilize core Python functionality
- Comprehensive test coverage
Rust Development (Post-Documentation)¶
- Week 1-2: Set up Rust development environment, PyO3 hello world
- Week 3-4: Implement order validation in Rust
- Week 5-6: Performance testing and integration
- Week 7-8: Technical indicators (if needed)
Key Questions to Revisit¶
- Do we actually need pandas-like operations?
- Current code doesn't use pandas heavily
-
Most technical analysis could be done in Backtrader
-
What's our HFT ambition level?
- Sub-millisecond: Need Rust/Go network layer
- Sub-10ms: Python optimization may suffice
-
Sub-100ms: Current performance probably fine
-
Deployment complexity vs performance gain
- Rust extensions add build complexity
- Cross-platform distribution challenges
- Is the performance gain worth it?
Immediate Actions (low effort, high impact)¶
- Profile real trading scenarios (not benchmark simulations)
- Optimize Python imports (lazy loading)
- Connection pooling for REST API calls
- Memory profiling of actual trading sessions
Notes¶
- Store creation slowness is test infrastructure, not production concern
- Pandas operations were benchmark artifacts, not real bottlenecks
- Our actual CrackTrader overhead is only 17ms - quite good!
- Network latency (200-1000ms) dominates real trading performance