Performance Overview¶

Current performance characteristics and optimization status.

Performance Philosophy¶

Measure everything with bench.py tool
Cache historical data to avoid repeated API calls
Profile bottlenecks before optimizing
Document what's actually slow vs what's fast

Current Performance Baseline¶

System Performance (recent benchmarks)¶

Component	Mock Mode	Sandbox Mode	Notes
Store Creation	4.6s	0.6s	One-time startup cost
Order Processing	17ms	N/A	Our code overhead (very good!)
Data Processing	5ms	N/A	1000 candles, raw processing
Network Operations	N/A	200-1000ms	External exchange latency
Total Benchmark	1.6s	775ms	Full system test

Note: Sandbox mode includes real network latency; use mock mode to measure pure code paths.

Large Dataset Performance¶

Dataset Size	Load Time	Memory Usage	With Caching
1K candles	0.01s	5MB	0.001s
100K candles	2-5s	50MB	0.1s (90% reduction)
1M candles	20-50s	500MB	0.5s (98% reduction)
5M candles	60-300s	2GB	2s (99% reduction)

Note: Historical data caching provides 90-99% performance improvement for repeated backtesting.

Performance Features¶

1. Historical Data Caching¶

Already implemented but not enabled by default.

# Enable automatic caching for large datasets
store = CCXTStore(
    exchange='binance',
    cache_enabled=True,           # 90%+ speedup for backtesting
    cache_dir="./trading_data"    # Configurable location
)

Features: - Monthly file segmentation for efficient storage - Automatic deduplication and incremental updates - Cross-exchange and multi-timeframe support - Thread-safe operations for concurrent strategies

2. Memory-Efficient Data Processing¶

# Stream large datasets instead of loading all at once
feed = CCXTDataFeed(
    store=store,
    symbol="BTC/USDT",
    ccxt_timeframe="1m",
    historical_limit=1000000
)

3. Performance Monitoring¶

Track timings in your strategy using standard tools:

import time

start = time.perf_counter()
results = cerebro.run()
elapsed = (time.perf_counter() - start) * 1000
print(f"Backtest runtime: {elapsed:.1f} ms")

Bottleneck Analysis¶

What's Fast¶

Order processing paths
Data validation and transformation
WebSocket streaming
Cached data access

What's Slow¶

Network calls (exchange dependent)
Initial data loading without caching
Complex indicators over large datasets
Cold start imports in some environments

Optimization Targets¶

Historical data pipeline (90% solved with caching)
Technical indicators for large datasets (Rust candidate)
Network layer optimization (connection pooling)
Memory usage for multi-million candle backtests

Performance Tools¶

Benchmarking Tool¶

Run comprehensive performance tests:

# Fast mock tests (in-memory)
python performance/bench.py --verbose

# Realistic sandbox tests (with network)
python performance/bench.py --sandbox --verbose

# Deep profiling with memory analysis
python performance/bench.py --profile --verbose

Output includes component timings, memory usage, and bottleneck hints.

Performance Reports¶

All benchmarks generate detailed reports:

performance/reports/
├── benchmark_mock_20250812.json      # Detailed metrics
├── memory_profile_mock_20250812.txt  # Memory analysis
├── latest_benchmark_mock.json        # Quick access
└── optimization_recommendations.json # Action items

Optimization Roadmap¶

Phase 1: Enable Existing Optimizations (Immediate)¶

Enable caching by default for large datasets
Document performance best practices
Add configuration examples for different scales

Phase 2: Data Pipeline Enhancement (Next)¶

Parquet format support (5x smaller files, 10x faster loading)
Streaming data processing for memory efficiency
Vectorized technical indicators

Phase 3: Rust Integration (Future)¶

Technical indicator computation (100x speedup potential)
Data I/O optimization
Network layer improvements

Performance Best Practices¶

For Backtesting¶

# ✅ Enable caching for repeated tests
store = CCXTStore(exchange_config, cache_enabled=True)

# ✅ Use appropriate historical limits
feed = CCXTDataFeed(historical_limit=50000)  # Not unlimited

# ✅ Profile your strategies
python performance/bench.py --profile

For Live Trading¶

# ✅ Monitor system health
cerebro.run(enable_health_monitoring=True)

# ✅ Use connection pooling
store = CCXTStore(exchange_config, connection_pool_size=10)

# ✅ Set appropriate log levels
import logging
logging.getLogger('cracktrader').setLevel(logging.WARNING)

For Large Datasets¶

# ✅ Stream data instead of loading all at once
feed = CCXTDataFeed(streaming=True)

# ✅ Use monthly data segmentation
cache_config = {
    "segmentation": "monthly",  # Optimal for large datasets
    "compression": True,
    "format": "parquet"  # Future: 5x smaller than pickle
}

Try It Yourself¶

Run the benchmark: python performance/bench.py --verbose
Enable caching: Set cache_enabled=True in your store config
Monitor a strategy: Add enable_health_monitoring=True to cerebro.run()
Profile bottlenecks: Use --profile flag for detailed analysis

Next: Benchmarking Guide | Large Datasets | Optimization Roadmap