Looking at a trader's P&L over a week or a month tells you almost nothing about their underlying skill. The variance in short-window outcomes for both skilled and unskilled traders is high enough that you can't distinguish them by P&L alone. A skilled trader can have a losing month. An unskilled trader can have a winning month. The interesting question is: what can you measure that actually predicts whether a trader will be profitable over the next 12 months?
This is the question the Trading Quality scoring engine at Arizet was built to answer. After three years of refinement against real prop firm flow at our B2B partners, the current engine uses 14 behavioral signals that, when composited into a single rating, correlate strongly with forward-looking trader outcomes. This article describes the 14 signals, why each one matters, and how to think about your own trading through this lens.
Why behavioral signals beat raw P&L
Three structural reasons:
- P&L is path-dependent and high-variance. A skilled trader with a 1.4 expectancy will have 5-day losing streaks. An unskilled trader can stack three wins in a row. Short-window P&L can't distinguish them.
- P&L can be gamed. If you grade traders only on P&L, traders learn to game it: martingale, lottery-ticket trades, leverage spikes. The leaderboard becomes a lottery winners list, not a skill ranking.
- P&L is the outcome, not the cause. Trading is a behavioral skill. The behaviors that produce sustainable P&L (risk management, consistency, discipline) are measurable directly. Measuring the cause is more predictive than measuring the outcome.
The Trading Quality score gives 50% weight to behavioral signals, 30% to competition results, 20% to consistency over time. The composite is far more predictive of forward-looking trader success than P&L alone.
The 14 signals
Grouped into four categories. We won't fully reveal the exact mathematical formulation of each. Four of the signals are patent-pending and we maintain proprietary advantage by keeping the implementation details closed. But the concepts are general and described accurately.
Category A: Risk management signals (4 signals)
Signal 1. Position sizing consistency. Does the trader size their positions similarly across similar setup types, or do they vary sizing wildly based on emotional conviction? Skilled traders are remarkably consistent in sizing; unskilled traders vary sizing 3-5x between trades. We measure the coefficient of variation of position sizes within each setup category. Lower = better.
Signal 2. Stop discipline. Does the trader place stops, and do they respect them when hit? We measure (a) percentage of trades with a defined stop loss at entry, (b) percentage of stopped-out trades where the stop was honored vs. moved against the trader. Skilled traders honor stops 95%+ of the time; unskilled traders move stops on 30-50% of losing trades, turning small losses into large ones.
Signal 3. Maximum drawdown discipline. When equity hits 80% of the trader's typical max drawdown, what happens? Skilled traders reduce sizing or pause; unskilled traders maintain or increase sizing trying to recover. We measure behavior in stress windows.
Signal 4. Daily risk capping. Does the trader stop trading after hitting a personal daily loss limit, or do they keep going? We measure the empirical distribution of daily loss outcomes; skilled traders have a sharp cutoff in their daily loss distribution, suggesting active stopping. Unskilled traders have a long left tail.
Category B: Execution quality signals (4 signals)
Signal 5. Setup selectivity. Does the trader take 100 trades per week or 10? More importantly, do they show measurable patience between trades? We measure the inter-trade time distribution. Skilled traders show clear "waiting for setup" patterns; unskilled traders show forced entries clustered together.
Signal 6. Entry timing. When the trader enters a trade, where does price typically go in the next 5-30 minutes? Skilled traders' entries are followed by favorable moves more often than chance would predict. Unskilled traders' entries cluster near short-term turning points (they buy local highs and sell local lows).
Signal 7. Exit timing. Of winners and losers, when does the trader exit relative to the optimal exit? Skilled traders capture 40-60% of the available move on average; unskilled traders capture 15-30%. The gap between "entry quality" and "exit quality" is also revealing. Many traders have decent entries but cut winners early and let losers run.
Signal 8. Average R-multiple. Average profit-to-loss ratio across trades. Most retail traders have an R-multiple of 0.7-1.2 (typical losers larger than typical winners). Skilled traders are at 1.4-2.5. This is calculable but often miscalculated; we use the median, not the mean, to remove outlier distortion.
Category C: Behavioral / psychological signals (3 signals)
Signal 9. Revenge trading detection. After a losing trade, does the trader's next-trade size increase? Does the next-trade-time-window compress? Both are signs of revenge trading. Skilled traders show no statistical correlation between previous-trade outcome and next-trade size. Unskilled traders show a strong correlation. This is one of the four patent-pending signals.
Signal 10. Overconfidence cycle detection. After a winning streak, does the trader's risk taking increase? Most traders increase sizing after a winning streak (overconfidence), then give back the gains in a draw-down. We measure rolling-window correlation between recent P&L and recent risk taking; flat correlation = skilled behavior.
Signal 11. Time-of-day discipline. Does the trader trade during their personally most-profitable hours, or do they trade randomly through the day? Most traders have measurably better outcomes during 1-2 specific hourly windows but trade through all session hours, diluting their edge.
Category D: Long-term consistency signals (3 signals)
Signal 12. Performance variance across regimes. Does the trader perform consistently across different market regimes (trend, range, high-vol, low-vol), or do they have one regime they do well in and three they break down in? We measure performance by regime tag.
Signal 13. Strategy adherence over time. Does the trader follow the strategy they wrote down 90 days ago, or have they drifted? Hard to measure directly; we use a proxy of "trade similarity to historical setup distribution." Sharp deviations indicate strategy-hopping.
Signal 14. Recovery from drawdowns. When the trader has a 10%+ drawdown, what's the path back? Skilled traders recover with the same strategy at the same sizing; unskilled traders go quiet for a few weeks, then return with a different strategy. We measure post-drawdown trading behavior.
How the composite score works
Each signal is normalized to a 0-100 scale based on percentile across the trader population. The composite Trader Rating is a weighted average:
| Category | Weight | What it captures |
|---|---|---|
| Risk management (signals 1-4) | 30% | Whether the trader has hard rules and follows them |
| Execution quality (signals 5-8) | 25% | Whether trades are entered and exited with edge |
| Behavioral signals (9-11) | 25% | Whether emotional state corrupts decision-making |
| Long-term consistency (12-14) | 20% | Whether the trader is robust across regimes and time |
The composite is then scaled to the 0-10,000 range that traders see as their Trader Rating.
How well does this predict future outcomes?
The honest answer based on our backtested data: materially better than P&L over short windows, but still imperfect. Forward-looking 12-month P&L correlates with current Trader Rating at r ≈ 0.42 in our data. That's far higher than the correlation between trailing-90-day P&L and forward 12-month P&L (r ≈ 0.18 in the same population), but it's still well below a perfect predictor. Trading remains a partially stochastic activity; even the best signals can't fully predict outcomes.
What this means practically: a trader with rating 8,000 is dramatically more likely to be profitable over the next 12 months than a trader with rating 4,000, but the rating-8,000 trader can still have a bad year, and the rating-4,000 trader can occasionally have a good one. The signals are informative, not deterministic.
What to do with this if you're a trader
If you're on a platform that gives you visibility into your component signal scores (the Desk's Pro tier and above show signal-level breakdowns), the practical use is straightforward: identify your two weakest signals and engineer your routine to improve them specifically.
If you're a typical struggling retail trader, you'll likely score weak on signals 2 (stop discipline), 9 (revenge trading), and 14 (recovery from drawdowns). The good news is that these are all behavioral, not technical. They don't require you to learn a new strategy or new market, just to add specific friction to your decision-making process.
The deeper point
Most retail traders fail not because they lack edge in their strategy but because they don't execute their strategy with consistency. The Trading Quality framework isolates the components of consistent execution. Improving on the framework is more productive than chasing the next strategy.
The signals above also explain why the Trader Rating can't be gamed by lottery-ticket trades, martingale, or other behaviors that produce short-term P&L spikes. The behavioral signals catch those patterns immediately. A single oversized lottery trade tanks signal 1 (position sizing consistency) and signal 4 (daily risk capping) within hours. The composite rating responds, the trader sees the rating drop, and the lesson is learned without a blown account.