REINFORCE Policy

1 sources - 4 claims

The REINFORCE component mainly adjusted retention fraction rather than materially using per-gene penalty modulation. The policy used four parameters for per-gene scores and a fifth parameter to control the feature retention fraction. The learned retention fraction was bounded between 0.25 and 0.90 and initialized at 0.575. The model's gene-selection gradient assumes independent Bernoulli outcomes, which is an approximation because ElasticNet selections are correlated.