I think it would be useful to more clearly define what is being tested here.
Are you testing the rate of player attribute decline when team training is not conducted?
Are you testing the rate of team cohesion/tactical familiarity decline when team training is not conducted?
Or something else?
Training, it seems to me, is multi faceted. There is team training (which the original poster's example stopped). There is also individual training (including additional focus and PPM training).
Rate of player attribute decline/improvement appear more closely related to individual training than team training, whereas team cohesion/tactical familiarity appears more closely related to team training.
If the goal is to test the rate of attribute decline, it's probably more appropriate to put an end to individual training.
If the goal is to test the rate of decline for team cohesion/tactical familiarity, then it would probably be useful to monitor team cohesion/tactical familiarity decline over time as well as match results.
If the point is to examine the prevalence of injuries, this is probably a bit more complicated. You probably want to first have a few baseline cases where the team trains normally and plays in matches using a specific tactic over a period of time and examine how often injuries occur.
Then compare this baseline injury frequency to a few schemes where the team is not doing team training.
Repeat with a few schemes where the team is not doing team or individual training.
Doing this on the same save (ie. test one case, document, reload to the first day of the game, test another case etc.) is probably the best practice to ensure players have the same starting attributes (in case of some players having random attributes/PA).
So I think it's important to clearly define what is being tested and use an experiment design which puts the appropriate stress on the relevant game mechanics. This will probably be of great help for @Neil Brock and others at SI to make specific adjustments where necessary rather than reviewing thousands of lines of code to look for a needle in a haystack.