Using Machine Learning to Predict Current Ability

CAE82 · August 20, 2023

Background

I decided to see if I could use machine learning (ML) to accurately predict the current ability (CA) of players based upon their attribute values and position ratings. I'd previously attempted to calculate CA using the attribute weights found in the pre-game editor. This was relatively successfully but only worked well for players with a single, natural position. When a player was able to play multiple positions, I wasn't able to analytically figure out how the different position weights combined to create the overall CA. However, this type of task is what machine learning algorithms excel at.

Machine Learning

The task is to supply a dataset of input values (or 'features' in ML parlance) as well as a corresponding output value (or 'target'). The ML algorithm will then learn how to map the features to the target. Once the model is trained, you can then provide the model with a set of features and it will predict the target value.

In terms of the Football Manager current ability task, we need to supply a sample of players to the ML algorithm (their attribute values and position ratings) along with their current ability. The model will then be trained on this sample of players and will learn how the attributes values and positions map to current ability. Once the model is trained, we can then feed the attribute and position values of a player and the model will predict the current ability.

The particular task is a regression problem - if you are interested in more of the background and implementation details then you can search for 'support vector regression' (SVR). This is a type of 'supervised machine learning' and all this means is that we provide the model with the training data from which it learns rather than the algorithm 'teaching itself'.

I wrote the code in Python using the 'scikit-learn' machine learning library.

Training Data

By using a modified version of a 3rd party scouting tool, I was able to export the players along with their attribute values, position ratings and current ability from a save game. This amounted to around 28,000 players. In the first instance I have focussed on outfield players so after filtering out the goalkeepers, I was left with around 25,000 players. This sample of players is further (randomly) split into two groups, 75% of the players act as the training data (the players from which the SVR algorithm learns from) and the remaining 25% acting as 'unseen training data'. These are the players that are used to test the accuracy of the model.

A histogram of the CA distribution of the players is shown below. Note the very few players with high values of CA. This has implications later for predicting the CA of top players; they are essentually outliers to the model so there is not much data for the model to be trained on.

Model Accuracy

Surprisingly, the model only took around 5 minutes to train on my fairly standard laptop. After playing with the model parameters, I was able to obtain a model accuracy of 98%. A plot showing the target CA against the predicted CA is shown below.

Each blue circle represents a player and the red line represents the situation in which the predicted CA is exactly equal to the target CA. In an ideal world, all the blue circles would lie on that red line. Note the small group of players at 175+. Despite the fact there are only a few of them, the model still accurately predicts their CA.

Examples

I used the model to predict the CA of a few specific players. The first player I tested was Ridle Baku.

He is a player with multiple positions which has a strong impact on his CA. His recommended CA in the test save is 152. The ML model predicted a CA of 151. Very promising!

The next player to test was Kevin De Bruyne.

He too can play multiple positions, has a strong weaker foot and is one of the 'outliers' at the top-end of the current ability range. His recommended CA in the test save is 186. The ML model predicted a CA of 180. Not bad for such an outlier!

I hope you find this article interesting and maybe it will help you to better understand how machine learning can be used.

CAE

Edited August 27, 2023 by CAE82
Change title to better reflect content

enigmatic · August 20, 2023

You probably already know this, but CA is a weighted average of attribute scores (technically attribute scores on a 200 point scale with the displayed values from 1-20 being rounded), so only the slight nonlinearity of the model stops it from being relatively trivial to figure out the weights by position from some sort of regression.

As I understand it, where a player is 20 out of 20 in multiple positions, the weight for each attribute is the highest weight of any of the positions a player can play. e.g a natural DC/ST weights the marking as important like a DC and the finishing as important like an ST. Assume this is scaled for positions a player has 15 or 18 ability in

Ability with the weaker foot has a relatively high weighting towards CA too, and some attributes like Determination and Natural Fitness have zero weights (although they're probably correlated with CA anyway which might confound the model a little). This might explain the [rest of the] prediction error in your model.

Edited August 20, 2023 by enigmatic

CAE82 · August 21, 2023

Thanks for your comment @enigmatic. I think the attributes are actually 1-100 based on using CheatEngine to load the RAM in FM and having a poke around. Could be wrong though.

Unfortunately I do not think the weights work like that (highest weight) based on my testing, I could not get decent results trying lots of different weight combinations. I do think the positions are grouped. So a DL becoming a D/WBL does not make a big difference due to the similarity in the positions. Similar if a DL becomes a DRL; playing on both sides doesn't seem to cause an obvious increase in CA either. In the end I gave up and hence used ML .

I actually only included the features that are known to contribute to CA; so the ML model does not include things like Determination, Flair and so on.

I also think there is some non-linearity in the model for high-valued attributes i.e. the weights are not constant for the whole 1-20 range. It seems that going from 12-16 does not cause as big a CA increase as going from 16-20 for example. Based on my earlier experiments I know that 6 in all attributes corresponds to a CA of 1 and 16's in all attributes corresponds to a CA of 200. So I presume that the range of 16-20 is treated somewhat differently.

CAE82 · August 27, 2023

A simple tool to test is here:

https://replit.com/@fmcae/FMCAP?v=1

enigmatic · August 27, 2023

On 21/08/2023 at 10:20, CAE82 said:

Unfortunately I do not think the weights work like that (highest weight) based on my testing, I could not get decent results trying lots of different weight combinations. I do think the positions are grouped. So a DL becoming a D/WBL does not make a big difference due to the similarity in the positions. Similar if a DL becomes a DRL; playing on both sides doesn't seem to cause an obvious increase in CA either. In the end I gave up and hence used ML .

Highest weight is why the weights work like that in your testing: a DR has the exact same weight as a DL so taking highest weight of both positions for doesn't affect CA. WB weights very similar to fullback weights [WB slightly higher for dribbling and slightly lower for concentration IIRC] so a D/WBL will have similar CA to the equivalent DL. But a good finishing rating will have not much effect on a DL and quite a bit more on a D/AML because it will take fullback weightings for defensive attributes and winger ratings for finishing

It might be difficult to estimate for players with 15-19 positional ratings though as the weights will be <100%

CAE82 · August 27, 2023

It’s not a simple highest weight, even for a DC/MC/SC. For two positions, the CA seems to be the same as the highest CA for each individual position. So if you had a player who was a 120CA DC and a 130CA MC, he would have a 130CA as a D/MC. It seems to be when you add in a 3rd position then the CA changes significantly. But it doesn’t appear to be simply from the highest of each of the three weights.

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached.

herne79 · August 27, 2023

50 minutes ago, CAE82 said:

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached.

It’s not that straight forward. Sure, if you use an editor to force a player to become “Natural” (or lower) in a new position we may see an uplift in CA so long as the player has relevant attributes for that new position which would affect weightings.

But in reality, getting a player to learn a new position - and thus potentially impact CA - depends on a large number of conditions such as age, playing time, versatility, attribute suitability and so on. Thus the impact on CA of position training isn’t necessarily as large as is often thought.

CAE82 · August 27, 2023

I’m not sure you’re correct there @herne79. I’ve never seen an ‘uplift’ in CA after a player gaining new positions (naturally or ‘forced’ through the editor). It usually seems to be the (R)CA stays the same and the attributes redistribute accordingly.

But of course I respect and appreciate your insight and will bear it in mind.

After more than a quarter of a century playing (CM) FM it’s regretful that the real interest and enjoyment comes from these kinds of toy experiments and I hope that FM24/25 bring in some genuinely new features that actually makes the game interesting again.

herne79 · August 27, 2023

1 hour ago, CAE82 said:

I’m not sure you’re correct there @herne79. I’ve never seen an ‘uplift’ in CA after a player gaining new positions (naturally or ‘forced’ through the editor). It usually seems to be the (R)CA stays the same and the attributes redistribute accordingly.

But of course I respect and appreciate your insight and will bear it in mind.

After more than a quarter of a century playing (CM) FM it’s regretful that the real interest and enjoyment comes from these kinds of toy experiments and I hope that FM24/25 bring in some genuinely new features that actually makes the game interesting again.

This might help (quote from SI):

https://community.sigames.com/forums/topic/520654-new-position-training-and-current-ability/?do=findComment&comment=12411290

CAE82 · August 28, 2023

Thanks. Have seen that before. Lots of interesting stuff posted by Seb Wassell, pity he doesn’t seem to post a lot any more. Maybe he left SI. I used to spend a long time on the ‘Developers Posts’ tab going through and reading all their posts to try and glean any insights into the game. But now seems to be mostly off-topic stuff.

wazzaflow10 · August 29, 2023

On 27/08/2023 at 13:52, CAE82 said:

It’s not a simple highest weight, even for a DC/MC/SC. For two positions, the CA seems to be the same as the highest CA for each individual position. So if you had a player who was a 120CA DC and a 130CA MC, he would have a 130CA as a D/MC. It seems to be when you add in a 3rd position then the CA changes significantly. But it doesn’t appear to be simply from the highest of each of the three weights.

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached.

Not really sure what you're trying to predict here. If you mean explain CA that makes more sense as an experiment. Predicting CA from current attributes would be like "predicting height in centimeters using inches". I think what you're trying to say is you want to find out that cm = 2.54 * in.

CA is a known value with a known, albeit complicated formula you can observe from the editor. This would be more a "fun learn how regression works in stats 101" model since we should be able to extract the beta weights of attributes to match the editor values exactly if we have the right formula. If we don't we know the weights are suffering from omitted variable bias or multicolinearity or the wrong specification. TBH you could get away with a boring old linear regression. I'd personally start with

CA ~ attributes + binaries of positions or the values provided for awkward/competent/accomplished/natural + footedness + an interaction between attributes and position + interaction between attributes and footedness + interaction between position and footedness + interaction between attributes, position and footedness.

I'm probably forgetting some things that affect CA but that's how I would start if this were a real experiment with unknown inputs.

CAE82 · August 29, 2023

The CA formula is not really known @wazzaflow10 (unless you have inside information from SI). I’ve worked it out for a player with a single position and ‘average’ attributes to a pretty good approximation, but it was too complex to work out for multiple positions and it had some flaws for high attribute values (for example, players with say 19 for acceleration and pace) which hints at some non-linearity. Indeed, the ‘boring old’ linear kernel didn’t work well (neither did the polynomial one) and I got the best results using the RBF kernel.

Now with the model you just supply the attributes that contribute to CA, position ratings and weaker foot (47 inputs in total) and it does a pretty good job of predicting CA without any a priori knowledge of the attribute weights. I’ve spoken with colleagues who are experts in deep learning and I know about physics-informed models (i.e. providing the model with a priori knowledge of attribute weights) but do not have enough knowledge (or motivation) to try and implement it.

I’m not sure what you mean by multicolinearity but if you are suggesting that the inputs are not independent when it comes to CA then I’ve never read or seen anything to suggest that is the case? I’d be interested to hear more. Of course, I’d also be interested to see the outcome of your ‘real experiment’ if you ever find the time or motivation.

CAE82 · August 29, 2023

Actually, regarding multicolinearity, I guess you are referring to the fact that the attributes weights are affected by the position ratings? So the position rating inputs essentially affect the attribute inputs too?

wazzaflow10 · August 30, 2023

7 hours ago, CAE82 said:

The CA formula is not really known @wazzaflow10 (unless you have inside information from SI).

If you open the pre game editor you can see how much weight an attribute has on CA points. That is known and the value of that weight depends on position (and possibly a few other things I'm not aware of). Position matters a great deal as it determines how much CA is used by an attribute as enigmatic has pointed out. If that weight is zero then that attribute should yield no input into determining the CA of a player.

https://www.fmscout.com/a-guide-to-current-ability-in-football-manager.html?pg=1

I'm actually assuming you've posted this article given the name... which makes this post strange to me since you're aware that weightings are different.

I assume this value ultimately is what you are trying to get at in the model. If it isn't I'm not really sure what this ML exercise is trying to accomplish since attributes are what determine CA. It's not really predicting anything since you're using the thing that directly determines CA to predict CA. You should be able to get CA exactly for every player with the right formula. Again it'd be like using inches to determine how tall someone is in centimeters. Contrast this with say a scout report (that you codify into values) or perhaps the stats the game produces where there isn't a direct relationship between the target variable and the inputs. So a prediction in height would be using relatives of someone or height at a younger age.

7 hours ago, CAE82 said:

but it was too complex to work out for multiple positions and it had some flaws for high attribute values (for example, players with say 19 for acceleration and pace) which hints at some non-linearity. Indeed, the ‘boring old’ linear kernel didn’t work well (neither did the polynomial one) and I got the best results using the RBF kernel.

Well this could just be plain old not enough data rather than non linearity. As for a linear regression not working I would suspect that it is due to wrong specification rather than the the wrong choice of model since again we can see the attribute weighting on CA in the editor. The only thing I'm unaware of for sure is if going from say 10 pace to 11 pace cost less CA than going from 15 to 16 and therefore the cost in CA between 10 pace and 15 is not linear. If that is true than a linear model would of course not work and you'd have to find another latent variable that increases the cost of CA per attribute point.

I unfortunately don't have time to do this with a day job and a kid.. i'd rather just play the game

CAE82 · August 30, 2023

Yes, I did write that a few years ago and know all about the weights. That works for a single position, but not multiple positions.

The aim of this ML model is to take into account the effect of the positions, since we don't know the formula and so that is the unknown. Although we know the weights of the attrbutes for each position, we do not know how the attribute weights are weighted for players with multiple positions e.g.

The overall CA is not a simple weighted mean of the CA for each individual position nor does it appear that it is a simple maximum weight. I could never deduce the relationship and hence resorted to using ML which does exactly what is it meant to do - we provide the attributes and positions and it gives a pretty good prediction of the CA.

In terms of linearity, if you look at an individual attribute, then it has a linear relationship to CA e.g.

However, the method posted on FM Scout seemed to breakdown for players with several high-valued, high-weighted attributes. For example, Dan James was an 'outlier' due to his very high acceleration, agilty, pace and stamina; all highly-weighted for wingers. Adama Traore was another. This hinted at some non-linearity at the extremes (see the graph on FM Scout - it is clearly linear in the range 6-16 but then the CA caps at 200. I imagine this is why players like Haaland, Mbappe and De Bruyne have some very low values for some (albeit irelevant) attributes. Marking, positioning, tackling are typically very low. Of course, you may say 4 is appropriate but it could also be because if it is 6 then they'll hit the 200 cap.

Anyhow, the model does what I wanted it to - I can now provide attributes and positions and predict a fairly accurate CA. No one knows the underlying formula for sure and with ML we don't need to. I've not seen a more accurate model of CA in the FM Community and so I am happy with the outcome of a fun experiment one weekend when I had a few spare hours (after too many years I'm bored of the game itself, hoping for some genuinely new features in FM25, not expecting much from FM24).

Edited August 30, 2023 by CAE82
Remove duplicate image.

wazzaflow10 · August 30, 2023

4 hours ago, CAE82 said:

The aim of this ML model is to take into account the effect of the positions, since we don't know the formula and so that is the unknown. Although we know the weights of the attrbutes for each position, we do not know how the attribute weights are weighted for players with multiple positions e.g.

So this would be the point of creating an interaction between an attribute and a position. I'm simplifying here but this is how I would construct it (ignoring effects of footedness for now though I suspect you wouldn't need to interact footedness and it could be left as a main effects variable only).

CA ~ B1*pace + B2*GK Pos Ability + ... B(n) * ST Pos Ability + B(i) * Pace * GK Pos Ability + ... + B(i+n) * Pace * ST Pos Ability + B(j) * Pace * GK Pos Ability * CB Pos Ability + B(j+n) * Pace * GK Pos Ability * ST Pos Ability ...

In the full model you'd have every attribute + every position interacted all with each other.

If you have a player that is natural at both CM and RW for instance to find out the effect an attribute has on CA you'd find the attribute you want (we'll call this the attribute baseline), the positions they have listed (i'd expect this to be insiginficant), the attribute interacted with each CM and RW individually (we'll call this the positional baseline), the attribute interacted with CM and RW together (and this is your multipositional weight). The sum of that equation would be how much CA an attribute eats up and provide you a clear weighting schema for every position.

I'm not shocked that a "more complicated" regression technique works better. They typically do lots of these interactions/high dimensionality models automatically and don't tell you what the inputs are (hence the term 'black boxes'). I don't know what is available in Python but in R there is a package called LIME that will try to extract higher dimensionality models into human readable output. The benefits of using a more simple technique is that you'll be able to read the results much more easily. Its not often you can make sense of an interaction but in this case we have a good idea what it is we're putting in the model and the interpretation.

If you have raw data in a csv or something maybe i'll toy around with it if i find time. just pm me or however files can be transferred here.

Sign In

Using Machine Learning to Predict Current Ability

Recommended Posts

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members