Jump to content

Using Machine Learning to Predict Current Ability


Recommended Posts

You probably already know this, but CA is a weighted average of attribute scores (technically attribute scores on a 200 point scale with the displayed values from 1-20 being rounded), so only the slight nonlinearity of the model stops it from being relatively trivial to figure out the weights by position from some sort of regression.

As I understand it, where a player is 20 out of 20 in multiple positions, the weight for each attribute is the highest weight of any of the positions a player can play. e.g a natural DC/ST weights the marking as important like a DC and the finishing as important like an ST. Assume this is scaled for positions a player has 15 or 18 ability in

Ability with the weaker foot has a relatively high weighting towards CA too, and some attributes like Determination and Natural Fitness have zero weights (although they're probably correlated with CA anyway which might confound the model a little). This might explain the [rest of the] prediction error in your model.

 

Edited by enigmatic
Link to post
Share on other sites

Thanks for your comment @enigmatic. I think the attributes are actually 1-100 based on using CheatEngine to load the RAM in FM and having a poke around. Could be wrong though.

Unfortunately I do not think the weights work like that (highest weight) based on my testing, I could not get decent results trying lots of different weight combinations. I do think the positions are grouped. So a DL becoming a D/WBL does not make a big difference due to the similarity in the positions. Similar if a DL becomes a DRL; playing on both sides doesn't seem to cause an obvious increase in CA either. In the end I gave up and hence used ML :).

I actually only included the features that are known to contribute to CA; so the ML model does not include things like Determination, Flair and so on.

I also think there is some non-linearity in the model for high-valued attributes i.e. the weights are not constant for the whole 1-20 range. It seems that going from 12-16 does not cause as big a CA increase as going from 16-20 for example. Based on my earlier experiments I know that 6 in all attributes corresponds to a CA of 1 and 16's in all attributes corresponds to a CA of 200. So I presume that the range of 16-20 is treated somewhat differently.

Link to post
Share on other sites

  • CAE82 changed the title to Using Machine Learning to Predict Current Ability
On 21/08/2023 at 10:20, CAE82 said:

Unfortunately I do not think the weights work like that (highest weight) based on my testing, I could not get decent results trying lots of different weight combinations. I do think the positions are grouped. So a DL becoming a D/WBL does not make a big difference due to the similarity in the positions. Similar if a DL becomes a DRL; playing on both sides doesn't seem to cause an obvious increase in CA either. In the end I gave up and hence used ML :).

Highest weight is why the weights work like that in your testing: a DR has the exact same weight as a DL so taking highest weight of both positions for doesn't affect CA. WB weights very similar to fullback weights [WB slightly higher for dribbling and slightly lower for concentration IIRC] so a D/WBL will have similar CA to the equivalent DL. But a good finishing rating will have not much effect on a DL and quite a bit more on a D/AML because it will take fullback weightings for defensive attributes and winger ratings for finishing

It might be difficult to estimate for players with 15-19 positional ratings though as the weights will be <100%

Link to post
Share on other sites

It’s not a simple highest weight, even for a DC/MC/SC. For two positions, the CA seems to be the same as the highest CA for each individual position. So if you had a player who was a 120CA DC and a 130CA MC, he would have a 130CA as a D/MC. It seems to be when you add in a 3rd position then the CA changes significantly. But it doesn’t appear to be simply from the highest of each of the three weights.

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached. 

Link to post
Share on other sites

50 minutes ago, CAE82 said:

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached.

It’s not that straight forward.  Sure, if you use an editor to force a player to become “Natural” (or lower) in a new position we may see an uplift in CA so long as the player has relevant attributes for that new position which would affect weightings.

But in reality, getting a player to learn a new position - and thus potentially impact CA - depends on a large number of conditions such as age, playing time, versatility, attribute suitability and so on.  Thus the impact on CA of position training isn’t necessarily as large as is often thought.

Link to post
Share on other sites

I’m not sure you’re correct there @herne79. I’ve never seen an ‘uplift’ in CA after a player gaining new positions (naturally or ‘forced’ through the editor). It usually seems to be the (R)CA stays the same and the attributes  redistribute accordingly.

But of course I respect and appreciate your insight and will bear it in mind.

After more than a quarter of a century playing (CM) FM it’s regretful that the real interest and enjoyment comes from these kinds of toy experiments and I hope that FM24/25 bring in some genuinely new features that actually makes the game interesting again. 

Link to post
Share on other sites

1 hour ago, CAE82 said:

I’m not sure you’re correct there @herne79. I’ve never seen an ‘uplift’ in CA after a player gaining new positions (naturally or ‘forced’ through the editor). It usually seems to be the (R)CA stays the same and the attributes  redistribute accordingly.

But of course I respect and appreciate your insight and will bear it in mind.

After more than a quarter of a century playing (CM) FM it’s regretful that the real interest and enjoyment comes from these kinds of toy experiments and I hope that FM24/25 bring in some genuinely new features that actually makes the game interesting again. 

This might help (quote from SI):

https://community.sigames.com/forums/topic/520654-new-position-training-and-current-ability/?do=findComment&comment=12411290

Link to post
Share on other sites

Thanks. Have seen that before. Lots of interesting stuff posted by Seb Wassell, pity he doesn’t seem to post a lot any more. Maybe he left SI. I used to spend a long time on the ‘Developers Posts’ tab going through and reading all their posts to try and glean any insights into the game. But now seems to be mostly off-topic stuff. 

Link to post
Share on other sites

On 27/08/2023 at 13:52, CAE82 said:

It’s not a simple highest weight, even for a DC/MC/SC. For two positions, the CA seems to be the same as the highest CA for each individual position. So if you had a player who was a 120CA DC and a 130CA MC, he would have a 130CA as a D/MC. It seems to be when you add in a 3rd position then the CA changes significantly. But it doesn’t appear to be simply from the highest of each of the three weights.

Anyhow, it doesn’t really matter the exact formula but we all know that playing in multiple positions will lead to a larger CA for a given set of attributes, thus leaving less growth until the PA is reached. 

Not really sure what you're trying to predict here. If you mean explain CA that makes more sense as an experiment. Predicting CA from current attributes would be like "predicting height in centimeters using inches". I think what you're trying to say is you want to find out that cm = 2.54 * in. 

CA is a known value with a known, albeit complicated formula you can observe from the editor. This would be more a "fun learn how regression works in stats 101" model since we should be able to extract the beta weights of attributes to match the editor values exactly if we have the right formula. If we don't we know the weights are suffering from omitted variable bias or multicolinearity or the wrong specification. TBH you could get away with a boring old linear regression. I'd personally start with

CA ~ attributes + binaries of positions or the values provided for awkward/competent/accomplished/natural + footedness + an interaction between attributes and position + interaction between attributes and footedness + interaction between position and footedness + interaction between attributes, position and footedness.  

I'm probably forgetting some things that affect CA but that's how I would start if this were a real experiment with unknown inputs. 

Link to post
Share on other sites

The CA formula is not really known @wazzaflow10 (unless you have inside information from SI). I’ve worked it out for a player with a single position and ‘average’ attributes to a pretty good approximation, but it was too complex to work out for multiple positions and it had some flaws for high attribute values (for example, players with say 19 for acceleration and pace) which hints at some non-linearity. Indeed, the ‘boring old’ linear kernel didn’t work well (neither did the polynomial one) and I got the best results using the RBF kernel.

Now with the model you just supply the attributes that contribute to CA, position ratings and weaker foot (47 inputs in total) and it does a pretty good job of predicting CA without any a priori knowledge of the attribute weights. I’ve spoken with colleagues who are experts in deep learning and I know about physics-informed models (i.e. providing the model with a priori knowledge of attribute weights) but do not have enough knowledge (or motivation) to try and implement it.

I’m not sure what you mean by multicolinearity but if you are suggesting that the inputs are not independent when it comes to CA then I’ve never read or seen anything to suggest that is the case? I’d be interested to hear more. Of course, I’d also be interested to see the outcome of your ‘real experiment’ if you ever find the time or motivation.

Link to post
Share on other sites

Actually, regarding multicolinearity, I guess you are referring to the fact that the attributes weights are affected by the position ratings? So the position rating inputs essentially affect the attribute inputs too?

Link to post
Share on other sites

7 hours ago, CAE82 said:

The CA formula is not really known @wazzaflow10 (unless you have inside information from SI). 

If you open the pre game editor you can see how much weight an attribute has on CA points. That is known and the value of that weight depends on position (and possibly a few other things I'm not aware of). Position matters a great deal as it determines how much CA is used by an attribute as enigmatic has pointed out. If that weight is zero then that attribute should yield no input into determining the CA of a player.

https://www.fmscout.com/a-guide-to-current-ability-in-football-manager.html?pg=1

I'm actually assuming you've posted this article given the name... which makes this post strange to me since you're aware that weightings are different.

I assume this value ultimately is what you are trying to get at in the model. If it isn't I'm not really sure what this ML exercise is trying to accomplish since attributes are what determine CA. It's not really predicting anything since you're using the thing that directly determines CA to predict CA. You should be able to get CA exactly for every player with the right formula.  Again it'd be like using inches to determine how tall someone is in centimeters. Contrast this with say a scout report (that you codify into values) or perhaps the stats the game produces where there isn't a direct relationship between the target variable and the inputs. So a prediction in height would be using relatives of someone or height at a younger age. 

 

7 hours ago, CAE82 said:

but it was too complex to work out for multiple positions and it had some flaws for high attribute values (for example, players with say 19 for acceleration and pace) which hints at some non-linearity. Indeed, the ‘boring old’ linear kernel didn’t work well (neither did the polynomial one) and I got the best results using the RBF kernel.

Well this could just be plain old not enough data rather than non linearity. As for a linear regression not working I would suspect that it is due to wrong specification rather than the the wrong choice of model since again we can see the attribute weighting on CA in the editor. The only thing I'm unaware of for sure is if going from say 10 pace to 11 pace cost less CA than going from 15 to 16 and therefore the cost in CA between 10 pace and 15 is not linear. If that is true than a linear model would of course not work and you'd have to find another latent variable that increases the cost of CA per attribute point. 

I unfortunately don't have time to do this with a day job and a kid.. i'd rather just play the game :)

Link to post
Share on other sites

Yes, I did write that a few years ago and know all about the weights. That works for a single position, but not multiple positions.

The aim of this ML model is to take into account the effect of the positions, since we don't know the formula and so that is the unknown. Although we know the weights of the attrbutes for each position, we do not know how the attribute weights are weighted for players with multiple positions e.g.

image.png.47f638750dd407b5fe530e73042a977b.png

The overall CA is not a simple weighted mean of the CA for each individual position nor does it appear that it is a simple maximum weight. I could never deduce the relationship and hence resorted to using ML which does exactly what is it meant to do - we provide the attributes and positions and it gives a pretty good prediction of the CA.

In terms of linearity, if you look at an individual attribute, then it has a linear relationship to CA e.g.

image.png.fec9561b37f56b54d206673fbf1ea6da.png

However, the method posted on FM Scout seemed to breakdown for players with several high-valued, high-weighted attributes. For example, Dan James was an 'outlier' due to his very high acceleration, agilty, pace and stamina; all highly-weighted for wingers. Adama Traore was another. This hinted at some non-linearity at the extremes (see the graph on FM Scout - it is clearly linear in the range 6-16 but then the CA caps at 200. I imagine this is why players like Haaland, Mbappe and De Bruyne have some very low values for some (albeit irelevant) attributes. Marking, positioning, tackling are typically very low. Of course, you may say 4 is appropriate but it could also be because if it is 6 then they'll hit the 200 cap.

Anyhow, the model does what I wanted it to - I can now provide attributes and positions and predict a fairly accurate CA. No one knows the underlying formula for sure and with ML we don't need to. I've not seen a more accurate model of CA in the FM Community and so I am happy with the outcome of a fun experiment one weekend when I had a few spare hours (after too many years I'm bored of the game itself, hoping for some genuinely new features in FM25, not expecting much from FM24).

Edited by CAE82
Remove duplicate image.
Link to post
Share on other sites

4 hours ago, CAE82 said:

The aim of this ML model is to take into account the effect of the positions, since we don't know the formula and so that is the unknown. Although we know the weights of the attrbutes for each position, we do not know how the attribute weights are weighted for players with multiple positions e.g.

So this would be the point of creating an interaction between an attribute and a position. I'm simplifying here but this is how I would construct it (ignoring effects of footedness for now though I suspect you wouldn't need to interact footedness and it could be left as a main effects variable only).

CA ~ B1*pace + B2*GK Pos Ability + ... B(n) * ST Pos Ability + B(i) * Pace * GK Pos Ability + ... + B(i+n) * Pace * ST Pos Ability + B(j) * Pace * GK Pos Ability * CB Pos Ability + B(j+n) * Pace * GK Pos Ability * ST Pos Ability ...

In the full model you'd have every attribute + every position interacted all with each other. 

If you have a player that is natural at both CM and RW for instance to find out the effect an attribute has on CA you'd find the attribute you want (we'll call this the attribute baseline), the positions they have listed (i'd expect this to be insiginficant),  the attribute interacted with each CM and RW individually (we'll call this the positional baseline), the attribute interacted with CM and RW together (and this is your multipositional weight). The sum of that equation would be how much CA an attribute eats up and provide you a clear weighting schema for every position.

I'm not shocked that a "more complicated" regression technique works better. They typically do lots of these interactions/high dimensionality models automatically and don't tell you what the inputs are (hence the term 'black boxes'). I don't know what is available in Python but in R there is a package called LIME that will try to extract higher dimensionality models into human readable output. The benefits of using a more simple technique is that you'll be able to read the results much more easily. Its not often you can make sense of an interaction but in this case we have a good idea what it is we're putting in the model and the interpretation.

If you have raw data in a csv or something maybe i'll toy around with it if i find time. just pm me or however files can be transferred here.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...