This model associates with each word a vector which locates it in a high-dimensional abstract space, near other words that occur in similar contexts and far from those which don’t. When producing text, it looks at the previous string of words and constructs a different vector, locating the word’s surroundings – its context – near those that occur in the context of similar words. We can think of these heuristically as representing the meaning of the word and the content of its context. But because these spaces are constructed using machine learning by repeated statistical analysis of large amounts of text, we can’t know what sorts of similarity are represented by the dimensions of this high-dimensional vector space. Hence we do not know how similar they are to what we think of as meaning or context. The model then takes these two vectors and produces a set of likelihoods for the next word; it selects and places one of the more likely ones—though not always the most likely. Allowing the model to choose randomly amongst the more likely words produces more creative and human-like text; the parameter which controls this is called the ‘temperature’ of the model and increasing the model’s temperature makes it both seem more creative and more likely to produce falsehoods. The system then repeats the process until it has a recognizable, complete-looking response to whatever prompt it has been given. - https://link.springer.com/article/10.1007/s10676-024-09775-5