Freethought Forum - View Single Post

erimir · 01-09-2018, 04:40 AM

Predictive text tends to have a short window. The main use case for predictive text is things like text messages and Twitter, so you're not expecting someone to write a whole book chapter with it, but mostly short text segments. Usually it's something like a trigram model, since it's not meant to write a whole sentence for you (bigram models used to be more common, but I think they've all gone beyond that now). Sometimes it can suggest multiple words, so I guess it may extend to a 4-gram or 5-gram model. It doesn't need to be longer because it merely suggests words with the expectation that you will very frequently not use them. A word being the most likely next word doesn't mean it's likely - most likely in this context often will only mean 5%... Even "the" only follows "of" about a quarter of the time, and that's because those are both function words. Individual content words are far less frequent than function words. Anyway, whenever you pick a different word, the previously generated suggestions are useless.

So those reasons are why you don't need to bother using a very long window for "predictive text".

And the thing about the window is, once you're outside the window, it has no memory of what was written. Because of this, it's very repetitive. There's no reason why it wouldn't suggest the same words in one sentence starting with "the" that it did in another sentence starting with "the", although you could introduce some randomness to counteract this. And since it has no knowledge of what was written before, it won't tend to be very coherent. The subject of one sentence won't have any connection to the previous sentence.

You could counter this by having the window cross sentence boundaries, but you'd still need a long enough window to keep the topics coherent, and at that point, your window might be so long that you start repeating sentences from the Harry Potter books wholesale. And either way, a simple predictive text model can't manage coreference - it cannot remember who is referred to by "he" or "she" because such information is not built into the model. If it were built into the model, it wouldn't be accurate, IMO, to call it "predictive text" anymore.

But anyway, that writing sample is not very repetitive, is too topically coherent, and doesn't seem to have pronouns that are difficult or impossible to resolve. All of which suggests it was not written purely with an n-gram language model aka predictive text.

And, most importantly, the authors say they "collaborate" with machines in the about section

01-09-2018, 04:40 AM	#17649
erimir Projecting my phallogos with long, hard diction Join Date: Sep 2005 Location: Dee Cee Gender: Male Posts: XMMMDCCCXXIX	Re: Miscellany Predictive text tends to have a short window. The main use case for predictive text is things like text messages and Twitter, so you're not expecting someone to write a whole book chapter with it, but mostly short text segments. Usually it's something like a trigram model, since it's not meant to write a whole sentence for you (bigram models used to be more common, but I think they've all gone beyond that now). Sometimes it can suggest multiple words, so I guess it may extend to a 4-gram or 5-gram model. It doesn't need to be longer because it merely suggests words with the expectation that you will very frequently not use them. A word being the most likely next word doesn't mean it's likely - most likely in this context often will only mean 5%... Even "the" only follows "of" about a quarter of the time, and that's because those are both function words. Individual content words are far less frequent than function words. Anyway, whenever you pick a different word, the previously generated suggestions are useless. So those reasons are why you don't need to bother using a very long window for "predictive text". And the thing about the window is, once you're outside the window, it has no memory of what was written. Because of this, it's very repetitive. There's no reason why it wouldn't suggest the same words in one sentence starting with "the" that it did in another sentence starting with "the", although you could introduce some randomness to counteract this. And since it has no knowledge of what was written before, it won't tend to be very coherent. The subject of one sentence won't have any connection to the previous sentence. You could counter this by having the window cross sentence boundaries, but you'd still need a long enough window to keep the topics coherent, and at that point, your window might be so long that you start repeating sentences from the Harry Potter books wholesale. And either way, a simple predictive text model can't manage coreference - it cannot remember who is referred to by "he" or "she" because such information is not built into the model. If it were built into the model, it wouldn't be accurate, IMO, to call it "predictive text" anymore. But anyway, that writing sample is not very repetitive, is too topically coherent, and doesn't seem to have pronouns that are difficult or impossible to resolve. All of which suggests it was not written purely with an n-gram language model aka predictive text. And, most importantly, the authors say they "collaborate" with machines in the about section