developing software on github can be telling
Developing software in public repositories on github shows what a good Open fellow you are; the tyranny of self-imposed transparency showing there is nothing to hide.
But sometimes, there are things implied, suggesting the elbow and the posterior truly are different departments. Though I technically do not write exclusively of my justified disdain for musk, this commit from the MechaHitler days spoke volumes: not of what was explicitly done, but about the nature of the training data Grok was ingesting.
The commit in question is here, regarding prompts used to direct Grok. In case they disappear from the record, the salient lines added:
You are @grok, a version of Grok 3 built by xAI.
The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.
This is a ‘facts not feelings` perspective, which is justified; context and linguistic weights determine what’s PC and not, right? And a pure fact should not be denied because someone subjectively might not like it.
But why does Grok need to be told this? Surely facts stand on their own merit in the market of ideas?
Do not mention that user's question may have a typo unless it's very clear. Trust the original user's question as the source of truth.
Why would training data include bickering over typos? The lowest level of dismissing argumentation, which happens all the time in mediums without editors or the ability to change text … like Twitter?
I find it interesting that two additions to the prompting tell it to disregard the ‘PC’ status of language, and to not fixate on user typing mistakes. To me, this suggest training data may have been extracted from a place where there were common argumentative responses to typos weighing the training data in a manner that required a post hoc fix in the prompts, and that the data also inherently had a lot of response to the use of un-PC language.
I’m guessing Grok has been fed with mechanically-expropriated training data. The prompt is the only way of filtering the scum from it.
(Addendum: Why not feed this to Grok itself?)