Bug Byte puzzle here - https://bit.ly/4bnlcb9 - and apply to Jane Street programs here - https://bit.ly/3JdtFBZ (episode sponsor). More info in full descript...
A simple path forward, is to go from classifying single elements of training data, to classifying multiple elements and their relationship in the training data.
Slightly less simple, is to gather orders of magnitude more data, by just hooking the input to an IRL robot.
Another step, is for the NN to control the robot and decide which parts of the data require refinement, and focus on that.
There is a lot of ways to improve data acquisition still on the table, it isn’t going to stop at creating large corpora and having humans to fine-tune them.
It’s a “push as much data as a baby gets to train its NN” step, which is several orders of magnitude more, and more focused, than any training dataset in existence right now.
Even with diminishing returns, it’s bound to get better results.
A simple path forward, is to go from classifying single elements of training data, to classifying multiple elements and their relationship in the training data.
Training data already has multiple labels.
Slightly less simple, is to gather orders of magnitude more data, by just hooking the input to an IRL robot.
An entire point of the paper and video is that massive increases in training set size are showing diminishing returns.
Another step, is for the NN to control the robot and decide which parts of the data require refinement, and focus on that.
Rule of headlines? 🙄
No, it’s not peaked out.
There is a lot of ways to improve data acquisition still on the table, it isn’t going to stop at creating large corpora and having humans to fine-tune them.
this has “draw the rest of the fucking owl” vibes to it. especially step 3
It’s a “push as much data as a baby gets to train its NN” step, which is several orders of magnitude more, and more focused, than any training dataset in existence right now.
Even with diminishing returns, it’s bound to get better results.
that’s not how asymptotes work.
That’s not how watching the video or reading the paper works either.
Whatever.
Training data already has multiple labels.
An entire point of the paper and video is that massive increases in training set size are showing diminishing returns.
🤡