Nyaggin'

Envisioning Future Forms of Artificial Intelligence Through the Lens of Knowledge Development

In large-scale deep learning, the emergence of scaling laws constrains the feasible scale of models in practical applications:

For a given AI task, the optimal loss L(N) achievable by a deep learning model of scale N is proportional to aN^(-α) + b. Here, α is an empirical slope, typically between 0.05 and 0.1, and b is the asymptotic lower bound of the loss, potentially related to the task type or the intrinsic entropy of the data.

This implies that as the model scale increases, the amount of scaling needed to achieve the same level of performance improvement grows exponentially. Therefore, in a reality with limited computational resources, it becomes impractical to rely on model scaling alone once a certain size is reached.

I believe this law arises from intrinsic limitations of deep learning itself:

  • Knowledge is represented as continuous tensors, lacking discrete compositionality;
  • Information flows only in one forward direction, lacking feedback and reconstruction;
  • Learning is driven by averaged error, lacking cognitive motivation;
  • There is no mechanism for "hypothesis generation and verification," and no ability for active construction.

In short, deep learning is extremely inefficient at extracting knowledge from existing data.

To quantify this inefficiency, consider the following idealized analysis:

Suppose we have a superintelligent system. For any given set of knowledge, it can derive a new piece of knowledge by reasoning over every subset. Then, for an input set of size N, the system could potentially generate 2^N new knowledge items—an exponential capacity.

Of course, a finite set of knowledge contains finite information. There is no reason to believe that this superintelligence can continue snowballing its way to infinite knowledge. We can regard the total amount of information derivable from an initial knowledge set as an upper bound—significantly greater than the literal information content of the original set. In the early stages of inference, newly derived knowledge may cause the total information to grow exponentially, but as this amount approaches the theoretical upper bound, further derivations yield diminishing returns—eventually contributing no new information. This is somewhat analogous to renormalization in physics: a simple theory might predict infinite energy near some limit, but in reality, unknown effects must intervene near that limit to keep the energy finite and meaningful.

Now consider the human brain. Constrained by its limited memory capacity, humans can only reason over two or three pieces of knowledge at once. Thus, the rate at which the human brain derives new knowledge from existing knowledge sets can be modeled as C(N, t), where t is a small positive number (approximately in the range [1, 4]), yielding a polynomial rate N^t.

Deep learning, as we already know from the scaling law, operates at a logarithmic rate in knowledge development.

The above is an imprecise but illustrative way to express the following insight: Compared to deep learning's statistical pattern recognition, the human brain possesses the capacity to deliberately derive new knowledge from existing knowledge. However, even the human brain falls short of the ideal exponential development capacity of the imagined superintelligence.

Based on this, I propose that a future AI paradigm, one more powerful than deep learning, should exhibit the following characteristics:

  • A discrete base of abstract knowledge;
  • The ability to "attempt understanding" of input datasets, converting them into knowledge and incorporating them into the base;
  • The capacity for internal reasoning over existing knowledge to discover new insights;
  • A heuristic-driven intermediary that selects valuable combinations of knowledge during each internal reasoning step—something I refer to as the “subconscious.”