# On Universal Prediction and Bayesian Confirmation

@article{Hutter2007OnUP, title={On Universal Prediction and Bayesian Confirmation}, author={Marcus Hutter}, journal={Theor. Comput. Sci.}, year={2007}, volume={384}, pages={33-48} }

Abstract The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model… Expand

#### Topics from this paper

#### Paper Mentions

#### 86 Citations

Parsimonious Inference

- Computer Science, Mathematics
- ArXiv
- 2021

The approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation, thus addressing a fundamental challenge in how to efficiently obtain predictions from data. Expand

Computability, inference and modeling in probabilistic programming

- Computer Science
- 2011

A new construction of the Mondrian process is presented as a partition-valued Markov process in continuous time, which can be viewed as placing a distribution on an infinite kd-tree data structure. Expand

Foundations of Induction

- Computer Science
- 2011

Solomonoff’s formal, general, complete, and essentially unique theory of universal induction and prediction, rooted in algorithmic information theory and based on the philosophical and technical ideas of Ockham, Epicurus, Bayes, Turing, and Kolmogorov, essentially constitutes a conceptual solution to the induction problem. Expand

On the Computability of Conditional Probability

- Mathematics, Computer Science
- J. ACM
- 2019

This work investigates the computability of conditional probability, a fundamental notion in probability theory, and a cornerstone of Bayesian statistics, and shows that there are computable joint distributions with noncomputable conditional distributions. Expand

Don't choose theories: Normative inductive reasoning and the status of physical theories

- Physics
- 2017

Evaluating theories in physics used to be easy. Our theories provided very distinct predictions. Experimental accuracy was so small that worrying about epistemological problems was not necessary.… Expand

On Finding Predictors for Arbitrary Families of Processes

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2010

It is shown that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too, and several sufficient and necessary conditions for the existence of a predictor are found, in terms of topological characterizations of the family $C$ as well as in local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors. Expand

Causal Inference Using the Algorithmic Markov Condition

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2010

This work explains why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. Expand

Characterizing predictable classes of processes

- Computer Science, Mathematics
- UAI
- 2009

It is shown that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of C, in other words, it can be obtaining as a Bayesian predictor whose prior is concentrated on a countable set. Expand

Replacing Causal Faithfulness with Algorithmic Independence of Conditionals

- Mathematics, Computer Science
- Minds and Machines
- 2012

This paper compares IC with causal faithfulness (FF), stating that only those conditional independences that are implied by the causal Markov condition hold true. Expand

Noncomputable Conditional Distributions

- Mathematics, Computer Science
- 2011 IEEE 26th Annual Symposium on Logic in Computer Science
- 2011

It is shown that in general one cannot compute conditional probabilities, so a pair of computable random variables are constructed in the unit interval whose conditional distribution P[Y|X] encodes the halting problem. Expand

#### References

SHOWING 1-10 OF 61 REFERENCES

On Generalized Computable Universal Priors and their Convergence

- Computer Science, Mathematics
- Theor. Comput. Sci.
- 2006

The central result is that the posterior of the universal semimeasure M converges rapidly to the true sequence generating posterior µ, if the latter is computable, and M is eligible as a universal predictor in case of unknown µ. Expand

The Selection of Prior Distributions by Formal Rules

- Mathematics
- 1996

Abstract Subjectivism has become the dominant philosophical foundation for Bayesian inference. Yet in practice, most Bayesian analyses are performed with so-called “noninformative” priors, that is,… Expand

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2003

Pareto-optimality of ξ is shown and an Occam's razor argument that the choice wν &sim 2-K(ν) for the weights is optimal, where K(κ) is the length of the shortest program describing ν, and the results are applied to games of chance. Expand

On the Existence and Convergence of Computable Universal Priors

- Mathematics, Computer Science
- ALT
- 2003

The existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: finitely computable, estimable, enumerable, and approximable, are investigated. Expand

The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

- Mathematics, Computer Science
- COLT
- 2002

This work replaces Solomonoff's optimal but noncomputable method for inductive inference with the novel Speed Prior S, under which the cumulative a priori probability of all data whose computation through an optimal algorithm requires more than O(n) resources is 1/n. Expand

Inferences from Multinomial Data: Learning About a Bag of Marbles

- Mathematics
- 1996

A new method is proposed for making inferences from multinomial data in cases where there is no prior information. A paradigm is the problem of predicting the colour of the next marble to be drawn… Expand

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

- Computer Science, Mathematics
- ALT
- 2004

A new upper bound on the prediction error for countable Bernoulli classes is derived, which implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. Expand

MDL convergence speed for Bernoulli sequences

- Mathematics, Computer Science
- Stat. Comput.
- 2006

The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied and a new upper bound on the prediction error for countable Bernoulli classes is derived. Expand

A Formal Theory of Inductive Inference. Part II

- Computer Science, Mathematics
- Inf. Control.
- 1964

1. Summary In Part I, four ostensibly different theoretical models of induction are presented, in which the problem dealt with is the extrapolation of a very long sequence of symbols—presumably… Expand

Information-theoretic asymptotics of Bayes methods

- Mathematics, Computer Science
- IEEE Trans. Inf. Theory
- 1990

The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector. Expand