1596471720

In data science, the traditional approach to dealing with prediction error reminds me of the iceberg metaphor. Our models cannot predict the future perfectly, so we optimize them to push the prediction error down as far as we can go. Better models, hopefully, lead to smaller errors. But there is a certain lower bound no model will be able to beat: the Bayes error. The Bayes error is the manifestation of the randomness inherent to the underlying problem, which is often considerable.

Once we come close to this Bayes performance, we are tempted to declare victory and conclude our work. In doing so, we have dealt with the visible part of the iceberg. But, we can do so much better by diving in and building a model for the uncertainty part of the problem, the area of the iceberg we cannot see. Quantifying the uncertainty will allow us to amend the point predictions (= best guesses) with deviation estimates and, ultimately, will lead to better and more robust models we can trust.

In this article, I am going to talk about probabilistic predictions, which unify point prediction and uncertainty modeling in one consistent framework. Using probabilistic prediction models we can calculate best-guess predictions and derive the “safety buffers” that, together, result in well informed decisions and our piece of mind. Let’s dive in!

This might be a somewhat extreme example of a game of chances, but it’s an ideal playground to explain some fundamental concepts from the probability theory I am going to use in this article. In case you are not familiar with this “game”, I suggest you consult the following Wikipedia article.

The act of loading a gun with one round, spinning the cylinder and pulling the trigger is an example of a *random experiment*. The defining property of a random experiment is the fact that each repetition of such an experiment might lead to a different outcome, but the set of outcomes is known in advance.

In Russian roulette, there are two possible outcomes of the experiment: the gun fires or not. It is handy to assign a mathematical object Y to the outcome of a random experiment. Y is called a *random element* of the outcome set. (Sometimes people call Y a random variable.) This means that Y = ‘gun fired’, if that is the event we observed after pulling the trigger, and, otherwise, the value of Y is ‘gun didn’t fire’. If you are a programmer, you might think of Y as a function with no arguments that, if evaluated, returns the outcome of the experiment.

If we load a gun with just one round, spin the cylinder, and try to shoot, it *probably* won’t fire. More precisely, the probability of the gun firing is ⅙. Let’s celebrate this rare occasion: our intuition matches the mathematical definition of the thing exactly. The *probability* of an event is a number between 0 and 1. Zero means “it ain’t gonna happen”. We assign one to a surefire event. Numbers strictly between zero and one indicate various degrees of “likeliness”.

A *probability distribution*, often denoted by F, encodes the probabilities of the outcomes of a random experiment, or, equivalently, the probabilities of the possible values of a random element Y. So you might ask F, what is the probability that Y = “gun fires”. And F will give you the answer: F(Y = “gun fires”) = ⅙. Because F is so reliable in answering these questions, people sometimes call F the_ law_ of Y.

The business of mathematics is that of generalizing ideas and applying the results to new problems. It usually works as follows: now that we know almost everything about the classical and venerable Russian roulette, let’s go a bit further and define the game of *Generalized Russian Roulette*.

#probability #machine-learning #data-science #statistics #deep learning

1623223443

Predictive modeling in data science is used to answer the question “What is going to happen in the future, based on known past behaviors?” Modeling is an essential part of data science, and it is mainly divided into predictive and preventive modeling. Predictive modeling, also known as predictive analytics, is the process of using data and statistical algorithms to predict outcomes with data models. Anything from sports outcomes, television ratings to technological advances, and corporate economies can be predicted using these models.

**Classification Model:**It is the simplest of all predictive analytics models. It puts data in categories based on its historical data. Classification models are best to answer “yes or no” types of questions.**Clustering Model:**This model groups data points into separate groups, based on similar behavior.- **Forecast Model: **One of the most widely used predictive analytics models. It deals with metric value prediction, and this model can be applied wherever historical numerical data is available.
**Outliers Model:**This model, as the name suggests, is oriented around exceptional data entries within a dataset. It can identify exceptional figures either by themselves or in concurrence with other numbers and categories.**Time Series Model:**This predictive model consists of a series of data points captured, using time as the input limit. It uses the data from previous years to develop a numerical metric and predicts the next three to six weeks of data using that metric.

#big data #data science #predictive analytics #predictive analysis #predictive modeling #predictive models

1617419868

As AI becomes more ubiquitous, it’s also become more autonomous — able to act on its own without human supervision. This demonstrates progress, but it also introduces concerns around control over AI. The AI Arms Race has driven organizations everywhere to deliver the most sophisticated algorithms around, but this can come at a price, ignoring cultural and ethical values that are critical to responsible AI. Here are five predictions on what we should expect to see in AI in 2021:

- Something’s going to give around AI governance
- Most consumers will continue to be sceptical of AI
- Digital transformation (DX) finds its moment
- Organizations will increasingly push AI to the edge
- ModelOps will become the “go-to” approach for AI deployment.

#opinions #2021 ai predictions #ai predictions for 2021 #artificial intelligence predictions #five artificial intelligence predictions for 2021

1601204400

It helps to create “**statistical models**” of real-world processes.

“Probabilistic programming allows for incorporating domain knowledge in the models and makes the machine learning system more interpretable”

Infer.NET process by compiles a model definition into the source code required to output a set of inference queries on the model. The diagram below summarises the inference process.

Source Microsoft

The steps are:

- The user creates a
**model definition**and specifies a collection of**inference queries**related to that model.The user shares the model definition and inference queries to the**model compiler**, which creates the source code — this writes the source code into a file.Compile source code to create an**algorithm**.With a set of**observed values**, the inference engine executes the compiled algorithm.

#dotnet-core #machine-learning #probabilistic #csharp #probabilistic-programming

1623983672

We’ve had our share of predictions in possibly every field that one can think of. Data analytics is one field that is never behind when it comes to predictions. With an enormous amount of data to deal with, this field opens doors for truck-loads of predictions and it is for this very reason that “analytics” has been the centre of attraction in probably every aspect and garnered eyeballs from all across the world. Some of the key predictions for this field in the years to come are-

…

#big data #latest news #prediction for the world of big data analytics #big data analytics #predictions #data

1624992720

The past year has been tumultuous, with many lessons still being revealed today. The COVID-19 crisis has stimulated digital transformation, forcing incumbent organisations to digitize their processes, modernise business models, enable data access, and upskill their workforce for a data-driven age. The COVID-19 pandemic has also proven the need for everyone to be data-fluent, informed citizens, as data can be used to inform and misinform us on the state of the pandemic.

As data science is becoming mature, organisations across the world are trying to increase their digital resilience, and become more data-driven in the process. With data science methodologies and technologies, specialised teams have worked on solving critical problems like self-driving cars, protein folding, and algorithmic trading programs. However, the applications of data science are widespread. It’s about creating data fluent organisations and societies, where everyone is equipped with the required skills they need to be informed, citizens, and employees. Over the coming years, we will see better tooling across the spectrum of data fluency. Meanwhile, let’s explore data trends and predictions for 2021.

#big data #latest news #data trends #predictions #data trends and predictions for the year of 2021