132 qa interview questions and answers pdf 1. 132 QA interview questions and answers Useful materials: Complete Interview Questions and Answers Guide and Tips to frequently asked questions with answers. Most common mock interview questions and best answers.

More Must- Know Data Science Interview Questions and Answers. The post 2. 1 Must- Know Data Science Interview Questions and Answers was the most viewed post of 2. For 2. 01. 7, KDnuggets Editors bring you 1. Data Science Interview Questions and Answers. Because some of the answers are quite lengthy, we will publish them in 3 parts over 3 weeks. This is part 1, which answers the 6 questions below.

Here is part 2 and part 3. This post answers questions: Q1. What are Data Science lessons from failure to predict 2. US Presidential election (and from Super Bowl LI comeback)Q2. What problems arise if the distribution of the new (unseen) test data is significantly different than the distribution of the training data? Q3. What are bias and variance, and what are their relation to modeling data?

Q4. Why might it be preferable to include fewer predictors over many? Q5. What error metric would you use to evaluate how good a binary classifier is?

What if the classes are imbalanced? What if there are more than 2 groups? Q6. What are some ways I can make my model more robust to outliers? Q1. What are Data Science lessons from failure to predict 2.

US Presidential election (and from Super Bowl LI comeback)Gregory Piatetsky answers: Just before the Nov 8, 2. Hillary Clinton an edge of ~3% in popular vote and 7.

Nate Silver's Five. Thirty. Eight had the highest chances of Trump Victory at ~3.

New York Times Upshot and Princeton Election Consortium estimated only ~1. Huffington Post gave Trump only 2% chance of victory. So what are the lessons for Data Scientists? To make a statistically valid prediction we need. Events can placed on the scale from deterministic (2+2 will always equal to 4) to strongly predictable (e. Pollsters need to get a representative sample, estimate the likelihood of a person actually voting, make many justified and unjustified assumptions, and avoid following their conscious and unconscious biases.

In the case of US Presidential election, correct prediction is even more difficult because of the antiquated Electoral college system when each state (except for Maine and Nebraska) awards the winner all its votes in the electoral college, and the need to poll and predict results for each state separately. The chart below shows that in 2. US presidential elections pollsters were off the mark in many states. They mostly underestimated the Trump vote, especially in 3 critical states of Michigan, Wisconsin, and Pennsylvania which all flipped to Trump. Source: @Nate. Silver.

Nov 9, 2. 01. 6. A few statisticians like Salil Mehta @salilstatistics were warning about unreliability of polls, and David Wasserman of 5. Sep 2. 01. 6 How Trump Could Win The White House While Losing The Popular Vote, but most pollsters were way off.

So a good lesson for Data Scientists is to question their assumptions and to be very skeptical when predicting a weakly predictable event, especially when based on human behavior. Other important lessons are. Examine data quality - in this election polls were not reaching all likely voters. Beware of your own biases: many pollsters were likely Clinton supporters and did not want to question the results that favored their candidate. For example, Huffington Post had forecast over 9.

Clinton Victory. See also other analyses of 2. Note: this answer is based on a previous KDnuggets post: http: //www. We had another example of statistically very unlikely event happen in Super Bowl LI on Feb 5, 2. ESPN estimated Falcons win probability at that time at almost 1. Salil Mehta tweet Salil Mehta tweet, Feb 6, 2. Never before has a team lost a Super Bowl after holding such advantage.

You need to understand the risk factors when dealing with such events, and try to avoid using probabilities, or if you have to use numbers, have a wide confidence range. Finally, if the odds seem to be against you but the event is only weakly predictable, go ahead and do your best - sometimes you will be able to beat the odds. Q2. What problems arise if the distribution of the new (unseen) test data is significantly different than the distribution of the training data? Gregory Piatetsky and Thuy Pham answer: The main problem is that the predictions will be wrong ! If the new test data is sufficiently different in key parameters of the prediction model from the training data, then predictive model is no longer valid. The main reasons this can happen are sample selection bias, population drift, or non- stationary environment.

Sample selection bias. Here the data is static, but the training examples have been obtained through a biased method, such as non- uniform selection or non- random split of data into train and test. If you have a large static dataset, then you should randomly split it into train/test data, and the distribution of test data should be similar to training data.

Covariate shift aka population drift. Here the data is not static, with one population used as a training data, and another population used for testing.(Figure from http: //iwann.

Invited. Talk- FHerrera- IWANN1. Sometimes the training data and test data are derived via different processes - eg a drug tested on one population is given to a new population that may have significant differences. As a result, a classifier based on training data will perform poorly. One proposed solution is to apply a statistical test to decide if the probabilities of target classes and key variables used by the classifier are significantly different, and if they are, to retrain the model using new data. Non- stationary environments. Training environment is different from the test one, whether it's due to a temporal or a spatial change.

This is similar to case b, but applies to situation when data is not static -  we have a stream of data and we periodically sample it to develop predictive models of future behavior. Another typical case is customer analytics where customer behavior changes over time. Neural Computation 1. What are bias and variance, and what are their relation to modeling data? Matthew Mayo answers: Bias is how far removed a model's predictions are from correctness, while variance is the degree to which these predictions vary between model iterations. Bias vs Variance, Image source.

As an example, using a simple flawed Presidential election survey as an example, errors in the survey are then explained through the twin lenses of bias and variance: selecting survey participants from a phonebook is a source of bias; a small sample size is a source of variance. Minimizing total model error relies on the balancing of bias and variance errors. Ideally, models are the result of a collection of unbiased data of low variance. Unfortunately, however, the more complex a model becomes, its tendency is toward less bias but greater variance; therefore an optimal model would need to consider a balance between these 2 properties. The statistical evaluation method of cross- validation is useful in both demonstrating the importance of this balance, as well as actually searching it out.

The number of data folds to use - - the value of k in k- fold cross- validation - - is an important decision; the lower the value, the higher the bias in the error estimates and the less variance. Bias and variance contributing to total error, Image source. Conversely, when k is set equal to the number of instances, the error estimate is then very low in bias but has the possibility of high variance.

The most important takeaways are that bias and variance are two sides of an important trade- off when building models, and that even the most routine of statistical evaluation methods are directly reliant upon such a trade- off. On next page, we answer. Why might it be preferable to include fewer predictors over many? What error metric would you use to evaluate how good a binary classifier is? What are some ways I can make my model more robust to outliers?