What's the point of the test set? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPre-processing (center, scale, impute) among training sets (different forms) and the test set - what is a good approach?Machine learning for Point Clouds Lidar dataHow to model user's buying behavior on Amazon?With unbalanced class, do I have to use under sampling on my validation/testing datasets?What's the best way to rank aggregate imdb rating data?How can l get 50 % examples in training set and 50% in test set for each class when splitting data?What is the appropriate name for this dataset?Sub-sampling so that sample statistics match population statisticsData set descriptions for frequent item-set mining data sethow to check the distribution of the training set and testing set are similar

What does 丫 mean? 丫是什么意思？

Crossing US/Canada Border for less than 24 hours

What is the meaning of 'breadth' in breadth first search?

Karn the great creator - 'card from outside the game' in sealed

How would a mousetrap for use in space work?

Intuitive explanation of the rank-nullity theorem

How do living politicians protect their readily obtainable signatures from misuse?

Draw 4 of the same figure in the same tikzpicture

What's the point of the test set?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

Is CEO the "profession" with the most psychopaths?

Misunderstanding of Sylow theory

Most bit efficient text communication method?

C's equality operator on converted pointers

Does the Mueller report show a conspiracy between Russia and the Trump Campaign?

How much damage would a cupful of neutron star matter do to the Earth?

What to do with repeated rejections for phd position

Random body shuffle every night—can we still function?

How to run automated tests after each commit?

preposition before coffee

Has negative voting ever been officially implemented in elections, or seriously proposed, or even studied?

How does the math work when buying airline miles?

How long can equipment go unused before powering up runs the risk of damage?

Why are vacuum tubes still used in amateur radios?

What's the point of the test set?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsPre-processing (center, scale, impute) among training sets (different forms) and the test set - what is a good approach?Machine learning for Point Clouds Lidar dataHow to model user's buying behavior on Amazon?With unbalanced class, do I have to use under sampling on my validation/testing datasets?What's the best way to rank aggregate imdb rating data?How can l get 50 % examples in training set and 50% in test set for each class when splitting data?What is the appropriate name for this dataset?Sub-sampling so that sample statistics match population statisticsData set descriptions for frequent item-set mining data sethow to check the distribution of the training set and testing set are similar

I get the point of a validation and training set, but the importance of a test set doesn't click for me.

Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.

After you've decided you have a model your proud of, you do a final sanity check on the test set, let's say the performance is trash. Are you really going to start all over? What decision making does it inform? In my workplace, the way timelines are structured, there's no time to start over.

asked 2 hours ago

Nick Corona

New contributor

$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch♦
1 hour ago

add a comment |

I get the point of a validation and training set, but the importance of a test set doesn't click for me.

Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.

asked 2 hours ago

Nick Corona

New contributor

$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch♦
1 hour ago

add a comment |

I get the point of a validation and training set, but the importance of a test set doesn't click for me.

Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.

asked 2 hours ago

Nick Corona

New contributor

I get the point of a validation and training set, but the importance of a test set doesn't click for me.

Let's say you train a model and you try your best to avoid overfitting by testing your model on the validation set.

dataset

asked 2 hours ago

Nick Corona

New contributor

asked 2 hours ago

Nick Corona

New contributor

asked 2 hours ago

Nick Corona

New contributor

asked 2 hours ago

Nick Corona

asked 2 hours ago

Nick Corona

New contributor

Nick Corona is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch♦
1 hour ago

add a comment |

$begingroup$
The test set is so that you don't cheat.
$endgroup$
– Stephen Rauch♦
1 hour ago

The test set is so that you don't cheat.

– Stephen Rauch♦
1 hour ago

add a comment |

2 Answers
2

active

oldest

votes

The point of a test set is to give you a final, unbiased performance measure of your entire model building process. This includes all modelling decisions in your pipeline, so any preprocessing, algorithm selection, feature engineering, feature selection, hyper parameter tuning and how you trained your model in general (5 fold? Bootstrapping? etc.). All of these decisions can lead to overfitting; for instance, selecting a set of hyperparameters that are coincidentally optimal for a particular validation set but not for the general population. If we have no test set you would not be able to identify this and would potentially be reporting highly optimistic scores.

Also, because the above modelling pipeline can get very complex, the possibility of leaking data and overfitting becomes very high. If you tune to your validation set, how will you know if your entire modelling process is not leaking data (and therefore overfitting?)

You bring up a good point; of course if we see that the test set score is poor then we will probably go back and tweak again. Thus, this just demotes the test set into a validation one if you use it too many times as you now run into the possibility of overfitting the test set (see almost every Kaggle competition). However, through repeated test set evaluation (train the model, then test it, then repeat with a different partioning) you will at least get a gauge on how variable your model is to help mitigate this problem. The amount of times you repeat will depend on how much the test set scores vary and how much uncertainty you are willing to accept (also time constraints).

In my opinion, in the business setting you should always make time to properly test your model. The dangers of overfitting are way too high and even worse; you would not even know it. If the test set scores end up being "trash" then at least you know the model is trash and you don't use it and/or you change your approach. This is way better than thinking the model is fantastic based off non rigorous validation and then having the model fail in production. The scientific method is there for a reason right?

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

add a comment |

I like your question, it is somewhat philosophical in nature.

We know that a test set should not affect the model, otherwise it acts as a validation set. Therefore, even if there is enough time, if we act on a bad test result and change the model, the test set becomes a validation set, although, it is not as involved as a validation set that is used for early stopping or parameter tuning.

In other words, a test set must be useless just the way you have described it! The moment it is useful, it becomes a validation set. Although, to be more precise, a test set is not THAT useless because it probably lowers your (and your boss's) expectation about the later performance of the model in production, so lower risk of heart failure there.

As an example, in a Kaggle competition, the final set is a "test set" since it does not affect the submitted models, however as soon as the final leaderboard is announced, that test set becomes a validation set; e.g., it affects which algorithms we later choose, i.e. those of top competitors.

In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.

P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)

answered 19 mins ago

Esmailian

3,476420

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);

;
$window.on('scroll', onScroll);

);

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49612%2fwhats-the-point-of-the-test-set%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

add a comment |

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

add a comment |

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

edited 1 hour ago

answered 1 hour ago

aranglol

1312

New contributor

answered 1 hour ago

aranglol

1312

answered 1 hour ago

aranglol

1312

New contributor

aranglol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

I like your question, it is somewhat philosophical in nature.

In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.

P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)

answered 19 mins ago

Esmailian

3,476420

add a comment |

I like your question, it is somewhat philosophical in nature.

In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.

P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)

answered 19 mins ago

Esmailian

3,476420

add a comment |

I like your question, it is somewhat philosophical in nature.

In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.

P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)

answered 19 mins ago

Esmailian

3,476420

I like your question, it is somewhat philosophical in nature.

In summary, it seems that most of the time we are using less-involved validation sets to double check more-involved validation sets.

P.S.: as of writing this answer, @aranglol came up with similar notes and examples :) (+1)

answered 19 mins ago

Esmailian

3,476420

answered 19 mins ago

Esmailian

3,476420

answered 19 mins ago

Esmailian

3,476420

answered 19 mins ago

Esmailian

3,476420

add a comment |

Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Nick Corona is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttykuu

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

瀋陽號驅逐艦目录接收與服役配置反潛直升機武進三型性能升級歷史除役參考資料外部連結导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编

Memorizing the KeyboardThe Norwegian Foreman''If the B…''The Consonant EaterThe Cherry TreeElle Rend Le Coeur Plus AmoureuxFill in the blanks with the number in wordsState of the UnionFind the missing elementsCircuit DiagramWhat's the name of the game show?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

瀋陽號驅逐艦 目录 接收與服役 配置反潛直升機 武進三型性能升級 歷史 除役 參考資料 外部連結 导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编

Memorizing the KeyboardThe Norwegian Foreman''If the B…''The Consonant EaterThe Cherry TreeElle Rend Le Coeur Plus AmoureuxFill in the blanks with the number in wordsState of the UnionFind the missing elementsCircuit DiagramWhat's the name of the game show?

2 Answers
2

2 Answers
2

2 Answers
2

瀋陽號驅逐艦目录接收與服役配置反潛直升機武進三型性能升級歷史除役參考資料外部連結导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编