Iteratively Build a Summary Dataset in an Effective Way The Next CEO of Stack Overflow
Avoiding the "not like other girls" trope?
The sum of any ten consecutive numbers from a fibonacci sequence is divisible by 11
Simplify trigonometric expression using trigonometric identities
How to show a landlord what we have in savings?
Physiological effects of huge anime eyes
Noise during hard braking
What difference does it make matching a word with/without a trailing whitespace?
How exploitable/balanced is this homebrew spell: Spell Permenancy?
How do I secure a TV wall mount?
Free fall ellipse or parabola?
Strange use of "whether ... than ..." in official text
Is there a rule of thumb for determining the amount one should accept for of a settlement offer?
How can I force the size of an int for debugging purposes?
What is Decreasing Arithmetic progression?
Is a distribution that is normal, but highly skewed, considered Gaussian?
Does the Idaho Potato Commission associate potato skins with healthy eating?
Calculate the Mean mean of two numbers
What does this strange code stamp on my passport mean?
What are the unusually-enlarged wing sections on this P-38 Lightning?
Could a dragon use its wings to swim?
Percent Dissociated from Titration Curve
Why was Sir Cadogan fired?
My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?
Words hidden in my phone number
Iteratively Build a Summary Dataset in an Effective Way
The Next CEO of Stack Overflow
$begingroup$
This is a problem I find a lot!! Can I achieve this goal without consuming so much time?
My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.
PROBLEM:
I want to extract summary data from a larger dataset and I only know how to do so utilizing next For loops. For example, I have a large dataset containing golf data, and I would like to extract summary statistics for the individual golf holes.
This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).
import numpy as np
import pandas as pd
import itertools
seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par
df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)
all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4
hold_list = [] #blank list
for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):
hold_data = df[((df['season'] == season) & (df['hole'] == hole))
& (df['round'].isin(round_ids[round-1]))]
mean_score = hold_data['score'].mean()
vspar_distro = hold_data['score'].value_counts().to_dict()
for score in all_scores:
count_score = 0
if score in vspar_distro:
count_score = vspar_distro[score]
hold_list.append([season,all_rounds[round-1]
,hole,mean_score,score,count_score])
historical_df = pd.DataFrame(hold_list,columns
= ['season','round','hole','mean_score','vspar_score','count'])
This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!
season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2
python python-3.x
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
This is a problem I find a lot!! Can I achieve this goal without consuming so much time?
My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.
PROBLEM:
I want to extract summary data from a larger dataset and I only know how to do so utilizing next For loops. For example, I have a large dataset containing golf data, and I would like to extract summary statistics for the individual golf holes.
This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).
import numpy as np
import pandas as pd
import itertools
seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par
df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)
all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4
hold_list = [] #blank list
for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):
hold_data = df[((df['season'] == season) & (df['hole'] == hole))
& (df['round'].isin(round_ids[round-1]))]
mean_score = hold_data['score'].mean()
vspar_distro = hold_data['score'].value_counts().to_dict()
for score in all_scores:
count_score = 0
if score in vspar_distro:
count_score = vspar_distro[score]
hold_list.append([season,all_rounds[round-1]
,hole,mean_score,score,count_score])
historical_df = pd.DataFrame(hold_list,columns
= ['season','round','hole','mean_score','vspar_score','count'])
This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!
season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2
python python-3.x
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
This is a problem I find a lot!! Can I achieve this goal without consuming so much time?
My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.
PROBLEM:
I want to extract summary data from a larger dataset and I only know how to do so utilizing next For loops. For example, I have a large dataset containing golf data, and I would like to extract summary statistics for the individual golf holes.
This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).
import numpy as np
import pandas as pd
import itertools
seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par
df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)
all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4
hold_list = [] #blank list
for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):
hold_data = df[((df['season'] == season) & (df['hole'] == hole))
& (df['round'].isin(round_ids[round-1]))]
mean_score = hold_data['score'].mean()
vspar_distro = hold_data['score'].value_counts().to_dict()
for score in all_scores:
count_score = 0
if score in vspar_distro:
count_score = vspar_distro[score]
hold_list.append([season,all_rounds[round-1]
,hole,mean_score,score,count_score])
historical_df = pd.DataFrame(hold_list,columns
= ['season','round','hole','mean_score','vspar_score','count'])
This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!
season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2
python python-3.x
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
This is a problem I find a lot!! Can I achieve this goal without consuming so much time?
My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.
PROBLEM:
I want to extract summary data from a larger dataset and I only know how to do so utilizing next For loops. For example, I have a large dataset containing golf data, and I would like to extract summary statistics for the individual golf holes.
This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).
import numpy as np
import pandas as pd
import itertools
seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par
df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)
all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4
hold_list = [] #blank list
for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):
hold_data = df[((df['season'] == season) & (df['hole'] == hole))
& (df['round'].isin(round_ids[round-1]))]
mean_score = hold_data['score'].mean()
vspar_distro = hold_data['score'].value_counts().to_dict()
for score in all_scores:
count_score = 0
if score in vspar_distro:
count_score = vspar_distro[score]
hold_list.append([season,all_rounds[round-1]
,hole,mean_score,score,count_score])
historical_df = pd.DataFrame(hold_list,columns
= ['season','round','hole','mean_score','vspar_score','count'])
This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!
season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2
python python-3.x
python python-3.x
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 mins ago
python_rubepython_rube
1
1
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
0
active
oldest
votes
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
python_rube is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216684%2fiteratively-build-a-summary-dataset-in-an-effective-way%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
python_rube is a new contributor. Be nice, and check out our Code of Conduct.
python_rube is a new contributor. Be nice, and check out our Code of Conduct.
python_rube is a new contributor. Be nice, and check out our Code of Conduct.
python_rube is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216684%2fiteratively-build-a-summary-dataset-in-an-effective-way%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
