Iteratively Build a Summary Dataset in an Effective Way The Next CEO of Stack Overflow

Avoiding the "not like other girls" trope?

The sum of any ten consecutive numbers from a fibonacci sequence is divisible by 11

Simplify trigonometric expression using trigonometric identities

How to show a landlord what we have in savings?

Physiological effects of huge anime eyes

Noise during hard braking

What difference does it make matching a word with/without a trailing whitespace?

How exploitable/balanced is this homebrew spell: Spell Permenancy?

How do I secure a TV wall mount?

Free fall ellipse or parabola?

Strange use of "whether ... than ..." in official text

Is there a rule of thumb for determining the amount one should accept for of a settlement offer?

How can I force the size of an int for debugging purposes?

What is Decreasing Arithmetic progression?

Is a distribution that is normal, but highly skewed, considered Gaussian?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Calculate the Mean mean of two numbers

What does this strange code stamp on my passport mean?

What are the unusually-enlarged wing sections on this P-38 Lightning?

Could a dragon use its wings to swim?

Percent Dissociated from Titration Curve

Why was Sir Cadogan fired?

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

Words hidden in my phone number

Iteratively Build a Summary Dataset in an Effective Way

The Next CEO of Stack Overflow

This is a problem I find a lot!! Can I achieve this goal without consuming so much time?

My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.

PROBLEM:
I want to extract summary data from a larger dataset and I only know how to do so utilizing next For loops. For example, I have a large dataset containing golf data, and I would like to extract summary statistics for the individual golf holes.

This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).

import numpy as np
import pandas as pd
import itertools

seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par

df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)


all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4

hold_list = [] #blank list

for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):

 hold_data = df[((df['season'] == season) & (df['hole'] == hole))
 & (df['round'].isin(round_ids[round-1]))]

 mean_score = hold_data['score'].mean()
 vspar_distro = hold_data['score'].value_counts().to_dict()
 for score in all_scores:
 count_score = 0
 if score in vspar_distro:
 count_score = vspar_distro[score]
 hold_list.append([season,all_rounds[round-1]
 ,hole,mean_score,score,count_score])


historical_df = pd.DataFrame(hold_list,columns 
 = ['season','round','hole','mean_score','vspar_score','count'])

This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!

 season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2

asked 2 mins ago

python_rube

New contributor

add a comment |

This is a problem I find a lot!! Can I achieve this goal without consuming so much time?

My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.

This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).

import numpy as np
import pandas as pd
import itertools

seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par

df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)


all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4

hold_list = [] #blank list

for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):

 hold_data = df[((df['season'] == season) & (df['hole'] == hole))
 & (df['round'].isin(round_ids[round-1]))]

 mean_score = hold_data['score'].mean()
 vspar_distro = hold_data['score'].value_counts().to_dict()
 for score in all_scores:
 count_score = 0
 if score in vspar_distro:
 count_score = vspar_distro[score]
 hold_list.append([season,all_rounds[round-1]
 ,hole,mean_score,score,count_score])


historical_df = pd.DataFrame(hold_list,columns 
 = ['season','round','hole','mean_score','vspar_score','count'])

This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!

 season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2

asked 2 mins ago

python_rube

New contributor

add a comment |

This is a problem I find a lot!! Can I achieve this goal without consuming so much time?

My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.

This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).

import numpy as np
import pandas as pd
import itertools

seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par

df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)


all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4

hold_list = [] #blank list

for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):

 hold_data = df[((df['season'] == season) & (df['hole'] == hole))
 & (df['round'].isin(round_ids[round-1]))]

 mean_score = hold_data['score'].mean()
 vspar_distro = hold_data['score'].value_counts().to_dict()
 for score in all_scores:
 count_score = 0
 if score in vspar_distro:
 count_score = vspar_distro[score]
 hold_list.append([season,all_rounds[round-1]
 ,hole,mean_score,score,count_score])


historical_df = pd.DataFrame(hold_list,columns 
 = ['season','round','hole','mean_score','vspar_score','count'])

This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!

 season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2

asked 2 mins ago

python_rube

New contributor

This is a problem I find a lot!! Can I achieve this goal without consuming so much time?

My code below achieves what I want it to achieve. However, I believe it could be a lot more efficient and Pythonic.

This code creates a scoring distribution and mean score for each Season-Hole-Round-Score vs. Par combination (48 rows in total).

import numpy as np
import pandas as pd
import itertools

seasons = [2001,2001,2001,2001,2002,2002,2002,2002]
holes = [1,1,2,2,1,1,2,2]
rounds = [3,4,3,4,3,4,3,4]
scores = [1,-1,0,0,0,1,-1,1] # actual scores vs. par

df = pd.DataFrame('season' : seasons, 'hole': holes, 'round':rounds, 'score': scores)


all_seasons = set(seasons); all_holes = set(holes); all_scores = [-1,0,1]
all_rounds = ["R3","R4","Weekend"] #some averages combine rounds
round_iter = np.arange(0,4) #position of rounds list
round_ids = [[3],[4],[3,4]] # weekend incldues rounds 3 and 4

hold_list = [] #blank list

for season,round,hole in itertools.product(all_seasons,round_iter,all_holes):

 hold_data = df[((df['season'] == season) & (df['hole'] == hole))
 & (df['round'].isin(round_ids[round-1]))]

 mean_score = hold_data['score'].mean()
 vspar_distro = hold_data['score'].value_counts().to_dict()
 for score in all_scores:
 count_score = 0
 if score in vspar_distro:
 count_score = vspar_distro[score]
 hold_list.append([season,all_rounds[round-1]
 ,hole,mean_score,score,count_score])


historical_df = pd.DataFrame(hold_list,columns 
 = ['season','round','hole','mean_score','vspar_score','count'])

This produces the df that I desire (here are the first 5 rows), but applying this to a file with 100k+ records takes a long time and I believe there is a more efficient way. Thanks!

 season round hole mean_score vspar_score count
0 2001 Weekend 1 0.0 -1 1
1 2001 Weekend 1 0.0 0 0
2 2001 Weekend 1 0.0 1 1
3 2001 Weekend 2 0.0 -1 0
4 2001 Weekend 2 0.0 0 2

python python-3.x

asked 2 mins ago

python_rube

New contributor

asked 2 mins ago

python_rube

New contributor

asked 2 mins ago

python_rube

New contributor

asked 2 mins ago

python_rube

asked 2 mins ago

python_rube

New contributor

python_rube is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

python_rube is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);

;
$window.on('scroll', onScroll);

);

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216684%2fiteratively-build-a-summary-dataset-in-an-effective-way%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

python_rube is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

python_rube is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttykuu