ArXiV Technical Paper API Github Repo Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator
What do you call the holes in a flute?
Complexity of many constant time steps with occasional logarithmic steps
Strange behaviour of Check
When is phishing education going too far?
I'm thinking of a number
Did the new image of black hole confirm the general theory of relativity?
Writing Thesis: Copying from published papers
Keep going mode for require-package
Losing the Initialization Vector in Cipher Block Chaining
Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?
Can smartphones with the same camera sensor have different image quality?
Cold is to Refrigerator as warm is to?
How are presidential pardons supposed to be used?
Can a zero nonce be safely used with AES-GCM if the key is random and never used again?
Active filter with series inductor and resistor - do these exist?
Array/tabular for long multiplication
Simulating Exploding Dice
How can I make names more distinctive without making them longer?
Can the prologue be the backstory of your main character?
How do I automatically answer y in bash script?
How can I protect witches in combat who wear limited clothing?
Stars Make Stars
Why is there no army of Iron-Mans in the MCU?
What are the performance impacts of 'functional' Rust?
ArXiV Technical Paper API Github Repo
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.
Repo: https://github.com/evader110/ArXivPully
Source provided as well:
from falcon import API
from urllib import request
from bs4 import BeautifulSoup
class ArXivPully:
# Removes rogue newline characters from the title and abstract
def cleanText(self,text):
return ' '.join(text.split('n'))
def pullFromArXiv(self,search_query, num_results=10):
# Fix Input if it has spaces in it
split_query = search_query.split(' ')
if(len(split_query) > 1):
search_query = '%20'.join(split_query)
url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
data = request.urlopen(url).read()
output = []
soup = BeautifulSoup(data, 'html.parser')
titles = soup.find_all('title')
# ArXiv populates the first title value as the search query
titles.pop(0)
bodies = soup.find_all('summary')
links = soup.find_all('link', title='pdf')
for i in range(len(titles)):
title = self.cleanText(titles[i].text.strip())
body = self.cleanText(bodies[i].text.strip())
pdf_link = links[i]['href']
output.append([pdf_link, title, body])
return output
def on_get(self, req, resp):
"""Handles GET requests"""
output = []
for item in req.params.items():
output.append(self.pullFromArXiv(item[0],item[1]))
resp.media = output
api = API()
api.add_route('/api/query', ArXivPully())
Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.
python web-scraping api git
New contributor
$endgroup$
add a comment |
$begingroup$
My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.
Repo: https://github.com/evader110/ArXivPully
Source provided as well:
from falcon import API
from urllib import request
from bs4 import BeautifulSoup
class ArXivPully:
# Removes rogue newline characters from the title and abstract
def cleanText(self,text):
return ' '.join(text.split('n'))
def pullFromArXiv(self,search_query, num_results=10):
# Fix Input if it has spaces in it
split_query = search_query.split(' ')
if(len(split_query) > 1):
search_query = '%20'.join(split_query)
url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
data = request.urlopen(url).read()
output = []
soup = BeautifulSoup(data, 'html.parser')
titles = soup.find_all('title')
# ArXiv populates the first title value as the search query
titles.pop(0)
bodies = soup.find_all('summary')
links = soup.find_all('link', title='pdf')
for i in range(len(titles)):
title = self.cleanText(titles[i].text.strip())
body = self.cleanText(bodies[i].text.strip())
pdf_link = links[i]['href']
output.append([pdf_link, title, body])
return output
def on_get(self, req, resp):
"""Handles GET requests"""
output = []
for item in req.params.items():
output.append(self.pullFromArXiv(item[0],item[1]))
resp.media = output
api = API()
api.add_route('/api/query', ArXivPully())
Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.
python web-scraping api git
New contributor
$endgroup$
add a comment |
$begingroup$
My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.
Repo: https://github.com/evader110/ArXivPully
Source provided as well:
from falcon import API
from urllib import request
from bs4 import BeautifulSoup
class ArXivPully:
# Removes rogue newline characters from the title and abstract
def cleanText(self,text):
return ' '.join(text.split('n'))
def pullFromArXiv(self,search_query, num_results=10):
# Fix Input if it has spaces in it
split_query = search_query.split(' ')
if(len(split_query) > 1):
search_query = '%20'.join(split_query)
url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
data = request.urlopen(url).read()
output = []
soup = BeautifulSoup(data, 'html.parser')
titles = soup.find_all('title')
# ArXiv populates the first title value as the search query
titles.pop(0)
bodies = soup.find_all('summary')
links = soup.find_all('link', title='pdf')
for i in range(len(titles)):
title = self.cleanText(titles[i].text.strip())
body = self.cleanText(bodies[i].text.strip())
pdf_link = links[i]['href']
output.append([pdf_link, title, body])
return output
def on_get(self, req, resp):
"""Handles GET requests"""
output = []
for item in req.params.items():
output.append(self.pullFromArXiv(item[0],item[1]))
resp.media = output
api = API()
api.add_route('/api/query', ArXivPully())
Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.
python web-scraping api git
New contributor
$endgroup$
My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.
Repo: https://github.com/evader110/ArXivPully
Source provided as well:
from falcon import API
from urllib import request
from bs4 import BeautifulSoup
class ArXivPully:
# Removes rogue newline characters from the title and abstract
def cleanText(self,text):
return ' '.join(text.split('n'))
def pullFromArXiv(self,search_query, num_results=10):
# Fix Input if it has spaces in it
split_query = search_query.split(' ')
if(len(split_query) > 1):
search_query = '%20'.join(split_query)
url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
data = request.urlopen(url).read()
output = []
soup = BeautifulSoup(data, 'html.parser')
titles = soup.find_all('title')
# ArXiv populates the first title value as the search query
titles.pop(0)
bodies = soup.find_all('summary')
links = soup.find_all('link', title='pdf')
for i in range(len(titles)):
title = self.cleanText(titles[i].text.strip())
body = self.cleanText(bodies[i].text.strip())
pdf_link = links[i]['href']
output.append([pdf_link, title, body])
return output
def on_get(self, req, resp):
"""Handles GET requests"""
output = []
for item in req.params.items():
output.append(self.pullFromArXiv(item[0],item[1]))
resp.media = output
api = API()
api.add_route('/api/query', ArXivPully())
Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.
python web-scraping api git
python web-scraping api git
New contributor
New contributor
New contributor
asked 18 mins ago
evader110evader110
61
61
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
evader110 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
evader110 is a new contributor. Be nice, and check out our Code of Conduct.
evader110 is a new contributor. Be nice, and check out our Code of Conduct.
evader110 is a new contributor. Be nice, and check out our Code of Conduct.
evader110 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown