ArXiV Technical Paper API Github Repo Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator

What do you call the holes in a flute?

Complexity of many constant time steps with occasional logarithmic steps

Strange behaviour of Check

When is phishing education going too far?

I'm thinking of a number

Did the new image of black hole confirm the general theory of relativity?

Writing Thesis: Copying from published papers

Keep going mode for require-package

Losing the Initialization Vector in Cipher Block Chaining

Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?

Can smartphones with the same camera sensor have different image quality?

Cold is to Refrigerator as warm is to?

How are presidential pardons supposed to be used?

Can a zero nonce be safely used with AES-GCM if the key is random and never used again?

Active filter with series inductor and resistor - do these exist?

Array/tabular for long multiplication

Simulating Exploding Dice

How can I make names more distinctive without making them longer?

Can the prologue be the backstory of your main character?

How do I automatically answer y in bash script?

How can I protect witches in combat who wear limited clothing?

Stars Make Stars

Why is there no army of Iron-Mans in the MCU?

What are the performance impacts of 'functional' Rust?

ArXiV Technical Paper API Github Repo

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API
from urllib import request
from bs4 import BeautifulSoup

class ArXivPully:
 # Removes rogue newline characters from the title and abstract
 def cleanText(self,text):
 return ' '.join(text.split('n'))

 def pullFromArXiv(self,search_query, num_results=10):
 # Fix Input if it has spaces in it
 split_query = search_query.split(' ')
 if(len(split_query) > 1):
 search_query = '%20'.join(split_query)
 url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
 data = request.urlopen(url).read()
 output = []
 soup = BeautifulSoup(data, 'html.parser')
 titles = soup.find_all('title')

 # ArXiv populates the first title value as the search query
 titles.pop(0)

 bodies = soup.find_all('summary')
 links = soup.find_all('link', title='pdf')
 for i in range(len(titles)):
 title = self.cleanText(titles[i].text.strip())
 body = self.cleanText(bodies[i].text.strip())
 pdf_link = links[i]['href']
 output.append([pdf_link, title, body])
 return output

 def on_get(self, req, resp):
 """Handles GET requests"""
 output = []
 for item in req.params.items():
 output.append(self.pullFromArXiv(item[0],item[1]))
 resp.media = output

api = API()
api.add_route('/api/query', ArXivPully())

Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.

asked 18 mins ago

evader110

New contributor

add a comment |

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API
from urllib import request
from bs4 import BeautifulSoup

class ArXivPully:
 # Removes rogue newline characters from the title and abstract
 def cleanText(self,text):
 return ' '.join(text.split('n'))

 def pullFromArXiv(self,search_query, num_results=10):
 # Fix Input if it has spaces in it
 split_query = search_query.split(' ')
 if(len(split_query) > 1):
 search_query = '%20'.join(split_query)
 url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
 data = request.urlopen(url).read()
 output = []
 soup = BeautifulSoup(data, 'html.parser')
 titles = soup.find_all('title')

 # ArXiv populates the first title value as the search query
 titles.pop(0)

 bodies = soup.find_all('summary')
 links = soup.find_all('link', title='pdf')
 for i in range(len(titles)):
 title = self.cleanText(titles[i].text.strip())
 body = self.cleanText(bodies[i].text.strip())
 pdf_link = links[i]['href']
 output.append([pdf_link, title, body])
 return output

 def on_get(self, req, resp):
 """Handles GET requests"""
 output = []
 for item in req.params.items():
 output.append(self.pullFromArXiv(item[0],item[1]))
 resp.media = output

api = API()
api.add_route('/api/query', ArXivPully())

asked 18 mins ago

evader110

New contributor

add a comment |

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API
from urllib import request
from bs4 import BeautifulSoup

class ArXivPully:
 # Removes rogue newline characters from the title and abstract
 def cleanText(self,text):
 return ' '.join(text.split('n'))

 def pullFromArXiv(self,search_query, num_results=10):
 # Fix Input if it has spaces in it
 split_query = search_query.split(' ')
 if(len(split_query) > 1):
 search_query = '%20'.join(split_query)
 url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
 data = request.urlopen(url).read()
 output = []
 soup = BeautifulSoup(data, 'html.parser')
 titles = soup.find_all('title')

 # ArXiv populates the first title value as the search query
 titles.pop(0)

 bodies = soup.find_all('summary')
 links = soup.find_all('link', title='pdf')
 for i in range(len(titles)):
 title = self.cleanText(titles[i].text.strip())
 body = self.cleanText(bodies[i].text.strip())
 pdf_link = links[i]['href']
 output.append([pdf_link, title, body])
 return output

 def on_get(self, req, resp):
 """Handles GET requests"""
 output = []
 for item in req.params.items():
 output.append(self.pullFromArXiv(item[0],item[1]))
 resp.media = output

api = API()
api.add_route('/api/query', ArXivPully())

asked 18 mins ago

evader110

New contributor

Repo: https://github.com/evader110/ArXivPully

Source provided as well:

from falcon import API
from urllib import request
from bs4 import BeautifulSoup

class ArXivPully:
 # Removes rogue newline characters from the title and abstract
 def cleanText(self,text):
 return ' '.join(text.split('n'))

 def pullFromArXiv(self,search_query, num_results=10):
 # Fix Input if it has spaces in it
 split_query = search_query.split(' ')
 if(len(split_query) > 1):
 search_query = '%20'.join(split_query)
 url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
 data = request.urlopen(url).read()
 output = []
 soup = BeautifulSoup(data, 'html.parser')
 titles = soup.find_all('title')

 # ArXiv populates the first title value as the search query
 titles.pop(0)

 bodies = soup.find_all('summary')
 links = soup.find_all('link', title='pdf')
 for i in range(len(titles)):
 title = self.cleanText(titles[i].text.strip())
 body = self.cleanText(bodies[i].text.strip())
 pdf_link = links[i]['href']
 output.append([pdf_link, title, body])
 return output

 def on_get(self, req, resp):
 """Handles GET requests"""
 output = []
 for item in req.params.items():
 output.append(self.pullFromArXiv(item[0],item[1]))
 resp.media = output

api = API()
api.add_route('/api/query', ArXivPully())

python web-scraping api git

asked 18 mins ago

evader110

New contributor

asked 18 mins ago

evader110

New contributor

asked 18 mins ago

evader110

New contributor

asked 18 mins ago

evader110

asked 18 mins ago

evader110

New contributor

evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);

;
$window.on('scroll', onScroll);

);

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

evader110 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttykuu

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

瀋陽號驅逐艦目录接收與服役配置反潛直升機武進三型性能升級歷史除役參考資料外部連結导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编

Memorizing the KeyboardThe Norwegian Foreman''If the B…''The Consonant EaterThe Cherry TreeElle Rend Le Coeur Plus AmoureuxFill in the blanks with the number in wordsState of the UnionFind the missing elementsCircuit DiagramWhat's the name of the game show?

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

瀋陽號驅逐艦 目录 接收與服役 配置反潛直升機 武進三型性能升級 歷史 除役 參考資料 外部連結 导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编

Memorizing the KeyboardThe Norwegian Foreman''If the B…''The Consonant EaterThe Cherry TreeElle Rend Le Coeur Plus AmoureuxFill in the blanks with the number in wordsState of the UnionFind the missing elementsCircuit DiagramWhat's the name of the game show?

瀋陽號驅逐艦目录接收與服役配置反潛直升機武進三型性能升級歷史除役參考資料外部連結导航菜单Taiwan Air Power海疆老兵－陽字號驅逐艦沿革World Navies Today: Taiwan (Republic of China)DD-839 USS POWER编