ArXiV Technical Paper API Github Repo Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator

What do you call the holes in a flute?

Complexity of many constant time steps with occasional logarithmic steps

Strange behaviour of Check

When is phishing education going too far?

I'm thinking of a number

Did the new image of black hole confirm the general theory of relativity?

Writing Thesis: Copying from published papers

Keep going mode for require-package

Losing the Initialization Vector in Cipher Block Chaining

Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?

Can smartphones with the same camera sensor have different image quality?

Cold is to Refrigerator as warm is to?

How are presidential pardons supposed to be used?

Can a zero nonce be safely used with AES-GCM if the key is random and never used again?

Active filter with series inductor and resistor - do these exist?

Array/tabular for long multiplication

Simulating Exploding Dice

How can I make names more distinctive without making them longer?

Can the prologue be the backstory of your main character?

How do I automatically answer y in bash script?

How can I protect witches in combat who wear limited clothing?

Stars Make Stars

Why is there no army of Iron-Mans in the MCU?

What are the performance impacts of 'functional' Rust?



ArXiV Technical Paper API Github Repo



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Clone GitHub repository using PythonGitHub Auto Copy Committer via APIGitHub API clientPython Politico API attemptAutomating the download of a GitHub repoTaking YouTube links out of a list of GitHub repo READMEsElegantly handle github API requests exceptionsRust GitHub repository downloaderAnsible scanning Github branchesGitHub repo tree generator



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.



Repo: https://github.com/evader110/ArXivPully



Source provided as well:



from falcon import API
from urllib import request
from bs4 import BeautifulSoup

class ArXivPully:
# Removes rogue newline characters from the title and abstract
def cleanText(self,text):
return ' '.join(text.split('n'))

def pullFromArXiv(self,search_query, num_results=10):
# Fix Input if it has spaces in it
split_query = search_query.split(' ')
if(len(split_query) > 1):
search_query = '%20'.join(split_query)
url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
data = request.urlopen(url).read()
output = []
soup = BeautifulSoup(data, 'html.parser')
titles = soup.find_all('title')

# ArXiv populates the first title value as the search query
titles.pop(0)

bodies = soup.find_all('summary')
links = soup.find_all('link', title='pdf')
for i in range(len(titles)):
title = self.cleanText(titles[i].text.strip())
body = self.cleanText(bodies[i].text.strip())
pdf_link = links[i]['href']
output.append([pdf_link, title, body])
return output

def on_get(self, req, resp):
"""Handles GET requests"""
output = []
for item in req.params.items():
output.append(self.pullFromArXiv(item[0],item[1]))
resp.media = output

api = API()
api.add_route('/api/query', ArXivPully())


Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.










share|improve this question







New contributor




evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    1












    $begingroup$


    My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.



    Repo: https://github.com/evader110/ArXivPully



    Source provided as well:



    from falcon import API
    from urllib import request
    from bs4 import BeautifulSoup

    class ArXivPully:
    # Removes rogue newline characters from the title and abstract
    def cleanText(self,text):
    return ' '.join(text.split('n'))

    def pullFromArXiv(self,search_query, num_results=10):
    # Fix Input if it has spaces in it
    split_query = search_query.split(' ')
    if(len(split_query) > 1):
    search_query = '%20'.join(split_query)
    url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
    data = request.urlopen(url).read()
    output = []
    soup = BeautifulSoup(data, 'html.parser')
    titles = soup.find_all('title')

    # ArXiv populates the first title value as the search query
    titles.pop(0)

    bodies = soup.find_all('summary')
    links = soup.find_all('link', title='pdf')
    for i in range(len(titles)):
    title = self.cleanText(titles[i].text.strip())
    body = self.cleanText(bodies[i].text.strip())
    pdf_link = links[i]['href']
    output.append([pdf_link, title, body])
    return output

    def on_get(self, req, resp):
    """Handles GET requests"""
    output = []
    for item in req.params.items():
    output.append(self.pullFromArXiv(item[0],item[1]))
    resp.media = output

    api = API()
    api.add_route('/api/query', ArXivPully())


    Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.










    share|improve this question







    New contributor




    evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      1












      1








      1





      $begingroup$


      My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.



      Repo: https://github.com/evader110/ArXivPully



      Source provided as well:



      from falcon import API
      from urllib import request
      from bs4 import BeautifulSoup

      class ArXivPully:
      # Removes rogue newline characters from the title and abstract
      def cleanText(self,text):
      return ' '.join(text.split('n'))

      def pullFromArXiv(self,search_query, num_results=10):
      # Fix Input if it has spaces in it
      split_query = search_query.split(' ')
      if(len(split_query) > 1):
      search_query = '%20'.join(split_query)
      url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
      data = request.urlopen(url).read()
      output = []
      soup = BeautifulSoup(data, 'html.parser')
      titles = soup.find_all('title')

      # ArXiv populates the first title value as the search query
      titles.pop(0)

      bodies = soup.find_all('summary')
      links = soup.find_all('link', title='pdf')
      for i in range(len(titles)):
      title = self.cleanText(titles[i].text.strip())
      body = self.cleanText(bodies[i].text.strip())
      pdf_link = links[i]['href']
      output.append([pdf_link, title, body])
      return output

      def on_get(self, req, resp):
      """Handles GET requests"""
      output = []
      for item in req.params.items():
      output.append(self.pullFromArXiv(item[0],item[1]))
      resp.media = output

      api = API()
      api.add_route('/api/query', ArXivPully())


      Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.










      share|improve this question







      New contributor




      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      My name is Ethan and I am trying to build an API for scaping technical papers to use for developers. Right now it only works for ArXiV but I would greatly appreciate some mentoring or a code review of my repo. I am a new developer and want to get my code to professional quality.



      Repo: https://github.com/evader110/ArXivPully



      Source provided as well:



      from falcon import API
      from urllib import request
      from bs4 import BeautifulSoup

      class ArXivPully:
      # Removes rogue newline characters from the title and abstract
      def cleanText(self,text):
      return ' '.join(text.split('n'))

      def pullFromArXiv(self,search_query, num_results=10):
      # Fix Input if it has spaces in it
      split_query = search_query.split(' ')
      if(len(split_query) > 1):
      search_query = '%20'.join(split_query)
      url = 'https://export.arxiv.org/api/query?search_query=all:'+search_query+'&start=0&max_results='+str(num_results)
      data = request.urlopen(url).read()
      output = []
      soup = BeautifulSoup(data, 'html.parser')
      titles = soup.find_all('title')

      # ArXiv populates the first title value as the search query
      titles.pop(0)

      bodies = soup.find_all('summary')
      links = soup.find_all('link', title='pdf')
      for i in range(len(titles)):
      title = self.cleanText(titles[i].text.strip())
      body = self.cleanText(bodies[i].text.strip())
      pdf_link = links[i]['href']
      output.append([pdf_link, title, body])
      return output

      def on_get(self, req, resp):
      """Handles GET requests"""
      output = []
      for item in req.params.items():
      output.append(self.pullFromArXiv(item[0],item[1]))
      resp.media = output

      api = API()
      api.add_route('/api/query', ArXivPully())


      Some design explanations. I run this API through Google Cloud Platform using Falcon API because both options are free for me and were the simplest to implement. Some known issues are already posted in the repo but I want to better understand Software Development skills, best practices, etc. I greatly appreciate any tips big or small and I look forward to drastically changing this source code to make it more robust.







      python web-scraping api git






      share|improve this question







      New contributor




      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 18 mins ago









      evader110evader110

      61




      61




      New contributor




      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      evader110 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          evader110 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          evader110 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          evader110 is a new contributor. Be nice, and check out our Code of Conduct.












          evader110 is a new contributor. Be nice, and check out our Code of Conduct.











          evader110 is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217461%2farxiv-technical-paper-api-github-repo%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          名間水力發電廠 目录 沿革 設施 鄰近設施 註釋 外部連結 导航菜单23°50′10″N 120°42′41″E / 23.83611°N 120.71139°E / 23.83611; 120.7113923°50′10″N 120°42′41″E / 23.83611°N 120.71139°E / 23.83611; 120.71139計畫概要原始内容臺灣第一座BOT 模式開發的水力發電廠-名間水力電廠名間水力發電廠 水利署首件BOT案原始内容《小檔案》名間電廠 首座BOT水力發電廠原始内容名間電廠BOT - 經濟部水利署中區水資源局

          Prove that NP is closed under karp reduction?Space(n) not closed under Karp reductions - what about NTime(n)?Class P is closed under rotation?Prove or disprove that $NL$ is closed under polynomial many-one reductions$mathbfNC_2$ is closed under log-space reductionOn Karp reductionwhen can I know if a class (complexity) is closed under reduction (cook/karp)Check if class $PSPACE$ is closed under polyonomially space reductionIs NPSPACE also closed under polynomial-time reduction and under log-space reduction?Prove PSPACE is closed under complement?Prove PSPACE is closed under union?

          Is my guitar’s action too high? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)Strings too stiff on a recently purchased acoustic guitar | Cort AD880CEIs the action of my guitar really high?Μy little finger is too weak to play guitarWith guitar, how long should I give my fingers to strengthen / callous?When playing a fret the guitar sounds mutedPlaying (Barre) chords up the guitar neckI think my guitar strings are wound too tight and I can't play barre chordsF barre chord on an SG guitarHow to find to the right strings of a barre chord by feel?High action on higher fret on my steel acoustic guitar