博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【优达学城测评】P3: Wrangle OpenStreetMap Data--Wrangling JSON(2)
阅读量:6474 次
发布时间:2019-06-23

本文共 3794 字,大约阅读时间需要 12 分钟。

hot3.png

#!/usr/bin/env python

# -*- coding: utf-8 -*-
"""
This exercise shows some important concepts that you should be aware about:
- using codecs module to write unicode files
- using authentication with web APIs
- using offset when accessing web APIs

To run this code locally you have to register at the NYTimes developer site 

and get your own API key. You will be able to complete this exercise in our UI
without doing so, as we have provided a sample result.

Your task is to process the saved file that represents the most popular

articles (by view count) from the last day, and return the following data:
- list of dictionaries, where the dictionary key is "section" and value is "title"
- list of URLs for all media entries with "format": "Standard Thumbnail"

All your changes should be in the article_overview function.

The rest of functions are provided for your convenience, if you want to access
the API by yourself.
"""
import json
import codecs
import requests

URL_MAIN = "http://api.nytimes.com/svc/"

URL_POPULAR = URL_MAIN + "mostpopular/v2/"
API_KEY = { "popular": "",
            "article": ""}

def get_from_file(kind, period):
    filename = "popular-{0}-{1}.json".format(kind, period)
    with open(filename, "r") as f:
        return json.loads(f.read())

def article_overview(kind, period):
    data = get_from_file(kind, period)
    titles = []
    urls =[]
    def article_overview(kind, period):
    data = get_from_file(kind, period)
    titles = []
    urls =[]

    for article in data:

        section = article["section"]
        title = article["title"]
        titles.append({section: title})
        if "media" in article:
            for m in article["media"]:
                for mm in m["media-metadata"]:
                    if mm["format"] == "Standard Thumbnail":
                        urls.append(mm["url"])
    return (titles, urls)

    return (titles, urls)

def query_site(url, target, offset):
    # This will set up the query with the API key and offset
    # Web services often use offset paramter to return data in small chunks
    # NYTimes returns 20 articles per request, if you want the next 20
    # You have to provide the offset parameter
    if API_KEY["popular"] == "" or API_KEY["article"] == "":
        print "You need to register for NYTimes Developer account to run this program."
        print "See Intructor notes for information"
        return False
    params = {"api-key": API_KEY[target], "offset": offset}
    r = requests.get(url, params = params)

    if r.status_code == requests.codes.ok:

        return r.json()
    else:
        r.raise_for_status()

def get_popular(url, kind, days, section="all-sections", offset=0):
    # This function will construct the query according to the requirements of the site
    # and return the data, or print an error message if called incorrectly
    if days not in [1,7,30]:
        print "Time period can be 1,7, 30 days only"
        return False
    if kind not in ["viewed", "shared", "emailed"]:
        print "kind can be only one of viewed/shared/emailed"
        return False

    url += "most{0}/{1}/{2}.json".format(kind, section, days)

    data = query_site(url, "popular", offset)

    return data

def save_file(kind, period):
    # This will process all results, by calling the API repeatedly with supplied offset value,
    # combine the data and then write all results in a file.
    data = get_popular(URL_POPULAR, "viewed", 1)
    num_results = data["num_results"]
    full_data = []
    with codecs.open("popular-{0}-{1}.json".format(kind, period), encoding='utf-8', mode='w') as v:
        for offset in range(0, num_results, 20):        
            data = get_popular(URL_POPULAR, kind, period, offset=offset)
            full_data += data["results"]
        
        v.write(json.dumps(full_data, indent=2))

def test():
    titles, urls = article_overview("viewed", 1)
    assert len(titles) == 20
    assert len(urls) == 30
    assert titles[2] == {'Opinion': 'Professors, We Need You!'}
    assert urls[20] == 'http://graphics8.nytimes.com/images/2014/02/17/sports/ICEDANCE/ICEDANCE-thumbStandard.jpg'

if __name__ == "__main__":
    test()

转载于:https://my.oschina.net/Bettyty/blog/756382

你可能感兴趣的文章
Touch Handling in Cocos2D 3.x(二)
查看>>
QT3.2版本QMessageBox字符串过长不能显示的bug修复
查看>>
干货--Redis+Spring+Struts2实现网站计算器应用项目案例
查看>>
程序员需要有多懒 ?- cocos2d-x 数学函数、常用宏粗整理
查看>>
使用 JSONModel
查看>>
一篇笔记整理JVM工作原理
查看>>
设计模式六大原则(1):单一职责原则
查看>>
Swift学习第二练——Swift项目时光电影
查看>>
Xcode中Groups和Folder的区别
查看>>
android 实现倒影
查看>>
在php扩展开发中如何使用pkg-config
查看>>
【软考视频】数据结构
查看>>
YourSQLDba设置共享路径备份
查看>>
DBImport v3.44 中文版发布:数据库数据互导及文档生成工具(IT人员必备)
查看>>
hibernate总结-缓存
查看>>
(一三九)静态联编和动态联编
查看>>
Magento(社区版)自带模块解析以及在国内的使用建议一
查看>>
Android 使用ViewPager实现类似gallery画廊的效果(画廊效果之ViewPager显示多个图片)...
查看>>
自定义的一个日历Calender
查看>>
四川麻将地胡
查看>>