如何自动抓取结果的多个页面？

已于 2025年4月17日编辑

问题

如何自动抓取结果的多个页面？

答案

有时您会觉得抓取几页用户结果并将其放在一个 JSON 文件中很方便。我创建了一个php 脚本来进行此操作，该脚本附加并粘贴在下面。

注释： Zendesk 提供此提示仅用于教学目的。Zendesk 不支持或保证此信息或任何代码范例。Zendesk 也无法提供对第三方技术，如php 的支持。在每个提示的评论部分发布任何问题，或在线搜索解决方案。

<?php

$resource = ''; // put the name of the endpoint here, for example organizations, tickets
$start_page = 1; // first page you want to load 
$end_page = 10; // last page you want to load (it could take a while if too big)
$subdomain = ''; // your zendesk subdomain
$userpwd = ''; // your zendesk username and password, or username/token and api token.  -  {username}/token:{token} or {username}:{password}

function makerequest($url, $userpwd) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERPWD, $userpwd); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_VERBOSE, TRUE); 
    $result = curl_exec($ch);
    
    if (curl_errno($ch)) $result = curl_error($ch);
    curl_close($ch);
    return $result;
}

function getpages($resource, $start_page, $end_page, $userpwd, $subdomain){

    $url = isset($url) ? $url : "https://$subdomain.zendesk.com/api/v2/$resource.json";
    $i = $start_page;
    $array = Array();
    while (1) {
        if ($i > $end_page) {
            $file = fopen("results.json", "w") or exit("Unable to open file!");
            fwrite($file, json_encode(array($resource => $array)));
            fclose($file);
            break;
        }
        $results = json_decode(makerequest($url, $userpwd));
        $array = array_merge($array, $results->$resource);
        $url = $results->next_page;
        $i++;        
    }    
}
getpages($resource, $start_page, $end_page, $userpwd, $subdomain);

?>

下载此脚本或复制粘贴到一个名为 paginate.php的文件中。如脚本评论中所述，在开始时填写变量，$resource、$START_page、$end_page、$subdomain和 $userpwd。

然后，导航到您下载文件的文件夹，并从您的终端运行命令 php web.php 以创建带有结果的新文件，在一个名为 结果.json的文件中。该文件旨在覆盖而不是附加。

收到结果后，您可以按需进行处理。此信息对于查找数组中的哪些条目是唯一的还是重复的条目可能很有用。要这样做，将结果输出到结果 .json 文件中，然后运行以下两个命令中的一个，第一个显示唯一结果的计数，第二个显示实际的唯一值：

cat results.json | jq '.' | grep '"id"' > sort > uniq -c > uniq.txt
cat results.json | jq '.' | grep '"id"' > sort > uniq > uniq.txt

使用这些命令还需要您有 jq，可从以下地址下载： https://stable.GitHub.io/jq/down/。

此外，此信息可用于导出数据以导入新系统。

翻译免责声明：本文章使用自动翻译软件翻译，以便您了解基本内容。我们已采取合理措施提供准确翻译，但不保证翻译准确性

如对翻译准确性有任何疑问，请以文章的英语版本为准。

2 KB

1023 字节

2 条评论

日期

最新

Brett Bowser

2021年1月21日

Thanks for taking the time to share this with everyone Seneca :)

Seneca Spurling

2021年1月21日

It seems that CURLOPT_SSL_VERIFYHOST now needs to be set to 2.

If you don't already have php-curl installed, you'll need that package as well.

If you run it and get a zero-length output file, try changing the $end_page to 2. If you ask for a page that doesn't exist it overwrites the file with nothing at all. However it seems to handle things correctly up to then. You get 100 records per page, so for example if you have 650 records, you'll need to set $end_page to 6.
If you're getting organizations, for example, you can use something like this to get the count so you know how many pages to get. The count may not be accurate over 100,000. See https://developer.zendesk.com/rest_api/docs/support/organizations.

curl https://{subdomain}.zendesk.com/api/v2/organizations/count.json \
  -v -u {email_address}:{password}

I know this is just an example and not meant to be robust. This isn't a complaint or request for changes, just adding these comments here in case they help someone like myself in the future.

如何自动抓取结果的多个页面？

问题

答案

2 条评论

帐户详情

登录查看详情

其他内容

补充说明

Tips to use API and SDK

常见主题

基于用户角色的指南

其他资源

如何自动抓取结果的多个页面？

问题

答案

2 条评论

帐户详情

登录查看详情

其他内容

补充说明

Tips to use API and SDK

相关文章