MENU

【笔记】PHP file_get_contents方式写查询ICP备案API

2020 年 12 月 29 日 • 阅读: 793 • 笔记

不会正则,只能以这种方式写这个API了

准备:

  • 查询域名备案信息的网站(查询不需要验证码) 这里用的是 icp.chinaz.com
  • 能查看源代码的浏览器(废话)

先上代码:

<?php
header('content-type:application/json;charset=utf8');
$domain = $_GET['domain'];
$str = file_get_contents('http://icp.chinaz.com/'.$domain);
function GetBetween($str,$start,$end){
    $r = explode($start, $str);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}
$unitname1 = GetBetween($str,'<span>主办单位名称</span>
                            <p>','</p>');
function GetBetween1($unitname1,$start,$end){
    $p = explode($start, $unitname1);
    if (isset($p[1])){
        $p = explode($end, $p[1]);
        return $p[0];
    }
    return '';
}
$unitname = GetBetween($unitname1,'">','</a>');
$nature = GetBetween($str,"<span>主办单位性质</span>
                            <p><strong class=\"fl fwnone\">","</strong></p>");
$icp = GetBetween($str,"<span>网站备案/许可证号</span>
                            <p><font>","</font>");
$sitename = GetBetween($str,'<span>网站名称</span>
                            <p>','</p>');
$url = GetBetween($str,'<span>网站首页网址</span>
                            <p class="Wzno">','</p>');
$check = GetBetween($str,'<span>审核时间</span>
                            <p>','</p>');
$result = [
    '主办单位名称' => $unitname,
    '主办单位性质' => $nature,
    '网站备案/许可证号' => $icp,
    '网站名称' => $sitename,
    '网站首页网址' => $url,
    '审核时间' => $check,
    ];
die(json_encode($result,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES|JSON_PRETTY_PRINT)); //json返回
echo $result;
?>

1.找到对应信息及源代码
pic5
pic6

2.定义PHP函数GetBetween(截取两字符之间的内容)

function GetBetween($str,$start,$end){
    $r = explode($start, $str);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}

当然前一段代码也不能少呀(要有$str啦)。在这里

$domain = $_GET['domain']; //获取参数域名(domain)
$str = file_get_contents('http://icp.chinaz.com/'.$domain); //读取网页内容

3.截取想要的信息

$nature = GetBetween($str,"<span>主办单位性质</span>
                            <p><strong class=\"fl fwnone\">","</strong></p>");
$icp = GetBetween($str,"<span>网站备案/许可证号</span>
                            <p><font>","</font>");
$sitename = GetBetween($str,'<span>网站名称</span>
                            <p>','</p>');
$url = GetBetween($str,'<span>网站首页网址</span>
                            <p class="Wzno">','</p>');
$check = GetBetween($str,'<span>审核时间</span>
                            <p>','</p>');

问: 为啥没有截取主体单位名称?

答: 因为主体单位名称是一个a标签 且href值中含有与域名对应的变量(qq.com这个对应的是深圳市腾讯计算机系统有限公司 )下面这样去除href的值或者href的值用(.*?)进行取值 尝试不成功! 第4步解决这个问题

$unitname1 = GetBetween($str,'<span>主办单位名称</span>
                            <p>
                                <a target="_blank" href="">','</a>
                            </p>');

4.所以换了一个方法解决这个问题(先将a标签整体取出 在从a标签中取出想要的值(深圳市腾讯计算机系统有限公司))

function GetBetween($str,$start,$end){
    $r = explode($start, $str);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}
$unitname1 = GetBetween($str,'<span>主办单位名称</span>
                            <p>','</p>'); //这里取出了整个a标签
function GetBetween1($unitname1,$start,$end){
    $p = explode($start, $unitname1);
    if (isset($p[1])){
        $p = explode($end, $p[1]);
        return $p[0];
    }
    return '';
}
$unitname = GetBetween($unitname1,'">','</a>'); //这里取出了想要的值

5.将所有取出的值放入数组

$result = [
    '主办单位名称' => $unitname,
    '主办单位性质' => $nature,
    '网站备案/许可证号' => $icp,
    '网站名称' => $sitename,
    '网站首页网址' => $url,
    '审核时间' => $check,
    ];

6.这一步是为了将第7步的返回json格式化显示

header('content-type:application/json;charset=utf8');

7.json返回

die(json_encode($result,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES|JSON_PRETTY_PRINT)); //json返回
echo $result;

8.查询方式:
访问:
http://你的域名/xxx/index.php?domain=qq.com or http://你的域名/xxx?domain=qq.com (这种前提是楼上php代码的文件是默认文档)
示例:
https://api.jerryiweb.com/api/icp?domain=baidu.com
结果:

{
    "主办单位名称": "北京百度网讯科技有限公司",
    "主办单位性质": "企业",
    "网站备案/许可证号": "京ICP证030173号-1",
    "网站名称": "百度",
    "网站首页网址": "baidu.com",
    "审核时间": "2020-11-13"
}
最后编辑于: 2021 年 03 月 05 日
返回文章列表 文章二维码
本页链接的二维码
打赏二维码