When making a call to create or update a job, the client is not closing TCP connections after the request to the api endpoint.
Running the following code demonstrates the issue:
$diffbot = new Diffbot("xxxxxxxxx");
$job = $diffbot->crawl("jonathan_test");
$job->setSeeds(["http://www.example.com"])->setMaxToCrawl(100)
->setMaxToProcess(100)->setMaxRounds(1)
->setOnlyProcessIfNew(1)->setMaxHops(3);
$api
= $diffbot->createArticleAPI('crawl')->setMeta(true)->setDiscussion(false)
->setQuerystring(true)
;
$job->setApi($api);
$x = $job->call();
sleep(100);
Now the socket remains open until the process quits:
$ netstat -an| grep 443
tcp 0 0 192.168.22.214:50844 35.192.184.37:443 TIME_WAIT
While not too bad for a single socket, if you create a lot of diffbot objects using new Diffbot(), you can quickly run out of open files on the system as the sockets aren't closed even when the object falls out of scope.
When making a call to create or update a job, the client is not closing TCP connections after the request to the api endpoint.
Running the following code demonstrates the issue:
Now the socket remains open until the process quits:
While not too bad for a single socket, if you create a lot of diffbot objects using
new Diffbot(), you can quickly run out of open files on the system as the sockets aren't closed even when the object falls out of scope.