This powerful function allows you to solve any captcha without having to identify it. To do this, Grimport creates a split-screen session between a specific section of your PC screen and the device of a human agent solving the captcha. The advantage is that mouse movements and keyboard typing are faithfully reproduced as if the captcha solver were actually behind the screen and the fingerprint is not altered (the solver does not pass through a proxy, there is no DSN leak and the fingerprint is not altered by a browser with different dimensions, for example).
This function requires the captcha to be loaded in Firefox Developper and the Grimport Bridge extension to be up to date. To do this, use the firefox function.
The function is synchronised between crawlers so that it cannot be executed twice simultaneously.
Installation
To use this technology, you need to install on Windows AutoHotKey v2 and on Linux, xdotool and wmctrl (sudo apt-get install xdotool wmctrl) to get to grips with the keyboard and mouse.
You also need to have a Cloud code defined as the function costs idIA Tech Cloud Credits. This depends on the price of the captcha but the cost is generally between 0.01€ and 0.005€ per captcha solved. Your Cloud code can be found on your idIA Tech invoices and your account must be funded. Enter your Cloud code in the Grimport Crawler settings or with the setCloudCode function.
Advice
It is important to make the solver's task as easy as possible. If the interaction zone is too large, the instructions are not clear or there are too many steps, more solvers risk cancelling the task to move on to another proposal, which increases the waiting time for your script. Restrict the interaction zone as much as possible to the captcha only and possibly preselect the field or open the puzzle if possible unless it involves a cancellation with refresh in the event of a long wait.
While the solver is taking control of the commands, you must not use your keyboard or mouse, as you will interfere with its actions.
You don't know the nationality of the captcha solver in advance. Make sure that captchas with instructions are displayed in English. Configure your browser to display pages in English only. It is often possible to switch captcha resolution pages to English by changing the language of the site in the URL or on the site.
If you need a pixel ruler, we recommend ScreenRuler.
Example
setCloudCode(111111)
firefox("browse", "https://captcha.com/demos/features/captcha-demo.aspx")
firefox("waitLoaded")
universalCaptchaSolver("#demoForm fieldset", { ->
if(firefox("javascript", """ document.querySelector(".correct") ? document.querySelector(".correct").innerHTML : '' """)) return true
else return false
})
setCloudCode(111111)
firefox("setHeaders", [
/.*/ : [ "Accept-Language": "en-US,en;q=0.5" ] //we ensure that all pages are in English
])
firefox("browse", "https://nopecha.com/demo/hcaptcha#easy")
firefox("waitLoaded")
positionIframe = firefox("positionInScreen", "iframe") //capture the initial position
screenZone = [ //extension of the area
"x": get(positionIframe, "x"),
"y": get(positionIframe, "y") - 50,
"width": get(positionIframe, "width") + 500,
"height": get(positionIframe, "height") + 650,
"from": "screenInFirefox",
]
universalCaptchaSolver(
screenZone,
{ ->
if(firefox("javascript", """ document.querySelector(".response") ? document.querySelector(".response").innerHTML : '' """)) return true
else return false
},
"Task: click on 'I am a human' and solve the captcha to make it disappear and validate!", //solving the captcha in two stages using a puzzle
true //for test, false in production,
{-> //refresh the page to reset everything
firefox("browse", "https://nopecha.com/demo/hcaptcha#easy")
firefox("waitLoaded")
},
3600, //one hour
40
)
Often the captcha is in an iframe (e.g. Datadome). You will then need to capture the coordinates inside the iframe in JS. These coordinates are relative to the page and not to the screen.
screenZone = firefox("javascript", """
captchaFrame = document.querySelector('#captcha__frame'); //the Datadome catpcha box
// Get the details of the captcha__frame
captchaFrameRect = captchaFrame.getBoundingClientRect();
// Creating a table with coordinates
coordinates = {
x: captchaFrameRect.left + window.scrollX,
y: captchaFrameRect.top + window.scrollY,
width: captchaFrameRect.width,
height: captchaFrameRect.height,
from: 'pageInFirefox'
};
//return the table (it will be returned as an object directly usable by screenZone)
coordinates""", 4) // index 4 was found by iterative testing. It represents the index of the frame in which Javascript is executed. We start with index 1 and gradually increase until we find a frame that works. There may be frames within frames and there may be many of them.
//we solve the captcha
universalCaptchaSolver(screenZone,{-->
iframeDatadome = regex(/(?si)<iframe [^<>]*src="([^<>"]+captcha[^<>"]+)"/, firefox("sourceCode"), "outerHTML") //it's not about frame 4. regex will run on the main frame. It simply detects the iframe containing the other iframes which themselves contain frame 4. This method is sufficient because when the captcha is solved, the target site no longer has any iframes containing the word "captcha".
if(iframeDatadome) return false
return true
})
Here is a practical example of captcha resolution with Datadome for a site that also uses IP filtering:
identification = {->
console("identification")
setCloudCode(11111111111)
//cookie page
firefox("browse", "https://www.website.com/")
firefox("waitLoaded")
await(10*1000) //ajax functions load the cookie in a complex way, so we wait to make sure everything is in place
//universalCaptchaSolver
positionCaptcha = firefox("positionInScreen", "#captcha__frame", "*") //note the last argument "*", which is used to search for the element in iframes
if(positionCaptcha)
{
positionCaptcha.put("from", "screenInFirefox") //the capture mode is added to the table because it is not in firefox/positionInScreen
universalCaptchaSolver(positionCaptcha,{->
if(contains(firefox("javascript","el = document.querySelector('.captcha__human__container'); if(!el) ''; else el.outerHTML", "*"), "blocked")) return true //in this case, the site blocks completely without offering a captcha, so we exit universalCaptchaSolver and deal with this case later with an IP change
else if(firefox("javascript","""el = document.querySelector("iframe[src*='geo.captcha-delivery.com']"); if(!el) ''; else el.outerHTML""", "*")) return false //the iframe is present in the main frame, which means that the captcha is still there
else return true //the captcha is solved
}, "Solve the captcha (usually a puzzle piece to be moved to the right place). Refresh as necessary (wait 5 secondes after do it).", false) //the instructions are important, guide the captcha solver carefully and in detail. At the slightest oddity, it will move on to another captcha.
}
//use of the cookie in grimport
setCookieValue(firefox("getAllCookies", ".website.com"), "https://www.website.com/")
setCookieValue(firefox("getAllCookies", ".www.website.com"), "https://www.website.com/")
setCookieValue(firefox("getAllCookies", "www.website.com"), "https://www.website.com/")
//check for blocking
if(contains(firefox("javascript","el = document.querySelector('.captcha__human__container'); if(!el) ''; else el.outerHTML", "*"), "blocked"))
{
console("Total blocking - you have to change your ip and cookie")
return false //failed identification
}
return true //successful identification
}
nbHTTPerrors = 0 //HTTP error counter
actionHttpError({httpErrorIdentifier,errorLink,codeHTTP-> //actionHttpError is triggered every time an HTTP error occurs somewhere in Grimport
if(equals(codeHTTP, 403)) //reverse engineering of requests has established that in the event of a 403 error you need to identify yourself again
{
nbHTTPerrors++ //each time an error occurs, the counter is incremented
successIdentification = false
if(nbHTTPerrors > 0 && nbHTTPerrors<=4)
{
successIdentification = identification()
}
if(nbHTTPerrors > 4 || !successIdentification) //if there are many requests for identification or if there is a problem with identification, the IP is changed
{
nbHTTPerrors=0
proxyChange(["method":"nordvpn_by_command","zone":"europe",
"action_between_change":{->
firefox("clearAllCookiesAllDomains") //Deleting cookies on Firefox
clearAllCookies() //Deleting cookies in Grimport Crawler
identification() //we identify the web page test and the proxy change
})
}
})
See also
antiCaptchafirefox
Parameters
screenPosition
The captcha solver cannot interact outside this area and cannot perform keystrokes. Its commands remain basic.
There are two ways of defining this zone:
- string (ex: ".captcha-puzzle") : If you use a text string, this will be understood as a CSS selector. The x,y,width,height coordinates of the designated HTML element will be used. Select a parent element of the captcha so that it can be viewed in full. Note that some captcha display a "I'm human" box, for example, which displays a larger popup when clicked. If you use the small box selector, the puzzle popup will be cut off and the solver will not be able to solve it.
- associative array (ex: ["x": 20, "y": 100, "width": 500, "height": 300, "from": "pageInFirefox"]) : If you use an array, you specify the x,y coordinates and the height and width dimensions of the area to be captured. An additional parameter must be used: "from" which can have the values "pageInFirefox" (the coordinates are taken from inside the Firefox web page, i.e. (0,0) designates the top left corner of the web page) or "screenInFirefox" (the coordinates are taken from your screen, (0,0) designates the top left corner of your screen). It may be useful to use positionInScreen from firefox to obtain these coordinates. With these two options, even if Firefox is minimised, it cannot see anything other than the Firefox window, this is a security feature for the Grimport user. A third mode exists for "from": "screen", which allows you to capture an area of the screen outside Firefox ((0,0) designates the top left corner of your screen).
checkerIsSolved
- string (ex: ".captcha-puzzle") : If you use a character string, it designates the CSS selector of an HTML element present in the page. As long as this element is detected, the captcha has not been resolved. When it disappears, the captcha is solved and universalCaptchaSolver ends.
- closure (ex: {-> return !equals(firefox("javascript", "window.location.href"), "https://site.com/captcha-check.php") } ) : You can define a function. As long as it returns false, the captcha is still being resolved. If it returns true, universalCaptchaSolver ends because the captcha has been solved.
{jsonResponse -> ... }
_instructions (optional)
By default, this text is displayed : "Task: solve the captcha to make it disappear and validate!"
_testYourself (optional)
_actionBetweenSolvingTry (optional)
If you add an argument to the function, the JSON of the response from the captcha service will be transmitted.
{jsonResponse -> ... }