Allows you to easily interact with Firefox.
It is important not to touch the Firefox Developer Edition browser at all. You can, however, work in parallel on another window as long as Firefox Developer Edition remains open.
The function is synchronous.
It is recommended to stop the synchronization when the script is finished by clicking the Grimport button in Firefox or to close the browser, as this consumes resources.
actionFirefox is a function of last resort, it is often used when the identification is too complex. However, always try to work in pure Grimport with functions like post or setCookie. Indeed, the dependency on the Firefox Developper browser, the connection latencies and the risks induced by the possibility that the browser is closed by mistake mean that this practice will not improve the performance and robustness of the script, which is essential in Web-Mining.
Development and list of actions
Main actions
- browse. 1 argument: the URL. Browse to the indicated URL. The first time a new tab is opened, it will be reused each time.
- javascript. 1 or 2 arguments: the Javascript code for the first. Execute a javascript code on the page. The result of the last instruction is returned by the function (see the example). It is recommanded to use ' as string separator insted of " because $ are interpreted with double-quote. Note: you cannot use JS libary of the page like jQuery.
Frame JS: You can specify a second optional argument to target the frame where the JavaScript should be executed. To identify them, do firefox("javascript", "document.documentElement.outerHTML", frameTargeted). You have 3 options for this frameTargeted argument:- integer : The frame is targeted using a subjective index. The main frame has the ID 0, the first inner frame has the id 1, the second the id 2, and so on. The frame id is arbitrarily defined as the alphabetical order of the frame URLs. It is often necessary to try several numbers before finding the right frame.
- regex/string : You define a regular expression corresponding to the url of the frame to find. Don't use (?si), the firefox regex format isn't quite the same as Grimport.
- "*" : The code is tested on all frames until it returns something. Disables Javascript error reporting. This behaviour may be random if your JS code is not specific to a single frame.
- sourceCode. No argument. Returns the source code of the current page (it is a DOM parsing).
- waitLoaded. No argument. Wait until the page is fully loaded.
- getCookie. 2 arguments: the first is the domain, the second is the cookie name. Returns the value of a cookie. Be careful to specify a relevant domain, sub-domains like www. have an impact (sometimes you need to set a domain like that ".myDomain.com" or like that "myDomain.com", see your cookie manager to know what you need).
- getAllCookies. 1 argument: the domain where the cookie is accessible. Returns the cookie value corresponding to this domaine (contatenation of name2=value1; name2=value2;... ). Be careful to specify a relevant domain, sub-domains like www. have an impact (sometimes you need to set a domain like that ".myDomain.com" or like that "myDomain.com", see your cookie manager to know what you need).
- setCookie. 3 arguments: the first is the url (not the domain, so include http://), the second is the cookie name, the third is the cookie value. Set a cookie to a new value.
- setAllCookies. 2 arguments: the first is the url (not the domain, so include http://), the second is the cookie value which include name and value of cookies (format: name1=value1; name2=value2; ...). Set cookies using the cookie value like in the Cookie header.
- clearCookie. 2 arguments: the first is the domain, the second is the cookie name. Remove a cookie by name. Be careful to specify a relevant domain, sub-domains like www. have an impact.
- clearAllCookies. 1 argument: the domain where the cookie is accessible. Remove all cookies from a domain. Be careful to specify a relevant domain, sub-domains like www. have an impact.
- clearAllCookiesAllDomains. no argument. Remove all cookies from all domains.
Other possible actions
- userAgent. 1 argument: the new User Agent. Change the User agent of Firefox.
- httpCode. No argument. Returns the HTTP code of the main frame.
- activateTab. no argument. Makes this tab active. See the documentation on multi-tab management. To select a tab other than the tab dedicated to the script, enter your personal tab id at the end of @actionType (ex: "activateTab-tabForTargetSite")
- lastRequests. no argument. Returns the list of queries captured using the "regexLastRequests" pattern (to be defined beforehand). The instruction returns a list of queries, which are associative arrays with the following keys : "headers", "method", "url", "tabId", "frameId", "documentUrl", "cookieStoreId", "proxyInfo", "ip", "incognito", "responseSize", "type", "urlClassification", "timeStamp", "thirdParty", "parentFrameId", "requestId", "originUrl", "frameAncestors", "requestSize". Useful for extracting a particular header, for example.
- regexLastRequests. 1 argument: the regular expression to filter the good URL. To be performed before "lastRequests". Used to target filtered requests stacked in the "lastRequests" list. You filter requests using a regular expression which must match the reason in the request URL.
- delayAction. 1 argument: the delay in seconds (60 by default). By default, each action should take a maximum of 60 seconds, otherwise Grimport will move to the next action. You can increase this time with this action.
- positionInScreen. 1 or 2 arguments: CSS selector of the targeted element for the first. Returns an associative array of properties [x: posX, y: posY, width: elemWidth, height: elemHeight] of the targeted HTML element within the screen (and not in the page). Useful for screenshots. You can specify a second optional argument to target the frame where CSS selector should be executed. This argument works in the same way as for "javascript".
- resizeWindow. 2 arguments: width and height in pixels. Resizes the Firefox window. Useful for varying the browser footprint.
- screenshot. 1 argument: file name without extension. Creates a screenshot of the Firefox window and saves it in Firefox's Downloads directory as a PNG file of which you have given the name without the .png extension.
- firefoxType. 1 argument: "firefox" (for the normal Firefox browser) or "firefox developer" (to use the Firefox Developer). Change the Firefox browser.
- autoClose. 1 argument: true (default) or false. Defines if the Firefox browser is automatically closed when you don't have some script depending on this function. Sometimes it can be useful to disable auto-close to make a script more stable, as opening and closing the browser can make it unstable.
- autoOpen. 1 argument: true (default) or false. Defines if the Firefox browser is automatically opened when a script needs Firefox. The method detects if a browser is already open and opens one if needed. This option is recommended and the method is not known to generate instabilities. An second launch of Firefox is possible but it is not a problem.
- setLocalStorage. 1 argument: associative array of [name : value] of variables to set in setLocalStorage. Ex: [ "_ym40786994_lsid": "322553582843", "_ym40786994_reqNum": "3" ]. Change the local storage content of Firefox.
- setHeaders. 1 argument: associative array of regex : [name : value]. Ex: [ /google\.com/ : [ "Accept-Language": "en-US,en;q=0.5" ], /idia-tech\.com/ : [ "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "cross-site" ]]. Changes or adds headers to requests which validate the regex pattern as a key in the array. Set the value of the array to an associative array of the names and values of the substitution headers.
- changeInitialDelayWaitLoaded. 1 argument: the desired number of milliseconds. By default waitLoaded waits 500ms and then waits for the page to load. With this method, you can change this initial wait time.
- closeFirefox. No argument. Manually close Firefox.
- closeTab. No argument. Close the tab. See the documentation on multi-tab management.
- connected. No argument. Returns 1 if the extension is connected to Grimport.
Example
Get a code source:firefox("browse","https://www.idia-tech.com/grimport.php")
firefox("waitLoaded")
console(firefox("sourceCode"))
Get the cookie value of a website:
firefox("browse","https://www.site.com/identification.php")
firefox("waitLoaded")
firefox("javascript","""
document.getElementsByClassName("login")[0].value="my_login"
document.getElementsById("pass").value="my_password"
document.querySelector("#submit").click()
""")
firefox("waitLoaded")
cookieValue=firefox("getAllCookies","www.idia-tech.com")
setCookieValue(cookieValue, "https://www.idia-tech.com")
Use javascript to return something (The last instruction is the return value):
firefox("setAllCookies","https://www.idia-tech.com/grimport.php", getCookieValue("https://www.idia-tech.com/")) //set the cookies of idia-tech.com of Grimport into Firefox
firefox("browse","https://www.idia-tech.com/grimport.php")
firefox("waitLoaded")
jsReturn=firefox("javascript",'''
description=document.getElementsByClassName("entry-content")[0].innerHTML;
description;''')
console("Page description: "+jsReturn)
Export the data in javascript and return them through a JSON:
firefox("browse", "https://www.idia-tech.com/activites.php")
firefox("waitLoaded")
data = jsonDecode(firefox("javascript","""
h1 = document.querySelector('h1').innerHTML
h4 = document.querySelector('h4').innerHTML
JSON.stringify({ h1: h1, h4: h4 })
"""))
console(get(data, "h1"))
console(get(data, "h4"))
Wait for the display of an HTML element that loads after the page is loaded:
firefox("browse", "https://www.site/page.php")
firefox("waitLoaded")
isOK=false
while(!isOK)
{
returnValue = firefox("javascript","""if(document.querySelector('.myElement')) "OK"; else "Nope!" """)
if( equals(returnValue, "OK")) isOK = true
else wait(500)
}
console( firefox("javascript","""document.querySelector('.myElement').attribute("data-name")""") )
Debugging your script
Javascript errors can appear in the Firefox console (Menu icon > More tools > Browser tools > Console) or in the module console. You can debug your firefox script in the module console, go to Menu icon > Extensions and Themes > Extensions > Cogwheel > Debugging modules > Grimport Crawler Bridge > Review.The error is displayed avec the message "JAVASCRIPT ACTION" and you can see all actions.
Multi-tab management
The function includes multi-tab management, allowing each Grimport script to execute its actions in its own tab to avoid interference. You can use several tabs for a script by specifying a tab ID of your choice at the end of @actionType. For example, "browse" becomes "browse-tab_of_google". Any character is allowed in the ID, but be sure to use the "-" separator between the action and the ID.
IDs allow two different scripts to interact with the same tab. So be careful, if you want to work on 2 tabs with one script and avoid interactions, it may be a good idea to include a random part in the id or the script's unique ID.
To target the current active tab, use "current" in this parameter.
Installation instructions
This installation procedure is not normally necessary, but may be useful if you have problems using Grimport Crawler or if you use it in an unusual way.
1) Download, install and open Firefox Developer Edition2) Write in the address bar about:config
3) Click on the "I'll be careful" button
4) Write xpinstall.signatures.required and set this property to false
5) Write browser.sessionstore.resume_from_crash and set this property to false
6) Restart Firefox Developer Edition
7) Go to the address of the extension : https://www.idia-tech.com/grimport-crawler/grimport-actionFirefox.xpi
8) Install the extension (allow for privacy mode)
9) Close and open Firefox Developer Edition to check the red square of Grimport is present.
There is an alternative for installing the extension: rename the file extension to .zip instead of .xpi. Unzip it. Go to any Firefox on about:debugging#/runtime/this-firefox. Click on load a temporary module. Then click on any file of the unzipped extension.
Sometimes Firefox can be temperamental, it may be necessary to install the extension twice, or even download the XPI from another browser and add it to Firefox as a file.
Parameters