Hoe programmeer je in Grimport?

Vraag ons om hulp bij het programmeren in Grimport TRANSLATE THIS PAGE WITHOUT TOUCHING THE CODES

Export Filtered Data

Now that we have seen how to extract data from a website, we will have to integrate them.

Integrating data means for example, to get the information of a product from an online store and then export it to a CSV or use it to create a product on a web shop.

Grimport Crawler can integrate data in many different ways.

 

File export (Csv, Excel)


It is possible to export data into a CSV file with Grimport Script Editor. The function is very easy to use, just indicate the name of the file, followed by an associative array ["column name" : "cell value"].
Here is an example of how to write a file on the desktop:

csv(path("desktop")+"my_csv_on_the_desktop.csv",  [ 
	"Product Title" : name, 
	"Product Price" : price, 
])

You will get the following table:

 

As we want to reuse the path variable everywhere we call csv, it is a good idea to define it in the INITIAL script and make it a super-global that can be found in the FROPAGE script where we will extract the data from web pages to export into the CSV.

Example:

INITIAL script:
csvPath = path("desktop")+"my_csv_on_the_desktop.csv"
setGlobal("csvPath")
FORPAGE script:
name = cleanSelect("h1")
price = cleanSelect(".price", null, null, "price")

csv(csvPath,  [ 
	"Product Title" : name, 
	"Product Price" : price, 
])

If a csv file exists under the same name as yours, you can delete it at the beginning of the script with the delete() function so that you don't rewrite what was previously extracted.

Example:

path = path("desktop")+"my_csv_on_the_desktop.csv"  
deleteOldCSVIfExists = true 
if(deleteOldCSVIfExists)  delete(path)    //The file will be deleted

Let's now try a slightly more complex script, in which we define 2 arrays of elements to be retrieved by CSS selectors and regular expressions and which are then merged and exported into a CSV:

Example:

INITIAL SCRIPT:

path0=path("desktop")+"my_csv_on_the_desktop.csv"

deleteCSV = true; if(deleteCSV) delete(path0) cssSeletors=[:] //associative array of type map regexs=[:] // associative array of type map column1 = 'title'; cssSelector1 = "h1"; //title css selector column2 = 'price'; cssSelector2 = "#myPrice"; // price css selector column3 = 'brand'; regex3 = /(?si)\s*Brand:\s*<a\s*href=\"https[^<>]*>([^<>]*)<\/a>/; // brand regex if(cssSelector1) add(cssSeletors, column1, cssSelector1) if(cssSelector2) add(cssSeletors, column2, cssSelector2) if(regex3) add(regexs, column3, regex3) setGlobal("path0") setGlobal("cssSeletors") setGlobal("regexs")

 

FORPAGE SCRIPT:


data=[:] 
add(data,  "URL of the page", urlPage) 
cssSeletors.each
{title,cssSelector->  
	add(data,  title, cleanSelect(cssSelector)) 
} 
regexs.each{title,reg->  
	add(data,  title, cleanRegex(reg)) 
} 

csv(path0,data)

To access a more complete script, you can also click on the following link: Export website data to a CSV (Script).

In Grimport Script Editor, you can also write in an Excel file with writeInExcelCache() and writeCacheToExcelFile().
These functions work similarly to csv(), specifying rows, columns and sheets. For column A, row 1 or the first sheet, specify the value 0.

You can also use the function write() that allows you to build a text file with the desired content.
Use writeEnd() to write in an existing file without overwriting the content.

 

Export with a CMS


Now that you have seen how to integrate data in CSV, let's see how to integrate them in a CMS.
If you use Prestashop you can interface Prestashop with Grimport Crawler. For this you will need to install our Catalog Importer module. You can access the installation tutorial for Prestashop and Intelligent Importer on the page Installation of Grimport and other software.
You can also install our modules on a CMS like WordPress, Magento or Shopify.
Once your CMS is interfaced with Grimport Crawler, you will be able to use functions that allow you to integrate data in a CMS.

To perform these integrations, we use the function:
function( string function_name , array arguments )

The first argument is the name of the function in the PHP library and the second argument contains an array of the function's arguments.
function() allows you to call functions in PHP (no relation to Grimport functions). They can be already existing functions in PHP (like file-get-contents) or new functions created in the library of the module. To see all available PHP functions, click on the button at the bottom of the script panel in Grimport Script Editor.
You can also use the php() function which sends PHP code directly to be executed.

Let's insert for example a product in Prestashop with an image, a name and a price:

//we  extract the title of the product
product_name = cleanSelect("h1")    
//we extract the price which is in a “price” class tag. The last argument cleans the data 
product_price = cleanSelect(".price",null,null,"price") 
//we extract the image from the src attribute of the tag 
product_image = cleanSelect("img.main",null,"src")

//let's  start inserting the extracted data 
//we create a product with a name and a price. Be careful at this point, nothing is created!
//This command is simply put on hold in a stack and will only be sent when send() is called. 
function("add_product" , [product_name, product_price] ) 

//creation of the image. Note that we do not re-specify the identifier of the product previously created, 
//the Grimport will memorize this product and it will be implicitly indicated to this function   
function("add_image" , [product_image] ) 
 
//the stack of commands (defined by the two function) are finally sent 
//and the product is created with an associated image
send()  

In the code above, we add a product with a name, a price and an image.
We use add_product() and add_image(). It is also possible to use the update_or_add_product() function which will create the product if it does not exist and update it if it does.

The send() function allows you to execute the commands. That is to say that before the send(), nothing is created. The commands are simply placed in an array and will only be sent to the server when send is called.

Sometimes you can't wait for send() to launch a command, so you have to use the function:
functionNow ( string function_name , array arguments )

We can also use the phpNow() function.

Here is an example of a native php function with functionNow() that returns true if the function exists:

isImapInstalledOnServer = functionNow("function_exists", ["imap_open"])  
//like the example #1 of the documentation: https://www.php.net/manual/function.function-exists.php

For example if we want to check if a reference exists or not in our database, we will write:

isProductExist=functionNow('id_product_reference',["F55OYN3"])

And here is an example with phpNow():

jsonReturn = phpNow('''$var =  2; 
$map = array("var1" => $var,  "yop"=>"toto"); 
echo json_encode($map);''')

mixedContentFromServer = jsonDecode( jsonReturn )  
// -> {"var1":2,"yop":"toto"}

The example above runs a php code that creates an array. At the end of the php code, it writes this array by encoding it in json, it is redecoded at the reception by Grimport which is then made into an array in Grimport. This is a way to pass data from php to Grimport.

You will see at the bottom of the page the use of the functions jsonDecode() and jsonEncode().

 

IMPORTANT:

Sometimes, in your program, you will have a succession of function() for example to add a product or to add an image in your online store.
The PHP function library of our modules use contextual identifiers.
The contextual identifier is an identifier that we will use for several successive PHP functions. This means that in the same PHP instance we don't need to define each ID for the function(). If there are several instances, we will have to explicitly define the IDs.
A PHP instance means the execution of one or several php functions at the same time (like a call to functionNow() or to send()).

Let's take the example below:

product_exist = functionNow("id_product_reference", [reference])

function("update_or_add_product",[ 
	product_exist ? null : name, 
	product_exist ? null : price, 
	reference, 
])

// add image if(!product_exist) { selectAllInCode("#gallery a",null,"href").each {img-> function("add_image",[img, name]) } } send()

In the example above, if the product is created for the first time, the images are added with the "add_image" function, otherwise they are not added in order to avoid duplication.
The "add_image" function will not need "id_product" to know to which product you can associate the image, because it will take the "id_product" from the "update_or_add_product" function.
The "add_image" function works as such: if there is no "id_product_for_image" (4th argument of the "add_image" function) and there is an "id_product" (it is the global variable instanciated by update_or_add_product and accessible in this PHP instance), then "id_product_for_image" = "id_product" and we will use the ID of the product created in this instance to associate the image.


An other possibility to define a Grimport variable "id_product" with functionNow() that we use in the arguments "id_product_for_image" of the "add_image" function.

The script would look like this one:

product_exist = functionNow("id_product_reference", [reference])

id_product = functionNow("update_or_add_product",[ 
	product_exist ? null : name, 
	product_exist ? null : price, 
	reference, 
])

// add image if(!product_exist) { selectAllInCode("#gallery a",null,"href").each {img-> functionNow("add_image",[img, name]) } }


The problem with this is that it will take a lot of time because there will be many requests (round trip) to send to the server, and that's why it's better to use function() with send() rather than functionNow().

Let's try to understand why.
You should be aware that your lines of code will induce some processing time. Each time a request is sent to the server, it takes time for the request to travel from the computer where Grimport is running to the server, plus the time for the server to execute the instructions, plus the time for the response to be sent back from the server to the client computer.
For the example, let's say that a request takes 1 second to travel over the network and that the execution time on the server is zero. Let's also imagine that each product has 10 images.

In the first version of the script, we have a first functionNow command that sends a request and that will therefore take 2 seconds (sending + return of the command), and send that sends a stack of 11 commands (update_or_add_product + 10 add_image) and that also takes 2 seconds. So each product takes 4 seconds to be created.

In the second version, the functionNow checking the existence of the product also takes 2 seconds, then we find a second one that creates the product and takes 2 seconds, then a loop of add_image that will each take 2 seconds (there will be 10 iterations). So with this version of the script, each product takes 24 seconds to create!

Now you understand the interest of contextual identifiers, not only do they lighten the code but also they allow to have much better performances.

 

Export to remote server


It is possible to insert the data in a MySQL with sqlQuery(). For the first two arguments, use the table provided by the function documentation. Indicate your login and password, then compose your SQL query. Use sqlEscape() to escape the data when building your query:

sqlQuery("com.mysql.jdbc.Driver", "jdbc:mysql://myhostname/databaseName","my login", "my password", 
"UPDATE my_table SET field="+sqlEscape(fieldToInsert)+" WHERE id=1234")

You can create a file that will be imported on the FTP server. For example, use the csv() function to create your CSV file in the temporary directory (the path of the file would be path("tmp")+"my_temp_csv.csv"). Then upload it to a server with the ftpUpload() function:

ftpUpload("68.258.354.12", "admin", "sd44f88t4h", "/var/www/download/myFile.csv", path("tmp")+"my_temp_csv.csv")

You can use the functions jsonDecode() and jsonEncode() to manage JSON files.
jsonDecode() is used to decode a JSON string. It converts a JSON encoded string into a Grimport object.
jsonEncode() is used to convert a array or an object into a JSON representation.

 

 


Next ❯ ❮ Previous