Vincent Doba's Blog

Technical Blog

Option versus nullable: which type spark deserializes faster

on 2020-11-12

Recently, I was wondering about Spark’s deserialization performance. Especially this question: when you have a nullable column in a dataframe, is it better to deserialize it to an option or to a nullable type ? Let’s answer this question in this blog post. The benchmark To answer this question, I define the following benchmark. I create simple input data, read it with three Spark applications that select a column, replace its null value with a default value, and write the result to parquet.

#spark #scala

Read more of Option versus nullable: which type spark deserializes faster

Reading parquets with different schemas in Spark

on 2020-10-25

Yesterday, I ran into a behavior of Spark’s DataFrameReader when reading Parquet data that can be misleading. If we have several parquet files in a parquet data directory having different schemas, and if we don’t provide any schema or if we don’t use the option mergeSchema, the inferred schema depends on the order of the parquet files in the data directory. The setup I am reading data stored in Parquet format.

#parquet #spark #scala

Read more of Reading parquets with different schemas in Spark

Install Hugo static website with nginx and let’s encrypt certificate using ansible

on 2020-07-19

In this article, I will present you how I configured the deployment of my blog that use the static site generator Hugo To do so, I used the following tools: Ansible (version 2.9.10) for configuration managment system Nginx for webserver Let’s encrypt and certbot for TLS/SSL certificates This article will go thought all the ansible tasks I had to put in place in order to install nginx, configure my blog virtualhost, add TLS/SSL encryption to the blog, and build the blog with Hugo

#hugo #ansible #debian

Read more of Install Hugo static website with nginx and let's encrypt certificate using ansible

Install JDK 8 on MacOS Without Admin Rights

on 2020-07-04

Many corporation are still developing applications using the 8th version of Java. And many corporations have strict security rules forbidding employees to have the administrator rights on their machine. However, not so many corporations use macbook pro as employee’s machine. The advantages of macbook pro for a developer is that it is rather easy to install software without administrator’s right. You just install Homebrew in your home directory and you can install lots of software without asking anyone permission.

#java #macos

Read more of Install JDK 8 on MacOS Without Admin Rights