All posts by mgalushka

About mgalushka

Java and Processing enthusiast, interested in computer vision and machine learning. Lover of Raspberry PI and Unix. Programmer and proud father. View all posts by mgalushka →

Measuring large data sets

March 24, 2019Uncategorizedmgalushka

I was always confusing those. Trying to remember them now.

1TiB is Tebibyte = 2⁴⁰ bytes ~= 1,099,511,627,776 bytes (more then 1 trillion)

1TB is Terabyte = 10¹² bytes == exactly 1 trillion bytes (1,000,000,000,000 = 4 groups of zeroes)

ffmpeg -i input.flac -ab 320k -map_metadata 0 -id3v2_version 3 output.mp3

Some C++ definitions

July 26, 2017c++mgalushka

SIOF = static initialization order fiasco
https://isocpp.org/wiki/faq/ctors#static-init-order

ODR = one definition rule
https://en.wikipedia.org/wiki/One_Definition_Rule

DOF = Destruction Order Fiasco
https://isocpp.org/wiki/faq/dtors

head k* && g++ -fsanitize=address -static-libasan -g -O -Wall -std=c++14 k.cpp k2.cpp && ASAN_OPTIONS=detect_odr_violation=1 ./a.out

Debugging undefined symbols in c++

June 14, 2016

This is awesome step-by-step on how to debug undefined symbol issues in your c++ programs:

http://gdwarner.blogspot.co.uk/2009/03/c-runtime-symbol-lookup-error.html

How to see git/mercurial branch on your command line

May 18, 2016productivity, Uncategorized, unixmgalushka

Add this to your `~/.bashrc` file:


function parse_git_branch () {
git branch 2&gt; /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/'
}

function hg_dirty() {
hg status --no-color 2&gt; /dev/null \
| awk '$1 == "?" { unknown = 1 }
$1 != "?" { changed = 1 }
END {
if (changed) printf "!"
else if (unknown) printf "?"
}'
}

function hg_branch() {
hg branch 2&gt; /dev/null | awk '{ printf " (" $1 ")" }'
hg bookmarks -a 2&gt; /dev/null | awk '/\*/ { printf " (" $2 ")"}'
}

RED="\[\033[0;31m\]"
YELLOW="\[\033[0;33m\]"
GREEN="\[\033[0;32m\]"
NO_COLOR="\[\033[0m\]"

DEFAULT="[37;40m"
PINK="[35;40m"
RANGE="[33;40m"

PS1="$GREEN\u@\h$NO_COLOR:\w$YELLOW\$(parse_git_branch)$YELLOW\$(hg_branch)$NO_COLOR\$ "

Google Code Jam 2016. Problem D – Fractals.

April 10, 2016Uncategorizedcompetition, google code jammgalushka

Google Code Jam 2016 – solution to problem D: Fractiles.

This is problem to get 10 points for free.
Small part is very easy.

Problem text.

If S = K this means that we can use K positions to test if there is gold.
We will use all of them.

Let’s assume there is gold in 1st position of original sequence.
Then 1st position of final sequence will be gold too – this is very easy to understand.

So we choose 1.

If there is no gold at 1st position – it means that first K numbers in final sequence are the same as in original.
So small solution which wirks for all inputs:

1 2 3 4 … K

To solve large input next idea becomes a solution:

Let’s imagine some start sequence and end sequence which has complexity C=2.
How we can clean 1 tile and check if 2 origina tiles contain gold or not?

Here is image which explains which tile we need to check to verify tiles 1 and 2 in original sequence.

This easily becomes feasible idea for solution – for complexity C – each additional level allows us to check +1 original tile by cleaning corresponding tile in corresponding position.

Minimum number of tiles which are required is floor(K / C)

How to configure Presto/Hive/HDFS on Mac

April 6, 2016hadoop, hive, unixmgalushka

It is quite a pain to setup everything.
Here are some links which helped me significantly:

Tricks:

Use java 1.7 with newest hadoop/hdfs/hive 2.0.0

To create metastore – go to $HIVE_HOME/bin and run:

schematool -initSchema -dbType derby

Derby is java in-memory database. This option will not allow you to run simultaneously Hive metastore (required for Presto) and Hive itself and so consider using mysql for metastore.

Then install presto going through instructions on prestodb.io

So to use presto – you need to shutdown Hive CLI and start metastore service from same directory where your derby is being set with schematool. To start metastore:

hive --service metastore

To check which components of Hive/HDFS are running on machine, run:

jps

To start datanode:

hdfs datanode

Create 2 aliases in ~/.bashrc to start/stop hadoop/hdfs:

alias hstart="/usr/local/Cellar/hadoop/2.7.1/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.7.1/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.7.1/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.7.1/sbin/stop-dfs.sh"

How to extend swap on Amazon Linux

It is easy.

Let’s say we want to extend it bu 500M, below count = 500 x 1024

sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=500M count=524288 sudo /sbin/mkswap /var/swap.1 sudo /sbin/swapon /var/swap.1

Left Semi Join on Hive

January 22, 2015hive, sqlhive, sqlmgalushka

Instead of writing:

SELECT a.key, a.value
FROM a
WHERE a.key in
 (SELECT b.key
  FROM B);

Let’s write:

SELECT a.key, a.val
FROM a LEFT SEMI JOIN b ON (a.key = b.key)

So LEFT SEMI JOIN is just allows to implement efficiently IN/EXISTS queries’ semantics in your queries.

Note that right hand side of query cannot be used in WHERE clauses – it should only be used in ON join condition.