All posts by mgalushka

About mgalushka

Java and Processing enthusiast, interested in computer vision and machine learning. Lover of Raspberry PI and Unix. Programmer and proud father.

Slate – windows manager for Mac

I have discovered Slate – windows manager for Mac which is doing exceptionally great job in improving my productivity.

Begin with reading of this article from Tristan Hume on how to start working with Slate:

http://thume.ca/howto/2012/11/19/using-slate/

And here is slate by itself with full tutorial:

https://github.com/jigish/slate

Here is part of config which explains the idea:


config windowHintsShowIcons true
config windowHintsIgnoreHiddenWindows false
config windowHintsSpread true
config windowHintsDuration 5

bind esc:cmd hint

alias mon-laptop      1680x1050
alias mon-monitor     2560x1600

alias right-top move screenOriginX+screenSizeX/2;screenOriginY                   screenSizeX/2;screenSizeY/2 ${mon-monitor}
alias right-bottom move screenOriginX+screenSizeX/2;screenOriginY+screenSizeY/2     screenSizeX/2;screenSizeY/2 ${mon-monitor}
alias left-top move screenOriginX;screenOriginY                                 screenSizeX/2;screenSizeY/2 ${mon-monitor}
alias left-bottom move screenOriginX;screenOriginY+screenSizeY/2                   screenSizeX/2;screenSizeY/2 ${mon-monitor}
alias full-screen move screenOriginX;screenOriginY                                 screenSizeX;screenSizeY ${mon-laptop}

bind pad9:cmd ${right-top}
bind pad3:cmd ${right-bottom}
bind pad7:cmd ${left-top}
bind pad1:cmd ${left-bottom}
bind pad5:cmd ${full-screen}

Here is my full config:
https://gist.github.com/mgalushka/d79c68464f191ba8e11a

I use next combinations to manage windows:

cmd+number on pad panel – I use it to move current active window to corresponding position: to corners/half the screen up and down.

For window hints I use cmd+esc

Find process_id to be killed

Very often I need to kill some background job in unix.

To do this, I need to find its process_id to be passed to


kill -9 process_id

to be killed properly.

Here is quick way to combine finding process id for specific job:


ps aux | grep [my-fancy-filter-to-find-a task] |\
         awk '{print $2}'

This will just print process_id for my task to be killed.

Caution! Please, use this with care as if your grep return not the process_id  you are expecting – you may get to a trouble.

Watch git/mercurial branch in command prompt

Sometimes this is crucial to not make a mistake committing in wrong branch.

To help mitigating this type of errors, just enable previewing in prompt the current branch you are on.

Following code works equally for git/mercurial branches, you need to put this into your ~/.bashrc file.

function parse_git_branch () {
  git branch 2> /dev/null |
      sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/'
}

function hg_branch() {
      hg branch 2> /dev/null |
           awk '{ printf "\033[37;0m\033[35;40m" $1 }'
      hg bookmarks 2> /dev/null |
           awk '/\*/ { printf " (" $2 ")"}'
}

PS1="$GREEN\u@\h$NO_COLOR:\w$YELLOW\$(parse_git_branch)$YELLOW\$(hg_branch)$NO_COLOR\$ "

For mercurial it also display current bookmark you are on.

This is how it look on my command prompt now:

Command prompt with git/mercurial branch

Twitter posts real-time clustering

Huge popularity growth for natural language processing havily grasped my attention in recent time.

Special thanks to @jmgomez for his unique twitter posts on the NLP and data mining topic.

I’m not an expert in Python but I was lucky to find out best tools to work with it from very beginning. There are lots of various IDEs for python, but in my opinion nothing will beat down PyCharm studio (from JetBrains).

I also started from studying a few courses on the topic (from Coursera), they are good and if you are seriously interested in NLP – you need to check them as well:

BTW, if you by chance have the practical assignments for the latter – please, drop me a private message in twitter or by email as I wasn’t able to find them online.

I decided to start my research with simple problem – analyzing tweets to classify them based on some group of classes.

Here is the whole project on github, I will launch this on the web very soon.

https://github.com/mgalushka/opinions-classifier

Oracle hierarchical queries

Discovered for myself Oracle hierarchical queries power.

Let’s assume you have a table with structure (basically storing graph representation in 1 table, parent is referencing to id in same table):

create table nodes(
    id number not null,
    parent number,
    type varchar2(10) not null);

table-hierarchical-queries

Let’s imagine we want to count how many children each parent in this structure has (any node can be both parent and child):


select parent, count(id)
from nodes
    start with id = 1
    connect by prior id = parent
group by parent;

Here you can use start with syntax to indicate id which you are asking to start search from.

In this case it will start with specific id = 1, that for each children matched by id = parent it will repeat same operation – looking for the children – therefore full tree will be walked through.

Connect by syntax allows to indicate condition which is used to find matched across whole hierarchy: on the left side of equation (id) you need to indicate parent reference. On the right side (parent) – you need to indicate child reference you are matching with parent.

You can play with SQL provided example here:

http://sqlfiddle.com/#!4/04b91/12

MTV Exit has finished.

Just came back from 2-days MTV Exit hackathon where we tried to solve issues related to slavery problem in the world.

Very impressive people and community and unbelievable passion of people from organizations which trying to challenge this hot topic.

We we doing riskmap application – site riskmap.org.ua to solve to issues in one shot:

1. Problem of insufficient level of information in society about real statistics, cases and real risk for each of us.

2. Problem of too many resources on too many sites which are not related to my specific questions regarding work migration to another country.

Have a look at our site, let me know your opinion.

BTW. We took second place in the competition and are planning to finish this project to the end.

Control sound with sound keys from Processing app on android

I was looking for a way to use sound controls on android to control remotely (from device) omxplayer running on raspberry pi:

sound-controls

Here is code how to do this easily on processing:


void keyPressed() {
    if (key == CODED && keyCode ==
        android.view.KeyEvent.KEYCODE_VOLUME_DOWN) {
        println ("Volume down");
    }
    if (key == CODED && keyCode ==
        android.view.KeyEvent.KEYCODE_VOLUME_UP) {
        println ("Volume up");
    }
}

Hadoop, Unix and lots of command line…

I decided to try hadoop for some huge files processing.

Basically, I’m doing some testing for one of the kaggle problems and needed to process 2-8G files in some way which requires a lot of CPU power.

I decided to try Amazon EMR with their pre-configured hadoop machines.

EMR is actually very good, but I have found for myself to have 1 special cluster  running all the time for tests – to check jobs before submitting large files to big clusters to save time on testing on a small inputs beforehand.

Discovered that Hive is not probably the best choice for you if you have  a lot of logic or very complex queries to run.

For myself I’m using custom jar clusters only.

How do I make a test before submitting job to big cluster? Connect to master machine and run:

hadoop jar myjar.jar input-files-from-s3

How to check what is the status of jobs you are running?

1. Look at monitoring status on Amazon screens

Amazon EMR monitoring

2.  Portforward to Hadoop web interface and look there – recommended way:

ssh -i your-ssh-key.pem -L <br />9100:amazon-public-ip-for-master-node:9100 <br />hadoop@amazon-public-ip-for-master-node

And then – just open http://localhost:9100 in browser to see hadoop web-console.

Why I’m using Processing framework.

This is just to use full power of creativity part during doing boring programmer work.

I’ve watched this course which revealed for me the power of Processing:

https://class.coursera.org/digitalmedia-001/class

The whole point is that you don’t need to make your program excellent from the first day of development. It simply possible that you will not have enough time and motivation to complete.

My idea is to use Processing for quick prototyping and then (only if the application is promising – you can check this with real users) – go ahead with more powerful solution.

Firstly – create a mockup which is just broken but have the ability to communicate an idea to the user. Only after that – fix and polish it.

That is why Processing was created for – to test, experiment and create quickly without much hassle and development tools.

Tracking people from webcam with OpenCV

Spent last weekend on #douhack (in Donetsk), I have been creating a program to count number of people walking through the street in front of web camera.

This appeared not such a simple task. To recognize moving objects I have used simple technique of background subtraction, when later frame with image capture from camera is subtracted pixel-by-pixel from previous image and revealing the regions which were moved from one frame to another.

More advanced algorithm described in documentation (see referenced works).

For tracking the person which moves I have tried a few techniques, camshift algorithm didn’t really helped. The reason for this is that algorithms doesn’t have enough “memory” capacity to track objects which are disappeared behind the other objects on the street. So I did a hack to linearize the movement of the person to estimate where moved object will appear again.

Here is demo how it works (pretty lame anyway):

http://www.youtube.com/watch?v=gcONLfkFSNM

Also Github link with sources:

https://github.com/mgalushka/pedestrians-traffic-calc

I strongly recommend this book to understand the basics of OpenCV and objects tracking (if pdf is not available – give me a shot and I will update link).

Big special thanks to  Mateusz Stankiewicz for his blog post regarding the topic.