Reading List


  1. From Socrates to Sartre, T. Z. Lavine: an introduction to philosophy.
  2. The Concept of Mind, Gilbert Ryle: final nail in the coffin of Descartes' dualism.
  3. Godel, Escher, Bach, Douglas Hofstadter


  1. Mechanics of Manipulation, Matt Mason: textbook on the mechanics of manipulation, a field Matt Mason pioneered. Would love to chat with him once I understand the mathematical formulation of the problem and his contributions to the field.
  2. Calculus of Variations and Optimal Control Theory, Daniel Liberzon: a treatment of optimal control theory based on Pontryagin's Maximum Principle. A non-dynamic-programming-based approach to solve optimal control seems unthinkable to me.

Decision Making

  1. The Book of Why, Judea Pearl: an accessible introduction to causality.


  1. An Introduction to Measure Theory, Terence Tao

I Bought My First Air Conditioner.

This summer in Pittsburgh was unbearably hot. When the mercury hit 40 degrees, I finally had enough and ordered an air conditioner for my bedroom. It took another week to arrive- time I spent trying every trick up my sleeve, from my Kharagpur days, to stay cool. All my friends were surprised that I held up for so long.

Not needing an air-conditioner was an act of rebellion to me. The AC is a tool for the standardization of weather; for replacing discomfort with monotony. I, on the other hand, like my three seasons- summer, rainy and winter. Especially the monsoon rains after a spell of hot summer. Scorchingly hot days, that leave you tossing all night in sweat, also make you dance outside in the rain when the rain gods arrive. What would you do if you had spent all your summer without feeling so much as a whiff of hot air? You would only complain about the mud. You see, I do not want to be that guy. I wish to accept the lows so that I can savor the highs.

So, when I finally confirmed my order on Amazon, it was with a deep sense of loss, for I had chosen comfort.

The Philosophic Quest

Socrates and Plato (~400 B.C.)

Historical Context: Plato was born cica 427 B.C. at the end of what is often called the Golden Age of Athens. This age ended with the defeat of a democratic and industrial Athens against a militaristic and agricultural Sparta in the Peloponnesian War. The defeat led to a brief reign of terror by a group of aristocrats who were ultimately overthrown and democracy was re-established. The newly democratic Athens sentenced Socrates to death because of his anti-democratic views. Following this, Plato too fled from Athens and after trying unsuccessfully to implement his idea of the philosopher-king in the real-world, came back to Athens to take to full-time teaching.

Philosophy: I will mostly talk about Plato's philosophy as he was highly influenced by Socrates and hence his philosophy mostly includes that of Socrates. The heart of Plato's philosophy is that Virtue is Knowledge.


Descarters (~17th AD)

Historical Context: After the time of Plato, the Greeks were conquered by the Macedonions, who were in turn defeated by the Romans. The decline of the Roman empire coincided with increasing dominance of Christian beliefs. The church destroyed many Greek and Roman writings, charging them with being pagan and un-Christian. For over a thousand years, from the 4th century to the 14th century, the Church controlled the social and cultural life of Europe. Plato's and Aristotle's writings re-emerged in the Western world via the Muslims. Saint Augustine and Saint Thomas led the construction of a new philosophy combining Catholic beliefs with the re-discovery of Plato and Aristotle's philosophies. All this led to what is now called the Renaissance.

Descartes was born in this fast changing world. The centuries-old dominance of the Church on the Western life was on a decline. A new empirical science based on mathematics and deductive reasoning was being built by people like Copernicus and Galileo. However, the Church was still quite powerful and Descartes deemed it too risky to offend the Church.

Philosophy: Descartes wanted to explain the world rationally. Starting from self-evident truths, he wanted to mathematically founded the field of mathematical physics.


Hume (~1740 AD)

Historical Context: The period of roughly a hundred years starting from Descartes' death and ending with Hume's death was a period of utmost optimism and self-confidence in western philosophy. So much so that this period has also been called the Age of Enlightenment (in the West). This optimism was rooted in fundamental discoveries in science, rise of a self-made middle class and wildly new opportunities for trade and commerce. All these events contributed to a new found confidence in the superiority of human reason. Newton was the poster child of this belief as he had succeeded in demonstrating that every mechanical interaction, whether on earth of between celestial bodies, could be explained by a small set of simple laws.

Philosophy: Hume was going to puncture massive holes in this belief in the superiority of rationality and the human mind. Hume was the greatest of a line of British empiricists, a line that included the likes of John Locke and George Berkeley. The empiricists poses a seemingly innocent question, "How do you know?". In the hands of Hume, this question turned out to be a wrecking ball that devastated philosophic optimism of the time.

Shivam Vats on #book,

Optimal Control and Reinforcement Learning


This post tries to sketch the structure and main features of two related fields - Optimal Control and Reinforcement Learning.

Optimal Control is an older field that originated in the Calculus of Variations and classic Control Theory that solves the problem of minimizing an objective function while controlling an agent. The problem is fully defined by two objects:

  1. The system dynamics equation (or the system model): This equation models how the system, that is to be controlled, evolves. The standard notation for the discrete version of this problem is \(x_{t+1} = f(x_t, u_t)\)

  2. A cost function \(J = \sum_1^N L(x_t, u_t)\) to be minimized.

The most well-studied optimal control problem is the Linear Quadratic Regulator (LQR) which assumes a linear dynamics equation and a quadratic cost. It is all well and good if both these functions are known. However, things start getting complicated as we attack problems in which we know less and less about these functions, or the simplifying assumptions do not hold.

Reinforcement Learning is a loose term that identifies a set of techniques that can attack Optimal Control problems in which we know very little about the problem structure. In particular, RL assumes only that the state is Markov (which is not a very restrictive assumption) and that the agent receives a reward/cost upon executing an action. What makes RL algorithms powerful is the fact that neither the cost function, nor the system dynamics (or transition function) is required. It will sure make the problem easier if they were known, but then that will be closer to the domain of Optimal Control.

The Subtle Art of Not Giving a Fuck

This is a surprisingly concise and freely-flowing book that explains Buddhist philosophy to a modern reader. The arguments in this book are derived from exactly the same realization that propelled Siddharth Gautam on his path to become the Buddha - the inevitability of death. The realization that you are going to die no matter what puts a question mark on the utility of everything in your life. If you fool yourself into ignoring this painful and harsh reality, you start caring too much about too many things - giving too many fucks. On the other hand, if you are going to die anyway, why should you care about anything? Mark Manson proposes that we should care about what we are leaving behind - our legacy. Will the world be a better place because of us? This gives us a higher purpose in life, which Mark believes is the way to live a happy and fulfilling life.

However, he also quotes Ernest Becker who believed that people's immortality projects - attempts to leave a legacy, were part of the problem. I am not sure how to reconcile these conflicting ideas. While attempting to make the world a better place, could you not end up starting another immortality project?

Important Points

  1. You can't care about anything and everything in life. Because if you do, you will never be happy. Your life will never be perfect. The key is to have a small list of high-value things that you really care about and work towards making them right. Everything else doesn't matter. The key to finding a better pareto optimal solution is by dropping objectives.

  2. The human brain, by design, needs to keep itself engaged. If you do not have real problems in life, your brain will simply create problems out of thin air. The one secret to living a fulfilling life is to find a real problem that you truly care about and work on it your whole life.

  3. Pain is an indispensable part of the process of getting better. It is powerful. Intense pain makes you rethink even your most deeply held beliefs.

  4. Life is about not knowing and then doing something anyway. There will never be a time when you are 100 percent sure of the right way forward. Even if you think you are, you may prove to be wrong later. You need to accept the uncertainty and keep moving forward.

  5. Action -> Inspiration -> Motivation

Shivam Vats on #book,

Dynamic Manipulation

Robot manipulation is where the action is these days on the control side of robotics. Navigation, which saw tremendous research in the past few decades, is finally in a state that self-driving car companies can deploy cars on roads (at least for testing) and we see drones doing all sorts of things. The state of robotic manipulation, however, is still quite primitive. The suction cup remains the gripper of choice in the industry, while decades' worth of research in anthropomorphic hands is rarely used.

I am using this post to outline the field of manipulation and to collect my thoughts on interesting problems that remain to be solved. Matt Mason talks about four categories of manipulation in an increasing order of difficulty:

  1. Kinematic Manipulation: Only kinematic constraints are considered. You don't worry about stability and ignore all forces. For example, robot arm moves to the grasp pose, closes the gripper (who cares if the object was grasped) and moves somewhere else.

  2. Static Manipulation: Now you are taking the force applied by the gripper (say parallel-jaw gripper) into account. Because of friction, different objects might need different amount of normal force to be lifted.

  3. Quasi-static Manipulation: In the previous category, there was no relative motion between surfaces involved, i.e., the system was in equilibrium (static). It is not so, if the robot needs to push an object across a surface. This category ignores acceleration.

  4. Dynamic Manipulation: In quasi-static manipulation, even though the object was moving, it was acted upon by the gripper at every point in time and we ignored acceleration. Many actions like throwing and dynamic closure are not possible without acceleration. The gripper imparts momentum to the object to get the job done and as a result, it might not even need to remain in contact with the object.

Principles of Effective Research

This is a summary (with some commentary from my side) on a piece with the same title by Michael Nielsen.

Self Discipline

Effective people are self-disciplined. I have been fighting with my lack of self-discipline for as long as I can remember. And for the longest time I treated it as a side-effect of my weak will-power. But over the years, experience has taught me otherwise:

  1. Will power is a finite and a scarce resource. Use it only when you absolutely have to.

  2. Your state of mind and environment have a much bigger and longer lasting impact on your discipline. If both of them are aligned with your goal, things will fall in place naturally and you will not need to discipline yourself. You may have heard people putting in 12 hours a day at work and still having fun.

  3. Your state of mind is affected by having clarity about what you are doing. In terms of the theory of reinforcement learning, having clarity means having a good and confident estimate of the expected future value of doing something. For example, if you are certain that putting in 12 hours a day for a year is going to get you a Nobel prize and you really want one, the 12 hours stop seeming like too much.

  4. The environment includes both your social and physical environment. If your social environment - advisor, lab and fellow researchers support the development of research skills, it can make an enormous difference. Every once in a while you are going to doubt the worth of your work. Having a supportive social environment is critical in these situations.

  5. The last factor is self-honesty. You know yourself the best. This means you are most susceptible to being fooled by yourself. And between you and yourself, there is no one else watching. One way to enforce honesty is to collect hard data about yourself on a regular basis and evaluate that every once in a while. Diary entries, research logs, daily logs are some different ways to do it.

2019 - A Review

TL;DR: 2019 was a mixed year and a local minimum (as I would like to believe). In the first half of the year (January through May), my focus was solely on aesthetics - physical fitness and art. This came at the cost of progress in my research and my intellectual development. By the start of June, I had begun to realize the effects of this skewed focus and I took drastic corrective actions. In the second half of the year (June through November), I focused solely on getting my research up to speed by curtailing my physical activities. Though this led to limited progress in my research (as opposed to zero progress in the first half), it entailed heavy stress that is not sustainable in the long term.

First Half (Jan - May)

I took to three habits:

  • working out thrice a week

  • reading for 3-4 hrs on the weekends

  • playing the piano

I had started going to the gym towards the end of 2018 and had made limited gains. I wanted to make visible changes in my body and for this I trained hard - choosing not too frequent (thrice a week) but intense and long (2 hrs) workout sessions. Unfortunately, my recovery wasn't quick enough and I would feel tired after these intense workouts. This negatively affected my work. It also negatively affected my sleep quality (or was it coffee?) which further degraded my work-life balance. However, I had decided to go all in and hence chose to ignore these warning signs as I felt that physical fitness and non-negotiable aspect of my life.

In the end, I did notice visible changes in my physique. This was the fittest I had ever been - 8 pull-ups, 75 kg squats, 50 kg bench-press and 5.5 miles in 52 mins.

However, this intensity was unsustainable.

I also spent some time learning how to play the piano - I took piano lessons and even bought a keyboard. However, I did not feel like continuing it because it was yet another solitary hobby in the list of solitary activities that I do. I found a fun alternative - Latin dancing, in the summer.

Second Half (June - Nov)

By this time, I had made zero progress in my research despite my attempts to maintain a work-life balance (basically aesthetics/work balance). My advisor wanted me to submit a paper by the end of the year, which was simply not possible if I did not take corrective actions.

So I stopped working out, reading books and piano and tried to spend more time working. This led to an interesting realization- I was not really interested in what I was working on and further that I did not really understand the value of my research. I do not like working on short-term objectives. I have always liked to go after the bigger picture. Not being able to understand where my research was headed was a major peeve and rather demoralizing.

I was aiming for ICRA (due in September) but missed the deadline. I then aimed for ICAPS in November. Even though my paper had very weak results, I still submitted the paper to keep my sanity. I do not like missing important deadlines (the practical reason being that it is a slippery slope).


  • Publishing papers regularly is non-negotiable during a Ph.D; not just because it keeps you in good standing in the program but also because it keeps you on your toes.

  • Do not worry about the quality of your paper during the initial days. Just submit!

  • Always ensure that you are aware of the bigger picture where your research fits. This requires studying consistently (read books) and keep abreast of the latest research (read papers).

Bouncing Back (Dec)

As I finished my course projects, exams towards the end of November, I could feel that I was gradually having a better sense of the research being done around me and even my own research. Surprisingly, this was a sum-total of a number of unplanned and providential experiences/interactions, for example- my course projects that I ignored for the most part, one-off discussions with colleagues and coming across eye-opening papers.

This reinforces my belief that for good research, it is critical to stay around the center of gravity. In the case of robotics and AI, CMU is that place.

Next, every once in a while, it is important to read books on fundamental ideas. You would anyway read books on the latest algorithms and techniques. It pays to study problems and formulations that are more abstract. For example, study classical RL vs the latest DRL papers, study MDP's vs RL, study Dynamic Programming vs MDP's.

In my case, this realization came as I studied Dynamic Programming and Optimal Control by Dmitri Bertsekas and 'A Unified Framework for Sequential Decisions'_. Both these readings (the latter being a chance discovery) studied together emphasize how well connected the fields of Heuristic Search, Optimal Control, RL and MDP are - they can all be looked at through the same Dynamic Programming lens.

For the first time, I now have a good idea who my ancestors are (ideologically speaking). Whence I could only look up to Dijkstra and Judea Pearl, I now have all the stars of optimal control and DP to look up to - Bellman, Bertsekas and so on.

Shivam Vats on #log,

A Guide to Planning Using Heuristic-based Lattice Search

Planning or more generally decision-making is a fundamental problem in Artificial Intelligence. Planning is useful whenever there is some notion of a goal and we would like to come up with a plan that helps us achieve the goal in a reasonable amount of time. Formally, we need to have a state space S, a start state s, a set of goals G and a cost function C that we would like to minimize. The goal of planning, then, is to come up with a plan that takes us from the start state to one of the goal states. The plan could be defined in various ways: it could be a set of states $n in S$.