Inmath, wecalltheserelationsfunctions. Itsourwayofrepresenting a setofpatterns, a mapping,

a relationshipbetweenmanyvariables.

Nomatterwhatmachinelearningmodelweuse, nomatterwhatdatasetweuse, thegoalofmachinelearningistooptimizeforanobjectiveandbydoingsoweareapproximating a function.

we’llseethatthereexists a valley, theminima. We’lluseourerrortohelpuscomputethepartialderivativewithrespecttoeachweightvaluewehaveandthisgivesusourgradient.

Thegradientrepresentsthechangeintheerrorwhentheweightsarechangedby a verysmallvaluefromtheiroriginalvalue.

Weusethegradienttoupdatethevaluesofourweightsin a directionsuchthattheerrorisminimized,

Thefirstderivativetellsusifthefunctionisincreasingordecreasingat a certainpoint, andthesecondderivativetellsusifthefirstderivativeisincreasingordecreasing, whichhintsatitscurvature.

Firstordermethodsprovideuswith a linethatistangentialto a pointonanerrorsurface, andsecondordermethodsprovidesuswith a quadraticsurfacethatkissesthecurvatureoftheerrorsurface.

HahaGet a roomyoutwoTheadvantagethenofsecondordermethodsisthattheydon’t ignorethecurvatureoftheerrorsurface, andintermsofstep-wiseperformance, theyarebetter.

Letslookat a popularsecondorderoptimizationtechniquecalledNewton’s methodnamedafterthedudewhoinventedcalculus.

Who’s namewas…

ThereareactuallytwoversionsofNewton’s method, thefirstversionisforfindingtherootsof a polynomial, allthosepointswhereitintersectsthe x-axis.

Soifyouthrew a ballandrecordeditstrajectory, findingtherootoftheequationwouldtellyouexactlywhattimeithitstheground.

Letssaywehave a function f of x andsomeinitialguessedsolution. Newtonsmethodsaysthatwefirstfindtheslopeofthetangentlineatourguesspoint, thenfindthepointatwhichthetangentlineintersectsthe x axis.

At a highlevel, given a randomstartinglocation, weconstruct a quadraticapproximationtotheobjectivefunctionthatmatchesthefirstandsecondderivativevaluesatthatpoint.

OKletsgoovertwocasesofNewton’s Methodforoptimizationtolearnmore, a 1D caseand 2D case.

Inthefirstcasewe’vegot a 1 dimensionalfunction. Wecanobtain a quadraticapproximationat a givenpointofthefunctionusingwhat’s called a Taylorseriesexpansion,

neglectingtermsoforderthreeorhigher.

A Taylorseriesis a representationof a functionasaninfinitesumoftermsthatarecalculatedfromthevaluesofthefunctionsderivativesat a singlepoint.

Sowhenshouldyouuse a secondordermethod? Firstordermethodsareusuallylesscomputationallyexpensivetocomputeandlesstimeexpensive, convergingprettyfastonlargedatasets.

Herearethekeypointstoremember: Firstorderoptimizationtechniquesusethefirstderivativeof a functiontominimizeit, secondorderoptimizationtechniquesused

thesecondderivative. TheJacobianis a matrixoffirstpartialderivativesandtheHessianis a matrixofsecondpartialderivatives.

AndNewton’s Methodis a a popularsecondorderoptimizationtechniquethatcansometimesoutperformgradientdescent. LastweekscodingchallengewinnerisAlbertoGarces.