- Kudzidza kusimbisa inzira yekusarudza yakatevedzana apo mumiririri anogadzirisa mubairo wekuwedzera nekudyidzana nenzvimbo.
- Nzira dzinoshandisa modhi uye dzisina modhi, RL yakadzika uye RL ine ma agent akawanda dzinogonesa kushandiswa mu robotics, vision, healthcare, finance uye ma operations makuru.
- Kubudirira kwekushandisa RL mumakambani kunoda simulation, strong compute, MLOps, domain expert uye clear business KPIs.
- Matambudziko makuru inyaya yekushanda zvakanaka kwedata, kugadzikana, rusaruro, kutsanangurwa kwaro uye kushandiswa kwakachengeteka kubva pakuenzanisa kuenda kunyika chaiyo.
Kudzidza kwekusimbisa pfungwa (RL) kwachinja kubva pakuda kuziva zvakawanda nezvedzidzo kuenda kune imwe yenzira dzakasimba dzekuvaka masisitimu ekugadzirisa zvisarudzo anochinjika. Panzvimbo pekudzidza kubva kumadata akagadziriswa, vamiririri veRL vanodzidza zvakananga kubva mukudyidzana, kuedza nekukanganisa, uye kunonoka kwemhinduro. Kuchinja ikoko kunochinja zvese: magadzirirwo atinoita maalgorithms, mavakirwo atinoita zvivakwa uye mashandisirwo atinoita AI kune kukosha kwebhizinesi chairo.
Kana uchiedza kunzwisisa zvinorehwa nekushandisa dzidzo yekusimbisa mukuita, unofanirwa kubatanidza zvikamu zvakasiyana-siyana panguva imwe chete: nheyo dzemasvomhu (mitemo, mibairo, mabasa ekukosha), bhokisi rezvishandiso realgorithmic (Q-learning, gradients dzemitemo, RL yakadzika), zvidimbu zveinjiniya (simulators, GPUs, MLOps) uye, zvakakosha, mibvunzo yehurongwa hwemaCIO nevatungamiriri (ROI, njodzi, kubatanidzwa nemaitiro ekare, mitemo). Chinyorwa chino chinotarisa mamiriro ezvinhu kubva pakutanga kusvika kumagumo, chinotarisa pakushandisa zvinobudirira kwete kutsanangura mabhuku chete.
Chii chaizvo chinonzi kudzidza kusimbisa (uye kuti kwakasiyana sei nekudzidza kweML yekare)
Kudzidza kunowedzera simba inzira yekudzidza iyo muiti anowana nzira yekuita zvinhu nekutaurirana ne mhepo mvura nenzvimbo, kugamuchira mhinduro muchimiro chemibairo kana chirango. Mumiriri haapiwi mazita akakodzera sekudzidza kwakatariswa, uye haasi kungounganidza data sekudzidza kusina kutariswa. Pane kudaro, anofanira kuona kuti ndeapi matanho anotungamira kune akakwira. mubayiro wakawedzerwa nekufamba kwenguva.
Pamutemo, matambudziko mazhinji eRL anofananidzwa neMarkov Decision Processes (MDP): Panguva imwe neimwe nhanho iyo nharaunda iri mumamiriro ezvinhu, mumiririri anosarudza chiito, nharaunda inoshanduka kuenda kune imwe mamiriro uye inodzosera mubairo we scalar. Chinangwa ndechekudzidza urongwa iyo inotsanangura mamiriro ezvinhu kuitira kuti purofiti inotarisirwa kwenguva refu ive yakanyanya, kwete kungowana mubairo wepakarepo chete.
Izvi zvinogadzira musiyano mukuru nekudzidza kwechinyakare kwemuchina: pane kuderedza kurasikirwa kusingachinji pane data rakagadzirwa, maRL agents anogadzirisa chinangwa chinoshanduka chinotsanangurwa nekudyidzana. Anofanira kuenzana kutengeserana kwekutsvaga nekushandisa zvisiri pamutemo: dzimwe nguva shandisa zvinotoita sezvakanaka, dzimwe nguva tsvaga zviito zvausingazive zvinogona kutungamira kumhedzisiro iri nani kwenguva refu.
Kubva pamaonero ehurongwa, mumwe musiyano wakakosha ndewekuti muRL "dataset ndiyo nharaunda pachayo". MuML inotariswa unobvunza kuti "ndedzipi data renhoroondo dzatinadzo?", ukuwo muRL mubvunzo mukuru ndewekuti "tinogona here kutevedzera kana kutevedzera nharaunda inosarudzwa?". Ndosaka simulators dzemhando yepamusoro uye mapatya edhijitari ari chinhu chakakosha pakuitwa kweRL kwakakomba.
Zvinhu zvikuru zvinovaka: mumiriri, nharaunda, mutemo uye mibairo
Chero nzira yekusimbisa hunyanzvi hwekudzidza, kubva pabhoti yemutambo wematoyi kusvika kuinjiniya inodzora, inotenderera pane zvikamu zvidiki zvepakati. Kunzwisisa zvakajeka kwakakosha kupfuura kubata nemusoro maalgorithms ega ega.
The muiti ndiye anoita sarudzo yatiri kudzidzisa. Inogona kuva software sevhisi yekusarudza mitengo, robhoti arm controlling motors, trading algorithm inosarudza maodha kana injini yekukurudzira inosarudza zvekuratidza mushandisi. Mumiririri ndiye anogadzira zviito.
The mhepo mvura nenzvimbo ndiyo nyika umo mumiririri anoita basa rake uye anopindura zviito zvake. Inogona kuva simulator yefizikisi, network yezvekufambisa zvinhu, musika, emulator yemitambo yemavhidhiyo kana mafambiro ebasa muchipatara. mamiriro (kana kucherechedza), inotsanangura kuti ndeapi zviito zviri pamutemo uye inounza mamiriro anotevera uye mubairo wenhamba mushure mechiito chimwe nechimwe.
The urongwa inotsanangura maitiro emumiririri: kana tichitarisa mamiriro ezvinhu akaonekwa, chii chaanofanira kutora? Maitiro ekuita zvinhu anogona kuva matafura akareruka (mumatambudziko madiki), mamodheru akatsetseka kana ma network epfungwa akadzika; anogona kuva echokwadi kana kuti asina kugadzikana. Chinangwa chikuru chekudzidzisa ndechekuvandudza maitiro aya kuitira kuti ape mibairo iri nani yenguva refu.
The chiratidzo chemubairo inotsanangura zvinoreva "kubudirira" munharaunda. Chiito chega chega chinotungamira kumubairo wakakura (ungave wakanaka, wakaipa kana zero). Kusiyana nekudzidza kunotariswa, mibairo inowanzova mishoma uye inononoka: mota inozvityaira yega inowana mubairo wekupedzisa nzira zvakachengeteka uye zvinobudirira, asi sarudzo dzemunhu mumwe nemumwe dzinogona kunge dzisiri dzakanaka kana dzakaipa panguva yadzinotorwa.
Chakabatana zvikuru ndicho basa rekukosha, iyo inofungidzira kuti mamiriro (kana kuti vaviri vezviitiko) akanaka sei maererano nemubayiro weramangwana unotarisirwa. Kunyange zvazvo mibairo iri pakarepo, basa rekushandisa rinoratidza mabhenefiti enguva refu, zvichibvumira mumiririri kudzivirira mabhenefiti enguva pfupi ayo anozoguma aipa gare gare. Muma RL algorithms mazhinji, kudzidza mabasa ekushandisa mabhenefiti kwakakosha sekudzidza mutemo wacho pachawo.
Kudzidza kwakavakirwa pamuenzaniso vs kudzidzira kusina kusimbisa pasina muenzaniso
Chimwe chezvisarudzo zvakakosha pakugadzira pakushandisa RL ndechekuti unovimba nemuenzaniso wenharaunda here kana kuti kwete. Izvi zvinopatsanura munda kuita zvichibva pamuenzaniso uye isina modhi nzira dzekushandisa, dzine migumisiro yakadzama inoshanda.
RL yakavakirwa pamuenzaniso inofungidzira kuti unoziva kana kudzidza muenzaniso wekuti nharaunda inoshanduka sei. Modhi iyoyo inofanotaura, zvichienderana nemamiriro ezvinhu uye chiito, kuti mamiriro anotevera uye mubairo waungangoona. Kana wangova nemodhi yakadaro, unogona kuronga nekuenzanisa matanho akawanda ekufungidzira uye kusarudza ane mubairo wepamusoro unotarisirwa. Izvi zvinonyanya kubatsira kana kuyedza kwenyika chaiyo kuchidhura, kune njodzi kana kunonoka - semuenzaniso, magridi emagetsi, maitiro eindasitiri kana kurapwa.
Maitiro ekushanda akajairika akavakirwa pamuenzaniso anotaridzika seizvi: mumiririri anosangana nemamiriro ezvinhu, anounganidza shanduko (mamiriro, chiito, mubairo, mamiriro anotevera), anokodzera kana kugadzirisa modhi yekuchinja uye obva ashandisa modhi iyoyo kutevedzera marongero akasiyana mukati. Nekutanga nzira dzemangwana mu silico, mumiririri anogona kuongorora nzira pasina kushandisa mari chaiyo.
Kusiyana neizvi, RL isina modhi haipe muenzaniso wakajeka wezvakatipoteredza uye inodzidza maitiro zvakananga kubva pane zvakaitika. Maitiro ekugadzirisa zvinhu akadai seQ-learning kana nzira dzakawanda dzepolicy-gradient dzinotarisa pakuvandudza mabasa evalue kana mapolicy zvichibva pamibairo yakaonekwa uye mamiriro ekutevera, vachishandisa matekiniki ebootstrapping pachinzvimbo chekuronga pachine nguva nemodhi yedynamics yakadzidzwa.
Nzira dzisina modhi dzinoshanda zvakanaka kana nharaunda yakakura, yakaoma, isingazivikanwe kana ichichinja nguva dzose, uye kana online kana kutevedzera kuedza nekukanganisa kuri nyore. Funga nezvemotokari dzakawanda dzinozvishandira dzoga dzakadzidziswa kushandisa michina inodzidzisa kutyaira yakawanda, kana kuti mumiririri wemitambo ari kuongorora mamiriyoni ezviitiko pasina hanya nekuchengetedzwa kwemotokari.
Maitiro ekudzidza ekusimbisa zvakakosha nemhuri
Pasi pemusoro, maRL implementations mazhinji nhasi anoshandisa misiyano yemhuri shoma huru dzemaalgorithms: nzira dzinoenderana nehukuru, nzira dzepolicy-gradient uye actor-critic hybrids. Pamusoro pezvo, ma network emidzi yetsinga anowedzera RL kumatambudziko makuru akadai sekuona uye kudzora kwakaoma.
Nzira dzinoenderana nepfungwa, dzakadai sekudzidza kweQ, dzinodzidza basa rinofungidzira kudzoka kunotarisirwa kwekutora chiito mumamiriro ezvinhu uye wozoshanda zvakanaka mushure mezvo. Mukudzidza kweQ-tabular, unochengetedza tafura yeQ(s,a) values woigadzirisa ne temporal-difference (TD) formula dzinobva padanho rekufungidzira kwazvino. Kana nzvimbo yemamiriro ezvinhu yakura kana kuti ikaramba ichienderera mberi, deep Q-networks (DQN) inotsiva tafura ne neural network, inowanzova convolutional network yemifananidzo.
Kudzidza musiyano wenguva pfupi ndiyo pfungwa huru iri shure kwema RL algorithms akawanda: pachinzvimbo chekumirira kusvika pakupera kwechikamu kuti vaverenge mhedzisiro chaiyo (sezvakaita nzira dzeMonte Carlo), nzira dzeTD dzinovandudza fungidziro zvichibva pane dzimwe fungidziro dzakadzidzwa. Mhedzisiro iyi inoita kuti kudzidza kuve nyore asi zvakare inounza matambudziko ekugadzikana.
Nzira dzekushandisa mutemo dzinogadzirisa zvakananga maparamita emutemo nekufungidzira magradients ekudzoka kunotarisirwa maererano nemaparamita iwayo. Pane kudzidza Q-values wozosarudza zviito nemaune, nzira idzi dzinogadzirisa mukana wekugoverwa kwezviito kuitira kuti nzira dzine mibairo yakakwira dzive nyore. Maitiro akadai seREINFORCE, Trust Region Policy Optimization (TRPO) uye Proximal Policy Optimization (PPO) anoshandiswa zvakanyanya mukutonga kunoramba kuripo uye robotics.
Nzira dzekuongorora vatambi dzinosanganisa nyika dzese mbiri nekuchengetedza mutemo wakajeka (mutambi) uye basa rekuratidza kukosha (mutsoropodzi). Mutsoropodzi anotungamira nhau dzemutambi nekupa fungidziro shoma dzekubatsira kwechiito chega chega. Mhando dzakakurumbira dzemutambi-mutsoropodzi akadzika dzinosanganisira A2C/A3C, DDPG (yezviito zvinoenderera mberi), uye SAC, dzese dzakabudirira munzvimbo dzeindasitiri nedzekutsvagisa.
Sezvo matambudziko achiramba achioma, vaongorori vakakurudzira kugadziriswa kwakadai seDouble Q-learning, Dueling DQN, Bootstrapped DQN uye distributional RL. Semuenzaniso, Double Q-learning inoshandisa nzira mbiri dzakasiyana dzekufungidzira kuderedza kusarura kwakawandisa, nepo Bootstrapped DQN ichichengetedza nzira dzakawanda dzeQ dzinokurudzira kuongorora kwakadzama nekutora misoro yakasiyana pachikamu chimwe nechimwe.
Kudzidza kwakasimba uye kudzidza kwakadzama: RL yakadzika
Kudzidza kwakadzika kwekusimbisa (deep RL) kungori kudzidza kwakasimba kwekusimbisa uko mutemo, basa rekubatsira kana muenzaniso wenyika zvinomiririrwa ne network yakadzika yetsinga. Izvi zvakagonesa RL kugadzirisa zvinhu zvisina kurongeka zvakaita semifananidzo, manzwi kana mavector emhando yepamusoro asingagone kushandiswa nematafura ekare kana mamodheru akatsetseka.
Muenzaniso unokosha ndewekushandisa manetwork e convolutional neural semaapproximators eQ-values mumitambo yeAtari. Iyo DQN algorithm inotora ma pixels asina kugayiwa kubva pachiratidziro senzira yekupinda, yoagadzirisa ne convolutional layers uye outputs estimated action values. Izvi zvakabvumira ma agents kudzidza nzira dzepamusoro dzevanhu zvakananga kubva kumifananidzo, vasina zvinhu zvakagadzirwa nemaoko kana ruzivo rwakajeka rwemitemo yemitambo.
Mumabasa ekuona kwemakombiyuta zvakanyanya, deep RL yakabatanidzwa nemaitiro ekutarisa uye magadzirirwo akasarudzika ekubata nekupatsanura, kuona zvinhu, kufungidzira kudzika uye kudzora kwakavakirwa pamufananidzo. Semuenzaniso, mamodheru ekutarisa anosarudza anogona kutarisa zviwanikwa zvemakomputa munzvimbo dzakakosha dzemufananidzo, achitungamirirwa nezviratidzo zvemubairo zvinoratidza mashandiro ebasa.
Zvisinei, deep RL ine nzara yekuverenga uye inozivikanwa nekusagadzikana. Kudzidzisa network hombe nezvinangwa zvakagadziriswa, data risingachinji uye mibairo yakanonoka zvinogona kusiyana zviri nyore kana hyperparameters, nzira dzekutsvaga uye magadzirirwo enetwork zvisina kugadziriswa zvakanaka. Ichi ndicho chimwe chezvikonzero zvikuru nei simulators dzakasimba uye hardware ine simba (GPUs, TPUs, distributed clusters) zvisingakurukurwe mumapurojekiti chaiwo.
Kubva padzidziso kuenda pakuita: maitiro eRL akajairika mukuita
Kushandisa RL system hakusi kungosarudza algorithm chete; asi kugadzira nzira yakazara inobva padambudziko rebhizinesi kuenda kumuenzaniso wezvakatipoteredza, sarudzo yealgorithm, kudzidziswa, kusimbiswa, kuiswa uye kutarisa. Matanho acho akabatana uye anowanzo dzokorora.
Kutanga, unotsanangura dambudziko resarudzo wobva watarisa kana rakatevedzana zvechokwadi uye richitungamirirwa nemubairo. Mabasa mazhinji ebhizinesi haana kukodzera RL uye anogadziriswa zviri nani nemamodheru anotariswa kana kunyange ma heuristics akareruka. Vanyoreri vakanaka veRL vanosanganisira kutaurirana kwenguva refu, feedback loops uye mamiriro anochinja - kuronga nzira, kugovera zviwanikwa, mitengo nekufamba kwenguva, kutonga marobhoti, mazano ekufamba kwenguva refu.
Chechipiri, unoita kuti nharaunda ive yepamutemo seMDP: mamiriro, zviito, mibairo uye shanduko. Izvi zvinoda ruzivo rwakadzama rwenzvimbo: ruzivo rupi rwunoonekwa nemumiriri padanho rega rega, ndeapi zviito zvinobvumidzwa, zviito izvozvo zvinoshandura sei sisitimu uye ndeapi marongerwo emubairo anonyatsoenderana nezvinangwa zvebhizinesi? Basa remubairo risina kugadzirwa zvakanaka rinogona kutungamira ku "kubiwa kwemubairo", uko vamiririri vanowedzera nhamba yenhamba nenzira dzinopesana nezvinangwa chaizvo.
Chechitatu, unosarudza kugadzira simulator kana kuvimba nedata renhoroondo yekudyidzana. Kana nharaunda chaiyo iri nengozi kana kuti inononoka (mitsetse yekugadzira, masisitimu emagetsi, marobhoti chaiwo), dhijitari yemhando yepamusoro yakakosha. Munzvimbo dzisina kunyanya kukosha, senge mazano epamhepo kana dzimwe sarudzo dzekushanda, unogona kutanga neRL isina mutemo palogs wozotanga kutsvaga online zvakanyatsonaka gare gare.
Chechina, unosarudza uye unogadzira mhuri yealgorithmic yakakodzera mamiriro ako nenzvimbo dzekuita, mamiriro edata nezvirambidzo. Kudzidza kweQ-tabular kunogona kukwana kumatambudziko madiki, akasiyana; magadzirirwo akafanana neDQN anoshanda pakutonga kwakabatana kwemifananidzo; nzira dzevatsoropodzi vanonyanya kushandiswa pakuita zvinhu zvinoramba zvichiitika; nzira dzekushandisa modhi dzinobatsira kana uchikwanisa kutevedzera zvakachipa asi data chairo rinodhura.
Chekupedzisira, unovaka pombi yeMLOps yakatenderedza mumiriri weRL: kuteedzera kuyedza, kudzidziswa kunodzokororwa, kuongorora zvichienderana nekutanga, nzira dzakachengeteka dzekuisa zvinhu uye kutarisa nguva dzose. Iyi nzira yekufambisa data haifanirwe kungobata mhando dzemuenzaniso chete asiwo mhando dzezvakatipoteredza, sezvo shanduko musimba rekufananidzira dzinogona kuchinja zvakanyanya maitiro emushandisi.
Mashandisirwo chaiwo ekudzidza kusimbisa
Pasinei nekuoma kwayo, RL yatove kushandiswa mumhando dzakasiyana-siyana dzinoshamisa dzemasystem chaiwo, kazhinji kacho ari kumashure kwezviitiko. Marobhoti, logistics, finance, hutano uye mapuratifomu edhijitari ndedzimwe dzenzvimbo dzinonyanya kukwezva vatengi.
Mumarobhoti, RL inodzidzisa marobhoti kuita hunyanzvi hwakaoma hwekufambisa, kufamba munzvimbo dzakazara uye kugadzirisa zvinhu nemazvo. Pane kunyora nemaoko nzira yega yega, marobhoti anodzidza nekudzokorora kutaurirana, zvishoma nezvishoma achivandudza kubata, kuungana kana kufamba. RL yakadzika ine zvinoonekwa zvinovabvumira kufunga zvakananga kubva kumakamera, vachichinja-chinja mamiriro ezvinhu.
Nzvimbo dzemitambo dzave nzvimbo yekutambira ongororo yeRL uye dzakaburitsa zvimwe zvezvinhu zvinonyanya kuoneka. Vamiririri vakadzidziswa kuburikidza neRL vakakwanisa mitambo yekare yeAtari, Go, chess, StarCraft nemimwe mitambo yakaoma yemaitiro, kazhinji vachipfuura nyanzvi dzepamusoro dzevanhu. Kubudirira uku kunoratidza kugona kweRL kuwana mazano enguva refu munzvimbo dzakakura dzesarudzo.
Munyaya dzemari, RL yakashandiswa mukutarisira portfolio, mazano ekutengeserana uye manejimendi enjodzi. Vamiririri vanodzidza kugovera mari, kuvhura nekuvhara nzvimbo kana kugadzirisa mapotifoliyo avo zvichienderana nemamiriro emusika ari kuchinja, vachigadzirisa purofiti inogadziriswa nenjodzi. Pano, zvirambidzo zvakaita semari yekutengeserana, miganhu yemutemo uye chishuwo chenjodzi zvinofanirwa kuiswa mukugadzirwa kwemubairo nenzvimbo.
Hutano ndeimwe nzvimbo inovimbisa asi inonetsa: RL inoshandiswa kugadzirisa maitiro ekurapa, kugadzirisa nguva yemwaranzi kana kugadzirisa zvirwere zvisingaperi nekufamba kwenguva. Nekutevedzera mamiriro emurwere uye nzira dzekupindira dziripo seMDP, mumiririri weRL anogona kupa mazano ekuita zvinhu zvinozobatsira pahutano kwenguva refu. Nekuti matambudziko makuru, nyaya dzakadai sekutsanangurwa, kururamisira uye kuchengetedzeka hazvigone kutaurirana.
Mukufambisa nekufambisa zvinhu, RL inovandudza mashandiro ekufambisa, manejimendi yezvikepe uye mashandisirwo emudura. Kubva pakufambiswa kwemotokari dzinotakura zvinhu dzinoita zvinoenderana nemamiriro ekunze panguva chaiyo, kusvika kunzvimbo dzekutora nekurongedza zvinhu mumarobhoti, vamiririri veRL vanonangana nemitengo yakaderera, kukurumidza kutumirwa kwezvinhu uye kuvimbika kwakanyanya nekudzidza kubva mukuenderera mberi kwemhinduro.
Masisitimu ekuona anotungamirirwa nekudzidza kwekusimbisa
Kuona kwekombiyuta inzira yechisikigo yekusimbisa kudzidza, kunyanya kana zvinhu zvichifanira kuita zvichibva pakuona kwekuona kwete pamatanho akarongeka ekuona. Deep RL inowedzera mamodheru ekuona akajairika nekuita kuti zvinobuda zvavo zvitungamirire zviito zvinoongororwa nguva dzose nebasa remubairo.
Semuenzaniso, masisitimu eRL akavakirwa pakuona emadrone anodzidza kudzivirira zvipingamupinyi uye kufamba munzvimbo dzakaoma vachishandisa kamera chete. Nekudzidziswa mumitambo inodzidzisa zvinhu zvakaita sema simulators, ma drones anogona kuona mamiriyoni ezviitiko zvekubhururuka uye kudzidza maitiro anoenderana nenyika chaiyo. Zviyero zvakaita sekubudirira kwekudzivirira zvipingamupinyi kana nguva yekupedza basa zvinoshanda semubairo unoumba maitiro.
Mukuongorora kwemaindasitiri, masisitimu ekuona akagadziridzwa neRL anosarudza kuti ndekupi uye sei kutsvaga zvikanganiso, kwete kungozviona nenzira isingachinji. Panzvimbo pekutarisa chigadzirwa chimwe nechimwe zvakafanana, mutemo weRL unogona kusarudza mazinga ekukwezva, maangle kana nzvimbo dzaunofarira zvichibva pane zvakamboonekwa, zvichivandudza kumhanya uye kururama.
Kufungidzira kwezvekurapa kunobatsirawo kubva kuRL, uko mitemo inogona kutungamira kuwanikwa kwemifananidzo, kutarisa nzvimbo dzinofungidzirwa kana bvunzo dzekuongorora dzakatevedzana. Chinangwa hachisi chekuona zvinhu zvisina kujairika chete asi kugadzirisa mashandiro ese ekuongorora chirwere pasi pemiganho yakaita senguva, mari uye kuchengetedzeka kwemurwere.
Kazhinji, kubatanidza kuona neRL pamwe chete kunoshandura masisitimu ekuziva asingachinji kuita zvimiro zvinoshanda zvekunzwisisa-kuita zvinogadzirisa maitiro avo munguva chaiyo. Kuchinjika ikoko ndiko chaizvo zvinodiwa nemabasa mazhinji epasirese, kubva pakutyaira uri wega kusvika pakutarisa zvinhu nekuchenjera.
Kudzidza kwekusimbisa vanhu vakawanda uye kuonana pamwe chete
Zviitiko zvakawanda zvechokwadi hazvingosanganisiri munhu mumwe chete akangwara asi vanhu vakawanda vavo vachitaurirana munzvimbo dzakafanana. Kudzidza kwekusimbisa makambani akawanda (MARL) kunobatsira pakugadzirisa mamiriro ezvinhu aya, apo makambani anogona kushandira pamwe, kukwikwidzana kana zvese zviri zviviri.
Mukushandiswa kwekuona pamwe chete, marobhoti akawanda, madrone kana makamera anobatana kuti azadzise chinangwa chimwe chete, senge kugadzira mepu yenzvimbo yenjodzi kana kutarisa mafekitori makuru emaindasitiri. Mumiriri wega wega anongotarisa maonero emunharaunda chete, saka kugovana ruzivo nekudzidza marongero anoshanda pamwe chete zvinova zvakakosha.
Hunhu hukuru hwemasystem ane vamiririri vakawanda hunosanganisira kuita sarudzo dzakapararira, nzira dzekutaurirana uye hunyanzvi hwebasa. Panzvimbo pemutongi mumwe chete, mumiririri wega wega anoita sarudzo dzemunharaunda, dzimwe nguva achizivisa ruzivo rwakajeka kune vamwe. Vamwe vamiririri vane hunyanzvi mukufambisa, vamwe mukuona kana kushandura, uye RL inofanira kudzidza mitemo inoshandisa chikamu ichi chebasa.
MARL inomutsa matambudziko matsva, akadai sekusamira kwenguva refu (nekuti mitemo yevamwe vamiririri inoramba ichichinja panguva yekudzidziswa) uye kugona kukura. Asi kana ikashanda, inogona kuwana kusimba uye kushanda kwepamusoro kupfuura chero sisitimu yemumiriri mumwe chete - kana mumiriri mumwe chete akakundikana, vamwe vanogona kutsiva nekugadzirisa.
Kupfuura marobhoti nekuona, RL ine ma "multi-agent RL" inotsigira mashandisirwo mukutarisira traffic, masisitimu esimba akagoverwa, ma "ad auction" uye chero nzvimbo uko vanhu vakawanda vanoita sarudzo vanosangana zvine hungwaru. Kune vanoisa mapurogiramu, dhizaini yenzira dzekutaurirana, kupatsanura mibairo uye nzira dzekudzidzisa zvinova zvakakosha senzira yeRL yekutanga.
Miganhu nematambudziko ekudzidza kuripo pari zvino
Pasinei nevimbiso yayo, RL haisi rombo rakanaka uye ine zvipingamupinyi zvakakomba zvekuti chero timu yekuisa chirongwa inofanira kutarisana nazvo zvakananga. Kusateerera nyaya idzi kunowanzoguma nehurongwa husina kugadzikana, mabhajeti ekuverenga asingashandiswi kana kuti mabhizinesi asingambobvi murabhoritari.
Kushanda zvakanaka kwedata uye sampuro ndizvo zvinonyanya kukonzera marwadzo: maRL algorithms mazhinji anoda kutaurirana kwakawanda kuti adzidze mitemo yakanaka. Mumitambo yekufananidzira izvozvo zvinogamuchirwa; muzvirongwa zvemuviri kana munzvimbo dzinodhura hazvigamuchirwi. Nzira dzinoshandiswa pamuenzaniso, RL isina Indaneti uye nzira dziri nani dzekutsvaga zvese izvi kuedza kuita kuti RL ishande zvakanaka.
Dambudziko rekutsvaga nekushandisa zvisirizvo harisi rekungoda kuziva chete asi idambudziko reunyanzvi hwekugadzira zvinhu. Vamiririri vanoongorora zvishoma vanobatwa nemaitiro asina kunaka; vamiririri vanoongorora zvakanyanya vanorasa zviwanikwa kana kutora matanho asina kuchengetedzeka. Matekiniki akadai semitemo ye-epsilon-greedy, kutanga kwetariro, mabhonasi ekuda kuziva kana mhando dzeThompson sampling anoshandiswa, asi kugadzirisa kwavo kunoramba kuri kwedambudziko chairo.
Kugadzikana uye kusangana zvimwe zvinokonzera kurwadziwa kwemusoro: maalgorithms eRL akadzika anogona kutenderera, kupatsanuka kana kupwanyika zvakanyanya kana mamiriro ezvinhu akachinja zvishoma. Kugadziriswa kudiki kunoratidzika sekunge kuri kwemubairo wezviyero, mwero wekudzidza kana magadzirirwo enetwork kunogona kuita kana kukanganisa kudzidziswa. Ndosaka kuedza kwakasimba, kubvisa zvinhu uye kutarisa zviri chinhu chakakosha kune chero chirongwa chakakomba cheRL.
Kudzidza kwekuchinjana uye kusanganisa zvinhu munzvimbo dzakasiyana-siyana kuchiri kuoma. Vamiririri vanowanzo dzidza marongero akagadzirwa zvakanaka kuti aenderane nemutambo wekudzidzisa kana kuti maitiro ekudzidzisa asi vanokundikana kana mamiriro ezvinhu achichinja - chiedza chitsva, maitiro akasiyana emushandisi, shanduko dzemarongero, kana hardware yakagadziridzwa. Matekiniki akadai se domain randomization, meta-learning uye kudzidziswa kwemabasa akawanda anobatsira, asi kushanda kwakasimba kwekunze kwekugovera kuchiri nzvimbo yekutsvagisa inoshanda.
Kugona kududzira uye kujekesa zvinhu zvinonyanya kunetsa RL yakadzika. Kana mitemo ichimiririrwa nehurongwa hukuru hwepfungwa, kunzwisisa kuti sei chimwe chinhu chakatorwa panguva yakatarwa hakusi chinhu chidiki. Muzvikamu zvinodzorwa zvakaita sezvemari nehutano, maitiro e "black-box" ari kuramba achigamuchirwa, zvichikonzera basa rekushandisa RL inotsanangurwa uye maturusi ekuongorora maitiro emutemo.
Maonero ehurongwa hwemaCIO: RL inoita sei nepfungwa dzebhizinesi?
Kubva pamaonero ehutungamiriri, mubvunzo mukuru hausi wekuti "tingashandisa RL here?" asi "tinofanira kushandisa RL here padambudziko iri, uye kana zvakadaro, riini?". RL tekinoroji ye "second-wave": inowanzova nemusoro chete kana sangano ratova nedata rakasimba, analytics uye ML inotariswa iripo.
Vaya vanoda RL vakanaka vane hunhu hwakasiyana-siyana: sarudzo dzinotevedzana, mhinduro dziripo, mamiriro ezvinhu anogona kutevedzera kana kuti kudzokororwa uye kune maKPI akajeka, anoyerwa akabatana nekushanda kwenguva refu. Kugadzirisa simba, mitengo inoshanduka, zvinhu zvakakura, kutonga kwakaoma kwemaindasitiri uye kugadzirisa zvinhu zvinoonekwa nevanhu kwenguva refu ndiyo mienzaniso yakajairika.
Vasati vatanga chirongwa chekuvandudza mashandiro epurojekiti, maCIO vanofanira kuongorora kugadzirira kwayo munzvimbo ina: data, tekinoroji, tarenda uye kukosha kwebhizinesi. Kudivi redata, chinangwa hachisi chekungowedzera huwandu hwemashoko chete asi kuti kutaurirana kunogona kutevedzera here kana kuti kwete. Kudivi retekinoroji, mukana wekuwana maGPU, zvivakwa zvakapararira uye MLOps stack yakasimba zvinosungirwa. Kudivi retarenda, zvikwata zvinoda nyanzvi dzeRL nemainjiniya vakasununguka nemasisitimu makuru; funga nezve diseño y construcción de equipos de agentes de IA.
Danho rakakosha nderekugadzira basa rekupa mibairo pamwe chete nenyanzvi dzemadomain kuitira kuti riratidze zvinangwa zvebhizinesi nezvirambidzo zvaro. Kana mubairo wacho uchingobata chikamu chakamanikana (semuenzaniso, mari inowanikwa) worega zvimwe (kutevedzera mitemo, kururamisira, kuchengetedzeka, kugutsikana kwevatengi), mumiririri achagadzirisa zvinhu zvisiri izvo uye ogadzira njodzi pane kukosha.
Chekupedzisira, nyaya dzebhizinesi reRL dzinofanira kubatanidza mubairo wakaunganidzwa nemumiriri zvakananga nezviyero zvemari: kudzikiswa kwemitengo, kukwidziridzwa kwemari kana kuwedzera kushanda zvakanaka. Pasina kubatana ikoko, hazvigoneke kutsanangura mutengo wese wekuva muridzi (simulation, compute, MLOps, maintenance) kana kuenzanisa mhinduro dzeRL ne baselines dziri nyore.
Magadzirirwo einjiniya uye marongero ekushandisa RL
Kudivi reinjiniya, kushandisa RL zvinoreva kuunganidza murwi wezvishandiso zvekudzidzisa, maraibhurari, zvivakwa zvekudzidzisa uye maturusi ekuyedza. Kunyange zvazvo mazano ealgorithmic ari akajairika, ecosystem yaunosarudza inokanganisa zvakanyanya kugona uye kuvimbika.
Magadzirirwo enzvimbo anopa nzira dzakajairwa dzekuti vamiririri vashandisane nemasisitimu chaiwo akaedzwa kana akaputirwa. Mapuratifomu ekare anoburitsa API iri nyore: kugadzirisa nharaunda, tora danho nekuita uye kugamuchira mamiriro matsva, mibairo uye magumo. Katarogu yakakura yenzvimbo - kubva kuAtari nemitambo yemavhidhiyo yekare kusvika kumasimulator ekutyaira uye zviitiko zveindasitiri - inobvumira kukurumidza kutevedzera uye kuenzanisa.
Pamusoro penzvimbo, maraibhurari eRL anoshandisa maalgorithms akasiyana-siyana (DQN, PPO, A2C, DDPG, SAC, Bootstrapped DQN nezvimwewo) ane ma defaults akakodzera uye tuning hooks. Maraibhurari aya anowanzo batanidzwa zvakanyanya nemafuremu ekudzidza akadzama akadai seTensorFlow kana PyTorch, zvichikupa mukana wekushandisa GPU acceleration, automatic differentiation uye maturity tooling ecosystem.
Mafuremu epamusoro anowedzera zvinhu zvakaita sekudzidziswa kwakagoverwa, mabuffers ekudzokorora asiri emutemo, kudzidziswa kwakavakirwa pahuwandu hwevanhu, kutsvaira kwe hyperparameter uye rutsigiro rwenzvimbo dzisiri dzemhando yepamusoro (senge ma simulators ekutyaira, mitambo ye3D yekutanga kana mamodheru eindasitiri akagadzirwa). Kumapurojekiti makuru, kugona kudzidzisa padanho guru, kutangazve kuedza uye kuenzanisa mhando dzakasiyana-siyana zvinova chinhu chikuru chinosiyanisa.
Chekupedzisira, MLOps layer inobatanidza zvese pamwe chete: kuteedzera kuyedza, kunyorwa kwedata nenzvimbo, kubatanidzwa nekuiswa kwezvinhu nguva dzose, kutarisa nekunyevera. MuRL, unofanirwa kubata tsananguro yezvakatipoteredza sechinhu chepakutanga: chero shanduko mukuchinja kwezvinhu, mhedzisiro yemubairo kana zvirambidzo zvinogadzira "dataset" itsva inogona kukanganisa mhedzisiro yapfuura.
Njodzi, tsika dzakanaka uye rusaruro muhurongwa hwekusimbisa dzidzo
Sezvo masisitimu eRL achienda kunzvimbo dzine njodzi huru, manejimendi enjodzi netsika zvinorega kuva zvinhu zvekuwedzera zvekusarudza uye zvinova zvinhu zvikuru zvinonetsa pakugadzira. Nekuti vamiririri vanoedza nesimba kuti vawane mubairo wakawanda, vanogona kushandisa nzira dzekuvhura nadzo, rusaruro kana kukanganiswa kwekugadzirwa kwezvakatipoteredza nenzira dzisina kutarisirwa nevanhu.
Kusarura mukudzidzisa data kana kutevedzera kunogona kutungamira kumitemo yekusarura, kunyanya mumasisitimu akavakirwa pakuona kana ekuita sarudzo anoshanda nevanhu. Kana mamwe mapoka evanhu asina kumiririrwa zvakakwana kana kuti asina kutaurwa zvisizvo munharaunda, mutemo wakadzidzwa unogona kuita zvakaipa kana kuti zvisina kururama paari. Izvi hazvisi zveRL chete, asi nzira yekudyidzana inogona kuwedzera mhedzisiro yakadaro.
Zvishandiso zvekuongorora kururamisira, kuyera rusaruro uye kusimbisa zvirambidzo zvinofanirwa kubatanidzwa muRL pipeline. Ongororo dzenguva dzose dzekugadzirwa kwenzvimbo, chimiro chemibairo uye mashandiro ayo mumapoka madiki zvinodiwa, pamwe chete nezvishandiso zvehunyanzvi zvakaita sematanho ekuenzanisa kuenzana, hurongwa hwekuona rusaruro uye nzira dzekutsanangura dzakagadzirirwa RL.
Chimwe chinonetsa ndechekuti mitemo yeRL yakadzika ndeye "black-box". Vatongi nevane chekuita nazvo vari kuramba vachida tsananguro dzezvisarudzo zvinoitwa otomatiki, kunyanya kana zvichikanganisa chikwereti, hutano, basa kana kuchengetedzeka. Basa rekushandisa RL rinotsanangurwa rine chinangwa chekutsvaga zvikonzero zvinonzwisisika nevanhu, kuratidza mamiriro ane simba uye kuyedza maitiro asina kunaka.
Chekupedzisira, hurongwa hwakawanda hwekutarisira njodzi hunosimbisa kudiwa kwekutarisa nguva dzose, kuteverwa uye kusimbiswa kwakasimba kwemabasa emubairo nemitemo. Munzvimbo dzakarongwa, magwaro ezviito, mamiriro ezvinhu uye mhedzisiro zvinofanirwa kuchengetwa uye kuongororwa, uye nzira dzekudzosera kumashure dzinofanira kunge dzakagadzirira kana mumiririri akaita zvisingatarisirwi.
Kubva pakufananidzira kusvika kunyika chaiyo: kuvhara mukaha wekuti sim-to-real
Mapurojekiti makuru eRL anonyanya kuvimba nekushandisa simulation panguva yekudzidziswa, obva atarisana nedambudziko rekuendesa marongero kunyika chaiyo. Musiyano uripo pakati penzvimbo dzakaedzwa nedzechokwadi - chiedza, magadzirirwo, ruzha, simba risina kurongwa, maitiro evanhu - zvinogona kukonzera kudzikira kukuru kwekushanda.
Musiyano uyu unonzi sim-to-real muganho unoyerwa nenzira dzakasiyana-siyana, kusanganisira metrics dzekugovera dzinoenzanisa zvakaongororwa uye zvakacherechedzwa chaizvo. Kusiyana kukuru kunoreva kuti mutemo uyu hauna kuona chero chinhu chakafanana nedata chairo rauchasangana naro, uye maitiro awo anogona kunge asina kusimba.
Kuti vaderedze izvi, vashandi vanoshandisa domain randomization (magadzirirwo akasiyana-siyana, chiedza, fizikisi panguva yekudzidziswa), kugadzirisa nedata chairo, kugadzirisa mitemo yakasimba uye nzira dzekuchengetedza ruzivo. Pfungwa yacho ndeyekuratidza mushandisi uyu kuti awane shanduko dzakawanda sezvinobvira kuitira kuti adzidze mazano akajairika pane kuyeuka maitiro emushandisi mumwe chete.
Mumapurogiramu akakosha pakuchengetedza, kuiswa kwedata kunorongwa: maajenti anotanga amhanya mu "mumvuri", achipa mazano anonyorwa asi asingaitwe, ozowana kuzvitonga zvishoma nezvishoma sezvo mashandiro avo uye kusimba kwavo zvichisimbiswa. Maitiro aya anokubatsira kuyedza mapolisi ari mumamiriro ezvinhu chaiwo pasina kuapa simba rakazara nguva isati yakwana.
Tichitarisa mberi, kufambira mberi mukuenzanisa kwakasimba, kugadzira modhi uye matekiniki ehybrid model-based/modhi-free zvicharamba zvichideredza musiyano we sim-to-real, zvichiita kuti RL ive nyore kune akawanda masisitimu echokwadi.
Kubatanidza zvikamu izvi zvese - kubva paMDP basics uye algorithm design kusvika pakuenzanisa, tsika, zvivakwa uye kurongeka kwebhizinesi - ndizvo zvinoshandura kudzidza kusimbisa kubva pazano rakangwara kuita tekinoroji inogona kushandiswa inogona kugadzira kukosha munzvimbo dzakaoma uye dzinochinja-chinja.